Isolate Hologres Resources to Prevent Mixed Workload Contention - Hologres

Hologres lets you run ETL pipelines, BI dashboards, and online recommendation services on a single instance. When multiple teams share the same instance, workloads compete for resources—ETL peaks can slow dashboards, and large ad-hoc queries can trigger out-of-memory (OOM) errors for other users. This guide explains how to use workload isolation, scheduled scaling, Serverless Computing, and query queues together to balance stability and cost across mixed workloads.

Scenario overview

The strategies in this guide are illustrated with a real-world e-commerce scenario in which three teams share one Hologres instance.

Team	Tasks	Characteristics
Data team	Real-time and near-real-time ETL using Flink and DataWorks Data Integration; near-real-time ETL using Dynamic Table; batch ETL using MaxCompute and Hologres	ETL peaks at night; batch ETL runs in early morning
Data analysts	BI dashboards for sales data; self-service analytics	BI traffic peaks during work hours, with occasional night surges; self-service queries can be resource-intensive
Recommendation team	Real-time product recommendations using primary key lookups	Traffic peaks every evening

How resource management features interact

Before choosing a strategy, understand how Hologres resource management features interact. Some combinations are mutually exclusive.

Feature	Works with virtual warehouses	Works with Serverless Computing	Works with query queues
Fixed Plan	Scale up only	Not supported	Not supported
Serverless Computing	Overflow from VW	Direct routing	Route queue to serverless
Adaptive Serverless Computing	Small tasks stay in VW	Large tasks routed automatically	Compatible
Auto-scaling	Scale out clusters	N/A	Compatible
Scheduled scaling	Scale up/down on schedule	N/A	Compatible

Important

Requests optimized by Fixed Plan cannot use Serverless Computing or query queues. Address peak loads for these requests by scaling up the virtual warehouse. Write peaks cannot be handled by auto-scaling.

Choose a resource management strategy

Use the following decision tree to select a strategy.

Extremely large workloads (massive data backfills, full table scans, joins across 10+ tables, deeply nested subqueries): use Serverless Computing to prevent these tasks from affecting other workloads. Enable adaptive serverless computing for automatic routing.
Fixed Plan-optimized requests: scale up the virtual warehouse to handle peaks. Serverless Computing and query queues are not available for these requests.
All other requests: choose based on performance requirements, peak patterns, and request type. See the challenges below.

Challenges and solutions

Challenge 1: Resource contention between teams

Problem: ETL pipelines and query workloads compete for the same computing resources, causing interference.

Solution: Deploy multiple virtual warehouses—one per team—to isolate workloads. See Architecture of virtual warehouses.

Example:

Primary virtual warehouse (init_warehouse): data team for writes and ETL
Read-only virtual warehouse 1: analytics team
Read-only virtual warehouse 2: recommendation team

Challenge 2: Fixed peak hours

Problem: Resource demand follows a predictable daily pattern, making it inefficient to maintain peak capacity at all times.

Solution: Use scheduled scaling (beta) to scale virtual warehouses up before peak hours and scale them back down afterward. If resource expansion is needed for fewer than 16 hours per day, scheduled scaling costs less than maintaining dedicated resources around the clock.

Example:

Team	Peak pattern	Scheduled scaling configuration
Data team (real-time ETL)	Nightly	Scale up the primary virtual warehouse in the evening; scale down in the morning. Because real-time ETL uses Fixed Plan, scaling up is the only option.
Data analysts	Work hours, with occasional night surges	Scale up the read-only virtual warehouse at the start of the workday; scale down in the evening. Handle night surges with auto-scaling (see Challenge 6).
Recommendation team	Every evening	Scale up the read-only virtual warehouse before the evening peak; scale down overnight. Point queries use Fixed Plan, so scaling up is the only option.

Challenge 3: Large tasks causing OOM errors or blocking other tasks

Problem: Large tasks consume excessive resources, triggering OOM errors or blocking smaller tasks for extended periods.

The following workload types are commonly affected.

Batch ETL

A single batch task often requires significant resources and can block the queue for a long time.

Priority	Solution
Stability	Run all batch ETL tasks on Serverless Computing. Configure this at the SQL or user level. See Run read and write tasks with Serverless Computing resources.
Balance stability and cost	Run small tasks on the primary virtual warehouse; route large tasks to Serverless Computing.

Near-real-time ETL with Dynamic Table

Dynamic Table runs an incremental refresh every minute per table. The compute cost per refresh varies with incremental data volume, making it unpredictable.

Priority	Solution
Stability	Run all Dynamic Table refresh tasks on Serverless Computing. See Create dynamic table.
Balance stability and cost	Route refresh tasks for large tables or tables with significant data fluctuations to Serverless Computing; run other tasks on the primary virtual warehouse.

BI dashboards

Many dashboards run simultaneously, and large queries can block smaller ones.

Priority Solution

Balance stability, cost, and setup effort Enable Adaptive Serverless Computing at the database or user level. Large tasks are automatically routed to Serverless Computing; small tasks remain in the virtual warehouse.

Balance stability and cost

Priority	Solution
Balance stability, cost, and setup effort	Enable Adaptive Serverless Computing at the database or user level. Large tasks are automatically routed to Serverless Computing; small tasks remain in the virtual warehouse.
Balance stability and cost	Route specific large queries to Serverless Computing using SQL fingerprints: Run the dashboard to ensure Hologres processes all requests. In slow query logs, filter by `cpu_time_ms` to identify large tasks and extract the `digest` field (SQL fingerprint). See Get and analyze slow query logs. Create a query queue for these SQL fingerprints. See Configure classifier matching rules. Configure the query queue to run on Serverless Computing. See Run query queue queries using Serverless Computing resources.
Cost savings	Run all queries in the virtual warehouse and enable large query auto-rerun for the query queue. Timed-out and OOM queries automatically rerun on Serverless Computing without affecting the user experience. See Large query control.

Route specific large queries to Serverless Computing using SQL fingerprints:

Run the dashboard to ensure Hologres processes all requests.
In slow query logs, filter by cpu_time_ms to identify large tasks and extract the digest field (SQL fingerprint). See Get and analyze slow query logs.
Create a query queue for these SQL fingerprints. See Configure classifier matching rules.
Configure the query queue to run on Serverless Computing. See Run query queue queries using Serverless Computing resources.

Cost savings Run all queries in the virtual warehouse and enable large query auto-rerun for the query queue. Timed-out and OOM queries automatically rerun on Serverless Computing without affecting the user experience. See Large query control.

Challenge 4: Sudden large ad-hoc queries destabilize the instance

Problem: Sporadic analytical queries create unpredictable, high resource usage that affects overall instance stability.

Priority	Solution
Stability	Route all ad-hoc analytical requests to Serverless Computing at the user level. See Configure at the user level.
Balance stability and cost	Enable Adaptive Serverless Computing at the user level. Large tasks are routed to Serverless Computing automatically; small tasks run in the virtual warehouse.
Cost savings	Run all requests in the virtual warehouse and enable large query auto-rerun. See Large query control.

Challenge 5: Different performance requirements across dashboard users

Problem: BI dashboards are accessed by different roles—data developers, operations staff, sales teams, and senior management—with varying performance expectations.

High-performance requirements

Priority	Solution
Stability	Create a dedicated virtual warehouse for high-performance requests.
Balance stability and cost	Route all requests through Serverless Computing, or enable Adaptive Serverless Computing.

Latency-tolerant workloads

Request pattern	Solution
Fixed (a role consistently accesses the same dashboard)	Configure manual throttling: run a stress test to determine the virtual warehouse's read capacity, then set a fixed concurrency limit for the query queue. See Create a query queue.
Variable (multiple dashboards accessed by different roles)	Enable automatic throttling. Hologres automatically adjusts the query queue's concurrency limit based on current workload. See Automatic throttling for query queues (beta).

Challenge 6: Unexpected request surges

Problem: Sudden, unpredictable traffic spikes—for example, an unexpected surge of nighttime queries—cannot be anticipated by scheduled scaling. Serverless Computing alone cannot handle these surges because the users, tables, and SQL statements involved are unknown in advance.

Solution: Enable auto-scaling. Virtual warehouses automatically scale out during peak loads and scale in when demand subsides. See Multi-cluster and auto scaling (beta).

Advanced settings

Configure task priorities for Serverless Computing

When multiple services share Serverless Computing resources, set priorities at the user level to control which tasks run first under resource contention. See Set priorities for Serverless Computing tasks.

Example:

User type	Priority	Behavior when resources are scarce
Roles with high performance requirements	5	Executed first
Batch ETL tasks	1	Wait in queue
All other tasks	3 (default)	Standard scheduling

Configure daily quotas for Serverless Computing

If multiple services use Serverless Computing, costs can be unpredictable. Set daily quotas at the instance level and per user to control costs. See Daily Usage Limit.

Enable high availability for read-only virtual warehouses

For read-only virtual warehouses that serve latency-sensitive workloads—especially online recommendation services—configure multiple shard-level replicas. If a query node fails, queries continue without data loss. See Concurrent queries with shard replicas.