Terms and usage of the resource isolation feature of StarRocks - E-MapReduce

This topic describes the resource isolation feature of StarRocks and describes the terms and usage of this feature.

Limits

The resource isolation feature is available for E-MapReduce (EMR) clusters of V5.7.0 or a later minor version.

Feature overview

The resource isolation feature isolates resources for queries performed in a cluster by different tenants. This reduces resource consumption in data queries and improves resource utilization in the cluster.

When you start a data query job, StarRocks selects a classifier for the job based on the job information. The classifier that is most suitable for the job is selected and the resource group to which the classifier belongs provides resources for the job.

Terms

resource group

A resource group is allocated a specific number of CPU resources and memory resources from a backend node. A resource group can be associated with one or more classifiers. To calculate the number of resources allocated to a resource group, take note of the following parameters:

cpu_core_limit: the proportion of the number of CPU cores allocated to a resource group to the total number of CPU cores on the backend node. The value is a positive integer. StarRocks specifies the time periods during which CPU resources in execution threads and I/O threads can be used in each resource group based on the value of the cpu_core_limit parameter.
For example, an Elastic Compute Service (ECS) instance with a 16-core CPU supports the following resource groups: r1, r2, and r3. The values of the cpu_core_limit parameter for the preceding resource groups are 2, 6, and 8. In this case, the number of CPU cores allocated to each of the resource groups can be calculated based on the following formulas: Number of CPU cores of the backend node × (2/16), Number of CPU cores of the backend node × (6/16), and Number of CPU cores of the backend node × (8/16). If resources are available, rg1 and rg2 have loads, and rg3 has no requests, the number of CPU cores allocated to rg1 and rg2 can be calculated based on the following formulas: Number of CPU cores of the backend node × (2/8) and Number of CPU cores of the backend node × (6/8).
mem_limit: the upper limit of memory resources for data queries from the backend node, which is the maximum value of the query_pool parameter. Valid values: 0.0 to 1.0.
For information about how to view the value of the query_pool parameter, see Manage memory resources.

classifier

A classifier is used to match a data query job. The classifier that is most suitable for the job is selected and the resource group to which the classifier belongs provides resources for the job. A resource group can be associated with one or more classifiers.
The following parameters must be configured when you select a classifier:
- user: the username.
- role: the role of the user.
- query_type: the query type. Only SELECT queries are supported.
- source_ip: the IP address from which the query is initiated. The value is a CIDR block.
Classifiers and data query jobs are matched based on the following rules:
- The job information meets the conditions of a classifier.
- If the job information meets the conditions of multiple classifiers, StarRocks calculates and compares the matching scores. Only the classifier that has the highest matching score is selected. The matching score is calculated based on the following rules:
  - If the username of the job meets the condition, one point is added to the score.
  - If the role of the user meets the condition, one point is added to the score.
  - If the query type meets the condition, the following points are added to the score: 1 + 1/Value of the query_type parameter of the classifier.
  - If the IP address from which the query is initiated meets the condition, the following points are added to the score: 1 + (32 - cidr_prefix)/64.
Examples:
- If a classifier requires more conditions, the classifier has a higher score.
```
-- In this example, Classifier B that requires more conditions has a higher score. 
classifier A (user='Alice') classifier B (user='Alice', source_ip = '192.168.**.**/24')
```
- If the numbers of conditions are the same, the classifier that has the most precise descriptions of conditions has the highest score.
```
-- In this example, Classifier B requires a smaller range of IP addresses, and therefore has a higher score. 
classifier A (user='Alice', source_ip = '192.168.**.**/16')
classifier B (user='Alice', source_ip = '192.168.**.**/24')

-- In this example, Classifier C supports fewer query types, and therefore has a higher score. 
classifier C (user='Alice', query_type in ('select'))
classifier D (user='Alice', query_type in ('insert','select', 'ctas'))
```

Usage

Enable the resource group feature

To use the resource group feature, you must configure session variables to enable a pipeline engine.

-- Enable a pipeline engine and the resource group feature for the current session: 
SET enable_pipeline_engine = true;
-- Globally enable a pipeline engine and the resource group feature: 
SET GLOBAL enable_pipeline_engine = true;

Note

By default, the resource group feature is enabled in StarRocks V3.1.0 or later. The enable_resource_group session variable is deprecated.

Create resource groups and classifiers

Syntax

CREATE RESOURCE GROUP group_name
TO (
    user='string',
    role='string',
    query_type in ('select'),
    source_ip='cidr'
) -- Create classifiers. Separate multiple classifiers with commas (,). 
WITH (
    "cpu_core_limit" = "INT",
    "mem_limit" = "m%",
    "concurrency_limit" = "INT",
    "type" = "normal" -- The type of the resource group. The value is fixed to normal. 
);

Example

CREATE RESOURCE GROUP rg1
TO
    (user='rg1_user1', role='rg1_role1', query_type in ('select'), source_ip='192.168.**.**/24'),
    (user='rg1_user2', query_type in ('select'), source_ip='192.168.**.**/24'),
    (user='rg1_user3', source_ip='192.168.**.**/24'),
    (user='rg1_user4'),
    (db='db1')
WITH (
    'cpu_core_limit' = '10',
    'mem_limit' = '20%',
    'type' = 'normal',
    'big_query_cpu_second_limit' = '100',
    'big_query_scan_rows_limit' = '100000',
    'big_query_mem_limit' = '1073741824'
);

(Optional) Specify a resource group

StarRocks uses classifiers to select a resource group that is most suitable for a job. You can also specify a resource group for a job based on your business requirements.

SET resource_group = 'group_name';

Query resource groups and classifiers

Syntax

Query all resource groups and classifiers.
```
SHOW RESOURCE GROUPS ALL;
```
Query resource groups and classifiers that match the current user.
```
SHOW RESOURCE GROUPS;
```
Query a specific resource group and classifier.
```
SHOW RESOURCE GROUP <yourResourceName>;
```

Manage quotas for resource groups and classifiers

Modify the quota for a resource group.

Syntax

ALTER RESOURCE GROUP <yourResourceName> WITH (
    'cpu_core_limit' = '10',
    'mem_limit' = '20%'
);

Create or delete classifiers.

Syntax

Create a classifier.

ALTER RESOURCE GROUP <yourResourceName> ADD CLASSIFIER[,...];

Delete specific classifiers.

ALTER RESOURCE GROUP <yourResourceName> DROP (CLASSIFER_ID_1, CLASSIFIER_ID_2, ...);

Delete all classifiers.

ALTER RESOURCE GROUP <yourResourceName> DROP ALL;

Delete a resource group

Syntax

DROP RESOURCE GROUP <yourResourceName>;