To further optimize query performance and management flexibility, StarRocks 3.3 introduces the concept of a warehouse, which is a group of compute nodes (CNs) or backend (BE) nodes within a StarRocks instance and is responsible for executing computational logic. This topic describes how to use a warehouse in E-MapReduce (EMR) Serverless StarRocks.
Background information
In a distributed database system, the rational allocation and utilization of resources is the key to ensure the high performance of the system. StarRocks is a high-performance Massively Parallel Processing (MPP) analytic database designed for real-time analysis and processing of large amounts of data. StarRocks delivers high throughput while ensuring low latency based on intelligent distribution and parallel execution of queries.
You can use warehouses in but not limited to the following scenarios:
Load balance and resource isolation: Query requests from different businesses or user groups are distributed to different warehouses to achieve load balance and ensure that resources of different businesses do not interfere with each other.
Performance optimization: For queries that require fast responses or for critical business operations, you can specify a high-performance warehouse for execution. Warehouses can have more powerful computing resources or superior network conditions.
Data security and compliance: When the system handles sensitive data, the execution of specific queries must comply with relative security standards or must be performed on specific warehouses to meet strict data protection regulations.
To support these requirements, StarRocks provides a series of configuration and management features that allow you to flexibly specify and use warehouses to optimize the query and resource management policies.
Prerequisites
An EMR Serverless StarRocks instance is created. For more information, see Create an instance.
You can add warehouses only to shared-data instances of StarRocks 3.3 or later. Shared-nothing instances support only a default warehouse named default_warehouse and do not support the multi-warehouse mode.
Usage notes
By default, asynchronous tasks, such as dynamic partition creation, compaction, schema change, materialized view refresh, and statistics collection, are performed in default_warehouse. To ensure high performance and smooth user experience, we recommend that you configure at least three CNs or BE nodes in default_warehouse.
Query warehouse and node information
Query information about all warehouses
Syntax
SHOW WAREHOUSES [LIKE '<pattern>'];[LIKE '<pattern>']is optional and is used to match a specific warehouse by name.<pattern>supports the following wildcards:%: matches characters of any length, including zero characters._: matches a single character.
Examples of
<pattern>:LIKE 'test%': matches all warehouses whose names start withtest.LIKE '%_warehouse': matches all warehouses whose names end with_warehouse.
Examples
Query information about all warehouses.
SHOW WAREHOUSES;Query information about warehouses whose names match the specified pattern.
SHOW WAREHOUSES LIKE 'test%';
Query information about all nodes in warehouses
The information of each node includes the name of the warehouse to which it belongs.
SHOW COMPUTE NODES;Query information about nodes in a specific warehouse
Query information about nodes in a single warehouse.
SHOW NODES FROM WAREHOUSE <warehouse_name>;Query information about nodes in multiple warehouses.
SHOW NODES FROM WAREHOUSES [LIKE '<pattern>'];If you do not specify
[LIKE '<pattern>'], information about nodes in all warehouses is displayed.
Use a specific warehouse to perform operations
If you do not specify a warehouse, all operations are performed in default_warehouse.
Queries
You can specify a warehouse for a query operation only by using SET_VAR or manually switching the warehouse in the current session.
Specify a warehouse by using SET_VAR
In StarRocks, you can use SET_VAR hints to dynamically specify the warehouse used by the current query. This method is suitable for scenarios where you need to temporarily switch the warehouse. It does not affect other queries or session configurations. For more information about SET_VAR hints, see System variables.
The following syntax is supported:
SELECT /*+SET_VAR(warehouse="<warehouse_name>")*/ * FROM <database_name>.<table_name>;/*+SET_VAR(warehouse="<warehouse_name>")*/ is written in fixed format and is used to dynamically configure the parameters at the query level. <warehouse_name> indicates the name of the warehouse. <database_name> and <table_name> indicate the names of database and table. You can replace them with actual names.
SET_VAR hints allow you to flexibly specify a warehouse for a single query, enabling resource isolation and performance optimization. Take note of the following items:
Required version: You must use StarRocks 3.3 or later.
Syntax specifications: Ensure that the position and format of
/*+SET_VAR(...)*/are correct.Permissions and resources: Ensure that the specified warehouse exists and you have access permissions on the warehouse.
Manually switch the warehouse in the current session
Specify a warehouse.
SET [session] warehouse [=] <warehouse_name>;In the statement,
sessionand=are optional and do not affect the semantics.Query the warehouse in use.
SHOW variables LIKE "%warehouse%";The specified warehouse takes effect only in the current session.
default_warehouseis used if no warehouse is specified.default_warehouseis used for asynchronous tasks, such as compaction, asynchronous materialized view refresh, and statistics collection.Execute a query.
SELECT * FROM my_db.my_table;
Data import operations (Load)
Stream Load
Add the warehouse information to the header of the HTTP request. The following syntax is supported:
curl --location-trusted -u <username>:<password> \ -H "label:<label_name>" \ -H "timeout:100" \ -H "max_filter_ratio:1" \ -H "warehouse:<warehouse_name>" \ -T <file_path> \ -XPUT http://<fe_host>:<fe_http_port>/api/<database_name>/<table_name>/_stream_loadThe following table describes the parameters in the code.
Parameter
Required
Description
<username>:<password>Yes
The username and password of the StarRocks instance. If no password is set for the account, you only need to enter
<username>:, such asadmin:.labelNo
The label of the import job. If you do not specify a label, StarRocks automatically generates one for the import job. Data with the same label cannot be imported repeatedly. This prevents repeated import of the same data.
timeoutNo
The timeout period for the import job. Valid values: 1 to 259200. Unit: seconds.
max_filter_ratioNo
The maximum fault tolerance rate of the import job. The maximum fault tolerance rate refers to the maximum proportion of data rows filtered out due to unqualified data quality that the import job can tolerate. Valid values: 0 to 1.
warehouseYes
The warehouse that is used to perform the Stream Load operation.
<file_path>Yes
The path where the data file is located.
fe_hostYes
The internal endpoint of the frontend (FE) node in the StarRocks instance. You can view the endpoint on the Instance Details page of the StarRocks instance.
NoteIf you submit the import job to a BE node or a CN, you must specify the IP address of the BE node or CN. For example, you can run the
SHOW COMPUTE NODES;command to view the IP address.fe_http_portYes
The HTTP port number of the FE node in the StarRocks instance. Default value: 8030.
NoteIf you submit the import job to a BE node or a CN, you must specify the HTTP port number of the BE node or CN. Default value: 8040.
database_nameYes
The name of the database where the destination StarRocks table resides.
table_nameYes
The name of the destination StarRocks table.
External transaction import
When you call the
/api/transaction/begininterface to enable a transaction, you must specify a label and add the warehouse information to the HTTP request header. The following syntax is supported:curl --location-trusted -u <username>:<password> \ -H "label:<label_name>" \ -H "db:<database_name>" \ -H "table:<table_name>" \ -H "warehouse::<warehouse_name>" \ -XPOST http://<fe_host>:<fe_http_port>/api/transaction/beginAfter you run the preceding command, when you call the
/api/transaction/load,/api/transaction/prepare, or/api/transaction/commitinterface, you must use the label that is specified for the/api/transaction/begininterface, but you do not need to specifywarehouseinformation again in the HTTP request header.Broker Load
LOAD LABEL [<database_name>.]<label_name> ( data_desc[, data_desc ...] ) WITH BROKER ( StorageCredentialParams ) [PROPERTIES ( opt_properties ) ]Add the
warehouseparameter toopt_propertiesin the statement that is used to create an import task. The warehouse of the current session is used by default. Sample statement:LOAD LABEL my_db.path_extra_fields_label ( DATA INFILE( "hdfs://<hdfs_ip>:9000/starrocks_test_data/hdfs_data/*.orc") INTO TABLE `my_tbl` FORMAT AS "orc" ) WITH BROKER hdfs_broker ( "hadoop.security.authentication" = "simple", "username" = "<hdfs_username>", "password" = "<hdfs_password>" ) PROPERTIES ( "warehouse"="test" );Routine Load
Add the
warehouseparameter to job_properties in the statement. The warehouse of the current session is used by default.CREATE ROUTINE LOAD <database_name>.<job_name> ON <table_name> [load_properties] [job_properties] FROM data_source [data_source_properties]Specify a warehouse named
testfor Routine Load to continuously consume Kafka messages and import the messages to StarRocks. Sample statement:CREATE ROUTINE LOAD my_db.routine_load_basic_types_fe824e00_680a_11ee_81d7_00163e135aa9 ON my_table COLUMNS (k1,k2,k3,k4,k5,k6,v1,v2,v3,v4,v5,v6,v7,v8), COLUMNS TERMINATED BY ',',where v1 != 0 PROPERTIES ( "desired_concurrent_number"="1", "max_error_number"="1000", "max_batch_interval"="7", "warehouse"="test" ) FROM KAFKA ( "kafka_broker_list"="<kafka_broker_ip>:<kafka_broker_port>", "kafka_topic"="xxx", "kafka_partitions"="0", "kafka_offsets"="OFFSET_BEGINNING" );Flink Connector
StarRocks supports native integration with Apache Flink, allowing efficient data ingestion via Flink into StarRocks. Starting from Apache Flink version 1.2.9, you can configure the
sink.properties.warehouseparameter to write data into a specified warehouse, enabling more flexible resource management and performance optimization.Use
sink.properties.warehouseto specify the target warehouse:sink.properties.warehouse=<warehouse_name><warehouse_name>: The name of the target warehouse.If this parameter is not specified, the system will use
default_warehouse.
Materialized views
Add the warehouse parameter to PROPERTIES in the CREATE MATERIALIZED VIEW statement. The warehouse of the current session is used by default. The following syntax is supported:
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [database_name.]<mv_name>
[COMMENT ""]
[PROPERTIES ("key"="value", ...)]
AS
<query_statement>For example, create a materialized view named order_mv and specify a warehouse named test for supporting subsequent rewrite operations on the materialized view. Sample statement:
CREATE MATERIALIZED VIEW order_mv
DISTRIBUTED BY HASH(`order_id`)
REFRESH ASYNC EVERY (interval 1 MINUTE)
PROPERTIES ("warehouse" = "test")
AS SELECT
order_list.order_id,
sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;After the execution is successful, you can use one of the following methods to view the information about the warehouse. The fe.audit.log file also records the execution status of rewrite statements on the materialized view, which contains the Warehouse field.
Method 1: Run the following command to view the details of the order_mv materialized view:
SHOW materialized views WHERE name="order_mv";Method 2: Run the following command to view the complete SQL statement for creating the order_mv materialized view:
SHOW CREATE materialized VIEW order_mv;
Compaction
Compaction is executed in default_warehouse by default. If you want to specify a different warehouse, you can modify the lake_compaction_warehouse configuration item. You can dynamically modify this configuration item in sessions. You can also modify this configuration item in the fe.conf file and then restart the FE node.
For example, you want to specify a warehouse named test for compaction.
Method 1: You can run the ADMIN SET FRONTEND CONFIG command to dynamically modify the configuration item.
ADMIN SET FRONTEND CONFIG ("lake_compaction_warehouse" = "test");Method 2: Add
lake_compaction_warehouse = test;to the fe.conf file and restart the FE node to make the modification take effect.
After the modification, you can run the show proc '/compactions' ; command to view the latest compaction records. The warehouse information is in the last column.
Connect to a specific warehouse
JDBC connection
The following syntax is supported:
jdbc.url=jdbc:mysql://<mysql_host>:3306/dbName?sessionVariables=warehouse=<warehouse_name>Superset connection
Install starrocks-sqlalchemy by following instructions in Superset Support and use init_command to specify a warehouse.
starrocks://<username>:<password>@<host>:<port>/<database_name>?init_command=set warehouse = <warehouse_name> Specify default warehouse for users
This feature is only supported in compute-storage separation instances of version 3.3.8 and above.
You can modify user properties (for example, sr_user) or specify the default warehouse (for example, sr_warehouse) when creating a user.
Method 1: Modify existing user properties through
ALTER USERALTER USER sr_user SET PROPERTIES ('session.warehouse' = 'sr_warehouse');Method 2: Create a new user and specify the default warehouse through
CREATE USERCREATE USER 'sr_user' PROPERTIES ('session.warehouse' = 'sr_warehouse');
After specification, you can view the properties of a specific user by executing the SHOW PROPERTY command.
Grant permissions on a warehouse
You can execute the GRANT statement to grant permissions to users or roles. The following syntax is supported:
GRANT
{ USAGE }
ON { WAREHOUSE <warehouse_name> [, <warehouse_name>,...] | ALL WAREHOUSE }
TO { ROLE | USER} {<role_name>|<user_identity>} [ WITH GRANT OPTION ]Permission | Description |
USAGE | Allows users to perform queries and other operations by using a specific warehouse. |
By default, a user who has the USAGE permission on a warehouse also has the SHOW permission on the warehouse.
For example, execute the following statement to grant the USAGE permission to user1:
GRANT USAGE ON warehouse waa TO user1;For more permission-related operations, see Manage user privileges.