The StarRocks connector lets you query MaxCompute data directly from a StarRocks cluster using an external catalog—no data import or external table creation required.
Prerequisites
Before you begin, ensure that you have:
-
A StarRocks cluster (V3.2.3 or later) that can access MaxCompute
-
An exclusive resource group for Data Transmission Service (subscription)
Limitations
-
Supported table types: standard tables, partitioned tables, clustered tables, Delta tables, and materialized views
-
Not supported: foreign tables, logical views, or JSON data
-
MaxCompute projects with the schema feature enabled are not supported. See Schema-related operations.
Create a MaxCompute catalog
Log on to your StarRocks cluster and run CREATE EXTERNAL CATALOG to create a MaxCompute catalog.
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "odps",
CatalogParams,
ScanParams,
CachingMetaParams
)
Parameters
catalog_name
The name of the MaxCompute catalog. The name must meet the following requirements:
-
Contains only letters (a–z, A–Z), digits (0–9), or underscores (
_), and must start with a letter -
Does not exceed 1,023 characters
-
Is case-sensitive
type
Set to odps. This identifies MaxCompute as the data source type.
comment
(Optional) A description of the MaxCompute catalog.
CatalogParams
Parameters for StarRocks to connect to MaxCompute. All four connection parameters are required.
| Parameter | Description |
|---|---|
odps.endpoint |
The endpoint of the MaxCompute service. Configure this based on the region and network mode of your MaxCompute project. See Endpoints. Only Alibaba Cloud Virtual Private Cloud (VPC) networks are supported. |
odps.project |
The name of the MaxCompute project. Find it in the MaxCompute console under Manage Configurations > Projects. If your workspace is in standard mode, use the development environment name (with _dev suffix) for dev and the production name for prod. |
odps.access.id |
The AccessKey ID of your Alibaba Cloud account or RAM user. Retrieve it from the AccessKey Management page. |
odps.access.key |
The AccessKey secret that corresponds to the AccessKey ID. |
odps.tunnel.quota |
The quota name for accessing MaxCompute. Two billing options are supported: Subscription (exclusive resource groups for Data Transmission Service (subscription)) and Pay-as-you-go (open storage). Specify only one quota name. See the table below for how to find the quota name for each option. |
How to find your quota name
| Option | Steps |
|---|---|
| Subscription | In the MaxCompute console, switch to the target region, then go to Manage Configurations > Quotas. See Computing resources - Quota management. |
| Pay-as-you-go | In the MaxCompute console, switch to the target region, then go to Manage Configurations > Tenant Property and turn on Storage API Switch. |
ScanParams
(Optional) Parameters that control how StarRocks scans MaxCompute data.
| Parameter | Required | Default | Description |
|---|---|---|---|
odps.split.policy |
No | size |
The data sharding policy. size: shards by data size (default shard size: 256 MB). row_offset: shards by row count. The sharding policy affects scan concurrency—adjust it if the default does not meet your throughput requirements. |
odps.split.row.count |
No | 4,194,304 | The maximum number of rows per shard. Applies only when odps.split.policy is row_offset. Value range: any integer greater than 0. |
CachingMetaParams
(Optional) Parameters that control how StarRocks caches MaxCompute metadata.
| Parameter | Required | Default | Description |
|---|---|---|---|
odps.cache.table.enable |
No | true |
Whether to cache table metadata. Enable this for environments with frequent DDL operations, multiple concurrent users, or complex queries. Disable it when table metadata changes frequently or caching memory is constrained. |
odps.cache.table.expire |
No | 86400 |
How long (in seconds) to retain cached table metadata before deletion. Default is 86,400 seconds (24 hours). Range: any integer >= 0. |
odps.cache.table.size |
No | 1000 |
The maximum number of tables whose metadata is cached. |
odps.cache.partition.enable |
No | true |
Whether to cache partition metadata for the target table. Has no effect on non-partitioned tables. |
odps.cache.partition.expire |
No | 86400 |
How long (in seconds) to retain cached partition metadata before deletion. Default is 86,400 seconds (24 hours). Range: any integer >= 0. |
odps.cache.partition.size |
No | 1000 |
The maximum number of partitioned tables whose partition metadata is cached. |
odps.cache.table-name.enable |
No | false |
Whether to cache the names of all tables in the MaxCompute project. |
odps.cache.table-name.expire |
No | 86400 |
How long (in seconds) to retain cached table names before deletion. Default is 86,400 seconds (24 hours). Range: any integer >= 0. |
Example
The following example creates a catalog named odps_catalog for the MaxCompute project mf_mc_bj.
CREATE EXTERNAL CATALOG odps_catalog
PROPERTIES (
"type" = "odps",
"odps.access.id" = "<yourAccessKeyId>",
"odps.access.key" = "<yourAccessKeySecret>",
"odps.endpoint" = "http://service.cn-beijing.maxcompute.aliyun.com/api",
"odps.tunnel.quota" = "pay-as-you-go",
"odps.project" = "mf_mc_bj"
);
Replace the placeholder values:
| Placeholder | Description |
|---|---|
<yourAccessKeyId> |
Your Alibaba Cloud AccessKey ID |
<yourAccessKeySecret> |
Your Alibaba Cloud AccessKey secret |
Query MaxCompute data
-
Switch to the catalog and select the project. Use either of the following methods:
-
Method 1: Switch in two steps.
SET CATALOG odps_catalog; USE mf_mc_bj; -
Method 2: Switch in one step.
USE odps_catalog.mf_mc_bj;
-
-
Query a table.
SELECT * FROM src LIMIT 10;Expected output:
+------+-------+ | key | value | +------+-------+ | 1 | 1 | | 3 | 3 | | 2 | 2 | | 4 | 100 | | 5 | 200 | | 6 | 300 | | 3 | 400 | +------+-------+
Manage catalogs and metadata
View catalogs
List all catalogs in the current StarRocks cluster:
SHOW CATALOGS;
Expected output:
+-----------------+----------+------------------------------------------------------------------+
| Catalog | Type | Comment |
+-----------------+----------+------------------------------------------------------------------+
| default_catalog | Internal | An internal catalog contains this cluster's self-managed tables. |
| odps_catalog | Odps | NULL |
+-----------------+----------+------------------------------------------------------------------+
View the creation statement for a specific catalog:
SHOW CREATE CATALOG odps_catalog;
View the schema of a MaxCompute table
View the column definitions of a table:
-- Syntax
DESC[RIBE] <catalog_name>.<database_name>.<table_name>;
-- Example
DESC odps_catalog.mf_mc_bj.src;
Expected output:
+-------+------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------+------+-------+---------+-------+
| id | INT | Yes | false | NULL | |
| a | INT | Yes | false | NULL | |
+-------+------+------+-------+---------+-------+
View the schema and creation statement of a table:
-- Syntax
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>;
-- Example
SHOW CREATE TABLE odps_catalog.mf_mc_bj.src;
Delete a catalog
DROP CATALOG odps_catalog;
Collect statistics for the cost-based optimizer (CBO)
StarRocks does not automatically collect statistics for MaxCompute tables. Run the following command to trigger manual collection:
-- Replace <table_name> with the name of the MaxCompute table.
ANALYZE TABLE <table_name>;
The cost-based optimizer (CBO) uses statistical information to estimate execution costs and select the most efficient query plan. For details, see Gather statistics for CBO.
Refresh cached metadata
By default, StarRocks caches MaxCompute metadata to improve query performance. If you change a table schema or update data, run the following command to force a metadata refresh:
-- Replace <table_name> with the name of the MaxCompute table.
REFRESH EXTERNAL TABLE <table_name>;
Troubleshooting
"Your slot quota is exceeded" error
This error occurs when the number of concurrent read slots exceeds the quota of your Data Transmission Service resource group. When StarRocks reads data, it splits the table into shards and distributes them to backend (BE) nodes, which read data in parallel. The number of occupied slots is approximately Max(Number of shards, Number of BEs × Concurrency of the BE thread pool).
To resolve this, try one or more of the following:
-
Purchase an exclusive resource group for Data Transmission Service that supports higher concurrency. See Purchase and use an exclusive resource group for Data Transmission Service.
-
Reduce the number of shards by adjusting
ScanParams. If you use therow_offsetsharding policy, increase the value ofodps.split.row.countto merge smaller shards. -
Modify the number of threads in the StarRocks BE thread pool by changing
scanner_thread_pool_thread_numorscanner_thread_pool_queue_size. See StarRocks configuration.
Appendix: Data type mappings
| MaxCompute type | StarRocks type |
|---|---|
| BOOLEAN | BOOLEAN |
| TINYINT | TINYINT |
| SMALLINT | SMALLINT |
| INT | INT |
| BIGINT | BIGINT |
| FLOAT | FLOAT |
| DOUBLE | DOUBLE |
| DECIMAL(p, s) | DECIMAL(p, s) |
| STRING | VARCHAR(1073741824) |
| VARCHAR(n) | VARCHAR(n) |
| CHAR(n) | CHAR(n) |
| JSON | VARCHAR(1073741824) |
| BINARY | VARBINARY |
| DATE | DATE |
| DATETIME | DATETIME |
| TIMESTAMP | DATETIME (precision loss: StarRocks does not support TIMESTAMP and converts it to DATETIME) |
| ARRAY | ARRAY |
| MAP | MAP |
| STRUCT | STRUCT |