All Products
Search
Document Center

E-MapReduce:Get started with an instance with separated storage and compute resources

Last Updated:Sep 24, 2025

Instances with separated storage and compute resources are ideal for business scenarios that are sensitive to storage costs and have low query efficiency requirements, such as online analytical processing (OLAP) multidimensional analysis and data warehouse applications. These instances are also ideal for querying data in data lakes, such as Apache Hive, Apache Iceberg, Apache Hudi, and Apache Paimon. Supported platforms include Object Storage Service (OSS), OSS-HDFS, and Hadoop Distributed File System (HDFS). With these instances, you can quickly query and analyze data in data lakes without migrating the data. The performance is three to five times higher than Presto. This instance type uses a storage-compute separation architecture and stores data in Alibaba Cloud OSS.

Prerequisites

  • You have registered an Alibaba Cloud account and completed identity verification.

  • If you are a Resource Access Management (RAM) user, you have been granted the AliyunEMRStarRocksFullAccess system policy. For more information, see Grant permissions to a RAM user.

    Note

    The AliyunEMRStarRocksFullAccess system policy is required for creating and managing StarRocks instances.

Precautions

The runtime environment of the code is managed and configured by the owner of the environment.

Procedure

Step 1: Create a StarRocks instance with separated storage and compute resources

  1. Go to the homepage of E-MapReduce (EMR) Serverless StarRocks.

    1. Log on to the E-MapReduce console.

    2. In the left-side navigation pane, choose EMR Serverless > StarRocks.

    3. In the top menu bar, select a region based on your business requirements.

  2. On the Instance List page, click Create Instance.

  3. On the E-MapReduce Serverless StarRocks page, configure the instance parameters.

    Configuration Item

    Example

    Description

    Product Type

    Pay-as-you-go

    Select Pay-as-you-go. For more information about billing, see Pay-as-you-go.

    Region

    China (Beijing)

    The physical location of the instance.

    Important

    You cannot change the region after the instance is created. Select a region with caution.

    Network And Zone

    • vpc_Hangzhou/vpc-bp1f4epmkvncimpgs****

    • Zone I

    • vsw_i/vsw-bp1e2f5fhaplp0g6p****

    Select a virtual private cloud (VPC), a zone, and the corresponding vSwitch.

    • Virtual Private Cloud: A VPC is an isolated network environment that you can define on Alibaba Cloud. You have full control over your VPC.

      Select an existing VPC or click Create VPC to go to the VPC console and create a VPC. For more information, see Create and manage a VPC.

      Note

      If your Serverless StarRocks instance needs to access the Internet, for example, to import data or query foreign tables, make sure that its VPC has Internet access enabled. You can deploy an Internet NAT gateway in the VPC and enable the SNAT feature. This allows the Serverless StarRocks instance to access Internet resources through the gateway. For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.

    • Zone: The zone where the instance is located.

    • vSwitch: A vSwitch is a basic network module of a VPC that connects different cloud resources.

      Select an existing vSwitch or click Create VSwitch to go to the VPC console and create a vSwitch. For more information, see Create and manage a vSwitch.

    Instance Type

    Storage-compute separation

    Suitable for scenarios with relatively low query efficiency requirements, such as OLAP multidimensional analysis, data lake analytics, federated queries on foreign tables, real-time data analytics, and data warehouses.

    Instance Series

    Standard Edition

    Supports Starter Edition and Standard Edition. For more information, see Instance series.

    Note

    Starter Edition is available only in the China (Beijing), China (Shanghai), China (Shenzhen), and China (Hangzhou) regions.

    StarRocks Version

    3.3

    The community version number of StarRocks.

    FE Specifications

    • Specification Type: Standard

    • Compute CU: 8 CU

    • Data Disk: PL1 ESSD,

    • High Availability: Enabled by default.

    • Number of Nodes: 3

    • Load Balancing: Built-in PrivateZone

    • Specification Type: Different Instance Edition values correspond to different FE node specification types.

      • Basic Edition: supports Standard Specifications.

      • Standard Edition: supports Standard Specifications and Memory-optimized Specifications.

    • Compute CU: Select a compute unit (CU).

      Select an appropriate CU specification based on your business requirements. For more information about CU fees, see Billable items.

    • Data Disk: Only PL1 ESSD is supported. The data disk size ranges from 100 GB to 65000 GB, with a step size of 100.

      For more information about cloud disks, see ESSD.

    • HA: Enabled by default. You can turn on HA only if you set the Instance Edition parameter to Standard Edition. If you turn on HA, the system increases the number of frontend nodes (FEs) from 1 to 3 to reduce the risk of failures.

      Important

      We strongly recommend that you turn on HA in the production environment.

    • Number of Nodes: The number of FE nodes. Valid values: 1 to 11. Only odd numbers are supported.

    CN Specifications

    • Specification Type: Standard

    • Compute CU: 8 CU

    • Data Disk: PL1 ESSD, 200 GB, 1

    • Number of Nodes: 3

    • Specification Type: Different Instance Edition values correspond to different CN node specification types.

      • Basic Edition: supports Standard Specifications.

      • Standard Edition: supports Standard Specifications, Memory-optimized Specifications, Network-enhanced Specifications, High-performance Storage, and Large-scale Storage specifications.

        • Standard Specifications: The default specifications. One CU is equal to 1 CPU core and 4 GiB of memory. Enterprise SSDs (ESSDs) are used for data storage.

        • Memory-optimized Specifications: One CU is equal to 1 CPU core and 8 GiB of memory. This option is suitable for scenarios in which a large amount of memory resources are required, such as scenarios where complex queries are made or scenarios where high concurrency is required. ESSDs are used for data storage.

        • Network-enhanced Specifications: One CU is equal to 1 CPU core and 4 GiB of memory. The network bandwidth is two or more times that of the standard specifications. This option is suitable for analysis of external tables that contain a large amount of data. ESSDs are used for data storage.

        • High-performance storage: If you select this option, you must select the desired specifications based on your business requirements. This option is suitable for scenarios in which high storage I/O performance is required. Local SSDs are used for data storage.

        • High-specification Storage: If you select this option, you must select the desired specifications based on your business requirements. Local HDDs are used for data storage. This option is suitable for scenarios in which a large volume of data needs to be stored and cost-effective storage is required, but high storage I/O performance is not required.

        Note

        High-performance Storage and Large-scale Storage are currently in whitelist mode.

    • Computing CU: Select a compute unit (CU).

      Select an appropriate CU specification based on your business requirements. For more information about CU fees, see Billable items.

    • Data Disk: PL0 ESSD, PL1 ESSD (recommended), PL2 ESSD, and PL3 ESSD are supported. For more information, see ESSD.

      The cache disk size ranges from 200 GB to 65000 GB. The default number of cache disks is 1. Valid values: 1 to 8. Step size: 1.

      Note

      After you specify a disk size based on your business requirements, the system adjusts the disk size based on the value that you specified. If the specified disk size is out of the supported range, the system will adjust the value to ensure optimal performance.

    • Number of Nodes: The number of CN nodes. Valid value: 1 to 100.

    Instance Name

    Custom instance name.

    The name of the instance. The name must be 1 to 64 characters in length and can contain Chinese characters, letters, digits, hyphens (-), and underscores (_).

    Administrator

    admin

    The administrator used to manage StarRocks. The default value is admin and cannot be changed.

    Logon Password and Confirm Password

    Custom password.

    The password for the built-in administrator `admin` of the StarRocks instance. Record this password. You need it to manage and use the StarRocks instance. If you forget the password, you can reset it. For more information, see How do I reset the password of an instance?

    For more information about the instance parameters, see Create an instance.

  4. Read and accept the Terms of Service, click Create Instance, and then complete the payment as prompted.

    After you complete the payment, return to the instance management page to view the new instance. The instance is created when its Status changes to Running.

Step 2: Connect to the StarRocks instance

  1. In the Instances section, find the instance that you want to connect and click Connect in the Actions column.

    You can use other methods to connect to a StarRocks instance. For more information, see the topics in the Instance connection directory.

  2. Connect to the StarRocks instance.

    1. On the New Connection tab of the page that appears, configure parameters. The following table describes the parameters.

      image.png

      Parameter

      Example

      Description

      Region

      China (Hangzhou)

      Select the region in which the instance resides.

      Instance

      StarRocks_Serverless

      Select the desired instance.

      Connection Name

      Connection_Serverless

      The name of the connection. Enter a custom name.

      The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).

      Username

      Default username or custom username

      The default initial username is admin. You can use the default username or create a username to connect to the instance. For information about how to create a user, see User management and data authorization.

      Password

      Custom password

      The password of the username that is created for the instance.

    2. Click Test Network Connectivity.

    3. After the network connectivity test is successful, click OK.

      In the left-side navigation pane, click SQL Editor. You can create a query on the Queries tab and perform the query. For more information, see Use EMR StarRocks Manager to connect to an EMR Serverless StarRocks instance.

Step 3: Run an SQL query

  1. On the Query List page of the SQL Editor, click the image.png icon.

  2. In the Create File dialog box, click OK.

  3. In the new file, enter the following commands. Select all the commands and click Run.

    /**Create a database**/
    CREATE DATABASE IF NOT EXISTS load_test;
    
    /**Use the database**/
    USE load_test;
    
    /**Create a table**/
    CREATE TABLE insert_wiki_edit
    (
        event_time DATETIME,
        channel VARCHAR(32) DEFAULT '',
        user VARCHAR(128) DEFAULT '',
        is_anonymous TINYINT DEFAULT '0',
        is_minor TINYINT DEFAULT '0',
        is_new TINYINT DEFAULT '0',
        is_robot TINYINT DEFAULT '0',
        is_unpatrolled TINYINT DEFAULT '0',
        delta INT SUM DEFAULT '0',
        added INT SUM DEFAULT '0',
        deleted INT SUM DEFAULT '0'
    )
    AGGREGATE KEY(event_time, channel, user, is_anonymous, is_minor, is_new, is_robot, is_unpatrolled)
    PARTITION BY RANGE(event_time)
    (
        PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
        PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
        PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
        PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
    )
    DISTRIBUTED BY HASH(user) BUCKETS 10
    PROPERTIES("replication_num" = "1");
    
    /**Insert data**/
    INSERT INTO insert_wiki_edit VALUES("2015-09-12 00:00:00","#en.wikipedia","GELongstreet",0,0,0,0,0,36,36,0),("2015-09-12 00:00:00","#ca.wikipedia","PereBot",0,1,0,1,0,17,17,0);
    
    /**Query data**/
    select * from insert_wiki_edit;

The following figure shows the output.

image.png

Note

You can run the SHOW CREATE TABLE load_test.insert_wiki_edit; command and check the result for the datacache.enable property to confirm that the storage-compute separated instance is working correctly. After the database and table for the storage-compute separation architecture are created, you can also find the new database and table directories in the OSS bucket.

image

Step 4: View table information

  1. In the new file, enter the following command to view the databases.

    SHOW PROC '/dbs';

    The following result is returned.

    image

  2. Enter the following command to view the details of the table.

    SHOW PROC '/dbs/10120';

    The following result is returned.

    image

    In storage-compute separation mode, CLOUD_NATIVE is the identifier for the table type field. StoragePath is the path of the table in OSS. You can use this path to find the data storage location of the table.

Step 5: Demonstrate the cache feature

  1. In the navigation pane on the left, choose Diagnostics And Analysis > SQL Tasks.

  2. Find the target query and click the query ID.

  3. Click the Execution Details tab.

    By viewing the Profile execution tree, you can find the relevant nodes and focus on two metrics on the right: CompressedBytesReadLocalDisk (data read from the local cache) and CompressedBytesReadRemote (data read from the remote OSS).

    In this example, the local cache feature is enabled for the `insert_wiki_edit_cache` table. By observing the metric values, you can determine that the query fully hit the local cache.

    image.png

    The local cache feature is not enabled for the `insert_wiki_edit_nocache` table. By analyzing its corresponding metric values, you can determine that the query on this table did not hit the local cache. All data was read from the remote OSS.

    image.png

Step 6: Perform a performance test

This example compares the query performance of the storage-compute separation mode (with local cache) with the storage-compute integration mode. You can use the SSB test dataset for a more detailed performance comparison. For more information, see SSB performance test.

  • Prepare the data environment.

    • Cluster resource configuration: 1 FE (8 CUs) + 3 CNs (Computing power: 16 CUs | Storage: 1000 GB).

    • Cluster parameters: Use the default settings. Enable local cache for the storage-compute separation cluster.

    • Data volume: 500 GB (sf=500)

  • Test results.

    • Storage-compute integration total time: 21.586 s.

    • Storage-compute separation total time (with local cache enabled on the second execution): 27.364 s.

    • Storage-compute separation without local cache total time: 117.529 s.

After you run the sh ssb_query.sh ssb script to perform the SSB test, the results show that when local cache is enabled, the query performance of the storage-compute separation mode is almost the same as that of the storage-compute integration mode. Both are significantly better than the performance of a cluster without local cache.

SQL     

Storage-compute integration

Storage-compute separation with data cache enabled

Storage-compute separation with data cache disabled

Q1.1    

0m0.373s

0m0.380s

0m2.080s

Q1.2    

0m0.303s

0m0.292s

0m2.141s

Q1.3    

0m0.101s

0m0.097s

0m0.144s

Q2.1    

0m2.461s

0m2.821s

0m14.401s

Q2.2    

0m2.272s

0m2.735s

0m13.048s

Q2.3    

0m2.168s

0m2.588s

0m13.957s

Q3.1    

0m4.536s

0m4.864s

0m14.810s

Q3.2    

0m2.371s

0m2.682s

0m11.292s

Q3.3    

0m2.082s

0m2.648s

0m13.651s

Q3.4    

0m0.195s

0m0.212s

0m0.572s

Q4.1    

0m5.122s

0m5.847s

0m29.576s

Q4.2    

0m1.141s

0m1.369s

0m1.465s

Q4.3    

0m0.661s

0m0.829s

0m0.792s

Total   

21.586s

27.364s

117.529 s

(Optional) Step 7: Release the instance

Important

This operation deletes the instance and all its resources. This action is irreversible. Proceed with caution.

If you no longer need an instance, you can release it to avoid incurring further charges.

  1. On the Instance List page, click Release in the Actions column of the instance you want to release.

  2. In the dialog box that appears, click OK.

References

  • For more information about operations in the SQL Editor, see SQL Editor.

  • To view the SQL query information of the current instance, analyze the SQL execution plan, and promptly diagnose and troubleshoot SQL issues, see Diagnostics and analysis.

  • To view and analyze all operations that occurred in the database, see Audit log.

Contact us

If you have any questions, search for the DingTalk group ID 24010016636 and join the group to ask your questions.