All Products
Search
Document Center

DataWorks:Process data

Last Updated:Aug 05, 2025

This topic describes how to use StarRocks nodes in DataWorks to process data in the ods_user_info_d_starrocks table and the ods_raw_log_d_starrocks table that are synchronized to StarRocks to obtain user profile data. The ods_user_info_d_starrocks table stores basic user information, and the ods_raw_log_d_starrocks table stores user website access logs. This topic helps you understand how to compute and analyze the synchronized data by using DataWorks and StarRocks to complete simple data processing in data warehouses.

Prerequisites

Before you start this tutorial, complete the steps in Synchronize data.

Step 1: Design a data processing link

In the data synchronization phase, the required data is synchronized to StarRocks tables. The next objective is to further process the data to generate the basic user profile data.

  1. Log on to the DataWorks console and go to the DATA STUDIO pane of the Data Studio page. In the Workspace Directories section of the DATA STUDIO pane, find the prepared workflow and click the workflow name to go to the configuration tab of the workflow.

  2. Drag StarRocks from the Database section of the configuration tab to the canvas on the right. In the Create Node dialog box, configure the Node Name parameter.

    In this tutorial, you need to create three StarRocks nodes. The following table lists node names that are used in this tutorial and the functionalities of the nodes.

    Node type

    Node name

    Node functionality

    imageStarRocks

    dwd_log_info_di_starrocks

    This node is used to split data in the ods_raw_log_d_starrocks table and synchronize the data to multiple fields in the dwd_log_info_di_starrocks table based on a built-in function or user-defined function (UDF).

    imageStarRocks

    dws_user_info_all_di_starrocks

    This node is used to aggregate data in the basic user information table ods_user_info_d_starrocks and the log data table dwd_log_info_di_starrocks and synchronize the aggregation result to the dws_user_info_all_di_starrocks table.

    imageStarRocks

    ads_user_info_1d_starrocks

    This node is used to further process data in the dws_user_info_all_di_starrocks table and synchronize the processed data to the ads_user_info_1d_starrocks table to generate a basic user profile.

  3. Draw lines to configure ancestor nodes for the StarRocks nodes, as shown in the following figure.

    image
    Note

    You can draw lines to configure scheduling dependencies for nodes in a workflow. You can also use the automatic parsing feature to enable the system to automatically identify scheduling dependencies between nodes. In this tutorial, scheduling dependencies between nodes are configured by drawing lines. For information about the automatic parsing feature, see Method 1: Configure scheduling dependencies based on the lineage in the code of a node.

Step 2: Register a function

You can use methods such as a function to convert the structure of log data for the experiment into data in tables.

Important
  • In this example, the required resources are provided for the function that is used to convert IP addresses into regions. You need to only download the resources to your on-premises machine, store the downloaded resources in an Object Storage Service (OSS) bucket, and register the resources as a function by following the steps below.

  • The IP address resources for this function are used only in this tutorial. If you need to implement the mappings between IP addresses and geographical locations in formal business scenarios, you must seek out professional IP address conversion services from specialized IP address websites.

Upload a resource (ip2region-starrocks.jar)

  1. Download the ip2region-starrocks.jar package.

    Note

    The ip2region-starrocks.jar package is used only in this tutorial.

  2. Upload a resource to OSS.

    1. Log on to the OSS console and go to the Buckets page. On the Buckets page, find the bucket that is created when you prepare environments and create the dataworks_starrocks directory in the path of the bucket.

    2. Upload the ip2region-starrocks.jar package to the dataworks_starrocks directory.

      In this tutorial, the full path in which the package is stored is https://test.oss-cn-shanghai-internal.aliyuncs.com/dataworks_starrocks/ip2region-starrocks.jar. You can obtain the path in which the uploaded OSS resource is stored based on the full path.

      Note
      • In this tutorial, a bucket named test is used.

      • The network address of the bucket to which the UDF belongs is the address that is used for Access from ECS over the Classic Network (internal network).

      • If you use an internal endpoint, the OSS bucket must reside in the same region as the DataWorks workspace. In this example, the China (Shanghai) region is used.

Register a function (getregion)

  1. Create a StarRocks node to register a function.

    Log on to the DataWorks console and go to the DATA STUDIO pane of the Data Studio page. In the Workspace Directories section of the DATA STUDIO pane, click the image icon and choose Create Node > Database > StarRocks to create a StarRocks node.

  2. Write code to register a function.

    • Register a function.

      CREATE FUNCTION getregion(string)
      RETURNS string
      PROPERTIES ( 
          "symbol" = "com.starrocks.udf.sample.Ip2Region", 
          "type" = "StarrocksJar",
          "file" = "Enter the full storage path of the OSS bucket. You can obtain the path in the preceding substep."
      );
    • Check whether the function is registered.

      SELECT getregion('The IP address of your on-premises machine');
    Important

    A function can be separately registered in the development environment and production environment only once. You must deploy the StarRocks node to the production environment before you can register a function in the production environment.

  3. In the top toolbar of the configuration tab of the StarRocks node, click Save. Then, click Deploy to deploy the StarRocks node to the StarRocks computing resource in the development environment and production environment by following the instructions displayed on the DEPLOY tab. Then, backfill data for the StarRocks node to complete the registration of the function in the production environment. After the function is registered, manually freeze the StarRocks node in the production environment in Operation Center. This prevents the StarRocks node from failing due to repeated registration.

Step 3: Configure the StarRocks nodes

To perform data processing, you must schedule the related StarRocks node to implement each layer of processing logic. In this tutorial, complete sample code for data processing is provided. You must configure the code separately for the dwd_log_info_di_starrocks, dws_user_info_all_di_starrocks, and ads_user_info_1d_starrocks nodes.

Configure the dwd_log_info_di_starrocks node

In the sample code for this node, the registered function is used to process the SQL code for fields in the ancestor table ods_raw_log_d_starrocks and synchronize the data in the table to the dwd_log_info_di_starrocks table.

  1. In the canvas of the workflow, move the pointer over the dwd_log_info_di_starrocks node and click Open Node.

  2. On the configuration tab of the node, select the StarRocks computing resource that is associated with the workspace when you prepare environments from the Select DataSource drop-down list.

  3. Copy the following SQL statements and paste them in the code editor:

    Note

    In the sample code for the dwd_log_info_di_starrocks node, the registered function is used to process the SQL code for fields in the ancestor table ods_raw_log_d_starrocks and synchronize the data in the table to the dwd_log_info_di_starrocks table.

    Sample code for the dwd_log_info_di_starrocks node

    CREATE TABLE IF NOT EXISTS dwd_log_info_di_starrocks (
        uid STRING COMMENT 'The user ID',
        ip STRING COMMENT 'The IP address',
        TIME STRING COMMENT 'The time in the yyyymmddhh:mi:ss format',
        status STRING COMMENT 'The status code that is returned by the server',
        bytes STRING COMMENT 'The number of bytes that are returned to the client',
        region STRING COMMENT 'The region, which is obtained based on the IP address',
        method STRING COMMENT 'The HTTP request type',
        url STRING COMMENT 'url',
        protocol STRING COMMENT 'The version number of HTTP',
        referer STRING COMMENT 'The source URL',
        device STRING COMMENT 'The terminal type',
        identity STRING COMMENT 'The access type, which can be crawler, feed, user, or unknown',
        dt DATE NOT NULL COMMENT 'The time'
    ) DUPLICATE KEY(uid) 
    COMMENT 'User behavior analysis case - The fact table for the user website access logs' 
    PARTITION BY(dt) 
    PROPERTIES ("replication_num" = "1");
    
    -- In this example, field data is dynamically partitioned based on the dt field. To prevent the node from rerunning and repeatedly writing data, the following SQL statement is used to delete the existing destination partitions before each processing operation. 
    ALTER TABLE dwd_log_info_di_starrocks DROP PARTITION IF EXISTS p${var} FORCE;
    
    -- Scenario: The following SQL statements use the getregion function to parse the IP address in the raw log data, and use regular expressions to split the raw data to analyzable fields and write the fields to the dwd_log_info_di_starrocks table. 
    -- Note:
    --     1. Before you can use a UDF on a DataWorks node, you must register a function. 
    --     2. You can configure scheduling parameters for nodes in DataWorks to synchronize incremental data to the related partition in the desired table every day in the scheduling scenario. 
    --        In actual development scenarios, you can define variables in the code of a node in the ${Variable name} format and assign scheduling parameters to the variables on the Properties tab of the configuration tab of the node. This way, the values of scheduling parameters can be dynamically replaced in the node code based on the configurations of the scheduling parameters. 
    INSERT INTO dwd_log_info_di_starrocks 
    SELECT 
        uid
        , ip  
        , time
        , status
        , bytes 
        , getregion(ip) AS region -- Obtain the region based on the IP address by using the UDF.
        ,REGEXP_EXTRACT(request, '([^ ]+)', 1) AS method
        ,REGEXP_EXTRACT(request, '^[^ ]+ (.*) [^ ]+$', 1) AS url
        ,REGEXP_EXTRACT(request, '([^ ]+)$', 1) AS protocol
        ,REGEXP_EXTRACT(referer, '^[^/]+://([^/]+)', 1) AS referer
      , CASE
        WHEN LOWER(agent) REGEXP 'android' THEN 'android'
        WHEN LOWER(agent) REGEXP 'iphone' THEN 'iphone'
        WHEN LOWER(agent) REGEXP 'ipad' THEN 'ipad'
        WHEN LOWER(agent) REGEXP 'macintosh' THEN 'macintosh'
        WHEN LOWER(agent) REGEXP 'windows phone' THEN 'windows_phone'
        WHEN LOWER(agent) REGEXP 'windows' THEN 'windows_pc'
        ELSE 'unknown'
    END AS device
      , CASE
        WHEN LOWER(agent) REGEXP '(bot|spider|crawler|slurp)' THEN 'crawler'
        WHEN LOWER(agent) REGEXP 'feed' OR REGEXP_EXTRACT(request, '^[^ ]+ (.*) [^ ]+$', 0) REGEXP 'feed' THEN 'feed'
        WHEN NOT (LOWER(agent) REGEXP '(bot|spider|crawler|feed|slurp)') 
             AND agent REGEXP '^(Mozilla|Opera)' 
             AND NOT (REGEXP_EXTRACT(request, '^[^ ]+ (.*) [^ ]+$', 0) REGEXP 'feed') THEN 'user'
        ELSE 'unknown'
    END AS identity,
     cast('${var}' AS DATE )AS dt
      FROM (
        SELECT
          SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 1)  AS ip
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 2)  AS uid
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 3)  AS time
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 4)  AS request
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 5)  AS status
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 6)  AS bytes
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 7)  AS referer
        , SPLIT_PART(CAST(col AS VARCHAR(65533)), '##@@', 8)  AS agent
    FROM
        ods_raw_log_d_starrocks
    WHERE
        dt = '${var}'
    ) a;
  4. Configure debugging parameters.

    In the right-side navigation pane of the configuration tab of the node, click Debugging Configurations. On the Debugging Configurations tab, configure the following parameters. These parameters are used to test the workflow in Step 4.

    Parameter

    Description

    Computing Resource

    Select the StarRocks computing resource that is associated with the workspace when you prepare environments.

    Resource Group

    Select the serverless resource group that you purchase when you prepare environments.

    Script Parameters

    In the Parameter Value column of the var parameter, enter a constant value in the yyyymmdd format. Example: var=20250223. When you debug the workflow, Data Studio replaces the variables defined for nodes in the workflow with the constant.

  5. Optional. Configure scheduling properties.

    You can retain default values for parameters related to scheduling properties in this tutorial. You can click Properties in the right-side navigation pane of the node configuration tab. For information about parameters on the Properties tab, see Scheduling properties.

    • Scheduling Parameters: In this tutorial, scheduling parameters are configured for the workflow. You do not need to configure scheduling parameters for the inner nodes of the workflow. The configured scheduling parameters can be directly used for code and tasks developed based on the inner nodes.

    • Scheduling Policies: You can configure the Time for Delayed Execution parameter to specify the duration by which the running of the batch synchronization node lags behind the running of the workflow. In this tutorial, you do not need to configure this parameter.

  6. In the top toolbar of the configuration tab, click Save to save the node.

Configure the dws_user_info_all_di_starrocks node

This node is used to aggregate data in the basic user information table ods_user_info_d_starrocks and the log data table dwd_log_info_di_starrocks and synchronize the aggregation result to the dws_user_info_all_di_starrocks table.

  1. In the canvas of the workflow, move the pointer over the dws_user_info_all_di_starrocks node and click Open Node.

  2. On the configuration tab of the node, select the StarRocks computing resource that is associated with the workspace when you prepare environments from the Select DataSource drop-down list.

  3. Copy the following SQL statements and paste them in the code editor:

    Note

    On the configuration tab of the dws_user_info_all_di_starrocks node, write code for aggregating data in the ancestor tables dwd_log_info_di_starrocks and ods_user_info_d_starrocks and synchronizing the aggregation result to the dws_user_info_all_di_starrocks table.

    Sample code for the dws_user_info_all_di_starrocks node

    CREATE TABLE IF NOT EXISTS dws_user_info_all_di_starrocks (
        uid STRING COMMENT 'The user ID',
        gender STRING COMMENT 'The gender',
        age_range STRING COMMENT 'The age range',
        zodiac STRING COMMENT 'The zodiac sign',
        region STRING COMMENT 'The region, which is obtained based on the IP address',
        device STRING COMMENT 'The terminal type',
        identity STRING COMMENT 'The access type, which can be crawler, feed, user, or unknown',
        method STRING COMMENT 'The HTTP request type',
        url STRING COMMENT 'url',
        referer STRING COMMENT 'The source URL',
        TIME STRING COMMENT 'The time in the yyyymmddhh:mi:ss format',
        dt DATE NOT NULL COMMENT 'The time'
    ) DUPLICATE KEY(uid) 
    COMMENT 'User behavior analysis case - Wide table for the website access information about users' 
    PARTITION BY(dt) 
    PROPERTIES ("replication_num" = "1");
    
    -- In this example, field data is dynamically partitioned based on the dt field. To prevent the node from rerunning and repeatedly writing data, the following SQL statement is used to delete the existing destination partitions before each processing operation. 
    ALTER TABLE dws_user_info_all_di_starrocks DROP PARTITION IF EXISTS p${var} FORCE;
    
    
    -- Scenario: Merge the processed log data stored in the dwd_log_info_di_starrocks table and the basic user information stored in the ods_user_info_d_starrocks table and synchronize the merged data to the dws_user_info_all_di_starrocks table. 
    -- Note: You can configure scheduling parameters for nodes in DataWorks to synchronize incremental data to the related partition in the desired table every day in the scheduling scenario. 
    -- In actual development scenarios, you can define variables in the code of a node in the ${Variable name} format and assign scheduling parameters to the variables on the Properties tab of the configuration tab of the node. This way, the values of scheduling parameters can be dynamically replaced in the node code based on the configurations of the scheduling parameters. 
    INSERT INTO dws_user_info_all_di_starrocks 
    SELECT 
        IFNULL(a.uid, b.uid) AS uid,
        b.gender,
        b.age_range,
        b.zodiac,
        a.region,
        a.device,
        a.identity,
        a.method,
        a.url,
        a.referer,
        a.time,
        a.dt
    FROM dwd_log_info_di_starrocks a
    LEFT JOIN ods_user_info_d_starrocks b
    ON a.uid = b.uid
    WHERE a.dt = '${var}';
    
  4. Configure debugging parameters.

    In the right-side navigation pane of the configuration tab of the node, click Debugging Configurations. On the Debugging Configurations tab, configure the following parameters. These parameters are used to test the workflow in Step 4.

    Parameter

    Description

    Computing Resource

    Select the StarRocks computing resource that is associated with the workspace when you prepare environments.

    Resource Group

    Select the serverless resource group that you purchase when you prepare environments.

    Script Parameters

    In the Parameter Value column of the var parameter, enter a constant value in the yyyymmdd format. Example: var=20250223. When you debug the workflow, Data Studio replaces the variables defined for nodes in the workflow with the constant.

  5. Optional. Configure scheduling properties.

    You can retain default values for parameters related to scheduling properties in this tutorial. You can click Properties in the right-side navigation pane of the node configuration tab. For information about parameters on the Properties tab, see Scheduling properties.

    • Scheduling Parameters: In this tutorial, scheduling parameters are configured for the workflow. You do not need to configure scheduling parameters for the inner nodes of the workflow. The configured scheduling parameters can be directly used for code and tasks developed based on the inner nodes.

    • Scheduling Policies: You can configure the Time for Delayed Execution parameter to specify the duration by which the running of the batch synchronization node lags behind the running of the workflow. In this tutorial, you do not need to configure this parameter.

  6. In the top toolbar of the configuration tab, click Save to save the node.

Configure the ads_user_info_1d_starrocks node

This node is used to further process data in the dws_user_info_all_di_starrocks table and synchronize the processed data to the ads_user_info_1d_starrocks table to generate a basic user profile.

  1. In the canvas of the workflow, move the pointer over the ads_user_info_1d_starrocks node and click Open Node.

  2. On the configuration tab of the node, select the StarRocks computing resource that is associated with the workspace when you prepare environments from the Select DataSource drop-down list.

  3. Copy the following SQL statements and paste them in the code editor:

    Sample code for the ads_user_info_1d_starrocks node

    CREATE TABLE IF NOT EXISTS ads_user_info_1d_starrocks (
    uid STRING COMMENT 'The user ID',
    region STRING COMMENT 'The region, which is obtained based on the IP address',
    device STRING COMMENT 'The terminal type',
    pv BIGINT COMMENT 'pv',
    gender STRING COMMENT 'The gender',
    age_range STRING COMMENT 'The age range',
    zodiac STRING COMMENT 'The zodiac sign',
    dt DATE NOT NULL COMMENT 'The time'
    ) DUPLICATE KEY(uid) 
    COMMENT 'User behavior analysis case - User profile data' 
    PARTITION BY(dt) 
    PROPERTIES ("replication_num" = "1");
    
    -- In this example, field data is dynamically partitioned based on the dt field. To prevent the node from rerunning and repeatedly writing data, the following SQL statement is used to delete the existing destination partitions before each processing operation. 
    ALTER TABLE ads_user_info_1d_starrocks DROP PARTITION IF EXISTS p${var} FORCE;
    
    -- Scenario: The following SQL statements are used to further process data in the dws_user_info_d_all_di_starrocks wide table that stores the user website access logs to generate basic user profile data, and synchronize the data to the ads_user_info_1d_starrocks table. 
    -- Note: You can configure scheduling parameters for nodes in DataWorks to synchronize incremental data to the related partition in the desired table every day in the scheduling scenario. 
    -- In actual development scenarios, you can define variables in the code of a node in the ${Variable name} format and assign scheduling parameters to the variables on the Properties tab of the configuration tab of the node. This way, the values of scheduling parameters can be dynamically replaced in the node code based on the configurations of the scheduling parameters. 
    INSERT INTO ads_user_info_1d_starrocks 
    SELECT 
    uid,
    MAX(region) AS region,
    MAX(device) AS device,
    COUNT(*) AS pv,
    MAX(gender) AS gender,
    MAX(age_range) AS age_range,
    MAX(zodiac) AS zodiac,
    dt
    FROM dws_user_info_all_di_starrocks
    WHERE dt = '${var}'
    GROUP BY uid, dt;
    
    SELECT * FROM dws_user_info_all_di_starrocks
    WHERE dt = '${var}';
  4. Configure debugging parameters.

    In the right-side navigation pane of the configuration tab of the node, click Debugging Configurations. On the Debugging Configurations tab, configure the following parameters. These parameters are used to test the workflow in Step 4.

    Parameter

    Description

    Computing Resource

    Select the StarRocks computing resource that is associated with the workspace when you prepare environments.

    Resource Group

    Select the serverless resource group that you purchase when you prepare environments.

    Script Parameters

    In the Parameter Value column of the var parameter, enter a constant value in the yyyymmdd format. Example: var=20250223. When you debug the workflow, Data Studio replaces the variables defined for nodes in the workflow with the constant.

  5. Optional. Configure scheduling properties.

    You can retain default values for parameters related to scheduling properties in this tutorial. You can click Properties in the right-side navigation pane of the node configuration tab. For information about parameters on the Properties tab, see Scheduling properties.

    • Scheduling Parameters: In this tutorial, scheduling parameters are configured for the workflow. You do not need to configure scheduling parameters for the inner nodes of the workflow. The configured scheduling parameters can be directly used for code and tasks developed based on the inner nodes.

    • Scheduling Policies: You can configure the Time for Delayed Execution parameter to specify the duration by which the running of the batch synchronization node lags behind the running of the workflow. In this tutorial, you do not need to configure this parameter.

  6. In the top toolbar of the configuration tab, click Save to save the node.

Step 4: Process data

  1. Synchronize data.

    In the top toolbar of the configuration tab of the workflow, click Run. In the Enter runtime parameters dialog box, specify a value that is used for scheduling parameters defined for each node in this run, and click OK. In this tutorial, 20250223 is specified. You can specify a value based on your business requirements.

  2. Query the result.

    1. Go to the SQL Query page.

      Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Analysis and Service > DataAnalysis. On the page that appears, click Go to DataAnalysis. In the left-side navigation pane of the page that appears, click SQL Query.

    2. Configure an SQL query file.

      1. In the SQL Query pane, click the image icon next to My Files and select Create File. In the Create File dialog box, configure the File Name parameter.

      2. In the left-side navigation tree, find the created SQL query file and click the file name to go to the configuration tab of the file.

      3. In the upper-right corner of the configuration tab, click the image icon. In the popover that appears, configure the following parameters.

        Parameter

        Description

        Workspace

        Select the workspace to which the user_profile_analysis_starrocks workflow belongs.

        Data Source Type

        Select StarRocks from the drop-down list.

        Data Source Name

        Select the StarRocks computing resource that is associated with the workspace when you prepare environments.

      4. Click OK.

    3. Write an SQL statement for the query.

      After all nodes in this topic are successfully run, write and execute the following SQL statement to check whether external tables are created based on the StarRocks nodes as expected.

      -- In the query statements, change the partition key value to the data timestamp of the ads_user_info_1d_starrocks node. For example, if the node is scheduled to run on February 23, 2025, the data timestamp of the node is 20250222, which is one day earlier than the scheduling time of the node. 
      SELECT * FROM ads_user_info_1d_starrocks  WHERE dt=The data timestamp; 

Step 5: Deploy the workflow

An auto triggered node can be automatically scheduled to run only after you deploy the node to the production environment. You can refer to the following steps to deploy the workflow to the production environment:

Note

In this tutorial, scheduling parameters are configured for the workflow when you configure scheduling properties for the workflow. You do not need to separately configure scheduling parameters for each node in the workflow.

  1. In the left-side navigation pane of the Data Studio page, click the image icon. In the Workspace Directories section of the DATA STUDIO pane, find the created workflow and click the workflow name to go to the configuration tab of the workflow.

  2. In the top toolbar of the configuration tab, click Deploy.

  3. On the DEPLOY tab, click Start Deployment to Production Environment to deploy the workflow by following the on-screen instructions.

Step 6: Run the nodes in the production environment

After you deploy the nodes on a day, the instances generated for the nodes can be scheduled to run on the next day. You can use the data backfill feature to backfill data for nodes in a workflow that is deployed, which allows you to check whether the nodes can be run in the production environment. For more information, see Backfill data and view data backfill instances (new version).

  1. After all nodes in the workflow are deployed, click Operation Center in the upper-right corner of the configuration tab of the node.

    You can also click the 图标 icon in the upper-left corner of the DataWorks console and choose All Products > Data Development And Task Operation > Operation Center.

  2. In the left-side navigation pane of the Operation Center page, choose Auto Triggered Node O&M > Auto Triggered Nodes. On the Auto Triggered Nodes page, find the zero load node workshop_start_starrocks and click the node name.

  3. In the direct acyclic graph (DAG) of the node, right-click the workshop_start_starrocks node and choose Run > Current and Descendant Nodes Retroactively.

  4. In the Backfill Data panel, select the nodes for which you want to backfill data, configure the Data Timestamp parameter, and then click Submit and Redirect.

  5. In the upper part of the Data Backfill page, click Refresh to check whether the workshop_start_starrocks node and its descendant nodes are successfully run.

Note

To prevent excessive fees from being generated after you complete the operations in this tutorial, you can configure the Effective Period parameter for all nodes in the workflow or freeze the zero load node workshop_start_starrocks.

What to do next

  • Display data in a visualized manner: After you complete user profile analysis, use DataAnalysis to display the processed data in charts. This helps you quickly extract key information to gain insights into the business trends behind the data.

  • Monitor data quality: Configure monitoring rules for tables that are generated after data processing to help identify and intercept dirty data in advance to prevent the impacts of dirty data from escalating.

  • Manage data: Data tables are generated in StarRocks after user profile analysis is complete. You can view the generated data tables in Data Map and view the relationships between the tables based on data lineages.

  • Use DataService Studio APIs to provide services: After you obtain the final processed data, use standardized APIs in DataService Studio to share data and to provide data for other business modules that use APIs to receive data.