DataWorks provides StarRocks Reader and StarRocks Writer for you to read data from and write data to StarRocks data sources. This topic describes the capabilities of synchronizing data from or to StarRocks data sources.

Supported StarRocks versions

EMR StarRocks 2.1.

Data type mappings

Most StarRocks data types, including numeric, STRING, and DATE data types, are supported.

Develop a data synchronization node

For information about the entry point for and the procedure of configuring a data synchronization node, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the node.

Add a data source

Before you configure a data synchronization node to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization node to synchronize data of a single table

Appendix: Code and parameters

Appendix: Configure a batch synchronization node by using the code editor

If you use the code editor to configure a batch synchronization node, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization node by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for StarRocks Reader

{
    "stepType": "starrocks",
    "parameter": {
        "selectedDatabase": "didb1",
        "datasource": "starrocks_datasource",
        "column": [
            "id",
            "name"
        ],
        "where": "id>100",
        "table": "table1",
        "splitPk": "id"
    },
    "name": "Reader",
    "category": "reader"
}

Parameters in code for StarRocks Reader

ParameterDescriptionRequiredDefault value
datasourceThe name of the StarRocks data source. YesNo default value
selectedDatabaseThe name of the StarRocks database. NoThe name of the database that is configured in the StarRocks data source
columnThe names of the columns from which you want to read data. YesNo default value
whereThe WHERE clause. For example, you can set this parameter to gmt_create > $bizdate to read the data that is generated on the current day.
  • You can use the WHERE clause to read incremental data.
  • If the where parameter is not provided or is left empty, StarRocks Reader reads all data.
NoNo default value
tableThe name of the table from which you want to read data. YesNo default value
splitPkThe field that is used for data sharding when StarRocks Reader reads data. If you specify this parameter, data sharding is performed based on the value of this parameter, and parallel threads can be used to read data. This improves data synchronization efficiency. We recommend that you set the splitPk parameter to the name of the primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards. NoNo default value

Code for StarRocks Writer

{
    "stepType": "starrocks",
    "parameter": {
        "selectedDatabase": "didb1",
        "loadProps": {
            "row_delimiter": "\\x02",
            "column_separator": "\\x01"
        },
        "datasource": "starrocks_public",
        "column": [
            "id",
            "name"
        ],
        "loadUrl": [
            "1.1.1.1:8030"
        ],
        "table": "table1",
        "preSql": [
            "truncate table table1"
        ],
        "postSql": [
        ]
    },
    "name": "Writer",
    "category": "writer"
}

Parameters in code for StarRocks Writer

ParameterDescriptionRequiredDefault value
datasourceThe name of the StarRocks data source. YesNo default value
selectedDatabaseThe name of the StarRocks database. NoThe name of the database that is configured in the StarRocks data source
loadPropsThe request parameters for the StarRocks Stream Load import method. If you want to import data as CSV files by using the Stream Load import method, you can configure request parameters. If you have no special requirements, set the parameter to {}. Request parameters that you can configure for the Stream Load import method:
  • column_separator: specifies the column delimiter of a CSV file. The default value is \t.
  • row_delimiter: specifies the row delimiter of a CSV file. The default value is \n.
  • If the data that you want to write to StarRocks contains \t or \n, you must use other characters as delimiters. Example:
    {    "column_separator": "\\x01",    "row_delimiter": "\\x02"}
YesNo default value
columnThe names of the columns to which you want to write data. YesNo default value
loadUrlThe URL of a StarRocks frontend node. The URL consists of the IP address of the frontend node and the HTTP port number. The default HTTP port number is 8030. If you specify URLs for multiple frontend nodes, separate them with commas (,). YesNo default value
tableThe name of the table to which you want to write data. YesNo default value
preSqlThe SQL statement that you want to execute before the synchronization node is run. For example, you can set this parameter to the TRUNCATE TABLE tablename statement to delete outdated data. NoNo default value
postSqlThe SQL statement that you want to execute after the synchronization node is run. NoNo default value