All Products
Search
Document Center

E-MapReduce:Stream Load

Last Updated:Mar 26, 2026

Stream Load is a synchronous HTTP-based method for loading local files or data streams directly into StarRocks. Submit a PUT request, and the HTTP response tells you immediately whether the load succeeded. Stream Load supports CSV and JSON, with a single-load limit of 10 GB.

Submit a load task

Stream Load uses curl to send a PUT request to the FE node. Other HTTP clients work too.

Syntax

curl --location-trusted -u <username>:<password> \
     -H "Expect:100-continue" \
     -XPUT <url> \
     (data_desc) \
     [opt_properties]
Include Expect:100-continue in the request header. This prevents unnecessary data transfer if the server rejects the task before the payload is sent.

URL

http://<fe_host>:<fe_http_port>/api/<database_name>/<table_name>/_stream_load
Parameter Required Description
<fe_host> Yes IP address of the FE node
<fe_http_port> Yes HTTP port of the FE node. Default: 18030. Run SHOW FRONTENDS to look up the address and port, or check the http_port parameter on the Configuration tab of the Cluster Service page.
<database_name> Yes Name of the database containing the target table
<table_name> Yes Name of the target table

Source data parameters

The data_desc block describes the source file and controls how StarRocks maps and filters the data.

Common parameters

Parameter Required Description
-T <file_path> Yes Path to the source data file. The filename can include or omit a file extension.
format No File format. Valid values: CSV (default), JSON
partitions No Target partitions. If omitted, data is loaded into all partitions of the table.
temporary_partitions No Target temporary partitions
columns No Column mapping between the source file and the StarRocks table. Required when the file schema does not match the table schema. See Column mapping examples.

CSV parameters

Parameter Required Description
column_separator No Column delimiter. Default: \t. For invisible characters, use hexadecimal with the \x prefix—for example, -H "column_separator:\x01" for the Hive delimiter.
row_delimiter No Row delimiter. Default: \n.
Note

curl cannot pass \n directly. Use Bash escaped string syntax: -H $'row_delimiter:\n'.

skip_header No Number of header rows to skip at the start of the CSV file. Default: 0.
where No Filter condition. Only rows that match the condition are loaded. Example: -H "where: k1 = 20180601"
max_filter_ratio No Maximum ratio of rows that can be filtered out due to data quality issues. Default: 0 (zero tolerance). Rows filtered by where are not counted.
strict_mode No Controls type-conversion filtering. false (default): disabled. true: strict filtering applied during column type conversions.
timeout No Load timeout in seconds. Default: 600. Valid range: 1–259200.
timezone No Time zone for the load. Default: UTC+8. Affects all time zone-related functions in the load.
exec_mem_limit No Memory limit for the load. Default: 2 GB.

JSON parameters

Parameter Required Description
jsonpaths No Fields to load, in JSON format. Required only for JSON matching mode.
strip_outer_array No Whether to strip the outermost JSON array. false (default): the entire array is imported as a single value. true: each element of the array is imported as a separate row.
json_root No Root element of the JSON data. Required only for JSON matching mode. Must be a valid JsonPath string. Default: empty (the entire JSON file is imported).
ignore_json_size No Whether to skip the JSON body size check. By default, the JSON body cannot exceed 100 MB. Set to true to skip the check—but note that large payloads may cause high memory usage.
compression / Content-Encoding No Compression algorithm for data transmission. Supported: GZIP, BZIP2, LZ4_FRAME, Zstandard.

Load task options

opt_properties are optional settings that apply to the entire load task.

-H "label: <label_name>"
-H "where: <condition>"
-H "max_filter_ratio: <num>"
-H "timeout: <num>"
-H "strict_mode: true | false"
-H "timezone: <string>"
-H "load_mem_limit: <num>"
-H "partial_update: true | false"
-H "partial_update_mode: row | column"
-H "merge_condition: <column_name>"
Parameter Required Description
label No Label for the load task. Prevents duplicate imports—StarRocks rejects any task that reuses a label from a successfully completed task in the last 30 minutes.
where No Filter condition applied to transformed data. Only rows matching the condition are loaded.
max_filter_ratio No Maximum ratio of rows that can be filtered out due to data quality issues. Default: 0. Rows filtered by where are not counted.
log_rejected_record_num No Maximum number of filtered rows to log per BE (or CN) node. Supported in StarRocks 3.1 and later. Valid values: 0 (default, no logging), -1 (log all), or a positive integer.
timeout No Load timeout in seconds. Default: 600. Valid range: 1–259200.
strict_mode No false (default): disabled. true: strict filtering applied.
timezone No Time zone for the load. Default: UTC+8.
load_mem_limit No Memory limit for the load task. Default: 2 GB.
partial_update No Whether to use partial column updates. Default: FALSE.
partial_update_mode No Mode for partial updates. row (default): suitable for real-time updates of many columns in small batches. column: suitable for batch updates of a few columns across many rows—for example, updating 10 out of 100 columns for all rows can improve performance by 10x.
merge_condition No Column used as the update condition. The update takes effect only when the imported value is greater than or equal to the current value. Must be a non-primary-key column. Supported only for primary key tables.
In StarRocks SQL, reserved keywords must be enclosed in backticks. See Keywords.

Return value

After the load completes, StarRocks returns the result in JSON format:

{
    "TxnId": 9,
    "Label": "label2",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 4,
    "NumberLoadedRows": 4,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 45,
    "LoadTimeMs": 235,
    "BeginTxnTimeMs": 101,
    "StreamLoadPlanTimeMs": 102,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 11,
    "CommitAndPublishTimeMs": 19
}
Field Description
TxnId Transaction ID of the load
Label Label of the load task
Status Final status of the load. See the table below.
ExistingJobStatus Status of the existing task when Status is Label Already Exists. Values: RUNNING or FINISHED.
Message Details about the load status. If the load fails, contains the failure reason.
NumberTotalRows Total rows read from the data stream
NumberLoadedRows Rows successfully loaded. Valid only when Status is Success.
NumberFilteredRows Rows filtered out due to data quality issues
NumberUnselectedRows Rows filtered out by the where condition
LoadBytes Size of the source file
LoadTimeMs Total time for the load task, in milliseconds
BeginTxnTimeMs Time to start the transaction, in milliseconds
StreamLoadPlanTimeMs Time to generate the execution plan, in milliseconds
ReadDataTimeMs Time to read data, in milliseconds
WriteDataTimeMs Time to write data, in milliseconds
ErrorURL URL for error details when the load fails. See Check error details.

Load status

Status Meaning Action
Success Data loaded and visible None
Publish Timeout Load submitted successfully, but data visibility may be delayed No retry needed
Label Already Exists The label is already used by another task Change the label and resubmit
Fail Load failed Check ErrorURL for details, fix the issue, and retry

Check error details

When Status is Fail, use ErrorURL to retrieve details about filtered rows:

# View error rows directly
curl "<ErrorURL>"

# Save error rows to a local file
wget "<ErrorURL>"

Example:

wget "http://172.17.**.**:18040/api/_load_error_log?file=error_log_b74dccdcf0ceb4de_e82b2709c6c013ad"

Cancel a load task

Stream Load tasks cannot be canceled manually. A task is automatically canceled if it times out or an import error occurs.

Examples

Load CSV data

This example loads data.csv into example_table in the load_test database:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label2" \
     -H "column_separator:," \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Load JSON data

This example loads JSON records from json.data:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label2" \
     -H "format:json" \
     -T json.data -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Set error tolerance

To allow up to 20% of rows to be filtered out due to data quality issues:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label3" \
     -H "column_separator:," \
     -H "max_filter_ratio:0.2" \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Filter rows by condition

To load only rows where k1 = 20180601:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label4" \
     -H "column_separator:," \
     -H "where:k1 = 20180601" \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Configure column mapping

When the source file column order does not match the table schema, use the columns parameter to specify the mapping.

Example 1: The table has columns c1, c2, c3. The source file columns are in the order c3, c2, c1:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label5" \
     -H "column_separator:," \
     -H "columns:c3, c2, c1" \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Example 2: The source file has an extra column that does not exist in the table. Use a placeholder for the extra column:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label6" \
     -H "column_separator:," \
     -H "columns:c1, c2, c3, temp" \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Example 3: The table has columns year, month, day. The source file has a single timestamp column in 2018-06-01 01:02:03 format. Use derived column expressions:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label7" \
     -H "column_separator:," \
     -H "columns:col, year=year(col), month=month(col), day=day(col)" \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Load compressed data

To load an LZ4-compressed JSON file:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label8" \
     -H "format:json" \
     -H "compression:lz4_frame" \
     -T data.json.lz4 -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

End-to-end example

Step 1: Create the target table

  1. Log in to the master node of the StarRocks cluster using SSH. For details, see Log on to a cluster.

  2. Connect to the StarRocks cluster using a MySQL client:

    mysql -h127.0.0.1 -P 9030 -uroot
  3. Create the database and table:

    CREATE DATABASE IF NOT EXISTS load_test;
    USE load_test;
    
    CREATE TABLE IF NOT EXISTS example_table (
        id INT,
        name VARCHAR(50),
        age INT
    )
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 3
    PROPERTIES (
        "replication_num" = "1"  -- Set the number of replicas to 1.
    );

    Press Ctrl+D to exit the MySQL client.

Step 2: Prepare the source data

Prepare CSV data

CSV data — create a file named data.csv:

id,name,age
1,Alice,25
2,Bob,30
3,Charlie,35

Prepare JSON data

JSON data — create a file named json.data:

{"id":1,"name":"Emily","age":25}
{"id":2,"name":"Benjamin","age":35}
{"id":3,"name":"Olivia","age":28}
{"id":4,"name":"Alexander","age":60}
{"id":5,"name":"Ava","age":17}

Step 3: Run the load task

Import CSV data

Load CSV data:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label1" \
     -H "column_separator:," \
     -T data.csv -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

Import JSON data

Load JSON data:

curl --location-trusted -u "root:" \
     -H "Expect:100-continue" \
     -H "label:label2" \
     -H "format:json" \
     -T json.data -XPUT \
     http://172.17.**.**:18030/api/load_test/example_table/_stream_load

What's next