Stream Load is a synchronous HTTP-based method for loading local files or data streams directly into StarRocks. Submit a PUT request, and the HTTP response tells you immediately whether the load succeeded. Stream Load supports CSV and JSON, with a single-load limit of 10 GB.
Submit a load task
Stream Load uses curl to send a PUT request to the FE node. Other HTTP clients work too.
Syntax
curl --location-trusted -u <username>:<password> \
-H "Expect:100-continue" \
-XPUT <url> \
(data_desc) \
[opt_properties]
Include Expect:100-continue in the request header. This prevents unnecessary data transfer if the server rejects the task before the payload is sent.
URL
http://<fe_host>:<fe_http_port>/api/<database_name>/<table_name>/_stream_load
| Parameter | Required | Description |
|---|---|---|
<fe_host> |
Yes | IP address of the FE node |
<fe_http_port> |
Yes | HTTP port of the FE node. Default: 18030. Run SHOW FRONTENDS to look up the address and port, or check the http_port parameter on the Configuration tab of the Cluster Service page. |
<database_name> |
Yes | Name of the database containing the target table |
<table_name> |
Yes | Name of the target table |
Source data parameters
The data_desc block describes the source file and controls how StarRocks maps and filters the data.
Common parameters
| Parameter | Required | Description |
|---|---|---|
-T <file_path> |
Yes | Path to the source data file. The filename can include or omit a file extension. |
format |
No | File format. Valid values: CSV (default), JSON |
partitions |
No | Target partitions. If omitted, data is loaded into all partitions of the table. |
temporary_partitions |
No | Target temporary partitions |
columns |
No | Column mapping between the source file and the StarRocks table. Required when the file schema does not match the table schema. See Column mapping examples. |
CSV parameters
| Parameter | Required | Description |
|---|---|---|
column_separator |
No | Column delimiter. Default: \t. For invisible characters, use hexadecimal with the \x prefix—for example, -H "column_separator:\x01" for the Hive delimiter. |
row_delimiter |
No | Row delimiter. Default: \n. Note
|
skip_header |
No | Number of header rows to skip at the start of the CSV file. Default: 0. |
where |
No | Filter condition. Only rows that match the condition are loaded. Example: -H "where: k1 = 20180601" |
max_filter_ratio |
No | Maximum ratio of rows that can be filtered out due to data quality issues. Default: 0 (zero tolerance). Rows filtered by where are not counted. |
strict_mode |
No | Controls type-conversion filtering. false (default): disabled. true: strict filtering applied during column type conversions. |
timeout |
No | Load timeout in seconds. Default: 600. Valid range: 1–259200. |
timezone |
No | Time zone for the load. Default: UTC+8. Affects all time zone-related functions in the load. |
exec_mem_limit |
No | Memory limit for the load. Default: 2 GB. |
JSON parameters
| Parameter | Required | Description |
|---|---|---|
jsonpaths |
No | Fields to load, in JSON format. Required only for JSON matching mode. |
strip_outer_array |
No | Whether to strip the outermost JSON array. false (default): the entire array is imported as a single value. true: each element of the array is imported as a separate row. |
json_root |
No | Root element of the JSON data. Required only for JSON matching mode. Must be a valid JsonPath string. Default: empty (the entire JSON file is imported). |
ignore_json_size |
No | Whether to skip the JSON body size check. By default, the JSON body cannot exceed 100 MB. Set to true to skip the check—but note that large payloads may cause high memory usage. |
compression / Content-Encoding |
No | Compression algorithm for data transmission. Supported: GZIP, BZIP2, LZ4_FRAME, Zstandard. |
Load task options
opt_properties are optional settings that apply to the entire load task.
-H "label: <label_name>"
-H "where: <condition>"
-H "max_filter_ratio: <num>"
-H "timeout: <num>"
-H "strict_mode: true | false"
-H "timezone: <string>"
-H "load_mem_limit: <num>"
-H "partial_update: true | false"
-H "partial_update_mode: row | column"
-H "merge_condition: <column_name>"
| Parameter | Required | Description |
|---|---|---|
label |
No | Label for the load task. Prevents duplicate imports—StarRocks rejects any task that reuses a label from a successfully completed task in the last 30 minutes. |
where |
No | Filter condition applied to transformed data. Only rows matching the condition are loaded. |
max_filter_ratio |
No | Maximum ratio of rows that can be filtered out due to data quality issues. Default: 0. Rows filtered by where are not counted. |
log_rejected_record_num |
No | Maximum number of filtered rows to log per BE (or CN) node. Supported in StarRocks 3.1 and later. Valid values: 0 (default, no logging), -1 (log all), or a positive integer. |
timeout |
No | Load timeout in seconds. Default: 600. Valid range: 1–259200. |
strict_mode |
No | false (default): disabled. true: strict filtering applied. |
timezone |
No | Time zone for the load. Default: UTC+8. |
load_mem_limit |
No | Memory limit for the load task. Default: 2 GB. |
partial_update |
No | Whether to use partial column updates. Default: FALSE. |
partial_update_mode |
No | Mode for partial updates. row (default): suitable for real-time updates of many columns in small batches. column: suitable for batch updates of a few columns across many rows—for example, updating 10 out of 100 columns for all rows can improve performance by 10x. |
merge_condition |
No | Column used as the update condition. The update takes effect only when the imported value is greater than or equal to the current value. Must be a non-primary-key column. Supported only for primary key tables. |
In StarRocks SQL, reserved keywords must be enclosed in backticks. See Keywords.
Return value
After the load completes, StarRocks returns the result in JSON format:
{
"TxnId": 9,
"Label": "label2",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 4,
"NumberLoadedRows": 4,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 45,
"LoadTimeMs": 235,
"BeginTxnTimeMs": 101,
"StreamLoadPlanTimeMs": 102,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 11,
"CommitAndPublishTimeMs": 19
}
| Field | Description |
|---|---|
TxnId |
Transaction ID of the load |
Label |
Label of the load task |
Status |
Final status of the load. See the table below. |
ExistingJobStatus |
Status of the existing task when Status is Label Already Exists. Values: RUNNING or FINISHED. |
Message |
Details about the load status. If the load fails, contains the failure reason. |
NumberTotalRows |
Total rows read from the data stream |
NumberLoadedRows |
Rows successfully loaded. Valid only when Status is Success. |
NumberFilteredRows |
Rows filtered out due to data quality issues |
NumberUnselectedRows |
Rows filtered out by the where condition |
LoadBytes |
Size of the source file |
LoadTimeMs |
Total time for the load task, in milliseconds |
BeginTxnTimeMs |
Time to start the transaction, in milliseconds |
StreamLoadPlanTimeMs |
Time to generate the execution plan, in milliseconds |
ReadDataTimeMs |
Time to read data, in milliseconds |
WriteDataTimeMs |
Time to write data, in milliseconds |
ErrorURL |
URL for error details when the load fails. See Check error details. |
Load status
| Status | Meaning | Action |
|---|---|---|
Success |
Data loaded and visible | None |
Publish Timeout |
Load submitted successfully, but data visibility may be delayed | No retry needed |
Label Already Exists |
The label is already used by another task | Change the label and resubmit |
Fail |
Load failed | Check ErrorURL for details, fix the issue, and retry |
Check error details
When Status is Fail, use ErrorURL to retrieve details about filtered rows:
# View error rows directly
curl "<ErrorURL>"
# Save error rows to a local file
wget "<ErrorURL>"
Example:
wget "http://172.17.**.**:18040/api/_load_error_log?file=error_log_b74dccdcf0ceb4de_e82b2709c6c013ad"
Cancel a load task
Stream Load tasks cannot be canceled manually. A task is automatically canceled if it times out or an import error occurs.
Examples
Load CSV data
This example loads data.csv into example_table in the load_test database:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label2" \
-H "column_separator:," \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Load JSON data
This example loads JSON records from json.data:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label2" \
-H "format:json" \
-T json.data -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Set error tolerance
To allow up to 20% of rows to be filtered out due to data quality issues:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label3" \
-H "column_separator:," \
-H "max_filter_ratio:0.2" \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Filter rows by condition
To load only rows where k1 = 20180601:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label4" \
-H "column_separator:," \
-H "where:k1 = 20180601" \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Configure column mapping
When the source file column order does not match the table schema, use the columns parameter to specify the mapping.
Example 1: The table has columns c1, c2, c3. The source file columns are in the order c3, c2, c1:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label5" \
-H "column_separator:," \
-H "columns:c3, c2, c1" \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Example 2: The source file has an extra column that does not exist in the table. Use a placeholder for the extra column:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label6" \
-H "column_separator:," \
-H "columns:c1, c2, c3, temp" \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Example 3: The table has columns year, month, day. The source file has a single timestamp column in 2018-06-01 01:02:03 format. Use derived column expressions:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label7" \
-H "column_separator:," \
-H "columns:col, year=year(col), month=month(col), day=day(col)" \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Load compressed data
To load an LZ4-compressed JSON file:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label8" \
-H "format:json" \
-H "compression:lz4_frame" \
-T data.json.lz4 -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
End-to-end example
Step 1: Create the target table
-
Log in to the master node of the StarRocks cluster using SSH. For details, see Log on to a cluster.
-
Connect to the StarRocks cluster using a MySQL client:
mysql -h127.0.0.1 -P 9030 -uroot -
Create the database and table:
CREATE DATABASE IF NOT EXISTS load_test; USE load_test; CREATE TABLE IF NOT EXISTS example_table ( id INT, name VARCHAR(50), age INT ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 3 PROPERTIES ( "replication_num" = "1" -- Set the number of replicas to 1. );Press
Ctrl+Dto exit the MySQL client.
Step 2: Prepare the source data
Prepare CSV data
CSV data — create a file named data.csv:
id,name,age
1,Alice,25
2,Bob,30
3,Charlie,35
Prepare JSON data
JSON data — create a file named json.data:
{"id":1,"name":"Emily","age":25}
{"id":2,"name":"Benjamin","age":35}
{"id":3,"name":"Olivia","age":28}
{"id":4,"name":"Alexander","age":60}
{"id":5,"name":"Ava","age":17}
Step 3: Run the load task
Import CSV data
Load CSV data:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label1" \
-H "column_separator:," \
-T data.csv -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
Import JSON data
Load JSON data:
curl --location-trusted -u "root:" \
-H "Expect:100-continue" \
-H "label:label2" \
-H "format:json" \
-T json.data -XPUT \
http://172.17.**.**:18030/api/load_test/example_table/_stream_load
What's next
-
For Java integration examples, see the stream_load demo.
-
For Spark Streaming integration, see SparkStreaming to StarRocks.