Usage of the INSERT INTO statement in StarRocks is similar to that in MySQL and other databases. However, in StarRocks, all data writes are performed by using independent import jobs. Therefore, INSERT INTO is described as an import method in StarRocks. This topic describes the scenarios, configurations, and examples of the INSERT INTO import method.

Scenarios

  • The result is synchronously returned if the INSERT INTO statement is executed to import data.
  • If you want to import a few data rows to test the StarRocks system, you can execute the INSERT INTO VALUES statement.
  • If you want to perform extract, transform, and load (ETL) operations on the data in the StarRocks table and import the data into another StarRocks table, you can execute the INSERT INTO SELECT statement.
  • You can create a foreign table. For example, you create a MySQL foreign table that maps data in a MySQL table. Then, you can import the data in the foreign table into the StarRocks table by executing the INSERT INTO SELECT statement.

Precautions

  • By default, the data that does not conform to the format requirements of the destination table is filtered out when you execute the INSERT INTO statement. For example, a string that exceeds the specified length is filtered out. For business scenarios in which data cannot be filtered out, you can set the enable_insert_strict session variable to true to ensure that the INSERT INTO statement is not executed if data is filtered out.
  • In StarRocks, the INSERT INTO statement reuses the import data logic. As a result, each execution of the INSERT INTO statement generates a new data version. If you frequently perform operations to import a small amount of data, excessive data versions are generated. This affects the query performance. We recommend that you do not frequently execute the INSERT INTO statement to import data or use it as a routine import job in the production environment. If you need to import stream data or small batches of data, you can use the Stream Load or Routine Load method.

Basic operations

Syntax

INSERT INTO table_name
[ PARTITION (p1, ...) ]
[ WITH LABEL label]
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
The following table describes the parameters.
Parameter Description
table_name The name of the destination table to which the data is imported. Enter the table name in the db_name.table_name format.
partitions Specifies the partitions into which the data is imported. The partitions that you specify must exist in the table_name table. Separate multiple partition names with commas (,). If you specify a partition, only the data that meets the requirements of the partition is imported. If this parameter is left empty, the default value is the names of all partitions in the table_name table.
label Specifies a label for an import job. The label is used to identify the INSERT INTO job. Each import job has a unique label within a single database.
Important We recommend that you specify a label for an import job instead of using the label that is automatically allocated by the system. If you use the label that is allocated by the system, you cannot know the execution result of the INSERT INTO statement when the connection is closed due to a network error. If you specify a label for the import job, you can query the execution result of the INSERT INTO statement by using the label.
column_name Specifies destination columns. The columns must exist in the table_name table.

The destination columns to which the data is imported from the source table. The columns can exist in any order. By default, all columns of the table_name table are destination columns if you leave this parameter empty. If a column in the source table does not exist in the destination columns, the column must have a default value. Otherwise, the INSERT INTO statement fails to be executed. If the type of the column returned by the query statement is different from that of the destination column, an implicit conversion is performed. If the conversion fails, the INSERT INTO statement returns a syntax parsing error.

expression An expression that assigns values to a column.
default Assigns default values to a column.
query A common query statement whose result is written to the destination table. All SQL statements supported by StarRocks can be used.
values You can write one or more data rows by executing the INSERT INTO VALUES statement.
Important You can execute the INSERT INTO VALUES statement to verify a demo by using several data rows. The statement is not applicable to any testing or production environment. StarRocks is not suitable for the scenario in which only a single data row is imported. We recommend that you import multiple data rows by executing the INSERT INTO SELECT statement.

Import results

The following sample return results are provided for the INSERT INTO statement, which is also an SQL statement.
  • Sample results for a successful execution
    • Example 1
      Execute the insert into tbl1 select * from empty_tbl; statement. The following result is returned:
      Query OK, 0 rows affected (0.02 sec)
    • Example 2
      Execute the insert into tbl1 select * from tbl2; statement. The following result is returned:
      Query OK, 4 rows affected (0.38 sec)
      {'label':'insert_8510c568-9eda-4173-9e36-6adc7d35291c', 'status':'visible', 'txnId':'4005'}
    • Example 3
      Execute the insert into tbl1 with label my_label1 select * from tbl2; statement. The following result is returned:
      Query OK, 4 rows affected (0.38 sec)
      {'label':'my_label1', 'status':'visible', 'txnId':'4005'}
    • Example 4
      Execute the insert into tbl1 select * from tbl2; statement. The following result is returned:
      Query OK, 2 rows affected, 2 warnings (0.31 sec)
      {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'visible', 'txnId':'4005'}
      The following result can also be returned:
      Query OK, 2 rows affected, 2 warnings (0.31 sec)
      {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
    The following table describes the parameters involved in the return results.
    Parameter Description
    rows affected Indicates the total number of data rows that are imported.
    warnings Indicates the number of data rows that are filtered out.
    status Indicates whether the imported data is visible. Valid values:
    • visible: The imported data is visible.
    • committed: The imported data is invisible.
    txnId The ID of the INSERT transaction.
    err Indicates unexpected errors. If you need to view the data rows that are filtered out, you can execute the SHOW LOAD statement. The URL in the returned result can be used to query the error data.
  • Sample results for a failed execution

    A failed execution indicates that no data is imported.

    For example, you execute the insert into tbl1 select * from tbl2 where k1 = "a"; statement. The following result is returned:
    ERROR 1064 (HY000): all partitions have no load data. url: [http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2](http://10.74.**.**:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2)

    The ERROR 1064 (HY000): all partitions have no load data error in the returned result shows the cause of the failure. The URL can be used to query the error data.

Related parameters

Parameters of FE nodes

timeout: The timeout period of the import job. Unit: seconds. If the import job is not complete within the specified timeout period, the system cancels the job. In this case, the job is in the CANCELLED state. The execution of INSERT INTO statement does not support the custom timeout period. The timeout period for all import jobs that use the INSERT INTO import method is the same. The default timeout period is 1 hour. If the import task is not complete within the specified timeout period, you need to adjust the value of the insert_load_default_timeout_second parameter of the frontend (FE) node.

Session parameters

Parameter Description
enable_insert_strict The INSERT INTO import method cannot control the tolerable failure rate of an import job. You can set the enable_insert_strict session parameter to control the tolerable failure rate of an import job. Valid values:
  • true: specifies that the import job fails if an error occurs for one data row. This is the default value.
  • false: specifies that a success result is returned if at least one data row is imported. If the import job fails, a label is returned.

    You can set this parameter to false by running the SET enable_insert_strict = false command.

query_timeout The INSERT INTO statement is also an SQL statement. As a result, the INSERT INTO statement is limited by the query_timeout session parameter. You can increase the timeout period by running the SET query_timeout = xxx command. The timeout period is measured in seconds.

Examples

Create databases and tables

  1. Execute the following statement to create a database:
    CREATE DATABASE IF NOT EXISTS load_test;
  2. Execute the following statement to specify a database:
    USE load_test;
  3. Execute the following statement to create a table:
    CREATE TABLE insert_wiki_edit
    (
        event_time DATETIME,
        channel VARCHAR(32) DEFAULT '',
        user VARCHAR(128) DEFAULT '',
        is_anonymous TINYINT DEFAULT '0',
        is_minor TINYINT DEFAULT '0',
        is_new TINYINT DEFAULT '0',
        is_robot TINYINT DEFAULT '0',
        is_unpatrolled TINYINT DEFAULT '0',
        delta INT SUM DEFAULT '0',
        added INT SUM DEFAULT '0',
        deleted INT SUM DEFAULT '0'
    )
    AGGREGATE KEY(event_time, channel, user, is_anonymous, is_minor, is_new, is_robot, is_unpatrolled)
    PARTITION BY RANGE(event_time)
    (
        PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
        PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
        PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
        PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
    )
    DISTRIBUTED BY HASH(user) BUCKETS 10
    PROPERTIES("replication_num" = "1");

Import data by executing the INSERT INTO VALUES statement

Execute the following statement to import data:
INSERT INTO insert_wiki_edit VALUES("2015-09-12 00:00:00","#en.wikipedia","GELongstreet",0,0,0,0,0,36,36,0),("2015-09-12 00:00:00","#ca.wikipedia","PereBot",0,1,0,1,0,17,17,0);
The following information is returned:
Query OK, 2 rows affected (0.29 sec)
{'label':'insert_1f12c916-5ff8-4ba9-8452-6fc37fac2e75', 'status':'visible', 'txnId':'601'}

Import data by executing the INSERT INTO SELECT statement

Execute the following statement to import data:
INSERT INTO insert_wiki_edit WITH LABEL insert_load_wikipedia SELECT * FROM routine_wiki_edit;
The following information is returned:
Query OK, 18203 rows affected (0.40 sec)
{'label':'insert_load_wikipedia', 'status':'visible', 'txnId':'618'}