ApsaraDB for SelectDB works with Object Storage Service (OSS) to provide the high-speed import feature over an internal network. You can use this feature to import hundreds of GB of data at a time. This topic describes how to use OSS to import data to an ApsaraDB for SelectDB instance.
Prerequisites
An AccessKey pair is obtained. For more information, see Create an AccessKey pair.
OSS is activated and an OSS bucket is created. The OSS bucket resides in the same region as the ApsaraDB for SelectDB instance to which you want to import data. For more information, see Get started by using the OSS console.
The OSS bucket resides in the same region as the ApsaraDB for SelectDB instance. OSS can be accessed over a virtual private cloud (VPC).
Syntax for importing data
LOAD LABEL load_label
(
data_desc1[, data_desc2, ...]
)
WITH S3
(
"AWS_PROVIDER" = "OSS",
"AWS_REGION" = "REGION",
"AWS_ENDPOINT" = "ENDPOINT",
"AWS_ACCESS_KEY" = "AccessKey ID",
"AWS_SECRET_KEY"="AccessKey Secret"
)
PROPERTIES
(
"key1"="value1", ...
);ApsaraDB for SelectDB allows you to import data from all object storage systems that are compatible with the Amazon Simple Storage Service (Amazon S3) protocol. Therefore, the syntax for importing data uses keywords such as AWS and S3.
Parameters
The data_desc1 parameter is used to describe a group of files to be imported. The following sample code provides the complete definition of the parameter.
[MERGE|APPEND|DELETE]
DATA INFILE
(
"file_path1"[, file_path2, ...]
)
[NEGATIVE]
INTO TABLE `table_name`
[PARTITION (p1, p2, ...)]
[COLUMNS TERMINATED BY "column_separator"]
[FORMAT AS "file_type"]
[(column_list)]
[COLUMNS FROM PATH AS (c1, c2, ...)]
[PRECEDING FILTER predicate]
[SET (column_mapping)]
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[PROPERTIES ("key1"="value1", ...)]Parameter | Description |
| The type of data merging. By default, this parameter is set to |
| The path of the file to be imported. You can specify multiple values for this parameter. Wildcards are supported. The path that you specify must be a path of a file. If the path is a path of a directory, the import fails. |
| The data of this batch is imported in a negative method. This method is applicable only to a table that is aggregated by using the SUM method and contains integers. In this method, the integers that correspond to the columns aggregated in the imported data by using the SUM method are reversed to offset the error data that was imported. |
| The partitions of the table to be imported. Data that is not in the specified partitions is ignored. |
| The column delimiter. The column delimiter is valid only in files of the |
| The format of the file to be imported. Valid values: |
| The order in which the columns are sorted in the source file. For more information, see Converting Source Data. |
| The columns that are extracted from the file to be imported. |
| The preset conditions for filtering data. The data is merged into source data rows in sequence based on the values of the |
| The function that is used to convert the specified column. |
| The conditions based on which the imported data is filtered. For more information, see Converting Source Data. |
| This parameter must be used together with the |
| The sequence_col column in the data to be imported. This parameter is applicable only to tables in the Unique key model and ensures the data sequence when data is imported. |
| The parameters that specify the format of the file to be imported. If the file to be imported is in the |
The following table describes the parameters that are used to configure the cloud storage service.
Parameter | Description |
| The service provider of the object storage service, which is fixed to OSS. |
| The region in which the OSS data to be accessed resides. |
| The Important Make sure that the OSS bucket and your ApsaraDB for SelectDB instance reside in the same region. |
| The |
| The |
The following table describes the fields of PROPERTIES.
Field | Description |
| The timeout period for the import job. Unit: seconds. Default value: |
| The maximum ratio of data that can be filtered out. For example, data is filtered out because it does not conform to standards. Default value: |
| The maximum size of memory that is available for the import job. Unit: bytes. Default value: |
| Specifies whether to enable the strict mode for the import job. Default value: |
| The time zone of the time zone-related functions, such as |
| The maximum number of import jobs that can concurrently run at a time. Default value: |
| The maximum number of concurrent jobs to send data for processing in batches. If the value of this parameter is greater than the value of the |
| Specifies whether to import data to only one tablet of the corresponding partition. Default value: |
Example
Create a table into which you want to import data in an ApsaraDB for SelectDB instance. Sample code:
CREATE TABLE test_table ( id int, name varchar(50), age int, address varchar(50), url varchar(500) ) DISTRIBUTED BY HASH(id) BUCKETS 4 PROPERTIES("replication_num" = "1");Create a file to be imported, which is named
test_file.txt. Sample code:1,yang,32,shanghai,http://example.com 2,wang,22,beijing,http://example.com 3,xiao,23,shenzhen,http://example.com 4,jess,45,hangzhou,http://example.com 5,jack,14,shanghai,http://example.com 6,tomy,25,hangzhou,http://example.com 7,lucy,45,shanghai,http://example.com 8,tengyin,26,shanghai,http://example.com 9,wangli,27,shenzhen,http://example.com 10,xiaohua,37,shanghai,http://example.comImport data by using OSS Load. Sample code:
LOAD LABEL test_db.test_label_1 ( DATA INFILE("s3://your_bucket_name/test_file.txt") INTO TABLE test_table COLUMNS TERMINATED BY "," ) WITH S3 ( "AWS_PROVIDER" = "OSS", "AWS_REGION" = "oss-cn-beijing", "AWS_ENDPOINT" = "oss-cn-beijing-internal.aliyuncs.com", "AWS_ACCESS_KEY" = "<your_access_key>", "AWS_SECRET_KEY"="<your_secret_key>" ) PROPERTIES ( "timeout" = "3600" );NoteThe path of the OSS bucket must start with s3://.
Cancel an OSS Load job
If an OSS Load job is not in the CANCELLED or FINISHED state, you can manually cancel the job. You must specify the label of the import job to be canceled. After an import job is canceled, the data written in the job is rolled back and does not take effect.
Syntax
CANCEL LOAD
[FROM db_name]
WHERE [LABEL = "load_label" | LABEL like "label_pattern"];Parameters
Parameter | Description |
| The name of the database. By default, if you do not specify this parameter, the current database is used. |
| The |
Examples
Cancel an import job whose
labelisexample_db_test_load_labelfrom theexample_dbdatabase.CANCEL LOAD FROM example_db WHERE LABEL = "example_db_test_load_label";Cancel import jobs whose labels contain
example_ fromthe example_db database.CANCEL LOAD FROM example_db WHERE LABEL like "example_";
View the data import job
OSS Load is an asynchronous data import method. If a data import statement is successfully executed, the OSS Load job is successfully committed but the data may not be imported. To query the status of a Broker Load job, you can execute the SHOW LOAD statement.
Syntax
SHOW LOAD
[FROM db_name]
[
WHERE
[LABEL [ = "your_label" | LIKE "label_matcher"]]
[STATE = ["PENDING"|"ETL"|"LOADING"|"FINISHED"|"CANCELLED"|]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];Parameters
Parameter | Description |
| The name of the database. By default, if you do not specify this parameter, the current database is used. |
| The |
| The status of the import job. You can view only import jobs that are in the specified state. |
| The order in which the returned data records are sorted. |
| The |
| The number of initial records to skip before the query results start to be displayed. Default value: |
Examples
Query the import jobs whose labels contain
2014_01_02in theexample_dbdatabase, and display 10 import jobs that are stored for the longest period of time.SHOW LOAD FROM example_db WHERE LABEL LIKE "2014_01_02" LIMIT 10;Query the import jobs whose labels are
load_example_db_20140102in theexample_dbdatabase, and sort these jobs byLoadStartTimein descending order.SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" ORDER BY LoadStartTime DESC;Query the import jobs that are in the loading state and whose labels are load_example_db_20140102 in the example_db database.
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "loading";Query the import jobs in the
example_dbdatabase and sort these jobs byLoadStartTimein descending order. Skip the initial five query results and display the next 10 query results.SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 5,10; SHOW LOAD FROM example_db ORDER BY LoadStartTime DESC limit 10 offset 5;