AnalyticDB for MySQL Data Lakehouse Edition (V3.0) allows you to access and import data by using external tables. You can import data by using regular import or elastic import. Compared with regular import, elastic import consumes less resources and reduces impacts on real-time data reads and writes. This topic describes how to use external tables to query Object Storage Service (OSS) data and import data from OSS to AnalyticDB for MySQL Data Lakehouse Edition (V3.0).
Prerequisites
OSS is activated. A bucket and a directory are created in OSS. For more information, see Activate OSS, Create buckets, and Create directories.
Data is uploaded to the OSS directory. For more information, see Upload objects.
An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster is created in the same region as the OSS bucket.
Sample data
In this example, the person
object is uploaded to the testBucketName/adb/dt=2023-06-15
directory in OSS. Line feeds are used as row delimiters, and commas (,) are used as column delimiters. Sample data in the person
object:
1,james,10,2023-06-15
2,bond,20,2023-06-15
3,jack,30,2023-06-15
4,lucy,40,2023-06-15
Procedure
- Go to the SQL editor.
- Log on to the AnalyticDB for MySQL console.
- In the upper-left corner of the page, select the region where the cluster resides.
- In the left-side navigation pane, click Clusters.
- On the Data Lakehouse Edition (V3.0) tab, find the cluster and click the Cluster ID.
- In the left-side navigation pane, choose .
Import data.
You can import data by using regular import or elastic import. The default method is regular import. Regular import reads data from compute nodes and creates indexes on storage nodes. This method consumes computing and storage resources. Elastic import reads data and creates indexes for Serverless Spark jobs. This method consumes resources of job resource groups. Only AnalyticDB for MySQL Data Lakehouse Edition (V3.0) clusters of V3.1.10.0 or later that have job resource groups support elastic import. For more information, see Data import methods.
If you use elastic import, take note of the following items:
To perform elastic import, you must execute the following statement to enable the elastic import feature:
SET adb_config RC_ELASTIC_JOB_SCHEDULER_ENABLE=true;
Elastic import allows you to import only CSV, Parquet, or ORC data from OSS.
Elastic import reads data and creates indexes based on job resource groups. This consumes resources of job resource groups, and you are charged for the resources. For more information, see View monitoring information of resource groups and Billable items of Data Lakehouse Edition (V3.0).
Make sure that job resource groups have sufficient resources to prevent long waiting time for jobs, long execution durations, and job failures.
An elastic import job requires at least 2 minutes to 3 minutes to complete and is not suitable for small amounts of data. If an import job must be completed within 3 minutes, we recommend that you use regular import.
An elastic import job requires a longer period of time to complete than a regular import job that uses the same resources. If you want an import job to be completed within a short period of time, we recommend that you increase the maximum amount of resources for an elastic import job to accelerate the job.
Regular import
Create an external database.
CREATE EXTERNAL DATABASE adb_external_db;
Use the CREATE EXTERNAL TABLE statement to create an OSS external table in the
adb_external_db
database. In this example, the external table is named adb_external_db.person.NoteThe OSS external table of AnalyticDB for MySQL must use the same names, quantity, order, and data types of fields as the OSS object.
Create a non-partitioned OSS external table
Create a partitioned OSS external table
For information about the syntax, see CREATE EXTERNAL TABLE.
Query data.
After you create an external table, you can execute the SELECT statement in AnalyticDB for MySQL to query data of the external table.
SELECT * FROM adb_external_db.person;
Sample result:
+------+-------+------+-----------+ | id | name | age | dt | +------+-------+------+-----------+ | 1 | james | 10 |2023-06-15 | | 2 | bond | 20 |2023-06-15 | | 3 | jack | 30 |2023-06-15 | | 4 | lucy | 40 |2023-06-15 | +------+-------+------+-----------+ 4 rows in set (0.35 sec)
Create a database in the AnalyticDB for MySQL cluster. If you have already created a database, skip this step.
CREATE DATABASE adb_demo;
Create a table in the AnalyticDB MySQL cluster to store data that is imported from OSS.
NoteThe created internal table must use the same names, quantity, order, and data types of fields as the external table that is created in Step b.
CREATE TABLE IF NOT EXISTS adb_demo.adb_import_test( id INT, name VARCHAR(1023), age INT, dt VARCHAR(1023) ) DISTRIBUTE BY HASH(id);
Import data to the table.
- Method 1: Execute the
INSERT INTO
statement to import data. If the primary key has duplicate values, data is not repeatedly inserted, and the INSERT INTO statement is equivalent toINSERT IGNORE INTO
. For more information, see INSERT INTO.INSERT INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
- Method 2: Execute the
INSERT OVERWRITE INTO
statement to synchronously import data. If the primary key has duplicate values, the original value is overwritten by the new value.INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
- Method 3: Execute the
INSERT OVERWRITE INTO
statement to asynchronously import data. For more information, see Asynchronous writing.SUBMIT JOB INSERT OVERWRITE adb_demo.adb_import_test SELECT * FROM adb_external_db.person;
- Method 1: Execute the
Elastic import
Create a database. If you have already created a database, skip this step.
CREATE DATABASE adb_demo;
Create an OSS external table.
NoteThe OSS external table of AnalyticDB for MySQL must use the same names, quantity, order, and data types of fields as the OSS object.
Elastic import allows you to create external tables by using only the
CREATE TABLE
statement.
CREATE TABLE oss_import_test_external_table ( id INT(1023), name VARCHAR(1023), age INT, dt VARCHAR(1023) ) ENGINE='OSS' TABLE_PROPERTIES='{ "endpoint":"oss-cn-hangzhou-internal.aliyuncs.com", "url":"oss://<bucket-name>/adb/oss_import_test_data.csv", "accessid":"LTAI5t8sqJn5GhpBVtN8****", "accesskey":"HlClegbiV5mJjBYBJHEZQOnRF7****", "delimiter":"," }';
ImportantWhen you create an external table, you can configure the following TABLE_PROPERTIES parameters for the CSV, Parquet, or ORC external table:
CSV:
endpoint
,url
,accessid
,accesskey
,format
,delimiter
,null_value
, andpartition_column
.Parquet:
endpoint
,url
,accessid
,accesskey
,format
, andpartition_column
.ORC:
endpoint
,url
,accessid
,accesskey
,format
, andpartition_column
.
For more information about external table parameters, see the "Create an OSS external table for a non-partitioned object" and "Create an OSS external table for a partitioned object." sections of the Use external tables to import data to Data Warehouse Edition topic.
Query data.
After you create an external table, you can execute the SELECT statement in AnalyticDB for MySQL to query data of the external table.
SELECT * FROM oss_import_test_external_table;
Sample result:
+------+-------+------+-----------+ | id | name | age | dt | +------+-------+------+-----------+ | 1 | james | 10 |2023-06-15 | | 2 | bond | 20 |2023-06-15 | | 3 | jack | 30 |2023-06-15 | | 4 | lucy | 40 |2023-06-15 | +------+-------+------+-----------+ 4 rows in set (0.35 sec)
Create a table in the AnalyticDB MySQL cluster to store data that is imported from OSS.
NoteThe created internal table must use the same names, quantity, order, and data types of fields as the external table that is created in Step b.
CREATE TABLE adb_import_test ( id INT, name VARCHAR(1023), age INT, dt VARCHAR(1023) primary key(id) ) DISTRIBUTED BY HASH(uid);
Import data.
ImportantElastic import allows you to import data by using only the
INSERT OVERWRITE INTO
statement.Method 1: Execute the INSERT OVERWRITE INTO statement to elastically import data. If the primary key has duplicate values, the original value is overwritten by the new value.
/+*elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
Method 2: Asynchronously execute the INSERT OVERWRITE INTO statement to elastically import data. Typically, the
SUBMIT JOB
statement is used to submit an asynchronous job./*elastic_load=true, elastic_load_configs=[adb.load.resource.group.name=resource_group]*/ SUBMIT JOB INSERT OVERWRITE INTO adb_demo.adb_import_test SELECT * FROM adb_demo.oss_import_test_external_table;
ImportantWhen you asynchronously submit an elastic import job, you cannot configure priority queues.
Sample result:
+---------------------------------------+ | job_id | +---------------------------------------+ | 2023081517195102101701907203151****** |
After you use the
SUBMIT JOB
statement to submit an asynchronous job, only a job ID is returned, which indicates that the asynchronous job is successfully submitted. You can use the returned job ID to terminate the asynchronous job or query the status of the asynchronous job. For more information, see Asynchronously submit an import job.Hint parameters:
elastic_load: specifies whether to use elastic import. Valid values: true and false. Default value: false.
elastic_load_configs: the configuration parameters of the elastic import feature. You must enclose the parameters within brackets ([ ]) and separate multiple parameters with vertical bars (|). The following table describes the parameters.
Parameter
Required
Description
adb.load.resource.group.name
Yes
The name of the job resource group that runs the elastic import job.
adb.load.job.max.acu
No
The maximum amount of resources for an elastic import job. Unit: AnalyticDB compute units (ACUs). Minimum value: 5 ACUs. Default value: number of shards plus 1.
Execute the following statement to query the number of shards in the cluster:
SELECT count(1) FROM information_schema.kepler_meta_shards;
spark.driver.resourceSpec
No
The resource type of the Spark driver. Default value: small. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.executor.resourceSpec
No
The resource type of the Spark executor. Default value: large. For information about the valid values, see the Type column in the "Spark resource specifications" table of the Conf configuration parameters topic.
spark.adb.executorDiskSize
No
The disk capacity of the Spark executor. Valid values: (0,100]. Unit: GiB. Default value: 10 GiB. For more information, see the "Specify driver and executor resources" section of the Conf configuration parameters topic.
(Optional) Check whether the submitted job is an elastic import job.
SELECT job_name, (job_type == 3) AS is_elastic_load FROM INFORMATION_SCHEMA.kepler_meta_async_jobs where job_name = "2023081818010602101701907303151******"
Sample result:
+---------------------------------------+------------------+ | job_name | is_elastic_load | +---------------------------------------+------------------+ | 2023081517195102101701907203151****** | 1 | +---------------------------------------+------------------+
If an elastic import job is submitted, 1 is returned for the
is_elastic_load
parameter. If a regular import job is submitted, 0 is returned.