All Products
Search
Document Center

AnalyticDB for MySQL:Data import methods

Last Updated:Oct 18, 2023

To meet a variety of data import requirements, AnalyticDB for MySQL allows you to import data by using external tables, DataWorks, or a program and Java Database Connectivity (JDBC). This topic describes the features and use scenarios of the preceding import methods to help you select a suitable data import method.

Use external tables to import data

AnalyticDB for MySQL allows you to access different data sources and create external tables to map external data sources. This way, you can concurrently read external data and import the data to AnalyticDB for MySQL. This data import method maximizes the use of cluster resources to improve import performance.

Basic characteristics

  • This method is suitable for importing a large amount of data in a single job.

  • A data import job consumes large amounts of cluster resources to ensure high performance. We recommend that you use this method during off-peak hours.

  • The imported data is invisible when the data import job is in progress, and becomes visible only after the job is completed.

  • If you import a partition by using this method, data of the existing partition that uses the same name is overwritten.

  • When a data import job is completed by using an external table, table indexes are created to improve query performance.

Common scenarios

  • Data initialization in data warehouses

    If you want to initialize and import terabytes of data to AnalyticDB for MySQL for analysis, we recommend that you store the data in Object Storage Service (OSS) or Apsara File Storage for HDFS and then use external tables to import the data.

  • Accelerated analysis of batch data

    Batch data is stored in batch data warehouses such as MaxCompute. Each day, gigabytes or terabytes of data are imported from batch data warehouses to AnalyticDB for MySQL for accelerated analysis.

Methods

You can import data by using regular import or elastic import. The default method is regular import. Regular import reads data from compute nodes and creates indexes on storage nodes. This method consumes computing and storage resources. Elastic import reads data and creates indexes for Serverless Spark jobs. This method consumes resources of job resource groups.

Regular import

AnalyticDB for MySQL supports regular import from various data sources such as MaxCompute, Apsara File Storage for HDFS, OSS, and ApsaraDB RDS for MySQL based on external tables. For more information, see the following topics:

Elastic import

Only AnalyticDB for MySQL Data Lakehouse Edition (V3.0) clusters of V3.1.10.0 or later that have job resource groups support elastic import. Elastic import consumes a small amount of storage resources or does not consume computing and storage resources. This reduces impacts on real-time data reads and writes and improves resource isolation. If job resource groups have sufficient resources, AnalyticDB for MySQL can run multiple elastic import jobs at the same time. You can also increase the maximum amount of resources for an elastic import job to accelerate the job. This improves job concurrency and elasticity.

AnalyticDB for MySQL supports elastic import only from MaxCompute and OSS data sources based on external tables. For more information, see the following topics:

Scenarios

Elastic import consumes a small amount of resources of worker nodes and accelerates data import. Elastic import is suitable for the following scenarios:

  • Multiple import jobs need to be run. In this case, you can use elastic import to accelerate the import. Regular import supports up to two concurrent import jobs. Elastic import supports up to 32 concurrent import jobs.

  • Large amounts of data need to be imported to multiple partitions of a partitioned table in AnalyticDB for MySQL. In this case, you can use elastic import and specify the adb.load.job.max.acu parameter to increase the maximum amount of resources for an elastic import job and improve job scalability.

    Note

    The default value of the adb.load.job.max.acu parameter is the number of shards plus 1. You can set the parameter to K × default value (K ≥ 1). We recommend that you set the maximum value of K to a value that is less than or equal to the number of partitions of the partitioned table in the current import job. For more information, see the "Hint parameters" section of the Use external tables to import data to Data Lakehouse Edition topic.

  • A regular import job results in high CPU and I/O usage of worker nodes, which affects real-time data reads and writes. In this case, you can use elastic import that consumes a small amount of storage resources or does not consume computing and storage resources to improve resource isolation.

Limits

  • You can use elastic import to import data to AnalyticDB for MySQL Data Lakehouse Edition (V3.0) based on external tables by using only the INSERT OVERWRITE INTO statement.

  • Take note of the following limits for SQL operations:

    • You can use elastic import to import data from a single table, but not multiple joined tables.

    • Elastic import supports only the WHERE and LIMIT clauses. Other syntax such as ORDER BY is not supported.

    • When you run an elastic import job, you can use asterisks (*), column names, default values, or constants for the column and value fields in the SELECT statement. Other syntax such as SQL functions is not supported.

  • You can use elastic import to import data only to fact tables.

  • Elastic import supports the following data types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, DATE, TIME, DATETIME, and TIMESTAMP.

Usage notes

  • Elastic import reads data and creates indexes based on job resource groups. This consumes resources of job resource groups, and you are charged for the resources. For more information, see View monitoring information of resource groups and Billable items of Data Lakehouse Edition (V3.0).

  • Make sure that job resource groups have sufficient resources to prevent long waiting time for jobs, long execution durations, and job failures.

  • Compared with regular import, elastic import displays different metrics, such as CPU utilization and I/O usage, on the monitoring page of the AnalyticDB for MySQL console.

Recommendations

  • An elastic import job requires at least 2 minutes to 3 minutes to complete and is not suitable for small amounts of data. If an import job must be completed within 3 minutes, we recommend that you use regular import.

  • An elastic import job requires a longer period of time to complete than a regular import job that uses the same resources. If you want an import job to be completed within a short period of time, we recommend that you increase the maximum amount of resources for an elastic import job to accelerate the job.

Use DataWorks to import data

DataWorks provides a visual data import method that allows you to import data from a variety of data sources to AnalyticDB for MySQL. Compared with external tables, DataWorks is more suitable when only a small amount of data needs to be imported.

Note

To import several hundred gigabytes of data, we recommend that you use external tables instead of DataWorks. For more information, see the "Use external tables to import data" section of this topic.

Common scenarios

  • Data import every minute or hour

    Small amounts of data need to be imported to AnalyticDB for MySQL every minute or hour for analysis.

  • Data import from multiple heterogeneous data sources

    Data needs to be imported from multiple data sources such as Tablestore, Redis, and PostgreSQL to AnalyticDB for MySQL.

Methods

To use DataWorks to import data, perform the following steps:

  1. Configure the data source. The following data sources are supported: ApsaraDB RDS for MySQL, Oracle, ApsaraDB RDS for SQL Server, OSS, MaxCompute, and HDFS. For more information, see the following topics:

  2. Add an AnalyticDB for MySQL V3.0 data source.

  3. Configure the source and destination connections for a synchronization job.

Optimization of import performance

For information about how to improve the performance of this import method, see the "Optimize the performance when you use DataWorks to import data" section of the Optimize data import performance topic.

Use a program and JDBC to import data

If using external tables or DataWorks to import data cannot meet your requirements, you can write a program and use JDBC to import data in data cleansing and other complex scenarios that involve unstructured data.

Common scenarios

  • Import after data pre-processing

    Log files are generated and need to be automatically parsed and imported to AnalyticDB for MySQL in real time.

  • On-premises data import

    On-premises data that fails to be uploaded to OSS, Apsara File Storage for HDFS, or MaxCompute needs to be imported to AnalyticDB for MySQL.

Usage notes

Optimization of import performance

For information about how to improve the performance of this import method, see the "Optimize the performance when you use a program and JDBC to import data" section of the Optimize data import performance topic.