Choose the Right Data Import Method for AnalyticDB MySQL - AnalyticDB for MySQL

AnalyticDB for MySQL supports five data import methods: external tables, DataWorks, Data Transmission Service (DTS), Java Database Connectivity (JDBC)-based programs, and data synchronization. The right choice depends on where your data lives, how much of it there is, and how often you need to move it.

Choose an import method

Data source	Volume / frequency	Recommended method
OSS, HDFS, MaxCompute, RDS, PolarDB	Large batch (GB–TB), one-time or daily	External tables
RDS for MySQL, Oracle, OSS, MaxCompute, HDFS	Small batch, every minute or hour	DataWorks
RDS for MySQL, PolarDB for MySQL, HBase	Real-time, within seconds	DTS
Log files, on-premises data, custom pipelines	Any volume, requires preprocessing	JDBC-based program
SLS, ApsaraMQ for Kafka, Hive, OSS	Streaming or metadata-driven	Data synchronization

Use external tables to import data

External tables map to data stored in external systems (OSS, HDFS, MaxCompute, and several RDS and PolarDB sources). AnalyticDB for MySQL reads data concurrently across all cluster nodes and builds indexes automatically when the job completes, making this the highest-throughput option for large imports.

Key behaviors:

Imported data is invisible while the job runs and becomes queryable only after it completes.
If the target partition already exists, its data is overwritten.
Index building runs automatically at job completion to optimize query performance.
Schedule large jobs during off-peak hours to avoid resource contention.

Regular import vs. elastic import

By default, AnalyticDB for MySQL uses regular import, which reads source data on interactive resource groups (resident compute nodes) and writes indexes on storage nodes. This can drive high CPU utilization and I/O usage on storage nodes during the job.

Elastic import (available on V3.1.10.0 and later) offloads both reading and index building to job resource groups (dynamically scaled compute nodes), leaving storage nodes nearly unaffected. Use elastic import when you need higher concurrency or want to protect interactive workloads during data loading.

	Regular import	Elastic import
Minimum version	No limit	V3.1.10.0 and later
Concurrent jobs	Up to 2	Up to 32
Resources consumed	Interactive resource groups + storage nodes	Job resource groups only; minimal storage node impact
Enabled by default	Yes	Yes, on V3.1.10 and later
Supported data sources	MaxCompute, OSS, HDFS, ApsaraDB RDS for MySQL, ApsaraDB RDS for SQL Server, PolarDB-X, PolarDB for MySQL	MaxCompute, OSS
Minimum job duration	Varies	~2–3 minutes
Recommended use scenarios	You want to import a small amount of data; no requirements on speed or concurrency	You want to consume fewer storage node resources to accelerate data import; an import job involves multiple tables; you want to import a large amount of data to partitioned tables

Elastic import limits:

Supported only on AnalyticDB for MySQL Data Lakehouse Edition (V3.0) based on external tables using INSERT OVERWRITE INTO.
Imports data from a single table only — joined tables are not supported.
Supports only WHERE and LIMIT clauses; ORDER BY is not supported. In the SELECT statement, you can use asterisks (*), column names, default values, or constants for the column and value fields — SQL functions are not supported.
Targets fact tables only.
Supported data types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, DATE, TIME, DATETIME, TIMESTAMP.

Elastic import requires at least 2–3 minutes to complete. For jobs that must finish within 3 minutes, use regular import instead.

Tune elastic import concurrency

To increase concurrency beyond the default, set the adb.load.job.max.acu parameter. Its default value is (number of shards) + 1. Set it to K × default value where K ≥ 1. Keep K at or below the number of partitions in the target partitioned table.

Query the number of shards in your cluster:

SELECT count(1) FROM information_schema.kepler_meta_shards;

For partition counts, see Storage analysis.

Use cases

Data warehouse initialization: Load terabytes of historical data from OSS or HDFS into AnalyticDB for MySQL for the first time. Schedule during off-peak hours to avoid resource contention.
Daily batch ingestion: Import gigabytes to terabytes of processed data from MaxCompute into AnalyticDB for MySQL each day for accelerated analysis.

Performance optimization

For optimization strategies specific to external table imports, see Optimize data import performance — external tables.

Use DataWorks to import data

DataWorks provides a visual, no-code interface for configuring data synchronization jobs between a wide range of sources and AnalyticDB for MySQL. It suits frequent small-batch imports and scenarios where data comes from multiple heterogeneous sources.

For imports of several hundred gigabytes or more, use external tables instead. DataWorks is optimized for smaller, more frequent transfers.

Use cases

Frequent small-batch imports: Load small amounts of data every minute or every hour for near-real-time analysis.
Multi-source consolidation: Pull data from sources like Tablestore, Redis, and PostgreSQL into a single AnalyticDB for MySQL cluster.

Get started

Configure the data source. Supported sources:
Add an AnalyticDB for MySQL V3.0 data source.
Configure source and destination connections for the synchronization job.

Performance optimization

For optimization strategies specific to DataWorks imports, see Optimize data import performance — DataWorks.

Use DTS to import data

Data Transmission Service (DTS) is a real-time data streaming service that supports relational database management system (RDBMS), NoSQL, and online analytical processing (OLAP) databases as sources. DTS captures change data continuously and replicates it to AnalyticDB for MySQL within seconds, making it the right choice when your analysis must reflect the latest state of an operational database.

Use cases

Real-time synchronization: Stream changes from ApsaraDB RDS for MySQL or PolarDB for MySQL to AnalyticDB for MySQL within seconds.
Multi-source aggregation: Consolidate data from multiple RDS for MySQL instances or PolarDB for MySQL clusters into a single AnalyticDB for MySQL cluster. Use DTS's multi-table merging feature to unify multiple source tables with the same schema into one destination table.

Get started

Use a JDBC-based program to import data

A JDBC-based program gives you full control over the import pipeline. Use this method when data requires preprocessing — parsing log files, applying transformations, or handling formats that external tables cannot read — or when staging data to OSS, HDFS, or MaxCompute first is not feasible.

Use cases

Import after data preprocessing: Parse log files or apply custom transformations before loading data into AnalyticDB for MySQL in real time.
On-premises data import: Load on-premises data that cannot be staged to OSS, HDFS, or MaxCompute beforehand.

Usage notes

Configure an appropriate JDBC driver before connecting. See Supported MySQL JDBC driver versions.
For large imports, configure a connection pool to reduce overhead. See Druid connection pool.
Use batch inserts and concurrent connections to maximize throughput.
For streaming data (for example, Apache Flink pipelines), see Import data from Apache Flink.
For non-customized on-premises data, see Use LOAD DATA to import data to Data Warehouse Edition and Use the AnalyticDB for MySQL import tool to import data to Data Warehouse Edition.

Performance optimization

For optimization strategies specific to JDBC-based imports, see Optimize data import performance — JDBC.

Use data synchronization to import data

AnalyticDB for MySQL includes built-in data synchronization features for streaming ingestion, metadata management, and data migration: AnalyticDB Pipeline Service (APS), metadata discovery, and Hive data migration.

Use cases

Low-cost log and message ingestion: Use APS to stream data continuously from Simple Log Service (SLS) and ApsaraMQ for Kafka into AnalyticDB for MySQL for cost-effective storage and analysis.
OSS metadata discovery: When OSS data lacks structured metadata, the metadata discovery feature scans OSS and builds database, table, and partition metadata — including data formats and field definitions — making the data accessible to AnalyticDB for MySQL and other analysis engines.
Hive data migration: Migrate Hive data to OSS using the Hive data migration feature. AnalyticDB for MySQL then automatically organizes the metadata for subsequent analysis.