AnalyticDB for MySQL supports five data import methods: external tables, DataWorks, Data Transmission Service (DTS), Java Database Connectivity (JDBC)-based programs, and data synchronization. The right choice depends on where your data lives, how much of it there is, and how often you need to move it.
Choose an import method
| Data source | Volume / frequency | Recommended method |
|---|---|---|
| OSS, HDFS, MaxCompute, RDS, PolarDB | Large batch (GB–TB), one-time or daily | External tables |
| RDS for MySQL, Oracle, OSS, MaxCompute, HDFS | Small batch, every minute or hour | DataWorks |
| RDS for MySQL, PolarDB for MySQL, HBase | Real-time, within seconds | DTS |
| Log files, on-premises data, custom pipelines | Any volume, requires preprocessing | JDBC-based program |
| SLS, ApsaraMQ for Kafka, Hive, OSS | Streaming or metadata-driven | Data synchronization |
Use external tables to import data
External tables map to data stored in external systems (OSS, HDFS, MaxCompute, and several RDS and PolarDB sources). AnalyticDB for MySQL reads data concurrently across all cluster nodes and builds indexes automatically when the job completes, making this the highest-throughput option for large imports.
Key behaviors:
Imported data is invisible while the job runs and becomes queryable only after it completes.
If the target partition already exists, its data is overwritten.
Index building runs automatically at job completion to optimize query performance.
Schedule large jobs during off-peak hours to avoid resource contention.
Regular import vs. elastic import
By default, AnalyticDB for MySQL uses regular import, which reads source data on interactive resource groups (resident compute nodes) and writes indexes on storage nodes. This can drive high CPU utilization and I/O usage on storage nodes during the job.
Elastic import (available on V3.1.10.0 and later) offloads both reading and index building to job resource groups (dynamically scaled compute nodes), leaving storage nodes nearly unaffected. Use elastic import when you need higher concurrency or want to protect interactive workloads during data loading.
| Regular import | Elastic import | |
|---|---|---|
| Minimum version | No limit | V3.1.10.0 and later |
| Concurrent jobs | Up to 2 | Up to 32 |
| Resources consumed | Interactive resource groups + storage nodes | Job resource groups only; minimal storage node impact |
| Enabled by default | Yes | Yes, on V3.1.10 and later |
| Supported data sources | MaxCompute, OSS, HDFS, ApsaraDB RDS for MySQL, ApsaraDB RDS for SQL Server, PolarDB-X, PolarDB for MySQL | MaxCompute, OSS |
| Minimum job duration | Varies | ~2–3 minutes |
| Recommended use scenarios | You want to import a small amount of data; no requirements on speed or concurrency | You want to consume fewer storage node resources to accelerate data import; an import job involves multiple tables; you want to import a large amount of data to partitioned tables |
Elastic import limits:
Supported only on AnalyticDB for MySQL Data Lakehouse Edition (V3.0) based on external tables using
INSERT OVERWRITE INTO.Imports data from a single table only — joined tables are not supported.
Supports only
WHEREandLIMITclauses;ORDER BYis not supported. In theSELECTstatement, you can use asterisks (*), column names, default values, or constants for the column and value fields — SQL functions are not supported.Targets fact tables only.
Supported data types:
BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,DECIMAL,VARCHAR,DATE,TIME,DATETIME,TIMESTAMP.
Elastic import requires at least 2–3 minutes to complete. For jobs that must finish within 3 minutes, use regular import instead.
Tune elastic import concurrency
To increase concurrency beyond the default, set the adb.load.job.max.acu parameter. Its default value is (number of shards) + 1. Set it to K × default value where K ≥ 1. Keep K at or below the number of partitions in the target partitioned table.
Query the number of shards in your cluster:
SELECT count(1) FROM information_schema.kepler_meta_shards;For partition counts, see Storage analysis.
Use cases
Data warehouse initialization: Load terabytes of historical data from OSS or HDFS into AnalyticDB for MySQL for the first time. Schedule during off-peak hours to avoid resource contention.
Daily batch ingestion: Import gigabytes to terabytes of processed data from MaxCompute into AnalyticDB for MySQL each day for accelerated analysis.
Performance optimization
For optimization strategies specific to external table imports, see Optimize data import performance — external tables.
Use DataWorks to import data
DataWorks provides a visual, no-code interface for configuring data synchronization jobs between a wide range of sources and AnalyticDB for MySQL. It suits frequent small-batch imports and scenarios where data comes from multiple heterogeneous sources.
For imports of several hundred gigabytes or more, use external tables instead. DataWorks is optimized for smaller, more frequent transfers.
Use cases
Frequent small-batch imports: Load small amounts of data every minute or every hour for near-real-time analysis.
Multi-source consolidation: Pull data from sources like Tablestore, Redis, and PostgreSQL into a single AnalyticDB for MySQL cluster.
Get started
Configure the data source. Supported sources:
Configure source and destination connections for the synchronization job.
Performance optimization
For optimization strategies specific to DataWorks imports, see Optimize data import performance — DataWorks.
Use DTS to import data
Data Transmission Service (DTS) is a real-time data streaming service that supports relational database management system (RDBMS), NoSQL, and online analytical processing (OLAP) databases as sources. DTS captures change data continuously and replicates it to AnalyticDB for MySQL within seconds, making it the right choice when your analysis must reflect the latest state of an operational database.
Use cases
Real-time synchronization: Stream changes from ApsaraDB RDS for MySQL or PolarDB for MySQL to AnalyticDB for MySQL within seconds.
Multi-source aggregation: Consolidate data from multiple RDS for MySQL instances or PolarDB for MySQL clusters into a single AnalyticDB for MySQL cluster. Use DTS's multi-table merging feature to unify multiple source tables with the same schema into one destination table.
Get started
Use a JDBC-based program to import data
A JDBC-based program gives you full control over the import pipeline. Use this method when data requires preprocessing — parsing log files, applying transformations, or handling formats that external tables cannot read — or when staging data to OSS, HDFS, or MaxCompute first is not feasible.
Use cases
Import after data preprocessing: Parse log files or apply custom transformations before loading data into AnalyticDB for MySQL in real time.
On-premises data import: Load on-premises data that cannot be staged to OSS, HDFS, or MaxCompute beforehand.
Usage notes
Configure an appropriate JDBC driver before connecting. See Supported MySQL JDBC driver versions.
For large imports, configure a connection pool to reduce overhead. See Druid connection pool.
Use batch inserts and concurrent connections to maximize throughput.
For streaming data (for example, Apache Flink pipelines), see Import data from Apache Flink.
For non-customized on-premises data, see Use LOAD DATA to import data to Data Warehouse Edition and Use the AnalyticDB for MySQL import tool to import data to Data Warehouse Edition.
Performance optimization
For optimization strategies specific to JDBC-based imports, see Optimize data import performance — JDBC.
Use data synchronization to import data
AnalyticDB for MySQL includes built-in data synchronization features for streaming ingestion, metadata management, and data migration: AnalyticDB Pipeline Service (APS), metadata discovery, and Hive data migration.
Use cases
Low-cost log and message ingestion: Use APS to stream data continuously from Simple Log Service (SLS) and ApsaraMQ for Kafka into AnalyticDB for MySQL for cost-effective storage and analysis.
OSS metadata discovery: When OSS data lacks structured metadata, the metadata discovery feature scans OSS and builds database, table, and partition metadata — including data formats and field definitions — making the data accessible to AnalyticDB for MySQL and other analysis engines.
Hive data migration: Migrate Hive data to OSS using the Hive data migration feature. AnalyticDB for MySQL then automatically organizes the metadata for subsequent analysis.
Get started
Automatically synchronize the metadata of a PolarDB-X instance to a Data Lakehouse Edition cluster
Use data synchronization to synchronize data from Simple Log Service to Data Warehouse Edition
Use data synchronization to synchronize data from Simple Log Service to Data Lakehouse Edition
Use metadata discovery to import data to Data Lakehouse Edition