All Products
Search
Document Center

AnalyticDB:Data import methods

Last Updated:Mar 28, 2026

AnalyticDB for MySQL supports five data import methods: external tables, DataWorks, Data Transmission Service (DTS), Java Database Connectivity (JDBC)-based programs, and data synchronization. The right choice depends on where your data lives, how much of it there is, and how often you need to move it.

Choose an import method

Data sourceVolume / frequencyRecommended method
OSS, HDFS, MaxCompute, RDS, PolarDBLarge batch (GB–TB), one-time or dailyExternal tables
RDS for MySQL, Oracle, OSS, MaxCompute, HDFSSmall batch, every minute or hourDataWorks
RDS for MySQL, PolarDB for MySQL, HBaseReal-time, within secondsDTS
Log files, on-premises data, custom pipelinesAny volume, requires preprocessingJDBC-based program
SLS, ApsaraMQ for Kafka, Hive, OSSStreaming or metadata-drivenData synchronization

Use external tables to import data

External tables map to data stored in external systems (OSS, HDFS, MaxCompute, and several RDS and PolarDB sources). AnalyticDB for MySQL reads data concurrently across all cluster nodes and builds indexes automatically when the job completes, making this the highest-throughput option for large imports.

Key behaviors:

  • Imported data is invisible while the job runs and becomes queryable only after it completes.

  • If the target partition already exists, its data is overwritten.

  • Index building runs automatically at job completion to optimize query performance.

  • Schedule large jobs during off-peak hours to avoid resource contention.

Regular import vs. elastic import

By default, AnalyticDB for MySQL uses regular import, which reads source data on interactive resource groups (resident compute nodes) and writes indexes on storage nodes. This can drive high CPU utilization and I/O usage on storage nodes during the job.

Elastic import (available on V3.1.10.0 and later) offloads both reading and index building to job resource groups (dynamically scaled compute nodes), leaving storage nodes nearly unaffected. Use elastic import when you need higher concurrency or want to protect interactive workloads during data loading.

Regular importElastic import
Minimum versionNo limitV3.1.10.0 and later
Concurrent jobsUp to 2Up to 32
Resources consumedInteractive resource groups + storage nodesJob resource groups only; minimal storage node impact
Enabled by defaultYesYes, on V3.1.10 and later
Supported data sourcesMaxCompute, OSS, HDFS, ApsaraDB RDS for MySQL, ApsaraDB RDS for SQL Server, PolarDB-X, PolarDB for MySQLMaxCompute, OSS
Minimum job durationVaries~2–3 minutes
Recommended use scenariosYou want to import a small amount of data; no requirements on speed or concurrencyYou want to consume fewer storage node resources to accelerate data import; an import job involves multiple tables; you want to import a large amount of data to partitioned tables

Elastic import limits:

  • Supported only on AnalyticDB for MySQL Data Lakehouse Edition (V3.0) based on external tables using INSERT OVERWRITE INTO.

  • Imports data from a single table only — joined tables are not supported.

  • Supports only WHERE and LIMIT clauses; ORDER BY is not supported. In the SELECT statement, you can use asterisks (*), column names, default values, or constants for the column and value fields — SQL functions are not supported.

  • Targets fact tables only.

  • Supported data types: BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, DATE, TIME, DATETIME, TIMESTAMP.

Elastic import requires at least 2–3 minutes to complete. For jobs that must finish within 3 minutes, use regular import instead.

Tune elastic import concurrency

To increase concurrency beyond the default, set the adb.load.job.max.acu parameter. Its default value is (number of shards) + 1. Set it to K × default value where K ≥ 1. Keep K at or below the number of partitions in the target partitioned table.

Query the number of shards in your cluster:

SELECT count(1) FROM information_schema.kepler_meta_shards;

For partition counts, see Storage analysis.

Use cases

  • Data warehouse initialization: Load terabytes of historical data from OSS or HDFS into AnalyticDB for MySQL for the first time. Schedule during off-peak hours to avoid resource contention.

  • Daily batch ingestion: Import gigabytes to terabytes of processed data from MaxCompute into AnalyticDB for MySQL each day for accelerated analysis.

Performance optimization

For optimization strategies specific to external table imports, see Optimize data import performance — external tables.

Use DataWorks to import data

DataWorks provides a visual, no-code interface for configuring data synchronization jobs between a wide range of sources and AnalyticDB for MySQL. It suits frequent small-batch imports and scenarios where data comes from multiple heterogeneous sources.

For imports of several hundred gigabytes or more, use external tables instead. DataWorks is optimized for smaller, more frequent transfers.

Use cases

  • Frequent small-batch imports: Load small amounts of data every minute or every hour for near-real-time analysis.

  • Multi-source consolidation: Pull data from sources like Tablestore, Redis, and PostgreSQL into a single AnalyticDB for MySQL cluster.

Get started

  1. Configure the data source. Supported sources:

  2. Add an AnalyticDB for MySQL V3.0 data source.

  3. Configure source and destination connections for the synchronization job.

Performance optimization

For optimization strategies specific to DataWorks imports, see Optimize data import performance — DataWorks.

Use DTS to import data

Data Transmission Service (DTS) is a real-time data streaming service that supports relational database management system (RDBMS), NoSQL, and online analytical processing (OLAP) databases as sources. DTS captures change data continuously and replicates it to AnalyticDB for MySQL within seconds, making it the right choice when your analysis must reflect the latest state of an operational database.

Use cases

  • Real-time synchronization: Stream changes from ApsaraDB RDS for MySQL or PolarDB for MySQL to AnalyticDB for MySQL within seconds.

  • Multi-source aggregation: Consolidate data from multiple RDS for MySQL instances or PolarDB for MySQL clusters into a single AnalyticDB for MySQL cluster. Use DTS's multi-table merging feature to unify multiple source tables with the same schema into one destination table.

Get started

Use a JDBC-based program to import data

A JDBC-based program gives you full control over the import pipeline. Use this method when data requires preprocessing — parsing log files, applying transformations, or handling formats that external tables cannot read — or when staging data to OSS, HDFS, or MaxCompute first is not feasible.

Use cases

  • Import after data preprocessing: Parse log files or apply custom transformations before loading data into AnalyticDB for MySQL in real time.

  • On-premises data import: Load on-premises data that cannot be staged to OSS, HDFS, or MaxCompute beforehand.

Usage notes

Performance optimization

For optimization strategies specific to JDBC-based imports, see Optimize data import performance — JDBC.

Use data synchronization to import data

AnalyticDB for MySQL includes built-in data synchronization features for streaming ingestion, metadata management, and data migration: AnalyticDB Pipeline Service (APS), metadata discovery, and Hive data migration.

Use cases

  • Low-cost log and message ingestion: Use APS to stream data continuously from Simple Log Service (SLS) and ApsaraMQ for Kafka into AnalyticDB for MySQL for cost-effective storage and analysis.

  • OSS metadata discovery: When OSS data lacks structured metadata, the metadata discovery feature scans OSS and builds database, table, and partition metadata — including data formats and field definitions — making the data accessible to AnalyticDB for MySQL and other analysis engines.

  • Hive data migration: Migrate Hive data to OSS using the Hive data migration feature. AnalyticDB for MySQL then automatically organizes the metadata for subsequent analysis.

Get started