AnalyticDB for MySQL supports importing data from various data sources, such as RDS MySQL, MongoDB, OSS, MaxCompute, and Kafka, into a data warehouse or a data lake. The import methods vary depending on the data source. Use this document to select an appropriate import method.
Overview
The differences between ingesting data into a data warehouse and a data lake are as follows:
Data ingestion into a data warehouse:
Data is pre-processed and then imported into the data warehouse.
The data warehouse uses the proprietary Xuanwu analytic storage engine developed by AnalyticDB for MySQL. This storage engine provides enterprise-grade data storage that is highly reliable, highly available, high-performance, and cost-effective. This engine enables AnalyticDB for MySQL to support high-throughput real-time writes and high-performance real-time queries.
Ingesting data into a data warehouse is suitable for business scenarios that require high performance for data analytics.
Data ingestion into a data lake:
Raw data is imported into the data lake in open source table formats, such as Iceberg and Paimon.
You can use the lake storage provided by AnalyticDB for MySQL or your own OSS bucket as the data lake storage. Because the data lake is open source and stores data in open formats such as Iceberg and Paimon, the data can be accessed by both the Spark and XIHE engines of AnalyticDB for MySQL and external engines such as MaxCompute.
Ingesting data into a data lake is suitable for business scenarios that require open source solutions and do not have strict requirements for analytics performance. If you require high access performance for your data lake, you can enable LakeCache to achieve higher bandwidth and lower latency compared to OSS.
Data ingestion into a data warehouse
Category | Data source | Import method | Product edition | Documentation |
Database | RDS MySQL | Appearance | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |
DTS | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Seamless integration | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
RDS SQL Server | DTS | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
PolarDB Distributed Edition (formerly DRDS) | DTS | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
One-stop synchronization | Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
PolarDB for MySQL | Federated analytics | Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
DTS | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Seamless integration | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
MongoDB | Appearance | Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
Seamless integration | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Lindorm | Seamless integration | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
Oracle | DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
Self-managed MySQL | Appearance | Data Warehouse Edition | ||
Self-managed HBase | DTS | Data Warehouse Edition | ||
Storage | OSS | Appearance | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Tablestore | Appearance | Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
HDFS | Appearance | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | ||
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Big data | MaxCompute | Appearance | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |
DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |||
Flink | Flink | Data Warehouse Edition | ||
Message queue | Kafka | DataWorks | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | |
Logstash plugin | Data Warehouse Edition | |||
Log data | Log data | Data synchronization | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | Synchronize SLS data to Data Warehouse Edition using the data synchronization feature |
Logstash plugin | Data Warehouse Edition | |||
Local data | SQLAlchemy | Data Warehouse Edition, Enterprise Edition, Basic Edition, or Data Lakehouse Edition | Import DataFrame data using SQLAlchemy | |
LOAD DATA | Data Warehouse Edition | |||
Import tool | Data Warehouse Edition | |||
Kettle | Data Warehouse Edition | |||
Data ingestion into a data lake
This feature is available only for Enterprise Edition, Basic Edition, or Data Lakehouse Edition clusters.
Category | Data source | Import method | Documentation |
Message queue | Kafka | Data synchronization | Synchronize Kafka data using the data synchronization feature (Recommended) |
Log data | Simple Log Service (SLS) | Data synchronization | Synchronize SLS data using the data synchronization feature (Recommended) |
Big data | Hive | Data migration | |
Storage | OSS | Metadata discovery |
References
AnalyticDB for MySQL also supports the asynchronous submission of data import tasks. For more information, see Submit an asynchronous import task.