Data Integration Architecture, Pipelines & Dev-Prod Mode - Dataphin

Data Integration is a simple and efficient data synchronization platform built on Dataphin. It provides powerful data pre-processing and facilitates high-speed, stable data synchronization between disparate data sources.

Background information

The use of big data across industries creates a high demand for data integration. Common requirements include efficiently configuring sync tasks for numerous data tables, integrating multiple disparate data sources, performing lightweight pre-processing on source data, and optimizing data sync tasks for fault tolerance, rate limiting, and concurrency.

Function overview

Note

If you purchased Dataphin after April 2020, the data synchronization feature has been upgraded to Data Integration.

Dataphin has upgraded its Data Integration features to help you build a simple, efficient, secure, and reliable data synchronization platform:

Improved data integration efficiency: You can use full database migration to quickly generate batch sync tasks. You can also use one-click target table creation to sync data to MaxCompute without manually creating tables. For more information, see Configure Integration Tasks Using Full Database Migration.
You can use flow and transform components to perform data pre-processing operations, such as traffic scrubbing, transformation, field desensitization, calculation, merging, distribution, and filtering. For more information, see Create and configure a cold migration pipeline.
Multiple developer modes: Data Integration supports both Dev-Prod and Basic developer modes, allowing you to choose the one that best fits your business scenario.
Logical table synchronization: You can quickly sync logical tables created in Dataphin to a destination database.
Custom components: You can create custom components to meet data synchronization needs in different business scenarios. Relational Database Management System (RDBMS) database components connect through Java Database Connectivity (JDBC). For non-RDBMS database components, you must upload the JAR package.

Data Integration supports various component types, allowing you to generate an offline single pipeline by dragging, configuring, and assembling components. It also supports the rapid generation of batch sync tasks. Full database migration supports MySQL, SQL Server, and Oracle as sources, with MaxCompute as a destination. Additionally, you can create custom components to meet specific data synchronization needs.

Data Integration

Quick access (recommended)

On the Dataphin home page, click Import Data in the product usage path to open Data Integration.

Standard access

On the Dataphin home page, choose Data Studio > Data Integration from the top menu bar to open the Data Integration page.