A data source connects to databases and storage services such as MaxCompute, MySQL, and OSS. Data Integration synchronization tasks use data sources to define where to read and write data.
Role of a data source
In Data Integration, a data source serves as an endpoint for data flow:
-
Source (Reader): The task reads data from the source data source.
-
Destination (Writer): The task writes processed data to the destination data source.
Configure both a source and destination data source before you synchronize data. Tasks require correctly configured data sources with stable network connectivity.
Supported data source types
DataWorks Data Integration supports various data source types listed in Supported data sources and synchronization solutions. The configuration varies by data source type.
Create a data source
Create and manage all data sources in Management Center. Data sources in Management Center are reusable, support environment isolation, and align with enterprise best practices.
Configuration: Data source management.
You can create a data source in Management Center or Data Integration. The following table compares the two methods.
|
Capability |
Management Center (recommended) |
Data Integration |
|
Management location |
. |
. |
|
Supports separate development and production configurations to isolate data. |
Not supported. Only the production environment is available. |
|
|
Multi-module reusability |
Usable across all modules: Data Integration, Data Studio, and data analysis. |
Limited functionality in other modules. |
|
Permission control |
Supports cross-workspace authorization. |
Does not support authorization. |
|
Applicable mode |
Recommended for standard mode workspaces. |
Suitable for basic mode or standard mode without isolation. |
|
Cloning capability |
Supports cloning data sources. |
Not supported. |
Both methods support third-party authentication and configuring data sources by using a RAM role.
-
The procedure for creating a data source is the same in both locations.
-
Creating a data source in Management Center automatically creates a matching data source in Data Integration. Both share the same production environment configuration.
-
Creating a data source in Data Integration automatically creates a matching data source in Management Center with only the production environment configuration. You must add the development environment configuration manually.
-
Configuration parameters vary by data source type. Data source list.
Using data sources
Basic mode:
Basic mode workspaces have one environment. Data sources in Management Center and Data Integration are identical.
Standard mode:
A workspace in standard mode supports data source environment isolation. Each data source can have separate development environment and production environment configurations that point to different databases or instances.
-
In Data Integration, only single-table batch synchronization tasks support environment isolation. Other synchronization tasks use the production environment data source.
-
Data sources created in Data Integration contain only the production environment configuration. To use them in Data Studio or single-table batch synchronization, add the development environment configuration in Management Center.
Next steps
After the data source passes the connectivity test, configure a synchronization task:
-
Batch synchronization for a single table: Configure by using the codeless UI and Configure by using the code editor.
-
Real-time synchronization for a single table: Configure a single-table real-time synchronization task.
-
Batch synchronization for a full database: Configure a batch synchronization task for a full database.
-
Real-time synchronization for a full database: Configure a real-time synchronization task for a full database.
-
Full and incremental synchronization for a full database: Configure a full and incremental synchronization task for a full database
FAQ
-
Why does data source connectivity sometimes succeed and sometimes fail?
-
The connectivity test fails when I access a database in a VPC. How do I fix this?
More data source questions are answered in Data Integration FAQ.