A data source connects to various databases and storage services, such as MaxCompute, MySQL, and OSS. It is a prerequisite for Data Integration synchronization tasks because it defines the database that a task reads from (the source) and writes to (the destination).
The role of a data source
In a Data Integration task, a data source acts as an endpoint at both ends of the data flow:
Source (Reader): The task reads data from the data source configured as the source.
Destination (Writer): The task writes the processed data to the data source configured as the destination.
You must configure both a source and a destination data source before you can synchronize a single table or a full database, in either batch or real-time mode. A correctly configured data source with proper network connectivity is required for tasks to run successfully.
Supported data source types
For a list of data sources supported by DataWorks Data Integration, see Supported data sources and synchronization solutions. The configuration process might vary slightly depending on the data source type. Refer to the UI for specific details.
Create a data source
DataWorks recommends that you create and manage all data sources centrally in Management Center. Data sources created here are reusable, manageable, and support features like environment isolation. This approach is a best practice for enterprise-level data development and production workloads.
Configuration: See Data source management.
You can create a data source in either Management Center or Data Integration. The following table compares the two methods.
Capability | Management Center (recommended) | Data Integration |
Management location | . | . |
Supports separate configurations for the development and production environments to protect production data. | Not supported. Only a production environment is available. | |
Multi-module reusability | Can be used across all modules, including Data Integration, DataStudio, and Data Analysis. | Has limited functionality when used in other modules. |
Permission control | Supports cross-workspace authorization. | Does not support authorization. |
Applicable mode | Recommended for workspaces in standard mode. Aligns with enterprise standards. | Suitable for basic mode or for standard mode scenarios that do not require isolation. |
Cloning capability | Supports cloning to quickly create a new data source. | Not supported. |
Both methods support third-party authentication and configuring data sources by using a RAM role.
The procedure for creating a data source is the same in both locations.
When you create a data source in Management Center, a corresponding data source with the same name is automatically created in Data Integration. Both share the same production environment configuration.
When you create a data source in Data Integration, a corresponding data source with the same name is also automatically created in Management Center. However, this data source contains only the production environment information. You must manually configure the development environment, which is marked as incomplete.
Configuration parameters vary by data source type. For more details, see Data source list.
Using data sources
Basic mode:
In a workspace in basic mode, there is only one environment. Data sources created in Management Center and Data Integration are identical.
Standard mode:
A workspace in standard mode supports environment isolation for data sources. A single data source name can have two separate configurations: one for the development environment and one for the production environment. You can set them to different databases or instances to isolate test data from production data. This isolation protects your production data.
In Data Integration, only tasks for batch synchronization for a single table support data source environment isolation. All other types of synchronization tasks use the production environment data source.
A data source created in Data Integration contains only the production environment configuration. Because its development environment information is missing, it cannot be used directly in data development tasks. You must complete the development environment configuration in Management Center before using it in DataStudio or for batch synchronization of a single table.
Next steps
After you configure the data source and it passes the connectivity test, you can configure a synchronization task in Data Integration:
Batch synchronization for a single table: Configure by using the codeless UI or Configure by using the code editor.
Real-time synchronization for a single table: Configure a single-table real-time synchronization task.
Batch synchronization for a full database: Configure a batch synchronization task for a full database.
Real-time synchronization for a full database: Configure a real-time synchronization task for a full database.
Full and incremental synchronization for a full database: Configure a full and incremental synchronization task for a full database
FAQ
Why does data source connectivity sometimes succeed and sometimes fail?
The connectivity test fails when I access a database in a VPC. How do I fix this?
For more common questions about using data sources, see Data Integration FAQ.