Data sources connect DataWorks to various databases and storage services, such as MaxCompute, MySQL, and OSS. They are required for Data Integration sync tasks, defining where tasks read data from (the source) and write data to (the destination).
The role of a data source
In a Data Integration task, a data source acts as an endpoint at both ends of the data flow:
Source (Reader): The data source where the task reads data.
Destination (Writer): The data source where the task writes processed data.
Whether you are performing single-table, full-database, offline, or real-time synchronization, you must configure the source and destination data sources. Successful task execution requires a properly configured data source and a stable network connection.
Supported data source types
For a list of data sources currently supported by DataWorks Data Integration, see Supported data sources and sync solutions. Configuration settings can vary by data source type. For details, refer to the user interface.
Create a data source
Centrally create and manage data sources in Management Center to ensure they are reusable, manageable, and support features like environment isolation. This is one of the best practices for enterprise-level data development and production tasks.
For configuration instructions, see Data Source Management.
You can create data sources in both the Management Center and Data Integration. The following table describes the differences between the two methods:
Capability | Created in Management Center (Recommended) | Created in Data Integration |
Management location | . | . |
Supports separate configurations for the development environment and production environment to ensure production security. | Not supported. Only a production environment configuration is available. | |
Multi-module reusability | Usable across all modules, including integration, development, and analysis. | Limited functionality when used in other modules. |
Access control | Supports cross-workspace authorization. | Does not support authorization. |
Applicable mode | Best for workspaces in standard mode. Adheres to enterprise-level standards. | Suitable for basic mode or for standard mode scenarios that do not require environment isolation. |
Cloning capability | Supports quick cloning to create new data sources. | Not supported. |
Both methods support managing third-party authentication files and using the RAM role-based authorization mode to add a data source.
The steps to create a data source are the same in both locations.
When you create a data source in Management Center, a data source with the same name is automatically created in Data Integration. The production environment configuration is shared between them.
When you create a data source in Data Integration, a data source with the same name is also automatically created in Management Center. However, this data source only contains production environment information. You must add the development environment information manually.
Configuration parameters vary depending on the data source type. For details, see Data source list.
Use a data source
Basic mode:
When a workspace is in basic mode, there is only one environment. In this case, there is no difference between a data source created in Management Center and one created in Data Integration.
Standard mode:
Workspaces in standard mode support environment isolation for data sources. A data source with the same name can have two sets of configurations: one for the development environment and one for the production environment. You can set them to different databases or instances. This isolates test data from production data, securing your production systems.
In Data Integration, only single-table offline sync tasks support environment isolation for data sources. All other types of sync tasks always use the production environment data source.
A data source created in Data Integration only contains the production environment configuration, and its development environment information is missing. As a result, you cannot select it directly in Data Studio. To use the data source in Data Studio, go to the Management Center and complete the development environment configuration.
Next steps
After a data source is configured and passes the connectivity test, you can:
Go to Data Integration to configure sync tasks: Sync tasks in Data Integration.
Go to Data Studio to configure single-table sync tasks: Offline sync in DataStudio, Real-time sync in DataStudio.
FAQ
What should I do if the connectivity test for a data source sometimes succeeds and sometimes fails?
What should I do if the connectivity test fails when accessing a database in a VPC environment?
For more FAQs on data sources, see Data Integration FAQ.