Create a data source in Data Integration - DataWorks - Alibaba Cloud Documentation Center

A data source connects to various databases and storage services, such as MaxCompute, MySQL, and OSS. It is a prerequisite for Data Integration synchronization tasks because it defines the database that a task reads from (the source) and writes to (the destination).

The role of a data source

In a Data Integration task, a data source acts as an endpoint at both ends of the data flow:

Source (Reader): The task reads data from the data source configured as the source.
Destination (Writer): The task writes the processed data to the data source configured as the destination.

You must configure both a source and a destination data source before you can synchronize a single table or a full database, in either batch or real-time mode. A correctly configured data source with proper network connectivity is required for tasks to run successfully.

Supported data source types

For a list of data sources supported by DataWorks Data Integration, see Supported data sources and synchronization solutions. The configuration process might vary slightly depending on the data source type. Refer to the UI for specific details.

Create a data source

Important

DataWorks recommends that you create and manage all data sources centrally in Management Center. Data sources created here are reusable, manageable, and support features like environment isolation. This approach is a best practice for enterprise-level data development and production workloads.

Configuration: See Data source management.

You can create a data source in either Management Center or Data Integration. The following table compares the two methods.

Capability	Management Center (recommended)	Data Integration
Management location	Management Center > Data Sources.	Data Integration > Data Sources.
Environment isolation	Supports separate configurations for the development and production environments to protect production data.	Not supported. Only a production environment is available.
Multi-module reusability	Can be used across all modules, including Data Integration, DataStudio, and Data Analysis.	Has limited functionality when used in other modules.
Permission control	Supports cross-workspace authorization.	Does not support authorization.
Applicable mode	Recommended for workspaces in standard mode. Aligns with enterprise standards.	Suitable for basic mode or for standard mode scenarios that do not require isolation.
Cloning capability	Supports cloning to quickly create a new data source.	Not supported.

Both methods support third-party authentication and configuring data sources by using a RAM role.

The procedure for creating a data source is the same in both locations.
When you create a data source in Management Center, a corresponding data source with the same name is automatically created in Data Integration. Both share the same production environment configuration.
When you create a data source in Data Integration, a corresponding data source with the same name is also automatically created in Management Center. However, this data source contains only the production environment information. You must manually configure the development environment, which is marked as incomplete.
Configuration parameters vary by data source type. For more details, see Data source list.

Using data sources

Basic mode:

In a workspace in basic mode, there is only one environment. Data sources created in Management Center and Data Integration are identical.

Standard mode:

A workspace in standard mode supports environment isolation for data sources. A single data source name can have two separate configurations: one for the development environment and one for the production environment. You can set them to different databases or instances to isolate test data from production data. This isolation protects your production data.

In Data Integration, only tasks for batch synchronization for a single table support data source environment isolation. All other types of synchronization tasks use the production environment data source.
A data source created in Data Integration contains only the production environment configuration. Because its development environment information is missing, it cannot be used directly in data development tasks. You must complete the development environment configuration in Management Center before using it in DataStudio or for batch synchronization of a single table.

Next steps

After you configure the data source and it passes the connectivity test, you can configure a synchronization task in Data Integration:

Batch synchronization for a single table: Configure by using the codeless UI or Configure by using the code editor.
Real-time synchronization for a single table: Configure a single-table real-time synchronization task.
Batch synchronization for a full database: Configure a batch synchronization task for a full database.
Real-time synchronization for a full database: Configure a real-time synchronization task for a full database.
Full and incremental synchronization for a full database: Configure a full and incremental synchronization task for a full database

FAQ

For more common questions about using data sources, see Data Integration FAQ.