Before leveraging Dataphin, select a database or data warehouse as a data source that aligns with your business scenario requirements. This data source is utilized for reading raw data and writing data throughout the data construction process. Dataphin integrates multiple data engines, enabling the integration of data warehouses such as MaxCompute and Hive, along with traditional enterprise databases like MySQL and Oracle.
Background information
Dataphin supports a range of data sources, including big data storage, file-based, message queues, relational databases, and NoSQL databases. The supported data sources for each module are outlined below:
To connect a data source in Dataphin, you must first establish the data source in data source management.
When adding data sources to Dataphin, you can include both production and development data sources. The Prod environment of Basic projects and Dev-Prod projects interacts with production data sources, while the Dev environment of Dev-Prod projects engages with development data sources. In DataService Studio, the Prod environment of Basic and Dev-Prod modes accesses production data sources, and the Dev environment of Dev-Prod mode connects to development data sources. Sync tasks do not support dual environments for production and development. The data sources in the product interact with production data sources.
NoteIf the required data source type is not present in the built-in types, you can customize offline or real-time data source types and connect them to Dataphin to fulfill various data source access requirements. For specific procedures, see:
Data source description
```markdown
Common scenarios | Description | References |
Offline integration | Offline integration supports various components, including input, output, and transform components, which can be assembled into a single offline integration pipeline through simple drag-and-drop, configuration, and assembly on the canvas. Offline integration also supports code editor mode, allowing for more personalized configurations. Additionally, input and output components of custom RDBMS data sources created by users will be automatically created in the component library to meet diverse data synchronization needs. | |
Real-time integration | Dataphin supports real-time integration features, allowing the real-time integration of data changes from the entire database or all tables in the source data source to the target data source, achieving real-time data synchronization between the source data source and the target data source. | |
Offlinedevelopment | After connecting data sources to Dataphin, you can create database SQL tasks in Dataphin for development. | |
Metadata acquisition | The Metadata Center is responsible for extracting, processing, centrally storing, and managing metadata from various business systems to support data governance and enhance the organization's internal data organization, retrieval, and analysis capabilities. | |
Real-time development | The connected data sources support the creation of real-time meta tables and the development of real-time tasks. | |
Data Quality | Data Quality, also known as asset quality, is a comprehensive data quality solution provided by the Dataphin platform for data development and usage. Data Quality features include quality rule configuration, quality monitoring, schedule configuration, intelligent alerting, and verification administration. | |
DataService Studio | DataService Studio (OneService) is the final step in building a data mid-end based on Dataphin. As a unified data service outlet, DataService Studio achieves unified market management of data, effectively lowering the threshold for data openness while ensuring the security of data openness. | |
Tag Factory | Tag Factory provides a one-stop tag development and service platform for enterprise data development teams and developers through the construction of full-link services from tag creation. It is suitable for various scenarios (such as risk control and marketing), providing offline, real-time, and service tag development, management, exploration, and service capabilities, empowering upper-layer business applications and enabling enterprises to accumulate tag assets, making tag development efficient, easy to find, easy to use, and easy to manage. |
This topic only lists the data sources supported for integration with Dataphin and the application scenarios supported in Dataphin. For detailed information on the specific features supported by data sources in each scenario, see:
Big data storage data sources
Data Source Type | Offline Integration | Real-Time Integration | Offline Development | Metadata Acquisition | Real-Time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
MaxCompute | Available | Available | No | No | Available | Available | No | Available | |
Hive | Available | Available | No | No | Available | Available | No | No | |
Hologres | Available | No | No | No | Available | Available | Available | Available | |
Impala | Available | No | No | No | No | No | Available | No | |
TDH Inceptor | Available | No | No | No | No | No | No | No | |
Kudu | Available | No | No | No | No | No | No | No | |
StarRocks | Available | No | No | No | Available | No | Available | No | |
Hudi | Available | No | No | No | Available | No | No | No | |
Doris | Available | No | No | No | Available | No | No | No | |
GreenPlum | Available | No | No | No | No | No | No | Available | |
TDengine | No | No | No | No | No | No | Available | No | |
ArgoDB | Available | No | No | No | No | No | No | No | |
Paimon | No | No | No | No | Available | No | No | No | |
SelectDB | Available | No | No | No | No | No | Available | No | |
Lindorm (Compute Engine) | Available | No | No | No | No | No | No | No |
File data sources
Data Source Type | Offline Integration | Real-time Integration | Offline Development | Metadata Acquisition | Real-time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
HDFS | Supported | No | No | No | No | No | No | No | |
FTP | Supported | No | No | No | No | No | No | No | |
OSS | Supported | No | No | No | No | No | No | No | |
Amazon S3 | Supported | No | No | No | No | No | No | No |
Message queue data sources
Data Source Type | Offline Integration | Real-time Integration | Offline Development | Metadata Acquisition | Real-time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
Log Service | Supported | No | No | No | Supported | No | No | No | |
Kafka | Supported | Supported | No | No | Supported | No | No | Supported | |
DataHub | Supported | Supported | No | No | Supported | No | No | Supported |
Relational data sources
Data Source Type | Offline Integration | Real-Time Integration | Offline Development | Metadata Acquisition | Real-Time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
PolarDB | Valid values: | No. | No. | No. | Valid values: | No. | No. | No. | |
PolarDB-X (formerly DRDS) | Acceptable values: | Valid values: None. | Valid values: None. | Valid values: None. | Valid values: None. | Valid values: None. | Valid values: None. | Valid values: None. | |
MySQL | Valid values. | Valid values. | Valid values. | Valid values. | Valid values. | Valid values. | Valid values. | Valid values. | |
SAP HANA | Valid values: | No. | No. | No. | Valid values: | Valid values: | Valid values: | No. | |
Microsoft SQL Server | Valid values | Valid values | No | Valid values | Valid values | Valid values | Valid values | No | |
PostgreSQL | Valid values | Valid values | No | Valid values | Valid values | Valid values | Valid values | Valid values | |
AnalyticDB for MySQL 2.0 | Valid values: None. | None. | None. | None. | Valid values: None. | None. | Valid values: None. | None. | |
AnalyticDB for MySQL 3.0 | Acceptable values: | Valid values: No. | Valid values: No. | Valid values: No. | Valid values: No. | Valid values: No. | Valid values: No. | Valid values: No. | |
AnalyticDB for PostgreSQL | Valid values: | No. | Valid values: | No. | Valid values: | Valid values: | Valid values: | Valid values: | |
OceanBase | Valid values: | No. | No. | No. | Valid values: | No. | No. | No. | |
Data Source Type | Acceptable values: | Acceptable values: | Acceptable values: | Acceptable values: | Acceptable values: | Acceptable values: | Acceptable values: | Acceptable values: | |
Vertica | Valid values: | No | No | No | No | No | No | No | |
IBM DB2 | Valid values. | Valid values. | No. | No. | No. | Valid values. | No. | No. | |
Teradata | Valid values: | No. | No. | No. | No. | No. | No. | No. | |
ClickHouse | Valid values. | Valid values. | No. | No. | Valid values. | Valid values. | Valid values. | No. | |
DM (Dameng) | Valid values: | No. | No. | No. | No. | Valid values: | Valid values: | No. | |
GBase 8a | Valid values: | No | No | No | No | No | No | No | |
KingbaseES | Valid values: | No | No | No | No | No | No | No | |
TiDB | Valid values: | No. | No. | No. | Valid values: | No. | No. | No. | |
GoldenDB | Valid values: | No. | No. | No. | No. | No. | No. | No. | |
PolarDB | Supported | No | No | No | Supported | No | No | No |
NoSQL data sources
Data Source Type | Offline Integration | Real-time Integration | Offline Development | Metadata Acquisition | Real-time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
HBase0.9.4 | No | No | No | No | No | No | Supported | Supported | |
HBase1.1x | Supported | No | No | No | Supported | No | Supported | Supported | |
HBase2.0 | Supported | No | No | No | Supported | No | Supported | Supported | |
Elasticsearch | Supported | No | No | No | Supported | No | Supported | Supported | |
MongoDB | Supported | No | No | No | Supported | No | Supported | No | |
Tablestore | Supported | No | No | No | No | No | No | Supported | |
Aliyun HBase | No | No | No | No | No | No | No | No | |
Redis | Supported | No | No | No | Supported | No | No | No | |
Lindorm (LindormTable) | No | No | No | No | No | No | Supported | Supported |
Semi-structured storage data sources
Data Source Type | Offline Integration | Real-time Integration | Offline Development | Metadata Acquisition | Real-time Development | Data Quality | DataService Studio | Tag Factory | Creation Guide |
API | Available | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Available | |
SAP Table | Available | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | |
Salesforce | Available | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable | Not applicable |