By creating an OSS data source, you can enable Dataphin to read business data from OSS or write data to OSS. This topic describes how to create an OSS data source.
Background information
OSS refers to Alibaba Cloud Object Storage Service. OSS is a secure, cost-effective, and highly reliable cloud storage service. OSS allows you to store large amounts of data in the cloud. If you use Alibaba Cloud Object Storage Service, you need to create an OSS data source before connecting to Dataphin for data development or writing Dataphin data to OSS. For more information, see What is OSS?.
Permissions
Only custom global roles with the Create Data Source permission and system roles such as Super Administrator, Data Source Administrator, Division Architect, and Project Administrator can create data sources.
Procedure
On the Dataphin homepage, click Management Center > Datasource Management in the top navigation bar.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select OSS in the File section.
If you have recently used OSS, you can also select OSS in the Recently Used section. You can also quickly search for OSS by entering keywords in the search box.
On the Create OSS Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the
data source code.table nameordata source code.schema.table nameformat. If you need to automatically access the data source in the corresponding environment based on the current environment, you can use the variable format${data source code}.tableor${data source code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source, which cannot exceed 128 characters.
Data Source Configuration
Select the data source to be configured:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both production and development data sources. If you select Production Data Source, you only need to configure the connection information for the production data source. Production data source Production data source
NoteTypically, production and development data sources should be configured as different data sources to isolate the development environment from the production environment and reduce the impact of the development data source on the production data source. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
Endpoint
The endpoint of the region where OSS is located. The format is
http://{oss-Region}.aliyuncs.com, where Region is the region where the bucket is located. For example, the endpoint for China (Hangzhou) ishttps://oss-cn-hangzhou.aliyuncs.comThe endpoint of the OSS service is related to the region. Different domain names need to be filled in when accessing different regions.
Bucket
The bucket information corresponding to the region where OSS is located. A bucket is a container used to store objects. You can obtain the bucket corresponding to the region where OSS is located on the Bucket List page.
You can create one or more buckets and add one or more objects to each bucket. During data synchronization, DataWorks can search for objects only in the bucket that is specified by this parameter.
Directory
If you only have permissions for a specific directory, you can specify the directory path here. For example,
/dataphin/.CNAME
Optional. The custom domain name of the OSS data source. For details about Alibaba Cloud OSS custom domain names, see Use custom domain names to access OSS resources.
Access ID, Access Key
The AccessKey ID and AccessKey Secret of the account to which the OSS data source belongs.
For information about how to obtain them, see Create an AccessKey pair.
Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, and data preview.
Click Test Connection or directly click OK to save and complete the creation of the OSS data source.
Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.