By creating an FTP data source, you can enable Dataphin to read business data from FTP or write data to FTP. This topic describes how to create an FTP data source.
Background information
File Transfer Protocol (FTP) is a protocol in the TCP/IP protocol suite. When developing websites, you can use an FTP client to upload website programs or web pages to a web server through the FTP protocol. If you use FTP in scenarios where you need to connect with Dataphin for data development or write Dataphin data to FTP, you need to first create an FTP data source.
Permissions
Only custom global roles with the Create Data Source permission and system roles such as Super Administrator, Data Source Administrator, Domain Architect, and Project Administrator can create data sources.
Procedure
In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +Create Data Source.
On the Create Data Source page, select FTP in the File section.
If you have recently used FTP, you can also select FTP in the Recently Used section. You can also enter FTP keywords in the search box to quickly filter.
On the Create FTP Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
Enter the name of the data source. The name must meet the following requirements:
It can contain only Chinese characters, letters, digits, underscores (_), or hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, you can reference tables in the data source in Flink_SQL tasks by using the format
data source code.table nameordata source code.schema.table name. If you need to automatically access the data source in the corresponding environment based on the current environment, you can use the variable format${data source code}.tableor${data source code}.schema.table. For more information, see Dataphin data source table development method.ImportantThe data source code cannot be modified after it is configured successfully.
After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.
Data Source Description
A brief description of the data source. It cannot exceed 128 characters.
Data Source Configuration
Select the data source to configure:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both production and development data sources. If you select Production Data Source, you only need to configure the connection information for the production data source. Production data source Production data source
NoteTypically, production and development data sources should be configured as different data sources to achieve environment isolation between development and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
Protocol
Select the file transfer protocol based on the protocol used by your FTP server. Currently supported transfer protocols include the following:
FTP: File Transfer Protocol, used to control bidirectional file transfer, and is also an application.
SFTP: Secure File Transfer Protocol based on SSH, providing a secure encryption method for file transfer.
FTPS: File Transfer Protocol based on SSL/TLS, equivalent to encrypted FTP.
Host
The address of the FTP server.
Port
The port of the FTP server.
Username
The username used to access the FTP server.
Authentication Type
When the Protocol type is SFTP, both Enter Password and Upload Key File authentication methods are supported.
When the Protocol type is FTP or FTPS, only the Enter Password authentication method is supported.
NoteThe key file authentication method requires uploading the SFTP private key file for access authentication, and only supports uploading PEM files.
SSLImplicit
Implicit mode. When Protocol is set to FTPS, you need to configure the SSLImplicit parameter. If the SSLImplicit protocol is enabled on the FTP server, select TRUE. Otherwise, select FALSE.
connectPattern
Connection mode. When Protocol is set to FTP or FTPS, you need to configure the connectPattern parameter. The parameter includes the following two options:
PORT (active mode): The client opens a port and waits for the server to establish a data connection.
PASV (passive mode): The server opens a port and waits for the client to establish a data connection.
Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the FTP data source.
Click Test Connection, and the system will test whether the data source can connect normally with Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but even if all selected clusters fail to connect, the data source can still be created normally.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.