All Products
Search
Document Center

Dataphin:Create FTP data source

Last Updated:Mar 24, 2025

Creating an FTP data source allows Dataphin to read from or write to FTP servers. This topic explains the process of setting up an FTP data source.

Background information

File Transfer Protocol (FTP) is part of the TCP/IP suite of protocols. It is commonly used by an FTP client to upload website programs or web pages to a web server via FTP. For scenarios where Dataphin integrates data development or writes data to FTP, creating an FTP data source is a prerequisite.

Permissions description

Only custom global roles with the Create Data Source permission and system roles such as Super Administrator, Datasource Administrator, Section Architect, and Project Administrator can create data sources.

Procedure

  1. On the Dataphin home page, from the top menu bar, select Management Center > Datasource Management.

  2. On the Datasource page, you can click + Create Data Source.

  3. In the Create Data Source dialog box, select File and then choose FTP .

    If you've recently used FTP, you can select it from the Recently Used area. You can also quickly filter results by entering FTP-related keywords in the search box.

  4. In the Create FTP Data Source dialog box, configure the connection parameters for the data source.

    1. Configure the basic information for the data source.

      Parameter

      Description

      Datasource Name

      Enter the data source name. The naming convention is as follows:

      • Can only contain Chinese characters, letters, numbers, underscores (_), or hyphens (-).

      • Cannot exceed 64 characters in length.

      Datasource Code

      After configuring the datasource encoding, you can reference tables in the data source in Flink_SQL tasks using the format datasource encoding.table name or datasource encoding.schema.table name. If you need to automatically access the data source corresponding to the environment based on the environment, access it using the variable format ${datasource encoding}.table or ${datasource encoding}.schema.table. For more information, see Dataphin Datasource Table Development Method.

      Important
      • Once the data source encoding is successfully configured, it cannot be modified.

      • After the data source encoding is successfully configured, data preview can be performed on the object details page of the asset directory and asset checklist.

      • In Flink SQL, currently only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are supported.

      Datasource Description

      A brief description of the data source. No more than 128 characters.

      Datasource Configuration

      Select the data source that needs to be configured:

      • If the business data source distinguishes between production data source and development data source, select Production + Development Data Source.

      • If the business data source does not distinguish between production data source and development data source, select Production Data Source.

      Tag

      You can classify and tag the data source according to the tag. For information on how to create a tag, see Manage Data Source Tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If your data source configuration selects Production + Development Data Source, you need to configure the connection information for Production + Development Data Source. If your data source configuration is Production Data Source, you only need to configure the connection information for Production Data Source.

      Note

      Typically, production and development data sources should be configured separately to maintain environment isolation and minimize the impact of development activities on production. However, Dataphin also allows for the same data source to be used for both purposes.

      Parameter

      Description

      Protocol

      Select the corresponding file transfer protocol based on the protocol used by the FTP server. The currently supported transfer protocols include the following:

      • FTP: File Transfer Protocol, used to control the bidirectional transfer of files, and is also an application.

      • SFTP: Secure File Transfer Protocol based on Secure Shell Protocol (SSH), providing a secure encryption method for transferring files.

      • FTPS: File Transfer Protocol based on SSL/TLS, equivalent to encrypted FTP.

      Host

      FTP server address.

      Port

      FTP server port.

      Username

      Username for accessing the FTP server.

      Authentication Type

      • When the Protocol type is SFTP, the authentication method supports Enter Password or Upload Key File.

      • When the Protocol type is FTP or FTPS, the authentication method supports Enter Password.

      Note

      The key file authentication method requires uploading the private key file of SFTP for access authentication, and only supports uploading PEM files.

      SSLImplicit

      Implicit mode. When Protocol is selected as FTPS, the SSLImplicit parameter needs to be configured. If the SSLImplicit protocol is enabled on the FTP server, select TRUE. Otherwise, select FALSE.

      connectPattern

      Connection mode. When Protocol is selected as FTP or FTPS, the connectPattern parameter needs to be configured. The parameters include the following two types:

      • PORT (Active Mode): The client opens a port and waits for the server to establish a data connection.

      • PASV (Passive Mode): The server opens a port and waits for the client to establish a data connection.

  5. Click Test Connection to verify if the data source can communicate properly with Dataphin.

    After you have configured the data source information, you can click Test Connection in the operation column to test the connectivity of either the default cluster or a registered scheduling cluster that is registered and functioning normally within Dataphin. By default, the system selects the default cluster, which cannot be deselected. If there is no resource group under the registered scheduling cluster, the connection test cannot be performed. You must first create a resource group before you can test the connection.

    • The selected cluster is only for testing network connectivity and is not used for subsequent task operations.

    • Connection tests typically complete within 2 minutes. If it times out, click the image icon to view the specific reason and retry.

    • Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the time when the final result was generated for you.

    • If the test result shows Connection Failed, you can click the image icon to see the specific reason for the failure.

    • If the test result shows Succeeded With Warning, this indicates that the application cluster is connected successfully, but there was a failure in connecting to the scheduling cluster. Consequently, the current data source is not suitable for data development and integration tasks. You can click the image icon to access the log details.

      Note

      Test results for the Default Cluster show one of three connection statuses: Success with Risk, Connection Successful, or Connection Failed. However, test results for the Registered Scheduling Cluster in Dataphin indicate only two statuses: Connection Successful or Connection Failed.

    • Ensure at least one cluster can connect to the data source; otherwise, saving the data source information is not possible.

  6. After a successful test, click OK to finalize the creation of the FTP data source.