All Products
Search
Document Center

Dataphin:Configure FTP Input Widget

Last Updated:Nov 20, 2025

The FTP input widget facilitates the transfer of data from an FTP server to the storage system associated with the big data platform, enabling data integration and further processing. This topic describes the steps to configure the FTP input widget.

Prerequisites

  • You have successfully created an FTP data source. For more information, see Create FTP Data Source.

  • To configure the FTP input widget properties, the account must have read-through permission for the data source. If you lack the necessary permissions, you must obtain them from the data source. For more information, see Request, Renew, and Return Data Source Permissions.

Procedure

  1. On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.

  2. At the top of the integration page, select Project (Dev-Prod mode requires selecting the environment).

  3. In the navigation pane on the left, click Batch Pipeline, and in the Batch Pipeline list, click the Offline Pipeline you want to develop to access its configuration page.

  4. Click the Component Library in the upper right corner to open the Component Library panel.

  5. In the Component Library panel's left-side navigation pane, select Input. Locate the FTP component in the input widget list on the right and drag it onto the canvas.

  6. Click the image icon on the FTP input widget card to open the FTP Input Configuration dialog box.

  7. In the FTP Input Configuration dialog box, set the necessary parameters.

    The FTP input component supports the following File Types: Text, CSV, xls, and xlsx. Each file type requires a different configuration, as detailed below.

    Parameters for Text and CSV files

    Parameter

    Description

    Basic Configuration

    Step Name

    Enter a name for the component based on the scenario. The name must follow these rules:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Can be up to 64 characters long.

    Datasource

    Select a data source. The data source must be configured in Dataphin and meet the following conditions:

    • The data source type is FTP Data Source, SFTP Data Source, or FTPS Data Source.

    • The account used to perform the Attribute Configuration has read-through permissions on the data source. If the account does not have the required permissions, request them. For more information, see Request, renew, and return data source permissions.

    You can also click New next to Datasource to go to the Management Center to add a data source. For more information, see Create an FTP data source.

    Compression Format (Optional)

    If the source file is compressed, select the compression format. Dataphin can then decompress the file. Supported formats include zip, gzip, tar.gz, bzip2, lzo, lzo-deflate, hadoop-snappy, and framing-snappy.

    Note

    If you select zip or tar.gz as the compression format, you can also configure the file name.

    File Name (Optional)

    The matching rule for files within the compressed package. You can specify multiple rules separated by semicolons (;). Wildcard characters are supported. For example, specify * to read all files in the package. If you leave this empty, the system uses * by default.

    File Path

    Enter the file path. You can enter multiple paths separated by semicolons (;). Wildcard characters are supported. For example, /dataphin/* reads all files in the dataphin directory.

    File Type

    Select Text or CSV.

    Data Content Start Row

    Set the row from which the component starts reading data. The default value is 1, which means reading starts from the first row. To skip the first N rows, set this parameter to N+1.

    Advanced Configuration

    Splitting Method

    Text files support Delimiter-based Splitting and Fixed-length Splitting. CSV files support Delimiter-based splitting.

    • Delimiter-based Splitting: Splits rows and fields based on the Field Delimiter and Row Delimiter.

    • Fixed-length Splitting: Treats each line of the file as a long string and extracts fields based on the start and end character positions.

    Field Separator

    If you select Delimiter-based splitting, specify the field delimiter. If you leave this empty, a comma (,) is used by default.

    Row Separator

    When the chunking method is Split By Field Length, you cannot configure the Row Delimiter. If you do not specify a value, the system uses the line feed character (\n) as the default row delimiter. When the file type is Text, you cannot configure the row delimiter and the textReaderConfig parameter at the same time.

    File Encoding

    Select the file encoding format. Supported formats are UTF-8 and GBK.

    NULL Value Conversion

    Specify a string that represents a NULL value. The component replaces all occurrences of this string in the source data with NULL. If you do not configure this parameter, no special processing is performed.

    Mark Complete File Check

    Checks whether a marker file exists before reading data. This indicates that the data is ready. This feature is Disabled by default.

    1. To enable this feature, click the toggle and then click Check Configuration.

    2. In the Check For Completion Marker File Configuration dialog box, configure the parameters.

      • Completion Marker File Path: Enter the path of the marker file to check. System parameters, global parameters, and cross-node parameters are supported. Example: /${check}/dataphin.

      • Health Check Interval (s): Specify the interval for each file check. The default value is 60 seconds.

      • Check Duration (min): Specify the duration for each file check. The default value is 60 minutes.

        Important
        • The check duration and the data transmission duration are added together to determine the total runtime of the integration node. Configure the check duration and runtime timeout settings with care. Resources are occupied during the check. Configure these settings based on your needs.

        • If the check time exceeds the node timeout period, the node is forcibly terminated.

      • Failure Handling Policy: If the file check fails, data is not extracted or written. You can set the policy to Fail the node or Succeed the node.

        • Fail The Node: If the check fails, the system sets the check node to Failed and does not run the integration node.

        • Succeed The Node: If the check fails, the system sets the check node to Succeeded and continues to run subsequent integration nodes.

    3. Click OK to save the configuration.

    When File Does Not Exist

    Specify the policy for when a source file does not exist. Supported policies are Ignore and Fail the node. This parameter is not available if you enable Check for Completion Marker File.

    • Ignore: If a file does not exist, the component ignores it and continues to read other files.

    • Fail The Node: If a file does not exist, the node is terminated and set to Failed.

    More Configuration

    Enter other configuration items to control data reading. For example, use textReaderConfig to control how Text files are read. The following code shows an example.

    {
      "textReaderConfig":{
      "useTextQualifier":false, // Specifies whether a qualifier exists.
      "textQualifier":"\"",// The qualifier.
      "caseSensitive":true, // Specifies whether the qualifier is case-sensitive.
      "trimWhitespace":false // Specifies whether to remove the whitespace characters from the beginning and end of each column.
      }
    }

    Output Fields

    The output fields are displayed. You can add output fields in the following ways:

    • Add multiple output fields at a time.

      • Format: Click Add In Batches. You can configure fields in JSON or TEXT format.

        • JSON format:

          // Example:
           [{
             "startPos": 0,
             "endPos": 10,
             "name": "user_id",
             "type": "String"
            },
            {
             "startPos": 11,
             "endPos": 15,
             "name": "user_name",
             "type": "String"
            }]
        • TEXT format:

          // Example:
          0,10,user_id,String
          11,15,user_name,String
      • Splitting Method: If the file type is Text and the splitting method is Fixed-length splitting, you can configure how to add fields in batches. Valid values are By field start position and By field length.

        • By field start position: The first number specifies the start character position of the field, the second number specifies the end position, and the following two values specify the field name and field type. For example, the Text format 0,10,user_id,String indicates that the characters from the 1st to the 11th position on each line of the file are imported as a field with the name user_id and the type String.

        • Specify by field length: The first value specifies the field length, and the following two values specify the field name and field type. For example, for the Text format, 11,user_id,String defines a field with a length of 11. The field name is user_id, and the field type is String. The next field starts from the character immediately after the previous field.

      • Row delimiter and Column delimiter: When you batch add fields in TEXT format, you can configure row and column delimiters. The row delimiter is used to separate the information for each field. The default value is a line feed \n, and supported delimiters include \n ; . . The column delimiter is used to separate field names from field types, and its default value is a comma (,).

    • Preview the splitting result.

      If the file type is Text and the splitting method is Fixed-length splitting, you can preview the splitting result.

      1. Click Preview Splitting Result.

      2. In the dialog box that appears, enter a test string and click Test to view the result.

    • Add a single output field.

      Click Add Output Field and enter the Source Ordinal Number, Field, and select a Type. For Text and CSV files, you must enter the numeric index of the column as the source ordinal number. The index starts from 0.

    • Manage existing output fields.

      You can perform the following operations on added fields:

      • In the Actions column, click the agag icon to edit a field.

      • In the Actions column, click the agfag icon to delete a field.

    Parameters for xls and xlsx files

    Parameter

    Description

    Basic Configuration

    Step Name

    Enter a name for the component based on the scenario. The name must follow these rules:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Can be up to 64 characters long.

    Datasource

    Select a data source. The data source must be configured in Dataphin and meet the following conditions:

    • The data source type is FTP Data Source, SFTP Data Source, or FTPS Data Source.

    • The account used to perform the Attribute Configuration has read-through permissions on the data source. If the account does not have the required permissions, request them. For more information, see Request, renew, and return data source permissions.

    You can also click New next to Datasource to go to the Planning module to add a data source. For more information, see Create an FTP data source.

    Compression Format

    If the source file is compressed, select the compression format. Dataphin can then decompress the file. Supported formats include zip, gzip, tar.gz, bzip2, lzo, lzo-deflate, hadoop-snappy, framing-snappy, and zlib.

    Note

    If you select zip or tar.gz as the compression format, you can also configure the file name.

    File Name

    The matching rule for files within the compressed package. You can specify multiple rules separated by semicolons (;). Wildcard characters are supported. For example, specify * to read all files in the package. If you leave this empty, the system uses * by default.

    File Path

    Enter the file path. You can enter multiple paths separated by semicolons (;). Wildcard characters are supported. For example, /dataphin/* reads all files in the dataphin directory.

    File Type

    Select xls or xlsx.

    Start Row Of Data Content

    Set the row from which the component starts reading data. The default value is 1, which means reading starts from the first row. To skip the first N rows, set this parameter to N+1.

    Sheet Selection

    Select sheets By Name or By Index. If you read data from multiple sheets, their data formats must be the same.

    Sheet Name/Sheet Index

    • Sheet Name: You can read data from multiple sheets. Separate the sheet names with commas (,). You can also enter * to read all sheets. You cannot use * and commas together. Example: sheet1,sheet2.

    • Sheet Index: You can read data from multiple sheets. Separate the sheet indexes with commas (,). You can also enter * to read all sheets. You cannot use * and commas together. For example, you can use 0,3,7-9 to specify individual or consecutive sheets.

    Advanced Configuration

    End Row Of Data Content

    If you do not specify this parameter, the component reads data to the last row. The value of End Row of Data Content must be greater than or equal to the value of Start Row of Data Content.

    Export Sheet Name

    Choose whether to export the sheet name. If you choose Export, a new field is added. This field contains the name of the source sheet for each row of data.

    File Encoding

    Select the file encoding format. Supported formats are UTF-8 and GBK.

    NULL Value Conversion

    Specify a string that represents a NULL value. The component replaces all occurrences of this string in the source data with NULL. If you do not configure this parameter, no special processing is performed.

    Mark Complete File Check

    Checks whether a marker file exists before reading data. This indicates that the data is ready. This feature is Disabled by default.

    1. To enable this feature, click the toggle and then click Check Configuration.

    2. In the Check For Completion Marker File Configuration dialog box, configure the parameters.

      • Completion Marker File Path: Enter the path of the marker file to check. System parameters, global parameters, and cross-node parameters are supported. Example: /${check}/dataphin.

      • Health Check Interval (s): Specify the interval for each file check. The default value is 60 seconds.

      • Check Duration (min): Specify the duration for each file check. The default value is 60 minutes.

        Important
        • The check duration and the data transmission duration are added together to determine the total runtime of the integration node. Configure the check duration and runtime timeout settings with care. Resources are occupied during the check. Configure these settings based on your needs.

        • If the check time exceeds the node timeout period, the node is forcibly terminated.

      • Failure Handling Policy: If the file check fails, data is not extracted or written. You can set the policy to Fail the node or Succeed the node.

        • Fail The Node: If the check fails, the system sets the check node to Failed and does not run the integration node.

        • Succeed The Node: If the check fails, the system sets the check node to Succeeded and continues to run subsequent integration nodes.

    3. Click OK to save the configuration.

    If File Does Not Exist

    Specify the policy for when a source file does not exist. Supported policies are Ignore and Fail the node. This parameter is not available if you enable Check for Completion Marker File.

    • Ignore: If a file does not exist, the component ignores it and continues to read other files.

    • Fail The Node: If a file does not exist, the node is terminated and set to Failed.

    Output Fields

    The output fields are displayed. You can add output fields in the following ways:

    • Add multiple output fields at a time.

      • Click Add In Batches. You can configure fields in JSON or TEXT format.

        • JSON Format: format:

          // Example:
           [{
             "startPos": 0,
             "endPos": 10,
             "name": "user_id",
             "type": "String"
            },
            {
             "startPos": 11,
             "endPos": 15,
             "name": "user_name",
             "type": "String"
            }]
        • TEXT format:

          Row delimiter and Column delimiter: When you batch add fields in TEXT format, you can configure row and column delimiters. A row delimiter is used to separate the information for each field. The default is the line feed character \n, and the supported delimiters are \n ; . . A column delimiter is used to separate the field name from the field type. The default is an English comma (,).

          // Example:
          0,10,user_id,String
          11,15,user_name,String
    • Add a single output field.

      Click Add Output Field and enter the Source Ordinal Number, Field, and select a Type. For xls and xlsx files, you must enter the uppercase letter of the column as the source ordinal number. You can also enter the numeric index of the column, which starts from 0. If you enter a lowercase letter, the system automatically converts it to uppercase. If you choose to export the sheet name, the source ordinal number is (-) and cannot be modified.

    • Manage existing output fields.

      You can also perform the following operations on added fields:

      • In the Actions column, click the agag icon to edit a field.

      • In the Actions column, click the agfag icon to delete a field.

    Parameters for JsonL files

    Parameter

    Description

    Basic Configuration

    Step Name

    Enter a name for the component based on the scenario. The name must follow these rules:

    • Can contain only Chinese characters, letters, underscores (_), and digits.

    • Can be up to 64 characters long.

    Datasource

    Select a data source. The data source must be configured in Dataphin and meet the following conditions:

    • The data source type is FTP Data Source, SFTP Data Source, or FTPS Data Source.

    • The account used to perform the Attribute Configuration has read-through permissions on the data source. If the account does not have the required permissions, request them. For more information, see Request, renew, and return data source permissions.

    You can also click New next to Datasource to go to the Planning module to add a data source. For more information, see Create an FTP data source.

    Compression Format

    If the source file is compressed, select the compression format. Dataphin can then decompress the file. Supported formats include zip, gzip, tar.gz, bzip2, lzo, lzo-deflate, hadoop-snappy, and framing-snappy.

    Note

    If you select zip or tar.gz as the compression format, you can also configure the file name.

    File Name

    The matching rule for files within the compressed package. You can specify multiple rules separated by semicolons (;). Wildcard characters are supported. For example, specify * to read all files in the package. If you leave this empty, the system uses * by default.

    File Path

    Enter the file path. You can enter multiple paths separated by semicolons (;). Wildcard characters are supported. For example, /dataphin/* reads all files in the dataphin directory.

    File Type

    Select the JsonL file type. This specifies the parsing method and does not restrict the file name extension.

    Start Row Of Data Content

    Set the row from which the component starts reading data. The default value is 1, which means reading starts from the first row. To skip the first N rows, set this parameter to N+1.

    Note

    This parameter does not take effect if you select a compression format.

    Advanced Configuration

    Compression Format

    If the source file is compressed, select the compression format. Dataphin can then decompress the file. Supported formats include zip, gzip, bzip2, lzo, lzo-deflate, hadoop-snappy, and framing-snappy.

    Check For Completion Marker File

    Checks whether a marker file exists before reading data. This indicates that the data is ready. This feature is Disabled by default.

    1. To enable this feature, click the toggle and then click Check Configuration.

    2. In the Check For Completion Marker File Configuration dialog box, configure the parameters.

      • Completion Marker File Path: Enter the path of the marker file to check. System parameters, global parameters, and cross-node parameters are supported. Example: /${check}/dataphin.

      • Health Check Interval (s): Specify the interval for each file check. The default value is 60 seconds.

      • Check Duration (min): Specify the duration for each file check. The default value is 60 minutes.

        Important
        • The check duration and the data transmission duration are added together to determine the total runtime of the integration node. Configure the check duration and runtime timeout settings with care. Resources are occupied during the check. Configure these settings based on your needs.

        • If the check time exceeds the node timeout period, the node is forcibly terminated.

      • Failure Handling Policy: If the file check fails, data is not extracted or written. You can set the policy to Fail the node or Succeed the node.

        • Fail The Node: If the check fails, the system sets the check node to Failed and does not run the integration node.

        • Succeed The Node: If the check fails, the system sets the check node to Succeeded and continues to run subsequent integration nodes.

    3. Click OK to save the configuration.

    If File Does Not Exist

    Specify the policy for when a source file does not exist. Supported policies are Ignore and Fail the node. This parameter is not available if you enable Check for Completion Marker File.

    • Ignore: If a file does not exist, the component ignores it and continues to read other files.

    • Fail The Node: If a file does not exist, the node is terminated and set to Failed.

    Output Fields

    The output fields are displayed. You can add output fields in the following ways:

    • Add multiple output fields at a time.

      • Click Add In Batches. You can configure fields in JSON or TEXT format.

        • JSON format:

          // Example:
           [{
             "startPos": 0,
             "endPos": 10,
             "name": "user_id",
             "type": "String"
            },
            {
             "startPos": 11,
             "endPos": 15,
             "name": "user_name",
             "type": "String"
            }]
        • TEXT format:

          Row delimiter and column delimiter: When you add fields in batches in TEXT format, you can configure row and column delimiters. The row delimiter is used to separate the information for each field. The default is the line feed character \n, and supported delimiters include \n ; . . The column delimiter is used to separate the field name and the field type, and the default is a comma (,).

          // Example:
          0,10,user_id,String
          11,15,user_name,String
    • Add a single output field.

      Click Add Output Field and enter the Source Ordinal Number, Field, and select a Type. For xls and xlsx files, you must enter the uppercase letter of the column as the source ordinal number. You can also enter the numeric index of the column, which starts from 0. If you enter a lowercase letter, the system automatically converts it to uppercase. If you choose to export the sheet name, the source ordinal number is (-) and cannot be modified.

    • Manage existing output fields.

      You can also perform the following operations on added fields:

      • In the Actions column, click the agag icon to edit a field.

      • In the Actions column, click the agfag icon to delete a field.

  8. Click Confirm to finalize the configuration of the FTP input widget.