All Products
Search
Document Center

Dataphin:Configure OSS input component

Last Updated:Mar 05, 2026

The OSS input component reads data from OSS data sources. In scenarios where you need to synchronize data from an OSS data source to other data sources, you must first configure the source data source for the OSS input component, and then configure the destination data source for data synchronization. This topic describes how to configure an OSS input component.

Prerequisites

  • An OSS data source is created. For more information, see Create an OSS data source.

  • The account that configures the properties of the OSS input component has the read-through permission on the data source. If you do not have the permission, you must request the permission on the data source. For more information, see Request permissions on a data source.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).

  3. In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the left-side navigation pane of the Component Library panel, select Inputs. Find the OSS component in the input component list on the right and drag it to the canvas.

  6. Click the image icon in the OSS input component card to open the OSS Input Configuration dialog box.

  7. In the OSS Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the OSS input component. Dataphin automatically generates a step name. You can also modify the name based on your business scenario. The name must meet the following requirements:

    • The name can contain only Chinese characters, letters, underscores (_), and digits.

    • The name cannot exceed 64 characters in length.

    Datasource

    Select a data source. Select a data source that is configured in the Dataphin system and meets the following conditions:

    • The data source type is OSS Data Source.

    • The account that configures the properties has the read-through permission on the data source. If you do not have the permission, you must request the permission on the data source. For more information, see Request permissions on a data source.

    You can also click Create next to Data Source to go to the planning module and add a data source. For more information, see Create an OSS data source.

    Object Prefix

    The name of the OSS object from which you want to read data. You can specify multiple object names. For example, if a bucket in OSS contains a data folder that includes the phin.txt file, you can set the Object Prefix to data/phin.txt to synchronize a specific file. To synchronize all files in a folder, you need to use a wildcard character, such as data/*.

    File Type

    The system supports reading files in the Text, CSV, xls, and xlsx formats. Different formats require different configuration information.

    Output Fields

    Displays the output fields. You can manually add output fields:

    • Click Batch Add.

      • Configure in JSON format, for example:

        // Example:
        [{"index": 0,"name": "user_id","type": "String"},
         {"index": 1,"name": "user_name","type": "String"}]
        Note

        index indicates the column number of the specified object, name indicates the field name after import, and type indicates the field type after import. For example: "index":3,"name":"user_id","type":"String" indicates that the fourth column in the file is imported, the field name is user_id, and the field type is String.

      • Configure in TEXT format, for example:

        1,user_name,String
        • The row delimiter is used to separate the information of each field. The default value is a line feed (\n). The system supports line feeds (\n), semicolons (;), and periods (.).

        • The column delimiter is used to separate field names from field types. The default value is a comma (,).

    • Click Create Output Field, and fill in Source Index, Column, and select Type as prompted. For Text and CSV file types, you must fill in the numeric index of the column where the field is located in the Source Index field. The index starts from 0.

    You can also perform the following operations on added fields:

    • Click and drag the image icon next to a field to change its position.

    • Click the Actionsagag icon in the column to edit an existing field.

    • Click the Actionsagfag icon in the column to delete an existing field.

  8. Text and CSV formats

    Parameter

    Description

    Column Delimiter

    The column delimiter of the file. If you do not specify this parameter, the system uses a comma (,) as the default value.

    Row Delimiter

    The row delimiter of the file. If you do not specify this parameter, the system uses a line feed (\n) as the default value.

    File Encoding

    The encoding format of the file from which you want to read data. The system supports UTF-8 and GBK for File Encoding.

    Null Value

    Enter the fields that you want to represent as null in the text box. If these fields exist in the source, the corresponding parts will be converted to null.

    Compression Format

    The format in which files are compressed. By default, this parameter is left empty, which indicates that files are not compressed. The system supports the following compression formats:

    • zip

    • gzip

    • bzip2

    • lzo

    • lzo_deflate

    First Row Content Type

    Select the content type of the first row in the text. The first row content type can be Data Content or Column Name.

    Xls and xlsx formats

    Parameter

    Description

    Sheet Selection

    You can select sheets to read by name or index. If you want to read multiple sheets, make sure that they have the same data format.

    • By Name: You need to fill in the Sheet Name that you want to read.

    • By Index: You need to fill in the Sheet Index that you want to read, starting from 0.

    Data Content Start Row

    Fill in the starting row of the data content. The default value is 1, which means that the data content starts from the first row. If you want to ignore the first N rows, set the data content start row to N+1.

    Data Content End Row

    Fill in the ending row of the data content. If you do not specify this parameter, the system reads data to the last row that contains data by default.

    Export Sheet Name

    Select whether to export the source sheet name of the data. The exported content is {sheet name}.

    File Encoding

    The system supports UTF-8 and GBK encoding.

    Compression Format

    The system supports zip, gzip, bzip2, lzo, and lzo_deflate compression formats.

    Null Value Conversion

    You can specify any string to be converted to a Null value.

  9. Click OK to complete the property configuration of the OSS input component.

What to do next

After you configure the input component, you can configure downstream components to implement data synchronization. For more information, see Development description of the integration component library.