The FTP Output Widget enables data writing to an FTP data source. When synchronizing data from various sources to an FTP data source, it's necessary to configure the FTP Output Widget's target data source after setting up the source data source information. This topic provides guidance on configuring the FTP Output Widget.
Prerequisites
You have successfully created an FTP data source. For more information, see Create FTP Data Source.
To configure the FTP output widget properties, the account must have read-through permission for the data source. If permission is lacking, you need to request access to the data source. For more information, see Request Data Source Permission.
Procedure
On the Dataphin home page, navigate to the top menu bar and select Development > Data Integration.
On the integration page, use the top menu bar to select Project (Dev-Prod mode requires selecting Environment).
In the left-side navigation pane, click on the Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.
To open the Component Library panel, click Component Library at the top right corner of the page.
In the Component Library panel's left-side navigation pane, select Output. Then, in the right-hand list, find the FTP widget and drag it onto the canvas.
Connect the FTP output widget to the target input, transform, or flow widget by clicking and dragging the
icon.
To configure the widget, click the
icon on the FTP output widget card, opening the FTP Output Configuration dialog box.
In the FTP Output Configuration dialog box, set the necessary parameters.
Parameter
Description
Basic Settings
Step Name
This is the name of the FTP output widget. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The naming convention is as follows:
Can only contain Chinese characters, letters, underscores (_), and numbers.
Cannot exceed 64 characters.
Datasource
In the data source drop-down list, all FTP-type data sources are displayed, including data sources for which you have write-through permission and those for which you do not have write-through permission.
For data sources without write-through permission, you can click Request after the data source to request write-through permission for the data source. For specific operations, see Request Data Source Permission.
If you do not have an FTP-type data source, click Create to create a data source. For specific operations, see Create FTP Data Source.
File Path
Enter the path where the file is stored. You can obtain the file path on FTP or the FTP server.
File Type
Select the file type to convert the data into for storage. File Type includes Text and CSV.
File Encoding
Select the encoding method for the file stored in the target data source. File Encoding includes UTF-8 and GBK.
Loading Policy
When writing data to the target data source (FTP data source), the policy for writing data into FTP. Loading policies include Overwrite Data, Append Data, and File Name Conflict Error. The descriptions are as follows:
Append Data: Directly add new data files to the target directory and use a random UUID suffix to ensure no file name conflicts.
Overwrite Data: Clean up files with conflicting file names in the directory before writing, and then add new data files for writing.
File Name Conflict Error: If there are files with conflicting file names in the directory, an error is reported at runtime.
Number Of Files To Write
Supports a single file or multiple files.
Single File: Writes into a single file on the target FTP.
Multiple Files: Select multiple files to write into multiple files. The number of files is the task concurrency. A sequence suffix such as
_1
,_2
,_3
, or a UUID random suffix is added to the file name. When the task configuration concurrency is 1, selecting multiple files will also have_1
suffix or UUID random suffix.
Suffix Format
When the number of files to write is set to generate multiple files, supports Sequence Suffix or UUID Random Suffix.
ImportantWhen the loading policy is set to append data, only UUID random suffixes can be generated.
Advanced Settings
Row Delimiter (optional)
Enter the delimiter between rows. Supports multi-character delimiters. If not filled in, the system uses the line feed symbol (\n) as the delimiter.
Field Delimiter (optional)
Enter the delimiter between fields. Supports multi-character delimiters. If not filled in, the system uses a half-width comma (,) as the delimiter.
Export Compressed File
You can select zip, gzip compression format, or Do Not Compress, Directly Export In The Selected File Type. After selecting the compression format, the compressed file is merged and written into FTP in the corresponding compression format or directly exported in the selected file type.
Export Column Header
Select whether to export the column header:
If selected, the field name is output in the first row of each file.
If not selected, the first row of the file is data.
Compress File Path (optional)
When Number of Files to Write is Single File and Export Compressed File format is
zip
, you can configure whether to compress the file path. That is, whether to compress the path of the file into the compressed package. In other scenarios, the file path is not compressed, only the single file is compressed.Mark Completion File
Whether to mark the completion file. The mark completion file includes Task Level and File Level:
Task Level: Only one mark completion file is generated after the task is completed. For example,
/ftpuser/test/SUCCESS
.File Level: Use the symbol
*
as a placeholder for the data file name. For example,/ftpuser/test/*.flg
, a mark completion file with the same name is generated for each data file.
When Required is selected, you also need to configure Mark Completion File Content. The configurable file information parameters include the following:
File Name:
$filename
.File Name with Path:
$filenamewithpath
.File Size:
$filesize
.Number of Data Rows in File:
$rowcount
.Pipeline-level configurable parameters are also supported. You can freely choose the required parameters and delimiters. In the task-level mark completion file, the information of each file is written in line by line.
Null Value Conversion (optional)
The string that represents null.
Field Mapping
Input Field
The field read from the upstream input widget.
Output Field
You need to configure the output field. Dataphin supports configuring output fields through Batch Add and Create New Output Field:
Batch Add: Click Batch Add to support batch configuration in JSON or TEXT format.
Batch configuration in JSON format, for example:
// Example: [{"name": "user_id","type": "String"}, {"name": "user_name","type": "String"}]
Notename represents the name of the introduced field, and type represents the type of the introduced field. For example:
"name":"user_id","type":"String"
means introducing a field named user_id and setting the field type to String.Batch configuration in TEXT format, for example:
// Example: user_id,String user_name,String
The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supports line feed (\n), semicolon (;), and period (.).
The column delimiter is used to separate the field name and field type. The default is a comma (,).
Create New Output Field.
Click +create New Output Field and fill in Column and select Type according to the page prompts.
Copy Upstream Field.
Click Copy Upstream Field. The system automatically generates output fields based on the upstream field names.
Manage Output Field.
You can also perform the following operations on the added fields:
Click the Actions column
icon to edit the existing fields.
Click the Actions column
icon to delete the existing field.
Mapping
The mapping relationship is used to map the input fields of the source table to the output fields of the target table, facilitating subsequent data synchronization. The mapping relationship includes same-name mapping and same-row mapping. The applicable scenarios are as follows:
Same-name Mapping: Map fields with the same field name.
Same-row Mapping: The field names of the source table and the target table are inconsistent, but the data in the corresponding rows of the fields need to be mapped. Only map fields in the same row.
Click Confirm to finalize the FTP Output Widget configuration.