All Products
Search
Document Center

DataWorks:FTP

Last Updated:Mar 26, 2026

The FTP data source provides a bidirectional channel for reading data from and writing data to FTP servers. This topic describes the data synchronization capabilities of the FTP data source in DataWorks.

Limitations

FTP Reader

FTP Reader reads data from remote FTP files. Because remote FTP files are inherently unstructured, the following capabilities and limitations apply.

CapabilitySupported
Read TXT files (two-dimensional table schema)Yes
CSV-like files with custom separatorsYes
Multiple data types (represented as STRING), column pruning, and column constantsYes
Recursive reads and file name filteringYes
Text compression (gzip, bzip2, zip, lzo, lzo_deflate)Yes
Concurrent reads from multiple filesYes
Multi-threaded concurrent reads from a single fileNo
Multi-threaded concurrent reads from a single compressed fileNo

FTP Writer

FTP Writer converts data and writes it to FTP files. Because FTP files are inherently unstructured, the following capabilities and limitations apply.

CapabilitySupported
Write text files with two-dimensional table schemaYes
CSV-like and TEXT file formats with custom separatorsYes
Multi-threaded writes (each thread writes to a different sub-file)Yes
Concurrent writes to a single fileNo
Native data types (all data is written as STRING)No
Text compression when writingNo

Supported field types

Remote FTP files have no native data types. The following field types are defined by DataX FtpReader.

DataX internal typeRemote FTP file data type
LONGLONG
DOUBLEDOUBLE
STRINGSTRING
BOOLEANBOOLEAN
DATEDATE

Add a data source

Before developing a synchronization task in DataWorks, add the FTP data source by following the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add a data source.

Develop a data synchronization task

For configuration entry points and procedures, see the following guides.

Single-table offline synchronization task

For a script demo and parameter descriptions for the code editor, see Appendix: Script demo and parameter description.

Appendix: Script demo and parameter description

Configure a batch synchronization task in the code editor

To configure a batch synchronization task in the code editor, set the parameters in the script according to the unified script format. For more information, see Configure a task in the code editor.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ftp",
            "parameter": {
                "path": [],
                "nullFormat": "",
                "compress": "",
                "datasource": "",
                "column": [
                    {
                        "index": 0,
                        "type": ""
                    }
                ],
                "skipHeader": "",
                "fieldDelimiter": ",",
                "encoding": "UTF-8",
                "fileFormat": "csv"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

ParameterDescriptionRequiredDefault value
datasourceThe data source name. Must match the name of the added data source.YesNone
pathThe path and name of the file in the remote FTP file system. Specify the full path including the file name and extension. Supports multiple paths. See Path parameter behavior.YesNone
columnThe list of fields to read. Each entry requires type and either index or value. See Column parameter format.YesNone
fieldDelimiterThe field separator for reading data.Yes,
skipHeaderSpecifies whether to skip the header row in CSV-like files. Not supported for compressed files.Nofalse
encodingThe encoding format of the files to read.Noutf-8
nullFormatThe string representation of null values. If not set, source data is written as-is without conversion.NoNone
markDoneFileNameThe name of the mark file. The synchronization task waits for this file to appear before starting.NoNone
maxRetryTimeThe number of retries when checking for the mark file. Retry interval is 1 minute; total wait time is 60 minutes.No60
csvReaderConfigConfiguration options for reading CSV files (Map type). Uses default values if not set.NoNone
fileFormatThe file type to read. By default, files are read as CSV files and parsed into a two-dimensional table. Set to binary for peer-to-peer binary copy between storage systems such as FTP and Object Storage Service (OSS).NoNone

Path parameter behavior

The behavior of path depends on what you specify:

Path typeBehavior
Single fileFTP Reader uses a single thread to extract data.
Multiple filesFTP Reader uses multi-threaded extraction. The number of threads equals the number of channels.
Wildcard (*)FTP Reader traverses the directory and reads all matching files. For example, / reads all files in the root directory; /bazhen/ reads all files in the bazhen directory. FTP Reader supports only the asterisk (*) as a file wildcard character.
Warning

Avoid using the * wildcard, as it may cause a Java Virtual Machine (JVM) memory overflow error.

Use scheduling parameters to configure file names and paths dynamically.

Additional constraints:

  • All text files in a job are treated as a single data table. Make sure all files conform to the same schema.

  • All files to read must be in CSV-like format and readable by the data synchronization system.

  • If no files match the specified path, the synchronization task reports an error.

Column parameter format

To read all columns as STRING, set "column": ["*"].

To specify individual fields:

{
    "type": "long",
    "index": 0
},
{
    "type": "string",
    "value": "alibaba"
}
  • type is required for each entry.

  • Specify either index (column position, starting from 0) or value (a constant injected by FTP Reader without reading from the source file).

Writer script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "ftp",
            "parameter": {
                "path": "",
                "fileName": "",
                "nullFormat": "null",
                "dateFormat": "yyyy-MM-dd HH:mm:ss",
                "datasource": "",
                "writeMode": "",
                "fieldDelimiter": ",",
                "encoding": "",
                "fileFormat": ""
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer script parameters

ParameterDescriptionRequiredDefault value
datasourceThe data source name. Must match the name of the added data source.YesNone
pathThe directory path in the FTP file system. FTP Writer writes multiple sub-files to this directory.YesNone
fileNameThe base file name. A random suffix is appended per write thread by default to avoid conflicts.YesNone
singleFileOutputSet to true to suppress the random suffix and write to the exact file name specified in fileName.Nofalse
writeModeThe cleanup behavior before writing. Valid values: truncate, append, nonConflict. See WriteMode values.YesNone
fieldDelimiterThe field separator for writing data. Must be a single character.YesNone
timeoutThe connection timeout for connecting to the FTP server. Unit: milliseconds.No60000 (1 minute)
skipHeaderSpecifies whether to skip the header row. Not supported for compressed files.Nofalse
compressThe compression format for writing. Valid values: gzip, bzip2.NoNo compression
encodingThe encoding format for writing.Noutf-8
nullFormatThe string representation for null values. For example, setting nullFormat="null" serializes null pointers as the literal string null.NoNone
dateFormatThe format for serializing DATE-type data. Example: "yyyy-MM-dd".NoNone
fileFormatThe format for writing files. Valid values: CSV, TEXT. CSV is a strict CSV format that escapes the column delimiter using double quotation marks ("). TEXT uses simple delimiter separation without escaping.NoTEXT
headerThe header row to write as the first row of the output file. Example: ["id", "name", "age"].NoNone
markDoneFileNameThe absolute path of the mark file generated after the synchronization task completes. In auto-triggered tasks, include scheduling parameters in the file name. Example: /user/ftp/markDone_${bizdate}.txt.NoNone

WriteMode values

ValueBehavior
truncateClears existing files before writing. If singleFileOutput is true, clears files with the same name. If false, clears all files with the fileName prefix.
appendWrites without pre-processing. Data Integration FTP Writer ensures file names do not conflict.
nonConflictReports an error if a file with the fileName prefix already exists.