edit-icon download-icon

FTP log data upload

Last Updated: Mar 27, 2018

This article describes how to upload log data in the FTP data source to DataWorks (formerly Data IDE) with Data Integration function by using FTP data source as an example.

Procedure

Add a data source

  1. After entering the project space, choose Data Integration > Data Sources page, and click New Source in the upper-right corner.

    1

  2. Enter the required information in the Add Data Source page. Select ftp for the data source type and perform the configuration as follows.

    10

    Configuration of FTP data source:

    • Data source name: ftp_workshop_log

    • Data source description: ftp log file synchronization

    • Data source type: ftp

    • Network Type: classic network

    • Protocol: sftp

    • Host: 10.80.177.33

    • Port: 22

    • Username/password: workshop/workshop

  3. Click Test Connectivity. If the test is successful, click Complete to successfully add the data source.

    11

Create target table

  1. Click Data Development from the upper menu to enter the Data Development homepage. Click New > Create script or Create script.

    12

  2. Complete the configurations in the Create Script File window that appears. Enter the file name, select ODPS SQL for the type, and click Submit. See the following figure.

    13

  3. Enter the statement as follows to create a target table for the FTP logs.

    1. DROP TABLE IF EXISTS ods_raw_log_d;
    2. CREATE TABLE ods_raw_log_d (
    3. col STRING
    4. )
    5. PARTITIONED BY (
    6. dt STRING
    7. );
  4. Click Run. When the log message returns as success, it means the target table is created successfully.

    14

    Note:

    You can use the DESC syntax to check if the table is created successfully.

    16

  5. Click Save to save the input SQL table creation statement.

    15

Create a data synchronization task

  1. Click New and select Create Task.

    17

  2. Configure the new node task, and click Create. The configuration items are as shown in the following figure.

    18

Configure the data synchronization task

  1. Enter the node configuration page, and select the source. See the following figure.

    21

    The configuration items for the data source are described as follows:

    • Data source: Select the created ftp data source.

    • File path: /home/workshop/user_log.txt

    • Column separator: |

  2. Click Next to select the target. See the following figure.

    22

    The configuration items for the data target are described as follows:

    • Data source: Select odps_first as the data storage target source.

    • Table: Select ods_raw_log_d as the data storage target table.

    • Partition information: ${bdp.system.bizdate}.

    • Clearing rule: Clear existing data before the write.

  3. Click Next to connect the data to be synchronized and configure field mapping. See the following figure.

    23

  4. Click Next to configure channel control with a maximum job rate of 10 MB/s. See the following figure.

    24

  5. Click Next to enter the preview and save page to preview the configurations or make modifications, if necessary. After confirming the configurations, click Save.

Submit a data synchronization task

  1. Click Submit to submit the configured data synchronization task.

    25

  2. Click Confirm Submission in the Submit New Version window that appears to submit the data synchronization task to the scheduling system.

    26

Test a data synchronization task

  1. Click Test run in the toolbar.

  2. Click OK in the Cyclical task run reminder window that appears.

    27

  3. In the Test run window that appears, check that the instance name and business date are represented by the default value and click Run.

    28

  4. Click Go to O&M Center in the Task Test run window that appears.

    29

    You can check the instance running status in the O&M center as shown in the following figure.

    30

Check if the data is successfully imported into MaxCompute

  1. Return to the create_table_ddl script file.

  2. Write and run SQL statement to check the entries imported into ods_raw_log_d.

    31

    The SQL statement is as follows. The partition key must be changed to the business date. If the task is tested on 20170712, the business date is 20170711.

    1. -- Check if the data is successfully written into MaxCompute
    2. select count(*) from ods_raw_log_d where dt=business date;
Thank you! We've received your feedback.