An FTP Check node can be used to periodically detect whether a specific file exists based on File Transfer Protocol (FTP). If the FTP Check node detects that the file exists, the scheduling system starts to run the descendant node of the FTP Check node. Otherwise, the FTP Check node retries the detection based on the configured detection interval. The FTP Check node stops the retry until the condition for stopping the detection is met. In most cases, FTP Check nodes are used for communications between the DataWorks scheduling system and external scheduling systems. This topic describes how to use an FTP Check node and the related precautions.

Prerequisites

Background information

An FTP Check node is typically used in the following scenario: A node in the DataWorks scheduling system needs to access an external database in an external scheduling system, but an ongoing data write task for the database is not performed by DataWorks. In this case, the time when the data write task is completed and the time when the database can be accessed are unknown to DataWorks. If the node accesses the database, the data that is read from the database may be incomplete or the data read fails because the data write task is not completed. To ensure that the node can successfully read data from the external database, you can enable the external scheduling system to generate a mark that indicates the data write task is completed. For example, you can enable the external scheduling system to generate a marker file with the suffix .done in the file system to indicate that the data write task is completed. Then, you can create an FTP Check node in the DataWorks scheduling system to periodically detect whether the marker file with the suffix .done exists. If the file exists, the node that needs to access the external database can be scheduled.
Note
  • You can specify the file system that can be used to store the marker files.
  • In this example, a marker file with the suffix .done is used. You can customize the information such as the format and name for your marker file.
Note External databases include but are not limited to Oracle, MySQL, and SQL Server.

Limits

  • Only the China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Zhangjiakou), China (Chengdu), and Singapore (Singapore) regions support FTP Check nodes.
  • FTP Check nodes can run only on exclusive resource groups for scheduling.
  • If an FTP Check node is scheduled by minute or hour, you can set the Check stop policy parameter only to Number of Check stops for the node.

Create an FTP Check node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the DataStudio page, move the pointer over the Create icon and choose General > FTP Check.
    Alternatively, you can find the desired workflow, right-click the workflow name, and then choose Create > General > FTP Check.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. Click the Properties tab in the right-side navigation pane and configure properties for the FTP Check node.
    The properties include basic properties, time properties, resource properties, and scheduling dependencies. For more information, see Basic properties, Configure time properties, Configure the resource group, and Instructions to configure scheduling dependencies.
  6. Configure a detection object and a detection policy.
    Configure parameters for the FTP Check node
    1. Select the FTP data source that you want to detect from the Select FTP data source drop-down list.
      You can select an FTP or SFTP data source. If no data source is available, you must add one. For more information, see Configure an FTP connection.
    2. Specify the path of the marker file in the Specify the file to Check field. If the file path that you specified is dynamic, you can use scheduling parameters to configure variable paths in the file path. For more information, see Configure scheduling parameters.
    1. Specify an interval at which the detection is performed in the Check interval (seconds) field.
    2. Select a policy for Check stop policy. The following policies are available:
      • Check stop time: the point in time when the detection stops. Specify this parameter in the hh24:mi:ss format. The time format is based on the 24-hour clock. If no marker file is detected each time the FTP Check node is run, the detection fails. In this case, the system does not schedule the descendant node of the FTP Check node. The system starts to schedule the descendant node only after the detection succeeds. If the previous detection fails, the node continues the detection based on the configured detection interval and stops the detection until the time that you specified for stopping the detection is reached. You can view the node logs to find the detailed cause of the failure.
        Note The scheduling cycle of the FTP Check node affects its stop policy.
        • If the FTP Check node is scheduled by minute or hour, the Check stop policy parameter can be set only to Number of Check stops for the node. For more information, see Configure a detection policy for an FTP Check node.
        • If you want to change the scheduling cycle of the FTP Check node for which Check stop policy is set to Check stop time from day to minute or hour, the Check stop time policy becomes invalid. In this case, you must set Check stop policy to Number of Check stops. Otherwise, the FTP Check node cannot be committed.
      • Number of Check stops: the maximum number of times that the detection can be performed. If no marker file is detected each time the FTP Check node is run, the detection fails. In this case, the system does not schedule the descendant node of the FTP Check node. The system starts to schedule the descendant node only after the detection succeeds. If the detection fails, the node continues the detection based on the specified detection interval and stops the detection when the maximum number of times that the detection can be performed is reached. You can view the node logs to find the detailed cause of the failure.
  7. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  8. Test the node. For more information, see View auto triggered nodes.