DataWorks allows you to use Object Storage Service (OSS) object inspection nodes to check whether an OSS object exists in a specified OSS storage path. This type of node is suitable for scenarios where descendant nodes can run only after specific objects are generated in OSS. For example, you can run a node that is used to synchronize OSS data to DataWorks only after the required OSS objects are generated. In this case, you can use an OSS object inspection node to check whether the OSS objects are generated. This topic describes how to create and use an OSS object inspection node.

Implementation mechanism

If a specific OSS object is detected within the timeout period of an OSS object inspection node, the OSS object inspection node succeeds and exits. If the specific OSS object is not detected within the timeout period, the OSS object inspection node fails and exits.

Permissions

MaxCompute uses Resource Access Management (RAM) and Security Token Service (STS) of Alibaba Cloud to secure data access. When an OSS object inspection node is running, it checks whether the required OSS object exists by using MaxCompute. In the development or production environment, the node implements the check based on the access identity of the development or production environment. Make sure that the access identity has the required permissions on the OSS bucket. For more information, see STS authorization.

Node configuration

You must configure scheduling dependencies and scheduling time for an OSS object inspection node in the way that you configure scheduling dependencies and scheduling time for other types of nodes.
  • Dependency configuration

    Each node in DataWorks must be configured with upstream dependencies. If the OSS object inspection node does not have ancestor nodes, you can select the zero load node or the root node in the current workspace as the ancestor node of the OSS object inspection node based on the complexity of your business.

  • Check time
    • Specify the scheduling time for the OSS object inspection node to define the point in time at which the OSS object inspection node is expected to start to check whether a specific OSS object exists.
    • Configure the Timeout parameter to specify a timeout period for the OSS object inspection node. If the OSS object is not detected before the timeout period ends, the OSS object inspection node fails and exits.
  • Check object

    Specify the storage path for objects to be detected in OSS. You can use DataWorks scheduling parameters to specify the storage path. Variables in the code are automatically replaced with specific values based on the point in time at which the OSS object inspection node is run to implement dynamic detection of OSS objects.

Create and use an OSS object inspection node

You can use an OSS object inspection node to check OSS objects within all tenants. To create and use an OSS object inspection node, perform the following steps:

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides. Find your workspace and click DataStudio in the Actions column.
  2. Create a workflow.
    If you have an existing workflow, skip this step.
    1. Move the pointer over the Create icon and select Create Workflow.
    2. In the Create Workflow dialog box, configure the Workflow Name parameter.
    3. Click Create.
  3. Create an OSS object inspection node.
    1. Move the pointer over the Create icon and choose Create Node > General > OSS Object Inspection.
      Alternatively, you can right-click the name of a desired workflow and choose Create Node > General > OSS Object Inspection.
    2. In the Create Node dialog box, configure the parameters, such as Name, and click Confirm.
  4. Configure the OSS object inspection node.
    Configure an OSS object inspection nodeThe following table describes the parameters.
    ParameterDescription
    OSS ObjectThe storage path of the OSS object. You can specify a scheduling parameter in the parameter value. For information about scheduling parameters, see DataWorks scheduling parameters. Timeout period
    TimeoutThe timeout period during which DataWorks checks whether the OSS object exists every 5 seconds. If the OSS object is not detected before the timeout period ends, the OSS object inspection node fails. The maximum value is 1440, in minutes.
    Storage Address (Storage)The storage space of the OSS object. Valid values:
    • Myself: detects the OSS object in the storage space of the current tenant.
    • Other: detects the OSS object in the storage space of another tenant.
  5. Configure scheduling properties for the OSS object inspection node.
    If you want the system to periodically run the OSS object inspection node, you can click Properties in the right-side navigation pane on the configuration tab of the OSS object inspection node to configure properties for the node based on your business requirements.
  6. Commit and deploy the MySQL node.
    1. Click the Save icon in the top toolbar to save the node.
    2. Click the Submit icon in the top toolbar to commit the node.
    3. In the Commit Node dialog box, configure the Change description parameter.
    4. Click OK.
    If you use a workspace in standard mode, you must deploy the node in the production environment after you commit the node. On the left side of the top navigation bar, click Deploy. For more information, see Deploy nodes.
  7. View the result.
    Go to Operation Center and view run logs of the OSS object inspection node. If the following error information appears, the OSS object is not detected:
    <Error>
     <Code>NoSuchKey</Code>
     <Message>The specified key does not exist.</Message>
     <RequestId></RequestId>
     <HostId>OSS object</HostId>
     <Key>xc/111.txt</Key>
    </Error>

What do I do if the error "The specified key does not exist" occurs?

  • Cause: The specified OSS object does not exist. No OSS object is detected.
  • Solution:
    • Check whether the storage path of the OSS object is correct and contains the file name extension of the object.
    • Check whether the value of the scheduling parameter that you specified for the storage path matches the OSS object name.