All Products
Search
Document Center

DataWorks:Create and manage ODPS nodes

Last Updated:Jan 18, 2024

DataWorks provides multiple types of ODPS nodes that you can use to develop MaxCompute tasks based on your business requirements. DataWorks also provides various node scheduling configurations to help you configure scheduling properties for a MaxCompute task in a flexible manner. This topic describes how to create and manage an ODPS node.

Prerequisites

  • A workflow is created.

    Development operations in different types of compute engines in DataStudio are performed based on workflows. Before you create a node, you must create a workflow. For more information, see Create a workflow.

  • A MaxCompute data source is added and associated with DataStudio.

    Before you create an ODPS node to develop a MaxCompute task, you must add a MaxCompute project to your DataWorks workspace as a MaxCompute data source and associate the MaxCompute data source with DataStudio as an underlying engine for MaxCompute task development.

  • The RAM user that you want to use to develop a MaxCompute task is added to the workspace as a member and is assigned the Development or Workspace Administrator role. The Workspace Administrator role has high permissions. Assign this role to the RAM user only if necessary. For information about how to add a member to a workspace and assign a role to the member, see Add workspace members and assign roles to them.

Create an ODPS node

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. On the DataStudio page, create a node. In this example, an ODPS SQL node is created.

    image.png

    1. In the Scheduled Workflow pane of the DataStudio page, find the created workflow, right-click the workflow name, and then choose Create Node > MaxCompute > ODPS SQL.

      Note

      You can also click or move the pointer over the Create icon in the Scheduled Workflow pane and select an ODPS node type to create a node of this type.

    2. In the Create Node dialog box, configure the Name parameter and click Confirm. After the node is created, you can develop and configure a MaxCompute task based on the node.

Develop a MaxCompute task

DataWorks supports multiple types of ODPS nodes. You can develop MaxCompute tasks based on the node types.

Node type

Use scenario

Task development guide

ODPS SQL

This type of node can be used to develop MaxCompute SQL tasks.

Develop a MaxCompute SQL task

SQL Snippet

This type of node can be used to develop MaxCompute SQL tasks.

In actual business scenarios, a large number of SQL code processes are similar. The input tables or output tables of these processes may have the same schema or compatible data types but different table names. In this case, developers can create a script template based on an SQL code process to reuse SQL code. The script template extracts input parameters from input tables and output parameters from output tables.

Overview of a script template

PyODPS 3

This type of node can be used to develop PyODPS tasks of MaxCompute. The underlying language version of a PyODPS 3 node is Python 3.

Develop a PyODPS 3 task

PyODPS 2

This type of node can be used to develop PyODPS tasks of MaxCompute. The underlying language version of a PyODPS 2 node is Python 2.

Develop a PyODPS 2 task

ODPS Spark

This type of node can be used to develop MaxCompute Spark tasks.

Develop a MaxCompute Spark task

ODPS Script

This type of node can be used to develop MaxCompute script tasks.

Develop a MaxCompute script task

ODPS MR

This type of node can be used to develop MaxCompute MapReduce tasks.

Develop a MaxCompute MapReduce task

Develop a MaxCompute task: advanced capabilities

In addition to MaxCompute task development capabilities, DataWorks also provides capabilities related to tables, resources, and functions for MaxCompute. You can use these capabilities to perform task development operations in an efficient manner.

  • Table-related capabilities: You can quickly create a MaxCompute table, view information of a MaxCompute table, and manage a MaxCompute table by using entry points and features in the DataWorks console. For more information, see Create and manage MaxCompute tables and Manage tables.

  • Function- and resource-related capabilities:

    • You can directly use built-in functions of MaxCompute when you develop a MaxCompute task in the DataWorks console. For information about how to view built-in functions of MaxCompute, see Use built-in functions.

    • You can upload a user-defined function (UDF) to DataWorks as a resource and register the UDF in DataWorks. When you develop a MaxCompute task, you can directly call the UDF. For information about how to use a UDF, see Create and use MaxCompute resources and Create and use a MaxCompute function.

    • You can upload a resource package that is developed on your on-premises machine to DataWorks or directly create a resource in DataWorks.

      DataWorks allows you to upload text files, Python code, and compressed packages such as .zip, .tgz, .tar.gz, .tar, and .jar packages to MaxCompute as different types of resources. When you create UDFs or run MapReduce tasks, you can reference these resources. For information about how to upload and use a resource, see Create and use MaxCompute resources.

Operations that you can perform after a task is developed

After you complete the development of a task by using the created node, you can perform the following operations:

  • Configure scheduling properties: You can configure properties for periodic scheduling of the node. If you want the system to periodically schedule and run the task, you must configure items for the node, such as rerun settings and scheduling dependencies. For more information, see Overview.

  • Debug the node: You can debug and test the code of the node to check whether the code logic meets your expectations. For more information, see Debugging procedure.

  • Deploy the node: After you complete all development operations, you can deploy the node. After the node is deployed, the system periodically schedules the node based on the scheduling properties of the node. For more information, see Deploy nodes.

Manage a node

After a node is created, you can perform various operations on the node, such as modifying or deleting the node. You can also combine the node with other nodes into a node group and then reference the node group in other workflows. For more information about management operations that you can perform on a node, see Create and manage a node group.