This topic describes how to create a PyODPS 3 node and the limits of PyODPS 3 nodes in DataWorks.

Limits

  • Python 3 defines bytecode differently in different subversions such as Python 3.7 and Python 3.8.

    MaxCompute is compatible with Python 3.7. A MaxCompute client that uses another subversion of Python 3 returns an error when code that has specific syntax is run. For example, a MaxCompute client that uses Python 3.8 returns an error when code that has the finally block syntax is run. We recommend that you use Python 3.7.

  • Each PyODPS 3 node can process a maximum of 50 MB of data and occupy a maximum of 1 GB of memory. Otherwise, DataWorks terminates the PyODPS 3 node. For a PyODPS 3 node, do not write code that processes an excessively large amount of data.
  • PyODPS 3 nodes can run on a shared resource group or an exclusive resource group for scheduling that is purchased after April 2020. If your exclusive resource group for scheduling is purchased before April 2020, you can submit a ticket to upgrade the resource group.

Create a PyODPS 3 node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click DataStudio in the Actions column.
  2. Move the pointer over the Create icon and choose CreateNode > MaxCompute > PyODPS 3.
    Alternatively, you can click the name of the desired workflow in the Business Flow section, right-click MaxCompute, and then choose Create Node > PyODPS 3.

    For more information about how to create a workflow, see Create a workflow.

  3. In the Create Node dialog box, configure the Name and Path parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. Configure and run the PyODPS 3 node.
    If you want to use the execute_sql() method, you must configure the parameters that are used to execute SQL statements.
    hints={'odps.sql.python.version': 'cp37', 'odps.isolation.session.enable': True}
    If you want to use a user-defined function (UDF) of DataFrame, such as df.map, df.map_reduce, df.apply, or df.agg, add the following configuration:
    hints={'odps.isolation.session.enable': True}

    PyODPS determines the runtime environment of the UDF and commits SQL statements based on the version of Python that the client uses. For example, a public Python UDF is used to call the DataFrame API. If the client uses Python 3, statements are interpreted based on Python 3. If the UDF executes a statement that is specific to Python 2, such as a print statement, the client returns the ScriptError error. For more information about how to reference a third-party package in a PyODPS 2 node, see Use a PyODPS node to reference a third-party package.

  6. On the configuration tab of the PyODPS 3 node, click Properties in the right-side navigation pane. On the Properties tab, configure scheduling properties for the node. For more information, see Configure basic properties.
  7. Save and commit the node.
    Notice You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
    1. Click the Save icon in the top toolbar to save the node.
    2. Click the Submit icon in the toolbar.
    3. In the Commit Node dialog box, configure the Change description parameter.
    4. Click OK.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Deploy nodes.
  8. Perform O&M operations on the node. For more information, see O&M overview of auto triggered nodes.

FAQ: How do I determine whether a custom Python script is successfully run?

The logic for determining whether a custom Python script is successfully run is the same as the logic for determining whether a custom Shell script is successfully run. For more information, see Create a Shell node.