This topic describes the usage limits of PyODPS 3 nodes and how to create a PyODPS 3 node in DataWorks.

Limits

  • Python 3 defines bytecode differently in its different subversions such as Python 3.7 and Python 3.8.

    MaxCompute is compatible with Python 3.7. A MaxCompute client that uses another subversion of Python 3 will return an error when it executes code with specific syntax. For example, a MaxCompute client that uses Python 3.8 will return an error when it executes code with the finally block syntax. We recommend that you use Python 3.7.

  • Each PyODPS 3 node can process a maximum of 50 MB data and can occupy a maximum of 1 GB memory. Otherwise, DataWorks terminates the PyODPS 3 node. Do not write code that will process an extra large amount of data in a PyODPS 3 node.
  • PyODPS 3 nodes can run on a shared resource group and an exclusive resource group for scheduling that is purchased after April 2020. If you create an exclusive resource group for scheduling before April 2020, submit a ticket to upgrade the resource group.

Create a PyODPS 3 node

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and choose MaxCompute > PyODPS 3.
    Alternatively, click the required workflow under Business Flow, right-click MaxCompute, and choose Create > PyODPS 3.

    For more information about how to create a workflow, see Create a workflow.

  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. Edit and run the PyODPS 3 node.
    For example, if you want to use the execute_sql() method, you must specify runtime parameters for the SQL statements.
    hints={'odps.sql.python.version': 'cp37', 'odps.isolation.session.enable': True}
    If you want to use a user-defined function (UDF) for DataFrame, such as df.map, df.map_reduce, df.apply, and df.agg, specify the following settings:
    hints={'odps.isolation.session.enable': True}

    PyODPS determines the runtime environment of the UDF and commits SQL statements based on the Python version that the client uses. Assume that a public Python UDF is used to call DataFrame. When the client uses Python 3, statements are interpreted to Python 3. If the UDF executes a print statement specific to Python 2, the client returns the ScriptError error. For more information about how to reference a third-party package in a PyODPS 2 node, see Reference a third-party package in a PyODPS node.

  6. Click the Properties tab in the right-side navigation pane and set the scheduling properties for the node. For more information, see Basic properties.
  7. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy a node.
  8. Test the node. For more information, see View auto triggered nodes.