This topic describes the usage limits of PyODPS 3 nodes and how to create a PyODPS
3 node in DataWorks.
Limits
- Python 3 defines bytecode differently in its different subversions such as Python
3.7 and Python 3.8.
MaxCompute is compatible with Python 3.7. A MaxCompute client that uses another subversion
of Python 3 will return an error when it executes code with specific syntax. For example,
a MaxCompute client that uses Python 3.8 will return an error when it executes code
with the finally block syntax. We recommend that you use Python 3.7.
- Each PyODPS 3 node can process a maximum of 50 MB data and can occupy a maximum of
1 GB memory. Otherwise, DataWorks terminates the PyODPS 3 node. Do not write code
that will process an extra large amount of data in a PyODPS 3 node.
- PyODPS 3 nodes can run on a shared resource group and an exclusive resource group
for scheduling that is purchased after April 2020. If you create an exclusive resource
group for scheduling before April 2020, submit a ticket to upgrade the resource group.
Create a PyODPS 3 node
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides, find the
workspace, and then click Data Analytics in the Actions column.
- Move the pointer over the
icon and choose .Alternatively, click the required workflow under
Business Flow, right-click
MaxCompute, and choose .
For more information about how to create a workflow, see Create a workflow.
- In the Create Node dialog box, set the Node Name and Location parameters.
Note The node name must be 1 to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.).
- Click Commit.
- Edit and run the PyODPS 3 node.
For example, if you want to use the
execute_sql() method, you must specify runtime parameters for the SQL statements.
hints={'odps.sql.python.version': 'cp37', 'odps.isolation.session.enable': True}
If you want to use a user-defined function (UDF) for DataFrame, such as
df.map,
df.map_reduce,
df.apply, and
df.agg, specify the following settings:
hints={'odps.isolation.session.enable': True}
PyODPS determines the runtime environment of the UDF and commits SQL statements based
on the Python version that the client uses. Assume that a public Python UDF is used
to call DataFrame. When the client uses Python 3, statements are interpreted to Python
3. If the UDF executes a print statement specific to Python 2, the client returns
the ScriptError error. For more information about how to reference a third-party package in a PyODPS
2 node, see Reference a third-party package in a PyODPS node.
- Click the Properties tab in the right-side navigation pane and set the scheduling properties for the node.
For more information, see Basic properties.
- Save and commit the node.
Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
- Click the
icon in the toolbar to save the node.
- Click the
icon in the toolbar.
- In the Commit Node dialog box, enter your comments in the Change description field.
- Click OK.
In a workspace in standard mode, you must click
Deploy in the upper-right corner after you commit the node. For more information, see
Deploy a node.
- Test the node. For more information, see View auto triggered nodes.