This topic describes the usage limits of PyODPS 3 nodes and how to create a PyODPS
3 node in the DataWorks console.
Limits
- Python 3 defines bytecode differently in its different subversions such as Python
3.7 and Python 3.8.
MaxCompute is compatible with Python 3.7. A MaxCompute client that uses another subversion
of Python 3 will return an error when it executes code with specific syntax. For example,
a MaxCompute client that uses Python 3.8 will return an error when it executes code
with the finally block syntax. We recommend that you use a MaxCompute client with
Python 3.7.
- Each PyODPS 3 node can process a maximum of 50 MB data and can occupy a maximum of
1 GB memory. Otherwise, DataWorks terminates the PyODPS 3 node. Do not write Python
code that will process an extra large amount of data in a PyODPS 3 node.
Create a PyODPS 3 node
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where the target workspace resides. Find
the target workspace and click Data Analytics in the Actions column.
- On the Data Development tab, move the pointer over
and choose .
You can also click the target workflow, right-click MaxCompute, and then choose .
- In the Create Node dialog box that appears, set Node Name and Location.
Note The node name can be up to 128 characters in length and can contain letters, digits,
underscores (_), and periods (.).
- Click OK.
- Enter the code of the PyODPS 3 node.
Enter the code of the PyODPS 3 node based on your needs. For example, if you need
to use the
execute_sql() method to execute SQL statements in the node, you must specify runtime parameters
for the SQL statements. For more information, see
SQL.
hints={'odps.sql.python.version': 'cp37', 'odps.isolation.session.enable': True}
If you need to use a user defined function (UDF), such as
df.map,
df.map_reduce,
df.apply, or
df.agg, to call the DataFrame API in the node, specify the following setting:
hints={'odps.isolation.session.enable': True}
PyODPS determines the runtime environment of the UDF and commits SQL statements based
on the Python version that the MaxCompute client uses. Assume that the MaxCompute
client uses Python 3. If the UDF executes a print statement specific to Python 2 to
call the DataFrame API, the MaxCompute client returns the ScriptError error.
- On the configuration tab of the PyODPS 3 node, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the
scheduling properties of the node. For more information, see Basic attributes.
- Save and commit the node.
Notice You need to set the node's re-run attribute and dependent upstream node to commit the node.
- Click
icon to save the node.
- Click
in the toolbar.
- In submit New version dialog box, enter remarks.
- Click OK.
In a workspace in the standard mode, you need to click
Publish in the upper-right corner after you commit the real-time sync node. For more information,
see
Deploy a node.
- The test node. For more information, see Auto triggered nodes.