DataWorks Shell nodes support running Python scripts by uploading them as resources and referencing those resources from a Shell node. Both Python 2 and Python 3 are supported. You can run Python scripts on a common Shell node (backed by MaxCompute) or an EMR Shell node (backed by E-MapReduce).
Limitations
-
Third-party packages must support both Python 2 and Python 3.
-
For additional limits on common Shell nodes, see the Limits section of Create a Shell node.
-
For additional limits on EMR Shell nodes, see the Limits section of Create an EMR Shell node.
How it works
DataWorks uploads your Python script as a resource, then references that resource from a Shell node. The node runs the script using the interpreter path you specify:
| Python version | Interpreter command |
|---|---|
| Python 3 | /home/tops/bin/python3 <script>.py |
| Python 2 | python <script>.py |
Prerequisites
-
A DataWorks workspace with DataStudio access.
-
A Shell node or EMR Shell node. See Create a Shell node or Create an EMR Shell node.
-
If your script requires third-party packages, install them on the resource group before running the node:
-
Serverless resource group (recommended): use the image management feature to install packages.
-
Exclusive resource group for scheduling: use the O&M Assistant feature to install packages.
-
Run a Python script on a common Shell node
Use this procedure when your Shell node runs on a MaxCompute resource group.
Step 1: Create a MaxCompute Python resource
-
Log on to the DataWorks console. In the top navigation bar, select a region. In the left-side navigation pane, choose Data Development and O&M > Data Development, select a workspace, and click Go to Data Development.
-
On the DataStudio page, right-click a workflow, then choose Create Resource > MaxCompute > Python. In the dialog box, set Name to
mc.pyand click Create.mc.pyis an example name. Use any name that suits your project. -
On the resource configuration tab, write your Python script. Example: Python 3:
Python 3
print('This is a test text')Python 2
print "This is a test text" -
Click the
icon to save, then click the
icon to commit the resource.
Step 2: Reference the resource in a Shell node
-
On the DataStudio page, right-click a workflow, then choose Create Node > General > Shell. Configure the Name parameter and click Confirm.
-
On the Shell node configuration tab, locate
mc.pyunder Resource in the MaxCompute folder. Right-click the resource name and select Insert Resource Path. When the resource is referenced successfully, the configuration tab displays the resource path:
Step 3: Configure and run the node
Add the resource reference directive and the interpreter command to the configuration tab, then run the node.
Use Python 3 to run the referenced resource in the common Shell node
##@resource_reference{"mc.py"}
/home/tops/bin/python3 mc.py
Use Python 2 to run the referenced resource in the common Shell node
##@resource_reference{"mc.py"}
python mc.py
To run the node, click the
icon. In the warning dialog box, click Continue to Run. In the Runtime Parameters dialog box, select a resource group, specify a custom image, and click OK.
The output confirms the script ran successfully:
Run a Python script on an EMR Shell node
Use this procedure when your Shell node runs on an E-MapReduce (EMR) resource group.
Step 1: Create an EMR file resource
-
Log on to the DataWorks console. In the top navigation bar, select a region. In the left-side navigation pane, choose Data Development and O&M > Data Development, select a workspace, and click Go to Data Development.
-
On the DataStudio page, right-click a workflow, then choose Create Resource > EMR > EMR File. In the dialog box, set File Source to Local, click Upload to upload the
emr.pyscript, and click Create. Sample script content: Python 3:emr.pyis an example name. Use any name that suits your project.Python 3
print('This is a test text')Python 2
print "This is a test text" -
Click the
icon in the top toolbar to commit the resource.
Step 2: Reference the resource in an EMR Shell node
-
On the DataStudio page, right-click a workflow, then choose Create Node > EMR > EMR Shell. Configure the Name parameter and click Confirm.
-
On the EMR Shell node configuration tab, locate
emr.pyunder Resource in the EMR folder. Right-click the resource name and select Insert Resource Path. When the resource is referenced successfully, the configuration tab displays the resource path:
Step 3: Configure and run the node
Add the resource reference directive and the interpreter command to the configuration tab, then run the node.
Use Python 3 to run the referenced resource in the EMR Shell node
##@resource_reference{"emr.py"}
/home/tops/bin/python3 emr.py
Use Python 2 to run the referenced resource in the EMR Shell node
##@resource_reference{"emr.py"}
python emr.py
To run the node, click the
icon. In the Parameters dialog box, select a resource group, specify a custom image, and click Run.
The output confirms the script ran successfully: