This topic describes how to use a PyODPS node in DataWorks to reference a third-party package. You can reference a common Python script or a third-party open source package.

Reference a common Python script

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a Python resource.
    1. On the Data Analytics tab, move the pointer over the Create icon and choose MaxCompute > Resource > Python.
      Alternatively, you can click the required workflow in the Business Flow section, right-click MaxCompute, and then choose Create > Resource > Python.
    2. In the Create Resource dialog box, set the Resource Name and Location parameters. In this example, the Resource Name parameter is set to pyodps_packagetest.py.
      Notice The resource name can contain letters, digits, periods (.), underscores (_), and hyphens (-) and must end with .py.
    3. Click Create.
    4. On the configuration tab of the newly created Python resource, enter the common Python script that you want to reference. In this example, the following script is used:
      # import os
      # print os.getcwd()
      # print os.path.abspath('.')
      # print os.path.abspath('..')
      # print os.path.abspath(os.curdir)
      
      def printname():
          print 'test2'
      print 123
    5. Click the Submit icon in the top toolbar.
  3. Create a PyODPS 2 node.
    1. In the Business Flow section, find the workflow in which you want to create a PyODPS 2 node, right-click MaxCompute, and then choose Create > PyODPS 2.
    2. In the Create Node dialog box, set the Node Name and Location parameters. In this example, the Node Name parameter is set to pyodps_testpackage.
      Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
  4. Open the configuration tab of the newly created PyODPS 2 node. Then, right-click the name of the Python resource in the Resource folder of your workflow and select Insert Resource Path.
    After the resource is referenced, the ##@resource_reference{"pyodps_packagetest.py"} statement is automatically written in the code editor of the PyODPS 2 node. Insert Resource Path
  5. Enter the code that is used to reference the common Python script in the code editor of the PyODPS 2 node. In this example, the following code is used:
    ##@resource_reference{"pyodps_packagetest.py"} # This statement is required to reference the created Python resource. 
    
    import sys
    import os
    sys.path.append(os.path.dirname(os.path.abspath('pyodps_packagetest.py'))) # Import the resource to the workspace. 
    import pyodps_packagetest # Reference the resource. You must delete the .py suffix in the resource name. 
    pyodps_packagetest.printname() # Call the method. 
  6. Click the Run icon in the top toolbar and view the results on the Runtime Log tab in the lower part of the configuration tab.
    View the results

Reference a third-party open source package

Before you reference a third-party open source package, you must use pip to install the package and make sure that the following requirements are met:
  • An exclusive resource group for scheduling is available. For more information, see Create an exclusive resource group for scheduling.
  • The third-party open source package is installed in O&M Assistant of the exclusive resource group for scheduling. For more information, see O&M Assistant. PyODPS nodes include PyODPS 2 nodes and PyODPS 3 nodes.
    • If you want to use a PyODPS 2 node to reference the third-party open source package, run the following command to install the package:
      pip install <Package that you want to reference> -i https://pypi.tuna.tsinghua.edu.cn/simple
      If you are prompted to upgrade pip after you run the preceding command, run the following command:
      pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
    • If you want to use a PyODPS 3 node to reference the third-party open source package, run the following command to install the package:
      /home/tops/bin/pip3 install <Package that you want to reference> -i https://pypi.tuna.tsinghua.edu.cn/simple

      After the package is installed, run the import command to import the package. For example, use O&M Assistant to run the pip3 -install oss2 command to install the package oss2. Then, run the import oss2 command in the PyODPS 3 node to import and reference oss2.

      If you are prompted to upgrade pip after you run the preceding commands, run the following command:
      /home/tops/bin/pip3 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
      If the following error occurs when you use the PyODPS 3 node, submit a ticket to apply for permissions.
      "/home/admin/usertools/tools/cmd-0.sh:Line 3: /home/tops/bin/python3: The file or directory does not exist."