This topic describes how to use a PyODPS node in DataWorks to reference a third-party package by depending on common Python scripts and open source third-party packages.

Depend on common Python scripts

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. Create a Python resource.
    1. On the Data Development tab, move the pointer over the Create icon icon and choose MaxCompute > Resources > Python.
      Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose New > Resource > Python.
    2. In the Create Resource dialog box, set the Resource Name and Location parameters. For example, set the Resource Name parameter to pyodps_packagetest.py.
      Notice The resource name can contain letters, digits, periods (.), underscores (_), and hyphens (-), and must end with .py.
    3. Click Confirm.
    4. On the configuration tab of the created Python resource, enter the code of the third-party package that is to be referenced. In this example, enter the following code:
      # import os
      # print os.getcwd()
      # print os.path.abspath('.')
      # print os.path.abspath('..')
      # print os.path.abspath(os.curdir)
      
      def printname():
          print 'test2'
      print 123
    5. Click the Submit icon in the top toolbar.
  3. Create a PyODPS 2 node.
    1. Click the required workflow under Business Flow. Right-click MaxCompute and choose Create > PyODPS 2.
    2. In the Create Node dialog box, set the Node Name and Location parameters. For example, set the Node Name parameter to pyodps_testpackage.
      Note The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
    3. Click Commit.
  4. Open the configuration tab of the PyODPS 2 node. Right-click the name of the created Python resource and select Insert Resource Path.
    After the resource is referenced, the ##@resource_reference{"pyodps_packagetest.py"} statement is automatically written in the code editor of the PyODPS 2 node.Reference the resource
  5. Enter the code of the third-party package that is to be referenced in the code editor of the PyODPS 2 node. In this example, enter the following code:
    ##@resource_reference{"pyodps_packagetest.py"} # This statement is required to reference the created Python resource.
    
    import sys
    import os
    sys.path.append(os.path.dirname(os.path.abspath('pyodps_packagetest.py'))) # Import the resource to the workspace.
    import pyodps_packagetest # Reference the resource. The .py suffix of the resource name must be deleted.
    pyodps_packagetest.printname() # Call the method.
  6. Click the Run icon in the top toolbar and view the results on the Runtime Log tab in the lower part of the page.
    View the results

Depend on an open source third-party package

If you want to depend on an open source third-party package, you must use pip to install the package. In addition, the following requirements must be met:
  • Use an exclusive resource group for scheduling. For more information, see Add an exclusive resource group for scheduling.
  • Install the required third-party package in O&M Assistant of the exclusive resource group for scheduling. For more information, see O&M Assistant. PyODPS nodes include PyODPS 2 nodes and PyODPS 3 nodes.
    • If a PyODPS 2 node is depended on, run the following command:
      pip install <Package to be installed> -i https://pypi.tuna.tsinghua.edu.cn/simple
      If you are prompted to upgrade pip after you run the preceding command, run the following command:
      pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
    • If a PyODPS 3 node is depended on, run the following command:
      /home/tops/bin/pip3 install <Package to be installed> -i https://pypi.tuna.tsinghua.edu.cn/simple
      If you are prompted to upgrade pip after you run the preceding command, run the following command:
      /home/tops/bin/pip3 install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
      If the following error is reported when you use the PyODPS 3 node, submit a ticket to apply for permissions:
      "/home/admin/usertools/tools/cmd-0.sh: line 3: /home/tops/bin/python3: The file or directory does not exist."