This topic describes how to reference a third-party package in a PyODPS node.

Prerequisites

Procedure

  1. Download the packages that are listed in the following table.
    Package File Resource file
    python-dateutil python-dateutil-2.6.0.zip python-dateutil.zip
    pytz pytz-2017.2.zip pytz.zip
    six six-1.11.0.tar.gz six.tar.gz
    pandas pandas-0.20.2-cp27-cp27m-manylinux1_x86_64.whl pandas.zip
    scipy scipy-0.19.0-cp27-cp27m-manylinux1_x86_64.whl scipy.zip
    scikit-learn scikit_learn-0.18.1-cp27-cp27m-manylinux1_x86_64.whl sklearn.zip
    Note
    • You need to manually change the file name extensions of the pandas, scipy, and scikit-learn packages from .whl to .zip.
    • Change the names of the resource files that you downloaded to be the same as those listed in the Resource file column of the preceding table.
    • Upload the preceding resource files as the resource files of the Archive storage class.
  2. Log on to the DataWorks console.
  3. Create a workflow.
    1. On the DataStudio page, right-click Business Flow and select Create Workflow.
    2. In the Create Workflow dialog box, specify Workflow Name and click Create.
  4. Create and commit resources.
    1. On the DataStudio page, move the pointer over the Create a workflow icon and choose MaxCompute > Resources > Archive.
      You can also unfold Business Flow, right-click a workflow, and then choose Create > MaxCompute > Resource > Archive.
    2. In the Create Resource dialog box, click Upload and select the python-dateutil-2.6.0.zip file. Upload a file
    3. Enter python.dateutil.zip in the Resource Name field and click OK. Rename the resource file
    4. Click the Commit icon to upload the resource file. Commit the resource
    5. Repeat the preceding steps to create and commit the resource files named pytz.zip, six.tar.gz, pandas.zip, sklearn.zip, and scipy.zip.
  5. Create a PyODPS node.
    1. Right-click the workflow that you created and choose Create > MaxCompute > PyODPS 2.
    2. In the Create Node dialog box, specify Node Name and click Commit.
    3. On the tab of the created node, enter the code of the node in the code editor.
      Sample code:
      def test(x):
          from sklearn import datasets, svm
          from scipy import misc
          import numpy as np
      
          iris = datasets.load_iris()
          assert iris.data.shape == (150, 4)
          assert np.array_equal(np.unique(iris.target),  [0, 1, 2])
      
          clf = svm.LinearSVC()
          clf.fit(iris.data, iris.target)
          pred = clf.predict([[5.0, 3.6, 1.3, 0.25]])
          assert pred[0] == 0
      
          assert misc.face().shape is not None
      
          return x
      
      from odps import options
      
      hints = {
          'odps.isolation.session.enable': True
      }
      libraries = ['python-dateutil.zip', 'pytz.zip', 'six.tar.gz', 'pandas.zip', 'scipy.zip', 'sklearn.zip']
      
      iris = o.get_table('pyodps_iris').to_df()
      
      print iris[:1].sepallength.map(test).execute(hints=hints, libraries=libraries)
                                  
  6. Click the Run icon.
  7. View the running result of the node on the Run Log tab.
    Sql compiled:
    CREATE TABLE tmp_pyodps_a3172c30_a0d7_4c88_bc39_434168263897 LIFECYCLE 1 AS
    SELECT pyodps_udf_1576485276_94d9d978_af66_4e27_a874_e787022dfb3d(t1.`sepallength`) AS `sepallength`
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    LIMIT 1
    
    Instance ID: 20191216083438175gcv6n4pr2
      Log view: http://logview.odps.aliyun.com/logview/?h=xxxxxx
    
       sepallength
    0          5.1
    Note For more information about best practices, see Use a PyODPS node to segment Chinese text based on Jieba.