PAI-TensorFlow is a deep learning computing framework. It supports training based on multiple models. You can use Machine Learning Studio, the MaxCompute client, or SQL nodes in the DataWorks console to call PAI-TensorFlow.

Limits

You can call PAI-TensorFlow only in the China (Beijing) and China (Shanghai) regions.

Use Machine Learning Studio to call PAI-TensorFlow

  1. Log on to the PAI console. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization. The PAI Visualization Modeling page appears.PAI-Studio
    When you create a project, we recommend that you select By usage in the Open GPU column. You can only run PAI-TensorFlow tasks by using GPU resources.
  2. Find the created project and click Machine Learning in the Operation column.
  3. The Algorithm Platform tab of Machine Learning Platform for AI (PAI) appears. In the left-side navigation pane, click Home.
  4. Choose Templates > Tensorflow Pictures Classification and click Create.TensorFlow image classification
    The training code and data is built in the template. You can use the template to get started with PAI-TensorFlow.

    The template sample is based on the image classification samples of the CIFAR-10 dataset. For more information about the code and data, visit CIFAR-10 dataset.

  5. In the New Experiment dialog box, configure the Name, Description, and Save To parameters.
  6. Click OK to go the Experiment page.
    The following figure shows the workflow of the experiment.TensorFlow workflow
    Description of the workflow:
    • You can use the data stored in Object Storage Service (OSS) buckets or MaxCompute tables as data sources. For more information, see I/O methods of PAI-TensorFlow.
    • To run a TensorFlow task, you must use your Alibaba Cloud account to authorize PAI to read OSS data. Perform the following steps:
      1. In the left-side navigation pane of the Algorithm Platform tab of PAI, click Settings.
      2. On the General page, select Authorize Machine Learning Platform for AI to access my OSS resources in the OSS Authorization section.
      OSS authorizationUse your Alibaba Cloud account to log on to the PAI console. Select Authorize Machine Learning Platform for AI to access my OSS resources in the OSS Authorization section of the General page.
    • If you use the experiment template, you must change the checkpoint of TensorFlow to the checkpoint of your OSS bucket.
    The following table describes parameters of the training component.
    Tab Parameter Description
    Parameters Setting TensorFlow Version You can select a suitable TensorFlow version based on the code.
    Python Code Files You must add the file of the code that you want to run to the OSS path. Compress the project file into a TAR.GZ file.
    Note The OSS bucket and the project must reside in the same region.
    Primary Python File If the code file is a TAR.GZ. file, select the entry Python file for this parameter.
    Data Source Directory The endpoint of the OSS bucket.
    Configuration File Hyperparameters and Custom Parameters Click the folder. Select a file from the OSS bucket or upload a file to the OSS bucket.
    Checkpoint Output Directory/Model Input Directory Select the input directory to store the model.
    MaxCompute output table (optional) You must set this parameter to a created table. The name of the output table must be the same as that of the output table in the code.
    SQL create table statement If the output table in the code does not exist, enter a SQL statement to create the output table.

    The SQL statement that is used to create a table is executed before the TensorFlow script execution. Example: create table iris_output(f1 DOUBLE,f2 DOUBLE,f3 DOUBLE,f4 DOUBLE,f5 STRING);.

    Maximum Scheduled Job Runtime The maximum time period for the execution of a scheduled task.
    The following table describes the distributed parameters.
    Tab Parameter Description
    Tuning Standalone/Distributed The number of machines.
    GPUs per Worker The number of GPUs for each worker.

    If the number of workers is 3 and the number of GPUs for each worker is 2, the total number of GPUs is 6.

    Workers The number of machines for distributed computing.
    Parameter Servers The number of servers,which is no more than 50% of the number of workers.

Use the MaxCompute client to call PAI-TensorFlow

MaxCompute is the big data computing platform provided by Alibaba Cloud. PAI is the artificial intelligence (AI) platform based on MaxCompute. Therefore, you can run PAI-TensorFlow tasks in MaxCompute. This section describes how to run PAI tasks by using the MaxCompute client.

To run PAI-TensorFlow tasks by using the MaxCompute client, you must activate PAI based on the pay-as-you-go billing method.

The MaxCompute client is a command-line tool that can be encapsulated. For more information about how to download and install the client, see Client. After you configure the environment, enter the script that is used to run PAI-TensorFlow tasks in the command-line tool. For more information, see Task parameters of PAI-TensorFlow.

Use SQL nodes in the DataWorks console to call PAI-TensorFlow

  1. Log on to the DataWorks console.
  2. Find the required workspace and click Data Analytics in the Actions column.
    DataWorks and Machine Learning Studio share projects. When you create a project, you must activate MaxCompute and use the pay-as-you-go billing method.
  3. In the Data Analytics section, move the pointer over the Create icon icon and choose MaxCompute > ODPS SQL.
  4. In the Create Node dialog box, configure the Node Type, Node Name, and Location parameters.
    Before you create a node, you must create a task workflow. For more information, see Create a workflow.
  5. Copy the PAI command that is used to run the PAI task to the edit box for SQL statements. Click the Run icon to run the command.
    Example:Example