All Products
Search
Document Center

Platform For AI:PAI-Flow manifest parameters

Last Updated:Dec 03, 2024

When you call the CreatePipelineRun operation to create a pipeline run, you can use a manifest file to specify information about the pipeline run. This topic describes the parameters in the manifest file and provides sample configurations for different types of pipelines.

Common parameters

apiVersion and metadata

apiVersion: The version of the manifest schema.
metadata:
  name: The name of the pipeline.
  identifier: The unique identifier of the pipeline.
  version: The version of the pipeline.
  provider: The provider of the pipeline.
  guid: The unique ID of the temporary node.
  displayName: The display name of the pipeline.
  annotations: The annotations of the pipeline.

Parameter

Type

Default value

Required

Description

apiVersion

String

core/v1

Yes

The version of the manifest schema. Set the value to core/v1.

metadata

Object

N/A

Yes

The metadata of the pipeline. You can identify a pipeline by using the identifier, version, and provider parameters.

name

String

N/A

No

The name of the pipeline. The value of this parameter does not need to be unique.

If you leave this parameter empty, a random value is used.

identifier

String

N/A

Yes

The unique identifier of the pipeline.

version

String

N/A

Yes

The version of the pipeline. If the input, output, and implementation of the pipeline change, create a new version.

We recommend that you use the v1.0.0 version format.

provider

String

N/A

Yes

The provider of the pipeline. Valid values:

  • Your user ID: specifies that you created the pipeline.

  • PAI: specifies that the pipeline is provided by Platform for AI (PAI).

guid

String

N/A

No

The unique identifier of a node of the pipeline. Configure this parameter only for temporary nodes.

displayName

String

N/A

No

The display name of the pipeline. The value of this parameter does not need to be unique.

annotations

List<Object>

N/A

No

The annotations of the pipeline. You can use this parameter to identify an Alink group node.

Example:

 pai.aliyun.com/algo-type:
 alink:
 type: "batch"

inputs and outputs

spec:
  inputs: The inputs of the pipeline.
  	artifacts: 
    parameters:
  outputs: The outputs of the pipeline.
  	artifacts:
    parameters:
  • Artifact

    Specifies the artifacts of a node. Artifacts are used to connect upstream or downstream nodes.

    artifacts:
    - name: The name of the artifact.
      metadata: The metadata of the artifact.
      value: The value of the artifact.
      desc: The description of the artifact.
      repeated: Specifies whether the artifact can be duplicated.
      required: Specifies whether the value parameter is required. The default value is false. 

    Parameter

    Type

    Default value

    Description

    name

    String

    N/A

    The name of the artifact. The name must be unique within a node.

    metadata

    Object

    N/A

    The metadata of the artifact, including the artifact type and storage type.

    value

    String

    N/A

    The value of the artifact.

    desc

    String

    N/A

    The description of the artifact. You can leave this parameter empty.

    repeated

    Bool

    false

    Specifies whether to duplicate the artifact to generate multiple artifacts that have the same metadata. Valid values:

    • true

    • false

    required

    Bool

    false

    Specifies whether the value parameter is required.

    • true

    • false (default)

  • Parameter

    Contains the following parameters that specify how to run the pipeline:

    parameters:
    - name: The name of the parameter.
      type: The type of the parameter.
      value: The default value of the parameter.
      desc: The description of the parameter.
      feasible: The specification of the parameter. 

    Parameter

    Type

    Default value

    Description

    name

    String

    N/A

    The name of the parameter, which must be unique within a node.

    type

    String

    N/A

    The type of the parameter. Valid values:

    • String

    • Double

    • Int

    • Bool

    • Map

    • List

    value

    String

    N/A

    The default value of the parameter.

    desc

    String

    N/A

    Description

    feasible

    Feasible

    N/A

    The specification of the parameter. Valid values:

    • min: the minimum value.

    • max: the maximum value.

    • len: the length of the value.

    • regex: a regular expression.

Create a single-node pipeline

Template

You can use a single-node pipeline to define an algorithm that can be reused in a Directed Acyclic Graph (DAG) pipeline.

apiVersion: The version of the manifest schema.
metadata:
  name: The name of the pipeline.
  identifier: The identifier of the pipeline.
  version: The version of the pipeline.
  provider: The provider of the pipeline.
  displayName: The display name of the pipeline.
  annotations: The annotations of the pipeline.
spec:
  inputs: The inputs of the pipeline.
  	artifacts: 
    parameters:
  outputs: The outputs of the pipeline.
  	artifacts:
    parameters:
  initContainers: The containers that you must initialize before you initialize the main container and sidecar containers.
  -name: The name of the container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.
  container: The description of the main container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.
  sideCarContainers: The sidecar containers.
  -name: the name of the container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.

Parameter description

This section describes only the parameters related to containers. For information about other parameters, see Common parameters.

spec:
  initContainers: The containers that you must initialize before you initialize the main container and sidecar containers.
  -name: the name of the container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.
  container: The main container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.
  sideCarContainers: The sidecar containers.
  -name: The name of the container.
    image: The image that is used to start the container.
    command: The startup command of the container.
    args: The startup parameters of the container.

Parameter

Type

Default value

Description

initContainers

List<InitContainer>

N/A

The containers that you must initialize before you initialize the main container and sidecar containers.

InitContainer

image

String

N/A

The image that is used to start the container. You can use an official image provided by PAI or a custom image.

args

List

N/A

The startup parameters of the container. You can leave this parameter empty.

command

String

N/A

The startup command of the container.

name

String

N/A

The name of the container, which must be unique within a node.

container

MainContainer

N/A

The main container. The status of the main container determines the result of a single-node pipeline run.

MainContainer

image

String

N/A

The image that is used to start the container. You can use an official image provided by PAI or a custom image.

args

List

N/A

The startup parameters of the container. You can leave this parameter empty.

command

String

N/A

The startup command of the container.

sideCarContainers

List<SideCarContainer>

N/A

The sidecar containers.

SideCarContainer

image

String

N/A

The image that is used to start the container. You can use an official image provided by PAI or a custom image.

args

List

N/A

The startup parameters of the container. You can leave this parameter empty.

command

String

N/A

The startup command of the container.

name

String

N/A

The name of the container, which must be unique within a node.

Examples

This example creates a single-node pipeline that outputs a MaxCompute table.

  • The outputs parameter specifies an artifact of the MaxComputeTable type.

  • The inputs parameter specifies the name of the output MaxCompute table.

apiVersion: core/v1
metadata:
  provider: '13266******76250'
  version: v1
  identifier: echo
spec:
  inputs:
    parameters:
    - name: outputTableName
      type: String
      value: pai_temp_outputTable
  outputs:
    artifacts:
    - name: outputTable
      metadata:
        type:
          DataSet:
            locationType: MaxComputeTable
      desc: SQL Script Output Port
  container:
    image: 'registry.cn-shanghai.aliyuncs.com/paiflow-core/max-compute-executor:v1.1.4'
    command:
    - sh
    - -c
    args:
    - |
      mkdir -p /pai/outputs/artifacts/outputTable/
      echo '{\"metadata\":{\"type\":{\"DataSet\":{\"locationType\":\"MaxComputeTable\"}}}' > /pai/outputs/artifacts/outputTable/metadata
      echo '{\"location\": {\"endpoint\": \"http://service.cn-shanghai.maxcompute.aliyun.com/api\",\n      \"project\": \"wyl_t******t2\", \"table\":  \"{{inputs.parameters.outputTableName}}\"}}' > /pai/outputs/artifacts/outputTable/value

Create a DAG pipeline

Template

You can create a DAG pipeline to define the execution order of multiple nodes.

apiVersion: The version of the manifest schema.
metadata:
  name: The name of the pipeline.
  identifier: The identifier of the pipeline.
  version: The version of the pipeline.
  provider: The provider of the pipeline.
  displayName: The display name of the pipeline.
  annotations: The annotations of the pipeline.
spec:
  inputs: The inputs of the pipeline.
  	artifacts: 
    parameters:
  outputs: The outputs of the pipeline.
  	artifacts:
    parameters:
  pipelines: The nodes in the pipeline.
  - metadata:
      name: The name of the node.
      displayName: The name to display for the node.
      guid: The unique identifier of the node. Configure this parameter only for a temporary node.
      provider: The provider of the node.
      identifier: The unique identifier of the node. Configure this parameter only for a non-temporary node.
      version: The version of the node.
    spec:
      arguments: The arguments of the node.
      	artifacts: 
    		parameters:
      dependencies: The dependencies of the node.
      when: The conditional expression for executing the node.
      withSequence: The index-based loop expression for executing the node.
        start: The start index.
        end: The end index.
        format: The format of the expression. For more information, visit https://pkg.go.dev/fmt.  
      withItems: The iteration-based loop expression for executing the node.
      withParam: The parameter-based loop expression for executing the node.
      parallelism: The maximum number of nodes that can run at the same time.

Parameter description

This section describes only the parameters related to a DAG pipeline. For information about other parameters, see Common parameters.

The following section describes the details of nodes in the pipeline.

spec:
  pipelines: The nodes in the pipeline.
  - metadata:
      name: The name of the node.
      displayName: The name to display for the node.
      guid: The unique identifier of the node. Configure this parameter only for a temporary node.
      provider: The provider of the node.
      identifier: The unique identifier of the node. Configure this parameter only for a non-temporary node.
      version: The version of the node.
    spec:
      arguments: The arguments of the node.
      	artifacts: The artifacts of the node.
    		parameters: The parameters of the node.
      dependencies: The dependent nodes of the node.
      when: The conditional expression for executing the node.
      withSequence: The index-based loop expression for executing the node.
        start: The start index.
        end: The end index.
        format: The format of the expression. For more information, visit https://pkg.go.dev/fmt.  
      withItems: The iteration-based loop expression for executing the node.
      withParam: The parameter-based loop expression for executing the node.
      parallelism: The maximum number of nodes that can run at the same time.

The pipelines parameter specifies a collection of nodes that are registered to PAI-Flow. Each node in a collection can contain another collection of nodes. The depth of nesting has no restrictions. A node is defined by using the following parameters:

Parameter

Type

Default value

Description

metadata

Object

N/A

The metadata of the node.

name

String

N/A

The name of the node, which must be unique in the collection.

displayName

String

N/A

The display name of the node, which does not need to be a unique value.

provider

String

N/A

The provider of the node.

guid

String

N/A

The unique identifier of the node. Configure this parameter only for a temporary node.

identifier

String

N/A

The unique identifier of the node. Configure this parameter only for a non-temporary node.

version

String

N/A

The version of the node.

spec

Object

N/A

The specification of the node.

arguments

Object

N/A

The parameters and artifacts of the node. If you specify this parameter, the default values are overridden.

dependencies

List<String>

N/A

The dependent nodes of the node.

The node is executed only after its dependent nodes complete execution.

when

String

N/A

The conditional expression for executing the node. The node is executed only when the requirements specified by the dependencies parameter and this parameter are met.

withSequence

Object

N/A

The index-based loop expression for executing the node.

withItems

List

N/A

The iteration-based loop expression for executing the node.

withParam

String

N/A

The parameter-based loop expression for executing the node.

parallelism

Int

N/A

The maximum number of nodes that can run at the same time in the pipeline. This is a pipeline-level parameter.

Examples

  • A simple DAG pipeline

    In this example, the pipeline consists of the following nodes:

    • Read data source: This node outputs a MaxCompute table.

    • Type conversion: This node converts the data types in the output table of the Read data source node and outputs a new MaxCompute table.

    • SQL script: This node uses an SQL script to convert the data types in the output table of the Type conversion node and outputs a new MaxCompute table.

    image

    Click to expand the sample manifest file

    apiVersion: "core/v1"
    metadata:
      provider: "11577*******4901"
      version: "v1"
      identifier: "job-root-pipeline-identifier"
    spec:
      inputs:
        parameters:
        - name: "execution_maxcompute"
          type: "Map"
      pipelines:
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "data_source"
          name: "data_source"
          displayName: "Read data source"
        spec:
          arguments:
            parameters:
            - name: "inputTableName"
              value: "pai_online_project.wumai_data"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "type_transform"
          name: "type_transform"
          displayName: "Type conversion"
        spec:
          arguments:
            artifacts:
            - name: "inputTable"
              from: "{{pipelines.data_source.outputs.artifacts.outputTable}}"
            parameters:
            - name: "cols_to_double"
              value: "time,hour,pm2,pm10,so2,co,no2"
            - name: "default_int_value"
              value: "0"
            - name: "reserveOldFeat"
              value: "false"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
          dependencies:
          - "data_source"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "sql"
          name: "sql"
          displayName: "SQL script"
        spec:
          arguments:
            artifacts:
            - name: "inputTable1"
              from: "{{pipelines.type_transform.outputs.artifacts.outputTable}}"
            parameters:
            - name: "sql"
              value: "select time,hour,(case when pm2>200 then 1 else 0 end),pm10,so2,co,no2\
                \ from ${t1}"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
          dependencies:
          - "type_transform"
  • A pipeline that uses a conditional expression

    In this example, the pipeline consists of the following nodes:

    • Read data source: This node outputs a MaxCompute table.

    • If the execution of the Read data source node is successful, the Type conversion pm2 node is executed to convert the data type of the pm2 column in the output table of the upstream node and then output a new MaxCompute table.

    • If the execution of the Read data source node fails, the Type conversion pm10 node is executed to convert the data type of the pm10 column in the output table of the upstream node and then output a new MaxCompute table.

    image

    Click to expand the sample manifest file

    apiVersion: "core/v1"
    metadata:
      provider: "1157******4901"
      version: "v1"
      identifier: "job-root-pipeline-identifier"
    spec:
      inputs:
        parameters:
        - name: "execution_maxcompute"
          type: "Map"
      pipelines:
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "data_source"
          name: "data_source"
          displayName: "Read data source"
        spec:
          arguments:
            parameters:
            - name: "inputTableName"
              value: "pai_online_project.wumai_data"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "type_transform"
          name: "type_transform_pm10"
          displayName: "Type conversion pm10"
        spec:
          when: '{{pipelines.dataSource.status}} == Failed'
          arguments:
            artifacts:
            - name: "inputTable"
              from: "{{pipelines.data_source.outputs.artifacts.outputTable}}"
            parameters:
            - name: "cols_to_double"
              value: "pm10"
            - name: "default_int_value"
              value: "0"
            - name: "reserveOldFeat"
              value: "false"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
          dependencies:
          - "data_source"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "type_transform"
          name: "type_transform_pm2"
          displayName: "Type conversion pm2"
        spec:
          when: '{{pipelines.dataSource.status}} == Succeeded'
          arguments:
            artifacts:
            - name: "inputTable"
              from: "{{pipelines.data_source.outputs.artifacts.outputTable}}"
            parameters:
            - name: "cols_to_double"
              value: "pm2"
            - name: "default_int_value"
              value: "0"
            - name: "reserveOldFeat"
              value: "false"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
          dependencies:
          - "data_source"

  • A pipeline that uses a parameter-based loop expression

    In this example, the pipeline consists of the following two nodes, which expand into four nodes during execution.

    • echo: This node outputs the ["a","b","c"] array.

    • date_consumer: This node is executed once for each element in the ["a","b","c"] array. This loop execution generates the following nodes:

      • date_consumer(0,"a"): the node generated for the "a" element.

      • date_consumer(1,"b"): the node generated for the "b" element.

      • date_consumer(2,"c"): the node generated for the "c" element.

    image

    Click to expand the sample manifest file

    apiVersion: "core/v1beta1"
    metadata:
      name: "echo_repeat"
      provider: "15577*****4904"
      identifier: "forcast"
      version: "v1"
    spec:
      pipelines:
      # The echo node has an output parameter whose value is ["a","b","c"]
      - apiVersion: "core/v1"
        metadata:
          provider: "15577*****4904"
          version: "v1"
          identifier: "echo"
          name: "echo"
      # The data_consumer node is executed three times.
      - apiVersion: "core/v1"
        metadata:
          provider: "15577*****4904"
          version: "v1"
          identifier: "date_consumer"
          name: "date_consumer"
        spec:
          arguments:
            parameters:
            - name: "data"
              value: "{{item.num}}"
          withParam: "{{pipelines.echo.outputs.parameters}}"
          dependencies: "echo"

  • A pipeline that uses OSS input, a custom image, and DLC jobs

    In this example, the pipeline consists of the following four nodes, which turn into six nodes during execution:

    • Read OSS data-1: This node reads data from an Object Storage Service (OSS) bucket.

    • Data to TFRecord-1: This node converts OSS data to the TFRecord format.

    • Image self-supervised training-1: This node creates a Deep Learning Container (DLC) job for image-related training.

    • Python Script-1: This node is executed based on the withSequence parameter. In this example, this node is executed three times and generates the following three nodes. Each node uses a custom image to run a DLC job.

      • Python script-1(0,1).

      • Python script-1(1,2).

      • Python script-1(2,3).

    image

    Click to expand the sample manifest file

    apiVersion: "core/v1"
    metadata:
      provider: "11577*****4901"
      version: "v1"
      identifier: "oss-dlc"
    spec:
      inputs:
        parameters:
        - name: "execution_maxcompute"
          type: "Map"
        - name: "execution_maxcompute_optional"
          type: "Map"
        - name: "execution_dlc_optional"
          type: "Map"
        - name: "execution_dlc"
          type: "Map"
      pipelines:
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "tfReadFileData"
          name: "id-26425a11-939f-4c9e-b261"
          displayName: "Read OSS data-1"
        spec:
          arguments:
            parameters:
            - name: "ossBucket"
              value: "oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/demo_image_match/meta/train_crop_label_lt_10k_nolabel.txt"
            - name: "arn"
              value: "acs:ram::1157******94901:role/aliyunodpspaidefaultrole"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "ev_ssl"
          name: "id-548b1789-3acc-4874-a06b"
          displayName: "Image self-supervised training-1"
        spec:
          arguments:
            artifacts:
            - name: "input_train_data"
              from: "{{pipelines.id-97c4cd39-b8df-4456-9ba6.outputs.artifacts.output_train_data}}"
            parameters:
            - name: "model_type"
              value: "MOBY_TIMM_TFRECORD_OSS"
            - name: "arn"
              value: "acs:ram::11577*****4901:role/aliyunodpspaidefaultrole"
            - name: "model_dir"
              value: "oss://alink-test-2.oss-cn-shanghai-internal.aliyuncs.com/${pai_system_run_id}/${pai_system_node_id}/"
            - name: "use_pretrained_model"
              value: "false"
            - name: "backbone"
              value: "resnet50"
            - name: "optimizer"
              value: "AdamW"
            - name: "initial_learning_rate"
              value: "0.001"
            - name: "train_batch_size"
              value: "64"
            - name: "num_epochs"
              value: "100"
            - name: "save_checkpoint_epochs"
              value: "10"
            - name: "train_num_readers"
              value: "4"
            - name: "fp16"
              value: "false"
            - name: "singleOrDistribute"
              value: "distribute_dlc"
            - name: "count"
              value: "1"
            - name: "gpuMachineType"
              value: "ecs.gn6e-c12g1.12xlarge"
            - name: "execution"
              from: "{{inputs.parameters.execution_dlc}}"
          dependencies:
          - "id-97c4cd39-b8df-4456-9ba6"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "ev_convert"
          name: "id-97c4cd39-b8df-4456-9ba6"
          displayName: "Data to TFRecord-1"
        spec:
          arguments:
            artifacts:
            - name: "input_label_file"
              from: "{{pipelines.id-26425a11-939f-4c9e-b261.outputs.artifacts.output_1}}"
            parameters:
            - name: "arn"
              value: "acs:ram::11577*****4901:role/aliyunodpspaidefaultrole"
            - name: "output_dir"
              value: "oss://alink-test-2.oss-cn-shanghai-internal.aliyuncs.com/${pai_system_run_id}/${pai_system_node_id}/"
            - name: "output_prefix"
              value: "tx"
            - name: "model_type"
              value: "CLASSIFICATION"
            - name: "class_list_file"
              value: "oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/demo_image_match/meta/test.config"
            - name: "test_ratio"
              value: "0.0"
            - name: "converter_class"
              value: "SelfSupervisedConverter"
            - name: "image_format"
              value: "jpg"
            - name: "read_parallel_num"
              value: "10"
            - name: "write_parallel_num"
              value: "1"
            - name: "num_samples_per_tfrecord"
              value: "1000"
            - name: "count"
              value: "5"
            - name: "cpu"
              value: "800"
            - name: "memory"
              value: "20000"
            - name: "execution"
              from: "{{inputs.parameters.execution_maxcompute}}"
          dependencies:
          - "id-26425a11-939f-4c9e-b261"
      - apiVersion: "core/v1"
        metadata:
          provider: "pai"
          version: "v1"
          identifier: "python_v2"
          name: "id-2e2c22ad-134f-4010-926f"
          displayName: "Python Script-1"
        spec:
          withSequence:
            start: 1
            end: 3
          arguments:
            artifacts:
            - name: "input2"
              metadata:
                type:
                  DataSet:
                    locationType: "OSS"
              from: "{{pipelines.id-26425a11-939f-4c9e-b261.outputs.artifacts.output_1}}"
            parameters:
            - name: "roleArn"
              value: "acs:ram::11577*****94901:role/aliyunodpspaidefaultrole"
            - name: "codeUri"
              value: "oss://alink-test-2.oss-cn-shanghai-internal.aliyuncs.com/a.py"
            - name: "_codeSourceAdvanced"
              value: "false"
            - name: "resourceId"
              value: "public-cluster"
            - name: "instanceType"
              value: "ecs.c6.large"
            - name: "_runConfigAdvanced"
              value: "true"
            - name: "imageUri"
              value: "registry.cn-shanghai.aliyuncs.com/pai-dlc/tensorflow-training:1.12PAI-cpu-py27-ubuntu18.04"
            - name: "jobType"
              value: "TFJob"
            - name: "execution_dlc"
              from: "{{inputs.parameters.execution_dlc_optional}}"
            - name: "execution_maxcompute"
              from: "{{inputs.parameters.execution_maxcompute_optional}}"
          dependencies:
          - "id-26425a11-939f-4c9e-b261"