When you call the CreatePipelineRun operation to create a pipeline run, you can use a manifest file to specify information about the pipeline run. This topic describes the parameters in the manifest file and provides sample configurations for different types of pipelines.
Common parameters
apiVersion and metadata
apiVersion: The version of the manifest schema.
metadata:
name: The name of the pipeline.
identifier: The unique identifier of the pipeline.
version: The version of the pipeline.
provider: The provider of the pipeline.
guid: The unique ID of the temporary node.
displayName: The display name of the pipeline.
annotations: The annotations of the pipeline.Parameter | Type | Default value | Required | Description |
apiVersion | String | core/v1 | Yes | The version of the manifest schema. Set the value to |
metadata | Object | N/A | Yes | The metadata of the pipeline. You can identify a pipeline by using the |
name | String | N/A | No | The name of the pipeline. The value of this parameter does not need to be unique. If you leave this parameter empty, a random value is used. |
identifier | String | N/A | Yes | The unique identifier of the pipeline. |
version | String | N/A | Yes | The version of the pipeline. If the input, output, and implementation of the pipeline change, create a new version. We recommend that you use the v1.0.0 version format. |
provider | String | N/A | Yes | The provider of the pipeline. Valid values:
|
guid | String | N/A | No | The unique identifier of a node of the pipeline. Configure this parameter only for temporary nodes. |
displayName | String | N/A | No | The display name of the pipeline. The value of this parameter does not need to be unique. |
annotations | List<Object> | N/A | No | The annotations of the pipeline. You can use this parameter to identify an Alink group node. Example: |
inputs and outputs
spec:
inputs: The inputs of the pipeline.
artifacts:
parameters:
outputs: The outputs of the pipeline.
artifacts:
parameters:Artifact
Specifies the artifacts of a node. Artifacts are used to connect upstream or downstream nodes.
artifacts: - name: The name of the artifact. metadata: The metadata of the artifact. value: The value of the artifact. desc: The description of the artifact. repeated: Specifies whether the artifact can be duplicated. required: Specifies whether the value parameter is required. The default value is false.Parameter
Type
Default value
Description
name
String
N/A
The name of the artifact. The name must be unique within a node.
metadata
Object
N/A
The metadata of the artifact, including the artifact type and storage type.
value
String
N/A
The value of the artifact.
desc
String
N/A
The description of the artifact. You can leave this parameter empty.
repeated
Bool
false
Specifies whether to duplicate the artifact to generate multiple artifacts that have the same metadata. Valid values:
truefalse
required
Bool
false
Specifies whether the value parameter is required.
truefalse(default)
Parameter
Contains the following parameters that specify how to run the pipeline:
parameters: - name: The name of the parameter. type: The type of the parameter. value: The default value of the parameter. desc: The description of the parameter. feasible: The specification of the parameter.Parameter
Type
Default value
Description
name
String
N/A
The name of the parameter, which must be unique within a node.
type
String
N/A
The type of the parameter. Valid values:
StringDoubleIntBoolMapList
value
String
N/A
The default value of the parameter.
desc
String
N/A
Description
feasible
Feasible
N/A
The specification of the parameter. Valid values:
min: the minimum value.max: the maximum value.len: the length of the value.regex: a regular expression.
Create a single-node pipeline
Template
You can use a single-node pipeline to define an algorithm that can be reused in a Directed Acyclic Graph (DAG) pipeline.
apiVersion: The version of the manifest schema.
metadata:
name: The name of the pipeline.
identifier: The identifier of the pipeline.
version: The version of the pipeline.
provider: The provider of the pipeline.
displayName: The display name of the pipeline.
annotations: The annotations of the pipeline.
spec:
inputs: The inputs of the pipeline.
artifacts:
parameters:
outputs: The outputs of the pipeline.
artifacts:
parameters:
initContainers: The containers that you must initialize before you initialize the main container and sidecar containers.
-name: The name of the container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.
container: The description of the main container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.
sideCarContainers: The sidecar containers.
-name: the name of the container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.Parameter description
This section describes only the parameters related to containers. For information about other parameters, see Common parameters.
spec:
initContainers: The containers that you must initialize before you initialize the main container and sidecar containers.
-name: the name of the container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.
container: The main container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.
sideCarContainers: The sidecar containers.
-name: The name of the container.
image: The image that is used to start the container.
command: The startup command of the container.
args: The startup parameters of the container.Parameter | Type | Default value | Description | |
initContainers | List<InitContainer> | N/A | The containers that you must initialize before you initialize the main container and sidecar containers. | |
InitContainer | image | String | N/A | The image that is used to start the container. You can use an official image provided by PAI or a custom image. |
args | List | N/A | The startup parameters of the container. You can leave this parameter empty. | |
command | String | N/A | The startup command of the container. | |
name | String | N/A | The name of the container, which must be unique within a node. | |
container | MainContainer | N/A | The main container. The status of the main container determines the result of a single-node pipeline run. | |
MainContainer | image | String | N/A | The image that is used to start the container. You can use an official image provided by PAI or a custom image. |
args | List | N/A | The startup parameters of the container. You can leave this parameter empty. | |
command | String | N/A | The startup command of the container. | |
sideCarContainers | List<SideCarContainer> | N/A | The sidecar containers. | |
SideCarContainer | image | String | N/A | The image that is used to start the container. You can use an official image provided by PAI or a custom image. |
args | List | N/A | The startup parameters of the container. You can leave this parameter empty. | |
command | String | N/A | The startup command of the container. | |
name | String | N/A | The name of the container, which must be unique within a node. | |
Examples
This example creates a single-node pipeline that outputs a MaxCompute table.
The outputs parameter specifies an artifact of the MaxComputeTable type.
The inputs parameter specifies the name of the output MaxCompute table.
apiVersion: core/v1
metadata:
provider: '13266******76250'
version: v1
identifier: echo
spec:
inputs:
parameters:
- name: outputTableName
type: String
value: pai_temp_outputTable
outputs:
artifacts:
- name: outputTable
metadata:
type:
DataSet:
locationType: MaxComputeTable
desc: SQL Script Output Port
container:
image: 'registry.cn-shanghai.aliyuncs.com/paiflow-core/max-compute-executor:v1.1.4'
command:
- sh
- -c
args:
- |
mkdir -p /pai/outputs/artifacts/outputTable/
echo '{\"metadata\":{\"type\":{\"DataSet\":{\"locationType\":\"MaxComputeTable\"}}}' > /pai/outputs/artifacts/outputTable/metadata
echo '{\"location\": {\"endpoint\": \"http://service.cn-shanghai.maxcompute.aliyun.com/api\",\n \"project\": \"wyl_t******t2\", \"table\": \"{{inputs.parameters.outputTableName}}\"}}' > /pai/outputs/artifacts/outputTable/valueCreate a DAG pipeline
Template
You can create a DAG pipeline to define the execution order of multiple nodes.
apiVersion: The version of the manifest schema.
metadata:
name: The name of the pipeline.
identifier: The identifier of the pipeline.
version: The version of the pipeline.
provider: The provider of the pipeline.
displayName: The display name of the pipeline.
annotations: The annotations of the pipeline.
spec:
inputs: The inputs of the pipeline.
artifacts:
parameters:
outputs: The outputs of the pipeline.
artifacts:
parameters:
pipelines: The nodes in the pipeline.
- metadata:
name: The name of the node.
displayName: The name to display for the node.
guid: The unique identifier of the node. Configure this parameter only for a temporary node.
provider: The provider of the node.
identifier: The unique identifier of the node. Configure this parameter only for a non-temporary node.
version: The version of the node.
spec:
arguments: The arguments of the node.
artifacts:
parameters:
dependencies: The dependencies of the node.
when: The conditional expression for executing the node.
withSequence: The index-based loop expression for executing the node.
start: The start index.
end: The end index.
format: The format of the expression. For more information, visit https://pkg.go.dev/fmt.
withItems: The iteration-based loop expression for executing the node.
withParam: The parameter-based loop expression for executing the node.
parallelism: The maximum number of nodes that can run at the same time.Parameter description
This section describes only the parameters related to a DAG pipeline. For information about other parameters, see Common parameters.
The following section describes the details of nodes in the pipeline.
spec:
pipelines: The nodes in the pipeline.
- metadata:
name: The name of the node.
displayName: The name to display for the node.
guid: The unique identifier of the node. Configure this parameter only for a temporary node.
provider: The provider of the node.
identifier: The unique identifier of the node. Configure this parameter only for a non-temporary node.
version: The version of the node.
spec:
arguments: The arguments of the node.
artifacts: The artifacts of the node.
parameters: The parameters of the node.
dependencies: The dependent nodes of the node.
when: The conditional expression for executing the node.
withSequence: The index-based loop expression for executing the node.
start: The start index.
end: The end index.
format: The format of the expression. For more information, visit https://pkg.go.dev/fmt.
withItems: The iteration-based loop expression for executing the node.
withParam: The parameter-based loop expression for executing the node.
parallelism: The maximum number of nodes that can run at the same time.The pipelines parameter specifies a collection of nodes that are registered to PAI-Flow. Each node in a collection can contain another collection of nodes. The depth of nesting has no restrictions. A node is defined by using the following parameters:
Parameter | Type | Default value | Description | |
metadata | Object | N/A | The metadata of the node. | |
name | String | N/A | The name of the node, which must be unique in the collection. | |
displayName | String | N/A | The display name of the node, which does not need to be a unique value. | |
provider | String | N/A | The provider of the node. | |
guid | String | N/A | The unique identifier of the node. Configure this parameter only for a temporary node. | |
identifier | String | N/A | The unique identifier of the node. Configure this parameter only for a non-temporary node. | |
version | String | N/A | The version of the node. | |
spec | Object | N/A | The specification of the node. | |
arguments | Object | N/A | The parameters and artifacts of the node. If you specify this parameter, the default values are overridden. | |
dependencies | List<String> | N/A | The dependent nodes of the node. The node is executed only after its dependent nodes complete execution. | |
when | String | N/A | The conditional expression for executing the node. The node is executed only when the requirements specified by the dependencies parameter and this parameter are met. | |
withSequence | Object | N/A | The index-based loop expression for executing the node. | |
withItems | List | N/A | The iteration-based loop expression for executing the node. | |
withParam | String | N/A | The parameter-based loop expression for executing the node. | |
parallelism | Int | N/A | The maximum number of nodes that can run at the same time in the pipeline. This is a pipeline-level parameter. | |
Examples
A simple DAG pipeline
In this example, the pipeline consists of the following nodes:
Read data source: This node outputs a MaxCompute table.
Type conversion: This node converts the data types in the output table of the Read data source node and outputs a new MaxCompute table.
SQL script: This node uses an SQL script to convert the data types in the output table of the Type conversion node and outputs a new MaxCompute table.
A pipeline that uses a conditional expression
In this example, the pipeline consists of the following nodes:
Read data source: This node outputs a MaxCompute table.
If the execution of the Read data source node is successful, the Type conversion pm2 node is executed to convert the data type of the pm2 column in the output table of the upstream node and then output a new MaxCompute table.
If the execution of the Read data source node fails, the Type conversion pm10 node is executed to convert the data type of the pm10 column in the output table of the upstream node and then output a new MaxCompute table.
A pipeline that uses a parameter-based loop expression
In this example, the pipeline consists of the following two nodes, which expand into four nodes during execution.
echo: This node outputs the
["a","b","c"]array.date_consumer: This node is executed once for each element in the
["a","b","c"]array. This loop execution generates the following nodes:date_consumer(0,"a"):the node generated for the"a"element.date_consumer(1,"b"):the node generated for the"b"element.date_consumer(2,"c"):the node generated for the"c"element.
A pipeline that uses OSS input, a custom image, and DLC jobs
In this example, the pipeline consists of the following four nodes, which turn into six nodes during execution:
Read OSS data-1: This node reads data from an Object Storage Service (OSS) bucket.
Data to TFRecord-1: This node converts OSS data to the TFRecord format.
Image self-supervised training-1: This node creates a Deep Learning Container (DLC) job for image-related training.
Python Script-1: This node is executed based on the withSequence parameter. In this example, this node is executed three times and generates the following three nodes. Each node uses a custom image to run a DLC job.
Python script-1(0,1).Python script-1(1,2).Python script-1(2,3).