All Products
Search
Document Center

Platform For AI:Python Script

Last Updated:Jun 10, 2026

Use the Python Script component in Machine Learning Designer to install dependency packages and run custom Python functions.

Locate the component

The Python Script component is in the UserDefinedScript folder of the Machine Learning Designer component list.

Prerequisites

  • DLC permissions are granted. Cloud service dependencies and authorizations: DLC.

  • The Python Script component runs on DLC compute resources. Associate DLC compute resources with your workspace. Manage workspaces.

  • The Python Script component stores code in OSS. Create an OSS bucket. Create a bucket.

    Important

    OSS bucket must be in the same region as Machine Learning Designer and DLC.

  • The RAM user must have the Algorithm Development role in the workspace. Manage workspace members. To use MaxCompute as a data source, also grant the MaxCompute Developer role.

Configure the component

  • Input ports

    The Python Script component has four input ports. Connect them to data from an OSS path or MaxCompute table.

    • OSS path input

      Input from an upstream component's OSS path is mounted to the node where the script runs. The system passes the mounted file path as an argument. For example, --input1 /ml/input/data/input1 specifies the path for the first input port. Read the mounted files from /ml/input/data/input1 as local files in your script.

    • MaxCompute table input

      MaxCompute table inputs are not mounted. The system passes table information as a URI argument. For example, python main.py --input1 odps://some-project-name/tables/table indicates the MaxCompute table for the first input port. Use the parse_odps_url function from the code template to parse metadata such as ProjectName, TableName, and Partition. Usage examples.

  • Output ports

    The Python Script component has four output ports. OSS Output Port 1 and OSS Output Port 2 output to OSS paths. Table Output Port 1 and Table Output Port 2 output to MaxCompute tables.

    • OSS path output

      The OSS path configured for Job output path on the Code Config tab is mounted to /ml/output/. OSS Output Port 1 and OSS Output Port 2 correspond to /ml/output/output1 and /ml/output/output2. Write files to these directories in your script to pass them to downstream components.

    • MaxCompute table output

      If the workspace has a MaxCompute project, the system passes a temporary table URI to the script, for example: python main.py --output3 odps://<some-project-name>/tables/<output-table-name>. Use PyODPS to create and write data to this table, then pass it to downstream components through connections.

  • Parameters

    Code Config

    Parameter

    Description

    Job output path

    OSS path for job output.

    • The configured OSS directory is mounted to /ml/output/ in the job container. Data written to /ml/output/ is persisted to the corresponding OSS directory.

    • The output ports OSS Output-1 and OSS Output-2 correspond to subpaths output1 and output2 under /ml/output/. When an OSS output port is connected to a downstream component, the downstream component receives data from the corresponding subpath.

    Code source

    (Choose one)

    Literal code

    • Python code: OSS path where code is saved. Code written in the editor is saved to this path. Default filename is main.py.

      Important

      Before you click Save for the first time, make sure that the specified OSS path does not contain a file with the same name. Otherwise, the existing file is overwritten.

    • Python code editor: The editor provides sample code by default. Usage examples. Write your code directly in the editor.

    Specify Git configuration

    • Git repository address: Address of the Git repository.

    • Code branch: Code branch. Default value is master.

    • Code commit: Commit ID. Takes priority over the branch. If specified, the branch setting is ignored.

    • Git username: Required if you need to access a private repository.

    • Git access token: Required to access a private code repository. Appendix: Obtain a GitHub account token.

    Select code source

    • Select code source repositories: Select a created code configuration. Code configurations.

    • Code branch: Code branch. Default value is master.

    • Code commit: Commit ID. Takes priority over the branch. If specified, the branch setting is ignored.

    Select OSS path

    In the OSS Code Path field, select the path where the code is uploaded.

    Command

    Command to execute, such as python main.py.

    Note

    The system generates the execution command based on the script name and port connections. Manual configuration is not required.

    Advanced option

    • Third-party dependencies: Specify third-party libraries in Python requirements.txt format. The system installs these libraries before the node runs.

      cycler==0.10.0            # via matplotlib
      kiwisolver==1.2.0         # via matplotlib
      matplotlib==3.2.1
      numpy==1.18.5
      pandas==1.0.4
      pyparsing==2.4.7          # via matplotlib
      python-dateutil==2.8.1    # via matplotlib, pandas
      pytz==2020.1              # via pandas
      scipy==1.4.1              # via seaborn
    • Enable container monitoring: Displays a configuration text box for fault tolerance monitoring parameters.

    Run Config

    Parameter

    Description

    Select Resource Group

    Select a public DLC resource group:

    • For a public resource group, configure InstanceType. Select a CPU or GPU instance. Default: ecs.c6.large.

    By default, the default DLC cloud-native resource group in the current workspace is used.

    VPC Settings

    Select an existing Virtual Private Cloud (VPC).

    Security Group

    Select an existing security group.

    Advanced option

    If you select this option, configure the following parameters:

    • Instance count: Number of instances. Default value is 1.

    • Job image URI: URI of the job image. The default image uses XGBoost 1.6.0. Change the image if you need a deep learning framework.

    • Job type: Change this only if your code is implemented for distributed execution. Supported values:

      • XGBoost/LightGBM Job

      • TensorFlow Job

      • PyTorch Job

      • MPI Job

Usage examples

Default sample code

The Python Script component provides the following sample code by default.

import os
import argparse
import json
"""
Sample code for the Python Script component
"""
# Default MaxCompute execution environment in the current workspace, which includes the MaxCompute project name and endpoint.
# This environment is injected only when a MaxCompute project exists in the current workspace.
# Example: {"endpoint": "http://service.cn.maxcompute.aliyun-inc.com/api", "odpsProject": "lq_test_mc_project"}.
ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"


def init_odps():
    from odps import ODPS
    # Information about the default MaxCompute project in the current workspace.
    mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
    o = ODPS(
        access_id="<YourAccessKeyId>",
        secret_access_key="<YourAccessKeySecret>",
        # Select the endpoint based on the region where your project is located, for example: http://service.cn-shanghai.maxcompute.aliyun-inc.com/api.
        endpoint=mc_execution["endpoint"],
        project=mc_execution["odpsProject"],
    )
    return o


def parse_odps_url(table_uri):
    from urllib import parse
    parsed = parse.urlparse(table_uri)
    project_name = parsed.hostname
    r = parsed.path.split("/", 2)
    table_name = r[2]
    if len(r) > 3:
        partition = r[3]
    else:
        partition = None
        return project_name, table_name, partition


def parse_args():
    parser = argparse.ArgumentParser(description="PythonV2 component script example.")
    parser.add_argument("--input1", type=str, default=None, help="Component input port 1.")
    parser.add_argument("--input2", type=str, default=None, help="Component input port 2.")
    parser.add_argument("--input3", type=str, default=None, help="Component input port 3.")
    parser.add_argument("--input4", type=str, default=None, help="Component input port 4.")
    parser.add_argument("--output1", type=str, default=None, help="Output OSS port 1.")
    parser.add_argument("--output2", type=str, default=None, help="Output OSS port 2.")
    parser.add_argument("--output3", type=str, default=None, help="Output MaxComputeTable 1.")
    parser.add_argument("--output4", type=str, default=None, help="Output MaxComputeTable 2.")
    args, _ = parser.parse_known_args()
    return args


def write_table_example(args):
    # Example: Execute an SQL statement to copy data from a public table provided by PAI to the temporary table specified for Table Output Port 1 (--output3).
    output_table_uri = args.output3
    o = init_odps()
    project_name, table_name, partition = parse_odps_url(output_table_uri)
    o.run_sql(f"create table {project_name}.{table_name} as select * from pai_online_project.heart_disease_prediction;")


def write_output1(args):
    # Example: Write data results to the mounted OSS path (subdirectory for OSS Output Port 1), and results can be passed to downstream components through connections.
    output_path = args.output1
    os.makedirs(output_path, exist_ok=True)
    p = os.path.join(output_path, "result.text")
    with open(p, "w") as f:
        f.write("TestAccuracy=0.88")


if __name__ == "__main__":
    args = parse_args()
    print("Input1={}".format(args.input1))
    print("Output1={}".format(args.output1))
    # write_table_example(args)
    # write_output1(args)

Description of common functions:

  • init_odps(): Initializes an ODPS instance for reading MaxCompute table data. Provide your AccessKeyId and AccessKeySecret. Obtain an AccessKey pair.

  • parse_odps_url(table_uri): Parses the input MaxCompute table URI and returns the project name, table name, and partition. The table_uri format is odps://${your_projectname}/tables/${table_name}/${pt_1}/${pt_2}/, for example, odps://test/tables/iris/pa=1/pb=1, where pa=1/pb=1 is a multi-level partition.

  • parse_args(): Parses arguments passed to the script. Input and output data are passed to the executed script as arguments.

Example 1: Chain with other components

This example modifies the heart disease prediction template to demonstrate how to use the Python Script component with other Machine Learning Designer components.Combined usePipeline configuration:

  1. Create a pipeline from the heart disease prediction template and open it. Heart disease prediction.

  2. Drag the Python Script component to the canvas, rename it to SMOTE, and configure the following code.

    Important

    The imblearn library is not included in the default image. Add imblearn in the Third-party dependencies field on the Code Config tab. The library is installed automatically before the node runs.

    import argparse
    import json
    import os
    from odps.df import DataFrame
    from imblearn.over_sampling import SMOTE
    from urllib import parse
    from odps import ODPS
    ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"
    
    
    def init_odps():
        # Information about the default MaxCompute project in the current workspace.
        mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
        o = ODPS(
            access_id="<Your_AccessKeyId>",
            secret_access_key="<Your_AccessKeySecret>",
            # Select the endpoint based on the region where your project is located, for example: http://service.cn-shanghai.maxcompute.aliyun-inc.com/api.
            endpoint=mc_execution["endpoint"],
            project=mc_execution["odpsProject"],
        )
        return o
    
    
    def get_max_compute_table(table_uri, odps):
        parsed = parse.urlparse(table_uri)
        project_name = parsed.hostname
        table_name = parsed.path.split('/')[2]
        table = odps.get_table(project_name + "." + table_name)
        return table
    
    
    def run():
        parser = argparse.ArgumentParser(description='PythonV2 component script example.')
        parser.add_argument(
            '--input1', type=str, default=None, help='Component input port 1.'
        )
        parser.add_argument(
            '--output3', type=str, default=None, help='Component input port 1.'
        )
        args, _ = parser.parse_known_args()
        print('Input1={}'.format(args.input1))
        print('output3={}'.format(args.output3))
        o = init_odps()
        imbalanced_table = get_max_compute_table(args.input1, o)
        df = DataFrame(imbalanced_table).to_pandas()
        sm = SMOTE(random_state=2)
        X_train_res, y_train_res = sm.fit_resample(df, df['ifhealth'].ravel())
        new_table = o.create_table(get_max_compute_table(args.output3, o).name, imbalanced_table.schema, if_not_exists=True)
        with new_table.open_writer() as writer:
            writer.write(X_train_res.values.tolist())
    
    
    if __name__ == '__main__':
        run()
    

    Replace <Your_AccessKeyId> and <Your_AccessKeySecret> with your own values. Obtain an AccessKey pair.

  3. Connect the SMOTE component downstream of the Split component. SMOTE oversamples training data to balance class distribution by creating synthetic samples for the minority class.

  4. Connect the new data from the SMOTE component to the Logistic Regression for Binary Classification component for training.

  5. Connect the trained model to the same prediction data and evaluation components as the left branch for a side-by-side comparison. After the component runs, click the visualization icon (Visualization) to view evaluation results.Evaluation results

    Additional oversampling does not significantly improve model performance, indicating the original sample distribution and model were already effective.

Example 2: Orchestrate DLC jobs

Connect multiple Python Script components in Machine Learning Designer to orchestrate a pipeline of DLC jobs. The following example starts four DLC jobs arranged in a DAG to control execution order.

Note

If the DLC job code does not read data from upstream nodes or pass data to downstream nodes, connections only represent scheduling dependencies and execution order.

DAGDeploy the pipeline to DataWorks for scheduled execution. Use DataWorks to schedule Machine Learning Designer pipelines for offline execution.

Example 3: Pass global variables

  1. Configure global variables.

    On the Machine Learning Designer pipeline page, click a blank area of the canvas and configure variables on the Global Variables tab in the right pane.Global variables configuration

  2. Pass the global variables to the Python Script component in one of two ways.

    • Click the Python Script component node. On the Code Config tab, select Advanced option and configure global variables as incoming parameters in the Command field.Command configuration

    • Modify the Python code to parse arguments using argparse.

      The following code uses the global variables configured in step 1 as an example. Update the code based on your actual global variables. Replace the code in the code editing area on the Code Config tab.

      import os
      
      import argparse
      import json
      
      """
      Sample code for the Python Script component
      """
      
      ENV_JOB_MAX_COMPUTE_EXECUTION = "JOB_MAX_COMPUTE_EXECUTION"
      
      
      def init_odps():
      
          from odps import ODPS
      
          mc_execution = json.loads(os.environ[ENV_JOB_MAX_COMPUTE_EXECUTION])
      
          o = ODPS(
              access_id="<YourAccessKeyId>",
              secret_access_key="<YourAccessKeySecret>",
              endpoint=mc_execution["endpoint"],
              project=mc_execution["odpsProject"],
          )
          return o
      
      
      def parse_odps_url(table_uri):
          from urllib import parse
      
          parsed = parse.urlparse(table_uri)
          project_name = parsed.hostname
          r = parsed.path.split("/", 2)
          table_name = r[2]
          if len(r) > 3:
              partition = r[3]
          else:
              partition = None
          return project_name, table_name, partition
      
      
      def parse_args():
          parser = argparse.ArgumentParser(description="PythonV2 component script example.")
      
          parser.add_argument("--input1", type=str, default=None, help="Component input port 1.")
          parser.add_argument("--input2", type=str, default=None, help="Component input port 2.")
          parser.add_argument("--input3", type=str, default=None, help="Component input port 3.")
          parser.add_argument("--input4", type=str, default=None, help="Component input port 4.")
      
          parser.add_argument("--output1", type=str, default=None, help="Output OSS port 1.")
          parser.add_argument("--output2", type=str, default=None, help="Output OSS port 2.")
          parser.add_argument("--output3", type=str, default=None, help="Output MaxComputeTable 1.")
          parser.add_argument("--output4", type=str, default=None, help="Output MaxComputeTable 2.")
          # Add code based on the configured global variables.
          parser.add_argument("--arg1", type=str, default=None, help="Argument 1.")
          parser.add_argument("--arg2", type=int, default=None, help="Argument 2.")
          args, _ = parser.parse_known_args()
          return args
      
      
      def write_table_example(args):
      
          output_table_uri = args.output3
      
          o = init_odps()
          project_name, table_name, partition = parse_odps_url(output_table_uri)
          o.run_sql(f"create table {project_name}.{table_name} as select * from pai_online_project.heart_disease_prediction;")
      
      
      def write_output1(args):
          output_path = args.output1
      
          os.makedirs(output_path, exist_ok=True)
          p = os.path.join(output_path, "result.text")
          with open(p, "w") as f:
              f.write("TestAccuracy=0.88")
      
      
      if __name__ == "__main__":
          args = parse_args()
      
          print("Input1={}".format(args.input1))
          print("Output1={}".format(args.output1))
          # Add code based on the configured global variables.
          print("Argument1={}".format(args.arg1))
          print("Argument2={}".format(args.arg2))
          # write_table_example(args)
          # write_output1(args)