All Products
Search
Document Center

Data Management:DMSNotebookOperator

Last Updated:Sep 19, 2025

This topic describes the configuration information of the DMSNotebookOperator operation.

Feature description

Executes a Notebook file (.pynb) managed by DMS.

Prerequisites

Parameters

Note

Parameters file_path, run_params, profile, session_id, profile_id, cluster_id, session_name, profile_name, cluster_name can use Jinja templates.

Parameter

Type

Required

Description

file_path

string

Yes

The path of the Notebook file (.pynb).

profile

dict

No

The configuration information of the Notebook session.

  • autoStopTime: The time when the resource is released.

  • mountPoints: The data storage location.

    The format is [{"mntPath" : "/mnt/data***","dataPath" : "oss://test/***"},......]. mntPath indicates the mount path, and dataPath indicates the OSS path.

  • dependencies: Pypi package management.

  • environments: Environment variables.

profile_id

string

No

Note

Required when not reusing a session.

  • profile_id: The ID of the configuration.

  • profile_name: The name of the configuration.

    Note

    Choose one of the two. profile_id has a higher priority.

profile_name

string

cluster_type

string

The type of the compute cluster for the DMS workspace. Valid values:

  • cpu

  • spark

cluster_id

string

  • cluster_id: The ID of the compute cluster for the DMS workspace.

  • cluster_name: The name of the compute cluster for the DMS workspace.

Note

Choose one of the two. cluster_id has a higher priority.

cluster_name

string

spec

string

The resource specification of the driver. Valid values:

  • 1C4G: 1 core, 4 GB

  • 2C8G: 2 cores, 8 GB

  • 4C16G: 4 cores, 16 GB

  • 8C32G: 8 cores, 32 GB

  • 16C64G: 16 cores, 64 GB

runtime_name

string

The image name.

session_id

string

No

Note

Required when reusing a session.

The information of the reused session.

  • session_id: The ID of the session.

  • session_name: The name of the session.

Note

Choose one of the two. session_id has a higher priority.

session_name

string

run_params

dict

No

The runtime parameters that can replace variables in the Notebook file.

timeout

int

No

The execution duration (timeout period) of the Notebook file, in seconds.

polling_interval

int

No

The interval at which the execution result is refreshed. The unit is seconds. The default value is 10.

Example

Note

The task_id and dag parameters are specific to Airflow. For more information, see the official Airflow documentation.

from airflow import DAG
from airflow.decorators import task
from airflow.models.param import Param
from airflow.operators.bash import BashOperator
from airflow.operators.empty import EmptyOperator

import json
from airflow.providers.alibaba_dms.cloud.operators.dms_notebook import DMSNotebookOperator


with DAG(
    "dms_notebook_test",
    params={
        "x":3
    },
) as dag:

    notebook_operator = DMSNotebookOperator(
        task_id='notebook_test_hz_name',
        profile_name='hansheng_profile.48',
        profile={},
        cluster_type='spark',
        cluster_name='spark_general2.218',
        spec='4C32G',
        runtime_name='Spark3.5_Scala2.12_Python3.9_General:1.0.9',
        file_path='/Workspace/code/default/test.ipynb',
        run_params={
            'a':"{{ params.x }}"
        },
        polling_interval=5,
        dag=dag
    )

    run_this_last = EmptyOperator(
        task_id="run_this_last",
        dag=dag,
    )

    notebook_operator >> run_this_last

if __name__ == "__main__":
    dag.test(
        run_conf={}
    )