All Products
Search
Document Center

AnalyticDB:Develop an interactive Jupyter job

Last Updated:Nov 27, 2025

AnalyticDB for MySQL Spark enables you to use a Docker image to start the interactive JupyterLab development environment. This environment helps you connect to AnalyticDB for MySQL Spark and perform interactive testing and computing using elastic resources of AnalyticDB for MySQL.

Prerequisites

  • An AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster is created.

  • A job resource group is created for the AnalyticDB for MySQL cluster.

  • A database account is created for the AnalyticDB for MySQL cluster.

  • You have completed account authorization.

  • The log storage path of Spark applications is configured.

    Note

    Log on to the AnalyticDB for MySQL console. Find the cluster that you want to manage and click the cluster ID. In the left-side navigation pane, choose Job Development > Spark JAR Development. Click Log Settings. In the dialog box that appears, select the default path or specify a custom storage path. You cannot set the custom storage path to the root directory of OSS. Make sure that the custom storage path contains at least one layer of folders.

Usage notes

  • AnalyticDB for MySQL Spark supports interactive Jupyter jobs only in Python 3.7 or Scala 2.12.

  • Interactive Jupyter jobs automatically release Spark resources after remaining idle for a period of time. The default release time is 1,200 seconds (the resources are automatically released 1,200 seconds after the last code block is executed). You can use the following command in a Jupyter Notebook Cell to configure the spark.adb.sessionTTLSeconds parameter and modify the automatic release time of Spark resources.

    %%configure -f
    {
       "spark.adb.sessionTTLSeconds": "3600"
    }

Connect to AnalyticDB for MySQL Spark

Use JupyterLab in the AnalyticDB for MySQL provided image to connect to Spark

  1. Install and start a Docker image.

  2. Pull the Jupyter image of AnalyticDB for MySQL. Run the following command:

    docker pull registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
  3. Start the interactive JupyterLab development environment.

    Command syntax:

    docker run -it 
      -p {Host port}:8888 
      -v {Host file path}:{Docker file path} 
      registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
      -d {ADB Instance Id} 
      -r {Resource Group Name} 
      -e {API Endpoint} 
      -i {AK Id} 
      -k {AK Sec}
      -t {StsToken} # Choose either StsToken or AK

    The following table describes the parameters.

    Parameter

    Required

    Description

    -p

    No

    Maps a host port to a container port. Specify the parameter in the -p {Host port}:{Container port} format.

    Specify a random value for the host port and set the container port to 8888. Example: -p 8888:8888.

    -v

    No

    If you do not mount the host path and disable the Docker container, the editing files may be lost. After you disable the Docker container, the container attempts to terminate all interactive Spark jobs that are running. You can use one of the following methods to prevent loss of the editing files:

    • When you start the interactive JupyterLab development environment, mount the host path to the Docker container and store the job files in the corresponding file path. Specify the parameter in the -v {Host path}:{Docker file path} format. Specify a random value for the file path of the Docker container. Recommended value: /root/jupyter.

    • Before you disable the Docker container, make sure that all files are copied and stored.

    Example: -v /home/admin/notebook:/root/jupyter. In this example, the host files that are stored in the /home/admin/notebook path are mounted to the /root/jupyter path of the Docker container.

    Note

    Save the editing notebook files to the /tmp folder. After you disable the Docker container, you can view the corresponding files in the /home/admin/notebook path of the host. After you re-enable the Docker container, you can continue to execute the files. For more information, see Volumes.

    -d

    Yes

    The ID of the AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.

    You can log on to the AnalyticDB for MySQL console and go to the Clusters page to view cluster IDs.

    -r

    Yes

    The name of the Job resource group in the AnalyticDB for MySQL cluster.

    You can log on to the AnalyticDB for MySQL console, choose Cluster Management > Resource Management in the left-side navigation pane, and then click the Resource Groups tab to view resource group names.

    -e

    Yes

    The endpoint of the AnalyticDB for MySQL cluster.

    For more information, see Endpoints.

    -i

    Yes (in specific scenarios)

    The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user.

    For information about how to view the AccessKey ID and AccessKey secret, see Accounts and permissions.

    -k

    -t

    Yes (in specific scenarios)

    The Security Token Service (STS) token, which is the temporary identity credential of the RAM role.

    A RAM user with permissions can call the AssumeRole - Obtain temporary identity credentials of a RAM role API operation with their own AccessKey pair to obtain the STS token of a RAM role and use the STS token to access Alibaba Cloud resources.

    Example:

    docker run -it  -p 8888:8888 -v /home/admin/notebook:/root/jupyter registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre -d amv-bp164l******** -r test -e adb.aliyuncs.com -i LTAI**************** -k ****************

    After you start the interactive JupyterLab development environment, the following information is returned. You can copy and paste the http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291 URL to your browser and use JupyterLab to connect to AnalyticDB for MySQL Spark.

    [I 2023-11-24 09:55:09.852 ServerApp] nbclassic | extension was successfully loaded.
    [I 2023-11-24 09:55:09.852 ServerApp] sparkmagic extension enabled!
    [I 2023-11-24 09:55:09.853 ServerApp] sparkmagic | extension was successfully loaded.
    [I 2023-11-24 09:55:09.853 ServerApp] Serving notebooks from local directory: /root/jupyter
    [I 2023-11-24 09:55:09.853 ServerApp] Jupyter Server 1.24.0 is running at:
    [I 2023-11-24 09:55:09.853 ServerApp] http://419e63fc7821:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
    [I 2023-11-24 09:55:09.853 ServerApp]  or http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
    [I 2023-11-24 09:55:09.853 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    Note

    If an error message appears when you start the interactive JupyterLab development environment, you can view the proxy_{timestamp}.log file for troubleshooting.

Use locally installed Jupyter Notebook to connect to Spark

Install and configure the Jupyter Notebook environment

  1. Install JupyterLab or JupyterHub.

  2. Install the SparkMagic plugin in Jupyter to run interactive Spark jobs. Choose the appropriate method based on your Jupyter version. The following example is for JupyterLab 3.x.

    Important

    All optional steps must be performed in strict order without skipping or reordering. If you skip any step, the on-duty engineer will not be able to analyze environment issues through Jupyter startup logs, and you will need to resolve any errors on your own.

    1. Install SparkMagic.

      pip install sparkmagic
    2. Install ipywidgets.

      pip install ipywidgets 
    3. (Optional) Install wrapper kernels. Run pip show sparkmagic, which will display the installation path of sparkmagic. Switch to that directory and run:

       jupyter-kernelspec install sparkmagic/kernels/sparkkernel
       jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
       jupyter-kernelspec install sparkmagic/kernels/sparkrkernel
    4. (Optional) Modify the SparkMagic config.json configuration file (default path is ~/.sparkmagic/config.json), changing 127.0.0.1:5000 to the IP and port you want to listen on. Below is a partial configuration structure example. For more details, refer to the related examples.

      "kernel_python_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000",
      "auth": "None"
      },
      "kernel_scala_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000",
      "auth": "None"
      },
      "kernel_r_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000"
      },
    5. (Optional) Enable server extensions to change clusters through code.

      jupyter server extension enable --py sparkmagic

Start the AnalyticDB for MySQL proxy

You can use any of the following methods to start the AnalyticDB for MySQL proxy.

Method 1: Start the proxy using Docker

  1. Install and start a Docker image.

  2. Pull the Jupyter image of AnalyticDB for MySQL. Run the following command:

    docker pull registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
  3. Start the Docker proxy. Execute the following command to start the container and listen on local port 5000.

    docker run -it  
      -p 5000:5000 
      -v {Host file path}:{Docker file path}
      registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
      -d {ADB Instance Id} 
      -r {Resource Group Name} 
      -e {API Endpoint} 
      -i {AK Id} 
      -k {AK Sec}
      -t {StsToken} # Choose either StsToken or AK

    The following table describes the parameters.

    Parameter

    Required

    Description

    -p

    No

    Maps a host port to a container port. Specify the parameter in the -p {Host port}:{Container port} format.

    Specify a random value for the host port and set the container port to 5000. Example: -p 5000:5000.

    -v

    No

    If you do not mount the host path and disable the Docker container, the editing files may be lost. After you disable the Docker container, the container attempts to terminate all interactive Spark jobs that are running. You can use one of the following methods to prevent loss of the editing files:

    • When you start the interactive JupyterLab development environment, mount the host path to the Docker container and store the job files in the corresponding file path. Specify the parameter in the -v {Host path}:{Docker file path} format. Specify a random value for the file path of the Docker container. Recommended value: /root/jupyter.

    • Before you disable the Docker container, make sure that all files are copied and stored.

    Example: -v /home/admin/notebook:/root/jupyter. In this example, the host files that are stored in the /home/admin/notebook path are mounted to the /root/jupyter path of the Docker container.

    Note

    Save the editing notebook files to the /tmp folder. After you disable the Docker container, you can view the corresponding files in the /home/admin/notebook path of the host. After you re-enable the Docker container, you can continue to execute the files. For more information, see Volumes.

    -d

    Yes

    The ID of the AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.

    You can log on to the AnalyticDB for MySQL console and go to the Clusters page to view cluster IDs.

    -r

    Yes

    The name of the Job resource group in the AnalyticDB for MySQL cluster.

    You can log on to the AnalyticDB for MySQL console, choose Cluster Management > Resource Management in the left-side navigation pane, and then click the Resource Groups tab to view resource group names.

    -e

    Yes

    The endpoint of the AnalyticDB for MySQL cluster.

    For more information, see Endpoints.

    -i

    Yes (in specific scenarios)

    The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user.

    For information about how to view the AccessKey ID and AccessKey secret, see Accounts and permissions.

    -k

    -t

    Yes (in specific scenarios)

    The Security Token Service (STS) token, which is the temporary identity credential of the RAM role.

    A RAM user with permissions can call the AssumeRole - Obtain temporary identity credentials of a RAM role API operation with their own AccessKey pair to obtain the STS token of a RAM role and use the STS token to access Alibaba Cloud resources.

Method 2: Install the proxy using the command line

  1. Download and install the proxy

    pip install aliyun-adb-livy-proxy-0.0.1.zip
  2. Execute the following command to start the proxy.

    Note

    After the AnalyticDB for MySQL proxy is successfully installed, you can run adbproxy --help to view the parameter list.

    adbproxy  --db {ADB Instance Id} 
      --rg {Resource Group Name} 
      --endpoint {API Endpoint} 
      --host 127.0.0.1
      --port 5000
      -i {AK Id} 
      -k {AK Sec}
      -t {StsToken} # Choose either StsToken or AK

    The following table describes the parameters.

    Parameter

    Required

    Description

    --db

    Yes

    The ID of the AnalyticDB for MySQL Enterprise Edition, Basic Edition, or Data Lakehouse Edition cluster.

    You can log on to the AnalyticDB for MySQL console and go to the Clusters page to view cluster IDs.

    --rg

    Yes

    The name of the Job resource group in the AnalyticDB for MySQL cluster.

    You can log on to the AnalyticDB for MySQL console, choose Cluster Management > Resource Management in the left-side navigation pane, and then click the Resource Groups tab to view resource group names.

    --endpoint

    Yes

    The endpoint of the AnalyticDB for MySQL cluster.

    For more information, see Endpoints.

    --host

    No

    The local IP address to which the adbproxy service binds. The default value is 127.0.0.1.

    --port

    No

    The port number that the adbproxy service listens on. The default value is 5000.

    -i

    Yes (in specific scenarios)

    The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user with AnalyticDB for MySQL access permissions.

    For information about how to obtain the AccessKey ID and AccessKey secret, see Accounts and permissions.

    -k

    -t

    Yes (in specific scenarios)

    The Security Token Service (STS) token, which is the temporary identity credential of the RAM role.

    A RAM user with permissions can call the AssumeRole - Obtain temporary identity credentials of a RAM role API operation with their own AccessKey pair to obtain the STS token of a RAM role and use the STS token to access Alibaba Cloud resources.

  3. After successful startup, the console will display relevant log information.

Start Jupyter

Use the following command to start the Jupyter interactive development environment.

jupyter lab
Note

If you have set a custom listening address, execute jupyter lab --ip=*** to start Jupyter, where *** is your custom listening address.

After successful startup, the following information is returned. You can copy and paste the http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291 URL to your browser to use Jupyter to connect to AnalyticDB for MySQL Spark.

[I 2025-07-02 17:36:16.051 ServerApp] Serving notebooks from local directory: /home/newuser
[I 2025-07-02 17:36:16.052 ServerApp] Jupyter Server 2.16.0 is running at:
[I 2025-07-02 17:36:16.052 ServerApp] http://419e63fc7821:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2025-07-02 17:36:16.052 ServerApp]     http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2025-07-02 17:36:16.052 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Run jobs in Jupyter

Start resources and define the maximum ACU usage for jobs

  1. After connecting to AnalyticDB for MySQL Spark using Jupyter, click PySpark on the page to create a new PySpark job. The Spark job will run with the following default configuration parameters:

    {
       "kind": "pyspark",
        "heartbeatTimeoutInSecond": "60",
        "spark.driver.resourceSpec": "medium",
        "spark.executor.resourceSpec": "medium",
        "spark.executor.instances": "1",
        "spark.dynamicAllocation.shuffleTracking.enabled": "true",
        "spark.dynamicAllocation.enabled": "true",
        "spark.dynamicAllocation.minExecutors": "0",
        "spark.dynamicAllocation.maxExecutors": "1",
        "spark.adb.sessionTTLSeconds": "1200"
    }
  2. To modify the Spark application configuration parameters, you can use the %%configure -f statement.

    1. Restart the kernel.

      In the top navigation bar, choose Kernel > Restart Kernel and Clear All Outputs. Make sure that no running Spark applications are displayed on the Jupyter development page.

      image

    2. Enter custom Spark application configuration parameters in the Jupyter Notebook Cell.

      Important

      When you specify custom Spark application configuration parameters, you must set the spark.dynamicAllocation.enabled parameter to false.

      Example:

      This configuration allocates 32 Executors for the Spark job, with each Executor having a specification of medium (2 cores and 8 GB memory). The entire job can allocate a total of 64 ACUs of computing resources.

      %%configure -f
      {
         "spark.driver.resourceSpec":"large",
         "spark.sql.hive.metastore.version":"adb",
         "spark.executor.resourceSpec":"medium",
         "spark.adb.executorDiskSize":"100Gi",
         "spark.executor.instances":"32",
         "spark.dynamicAllocation.enabled":"false",
         "spark.network.timeout":"30000",
         "spark.memory.fraction":"0.75",
         "spark.memory.storageFraction":"0.3"
      }

      For more information about Spark application configuration parameters, see Spark application configuration parameters and the Spark official website.

  3. Click the image button to modify the Spark application configuration parameters.

    Important
    • After you close the Jupyter Notebook page, the specified custom configuration parameters no longer take effect. If you do not specify Spark application parameters after you re-open the Jupyter Notebook page, the default configuration parameters are used to run a Spark job.

    • When you run a Spark job on the Jupyter Notebook page, all configurations of the job are written directly to a JSON structure, instead of the conf object of a JSON structure required when you submit a batch job.

Run jobs

  1. Enter the spark command to start a SparkSession.image

    Note

    Click Link in the return value to access the Spark UI interface and view information such as Spark job logs.

  2. Execute Spark SQL in the Jupyter Notebook Cell to query the list of available databases in the AnalyticDB for MySQL cluster.

    Important

    You must add %%sql before executing Spark SQL code, otherwise it will be parsed as Python code by default. You can run %%help to learn more about Magic commands and their usage.

    %%sql
    show databases

    imageThe query results are consistent with those in AnalyticDB for MySQL.