All Products
Search
Document Center

AnalyticDB:Develop an interactive Jupyter job

Last Updated:Mar 28, 2026

AnalyticDB for MySQL Spark lets you run interactive Spark jobs directly from a Jupyter environment, using elastic computing resources from your cluster. Choose between a Docker-based all-in-one setup that requires no manual installation, or a locally installed Jupyter Notebook connected through the ADB proxy.

Prerequisites

Before you begin, make sure that you have:

To configure log storage, go to the AnalyticDB for MySQL console, click the cluster ID, then choose Job Development > Spark JAR Development in the left-side navigation pane, and click Log Settings. Select the default path or enter a custom path. The custom path cannot be the root directory of OSS — it must contain at least one subfolder.

Usage notes

  • Interactive Jupyter jobs support Python 3.7 and Scala 2.12 only.

  • Spark resources are automatically released after the session is idle for 1,200 seconds (measured from the last executed code block). To change this timeout, run the following in a Jupyter Notebook cell:

    %%configure -f
    {
       "spark.adb.sessionTTLSeconds": "3600"
    }

Choose a connection method

MethodBest forWhat's included
ADB Docker image (recommended)Quick start, no local Jupyter setupJupyterLab + SparkMagic + ADB proxy, pre-configured in one image
Locally installed JupyterExisting Jupyter environment you want to reuseInstall SparkMagic manually; start the ADB proxy separately

Method 1: Use the ADB Docker image

This method starts a pre-configured JupyterLab environment inside a Docker container. The image includes SparkMagic and the ADB proxy — no additional installation is needed.

Start the Docker container

  1. Install and start Docker.

  2. Pull the ADB Jupyter image:

    docker pull registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
  3. Start the container:

    Parameters:

    ParameterRequiredDescriptionExample
    -pNoMaps a host port to container port 8888.-p 8888:8888
    -vNoMounts a host directory into the container to prevent file loss when the container stops. Recommended container path: /root/jupyter. Save notebook files to /tmp inside the container — they will appear at the host path after the container stops. See Volumes.-v /home/admin/notebook:/root/jupyter
    -dYesThe cluster ID. Find it on the Clusters page in the ADB console.amv-bp164l********
    -rYesThe job resource group name. Find it under Cluster Management > Resource Management > Resource Groups in the console.test
    -eYesThe API endpoint of the cluster. See Endpoints.adb.aliyuncs.com
    -i / -kConditionalYour Alibaba Cloud account or RAM user's AccessKey ID and AccessKey secret. See Accounts and permissions. Use this or -t, not both.LTAI****************
    -tConditionalA Security Token Service (STS) token — a temporary credential for a RAM role. Obtain it by calling AssumeRole with an AccessKey pair. Use this or -i/-k, not both.
    docker run -it \
      -p {host-port}:8888 \
      -v {host-path}:{docker-path} \
      registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre \
      -d {cluster-id} \
      -r {resource-group-name} \
      -e {api-endpoint} \
      -i {ak-id} \
      -k {ak-secret} \
      -t {sts-token}   # Use either -t (STS token) or -i/-k (AccessKey pair)

    Example:

    docker run -it -p 8888:8888 -v /home/admin/notebook:/root/jupyter \
      registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre \
      -d amv-bp164l******** -r test -e adb.aliyuncs.com \
      -i LTAI**************** -k ****************
  4. After the container starts, copy the URL from the output and open it in your browser:

    [I 2023-11-24 09:55:09.852 ServerApp] nbclassic | extension was successfully loaded.
    [I 2023-11-24 09:55:09.852 ServerApp] sparkmagic extension enabled!
    [I 2023-11-24 09:55:09.853 ServerApp] sparkmagic | extension was successfully loaded.
    [I 2023-11-24 09:55:09.853 ServerApp] Serving notebooks from local directory: /root/jupyter
    [I 2023-11-24 09:55:09.853 ServerApp] Jupyter Server 1.24.0 is running at:
    [I 2023-11-24 09:55:09.853 ServerApp] http://419e63fc7821:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
    [I 2023-11-24 09:55:09.853 ServerApp]  or http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
    [I 2023-11-24 09:55:09.853 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

    Open http://127.0.0.1:8888/lab?token=<token> to access JupyterLab.

If the container fails to start, check the proxy_{timestamp}.log file for error details.

Method 2: Use a locally installed Jupyter Notebook

This method connects an existing Jupyter installation to ADB Spark through the ADB proxy. It involves three stages: install SparkMagic, start the proxy, then start Jupyter.

Stage 1: Install SparkMagic

Important

All optional steps must be performed in strict order without skipping or reordering. If you skip any step, the on-duty engineer will not be able to analyze environment issues through Jupyter startup logs, and you will need to resolve any errors on your own.

  1. Install JupyterLab or JupyterHub.

  2. Install SparkMagic:

    pip install sparkmagic
  3. Install ipywidgets:

    pip install ipywidgets
  4. (Optional) Install wrapper kernels. The following example applies to JupyterLab 3.x. Run pip show sparkmagic to find the SparkMagic installation path, then run:

    jupyter-kernelspec install sparkmagic/kernels/sparkkernel
    jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
    jupyter-kernelspec install sparkmagic/kernels/sparkrkernel
  5. (Optional) Edit the SparkMagic configuration file at ~/.sparkmagic/config.json to change the Livy server address from 127.0.0.1:5000 to your preferred IP and port. For a full configuration reference, see the example config. The following shows the relevant fields:

    "kernel_python_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000",
      "auth": "None"
    },
    "kernel_scala_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000",
      "auth": "None"
    },
    "kernel_r_credentials": {
      "username": "",
      "password": "",
      "url": "http://127.0.0.1:5000"
    }
  6. (Optional) Enable server extensions to switch clusters from code:

    jupyter server extension enable --py sparkmagic

Stage 2: Start the ADB proxy

Start the ADB proxy using either Docker or the command line.

Option A: Docker

  1. Install and start Docker.

  2. Pull the ADB image:

    docker pull registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre
  3. Start the proxy container, which listens on port 5000:

    ParameterRequiredDescriptionExample
    -pNoMaps a host port to container port 5000.-p 5000:5000
    -vNoMounts a host directory into the container to prevent file loss when the container stops.-v /home/admin/notebook:/root/jupyter
    -dYesThe cluster ID. Find it on the Clusters page in the ADB console.amv-bp164l********
    -rYesThe job resource group name. Find it under Cluster Management > Resource Management > Resource Groups.test
    -eYesThe API endpoint of the cluster. See Endpoints.adb.aliyuncs.com
    -i / -kConditionalAccessKey ID and AccessKey secret. See Accounts and permissions. Use this or -t, not both.
    -tConditionalSTS token for a RAM role. Obtain it by calling AssumeRole. Use this or -i/-k, not both.
    docker run -it \
      -p {host-port}:5000 \
      -v {host-path}:{docker-path} \
      registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:adb.notebook.0.5.pre \
      -d {cluster-id} \
      -r {resource-group-name} \
      -e {api-endpoint} \
      -i {ak-id} \
      -k {ak-secret} \
      -t {sts-token}   # Use either -t (STS token) or -i/-k (AccessKey pair)

Option B: Command line

  1. Download and install the proxy package:

    pip install aliyun-adb-livy-proxy-0.0.1.zip
  2. Start the proxy:

    Run adbproxy --help to view all available parameters.
    ParameterRequiredDefaultDescription
    --dbYesThe cluster ID. Find it on the Clusters page in the ADB console.
    --rgYesThe job resource group name. Find it under Cluster Management > Resource Management > Resource Groups.
    --endpointYesThe API endpoint. See Endpoints.
    --hostNo127.0.0.1The local IP address the proxy binds to.
    --portNo5000The port the proxy listens on.
    -i / -kConditionalAccessKey ID and AccessKey secret. See Accounts and permissions. Use this or -t, not both.
    -tConditionalSTS token for a RAM role. Obtain it by calling AssumeRole. Use this or -i/-k, not both.
    adbproxy \
      --db {cluster-id} \
      --rg {resource-group-name} \
      --endpoint {api-endpoint} \
      --host 127.0.0.1 \
      --port 5000 \
      -i {ak-id} \
      -k {ak-secret} \
      -t {sts-token}   # Use either -t (STS token) or -i/-k (AccessKey pair)

    After startup, the console displays proxy log messages confirming the service is running.

Stage 3: Start Jupyter

Start JupyterLab:

jupyter lab

If you configured a custom listening address, specify it with --ip:

jupyter lab --ip=<custom-ip>

After startup, copy the URL from the output and open it in your browser:

[I 2025-07-02 17:36:16.051 ServerApp] Serving notebooks from local directory: /home/newuser
[I 2025-07-02 17:36:16.052 ServerApp] Jupyter Server 2.16.0 is running at:
[I 2025-07-02 17:36:16.052 ServerApp] http://419e63fc7821:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2025-07-02 17:36:16.052 ServerApp]     http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2025-07-02 17:36:16.052 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Open http://127.0.0.1:8888/lab?token=<token> to access Jupyter.

Run jobs in Jupyter

Configure Spark resources

When you open a new PySpark notebook, Spark starts with the following default configuration:

{
   "kind": "pyspark",
   "heartbeatTimeoutInSecond": "60",
   "spark.driver.resourceSpec": "medium",
   "spark.executor.resourceSpec": "medium",
   "spark.executor.instances": "1",
   "spark.dynamicAllocation.shuffleTracking.enabled": "true",
   "spark.dynamicAllocation.enabled": "true",
   "spark.dynamicAllocation.minExecutors": "0",
   "spark.dynamicAllocation.maxExecutors": "1",
   "spark.adb.sessionTTLSeconds": "1200"
}

To use custom Spark configuration parameters:

  1. Restart the kernel: in the top navigation bar, choose Kernel > Restart Kernel and Clear All Outputs. Confirm that no Spark applications are running.

    image

  2. Enter your custom parameters using %%configure -f: The following example allocates 32 executors with a medium spec (2 cores, 8 GB memory each), for a total of 64 ACUs:

    Important

    When specifying custom Spark parameters, set spark.dynamicAllocation.enabled to false.

    %%configure -f
    {
       "spark.driver.resourceSpec": "large",
       "spark.sql.hive.metastore.version": "adb",
       "spark.executor.resourceSpec": "medium",
       "spark.adb.executorDiskSize": "100Gi",
       "spark.executor.instances": "32",
       "spark.dynamicAllocation.enabled": "false",
       "spark.network.timeout": "30000",
       "spark.memory.fraction": "0.75",
       "spark.memory.storageFraction": "0.3"
    }

    For the full parameter reference, see Spark application configuration parameters and the Spark documentation.

  3. Click the image button to apply the configuration.

Important
  • Custom configuration parameters are reset when you close the notebook. On reopening, default parameters apply unless you run %%configure -f again.

  • For interactive jobs, all configuration parameters are written directly to the JSON structure — not inside the conf object required for batch jobs.

Run Spark jobs

  1. Enter the spark command to start a SparkSession.

    Click Link in the return value to open the Spark UI, where you can view job logs and execution details.

    image

  2. Run Spark SQL by prefixing queries with %%sql. The following example lists all databases in the cluster:

    Important

    The %%sql prefix is required. Without it, the cell content is parsed as Python. Run %%help to see all available magic commands.

    %%sql
    show databases

    The results match the databases available in your AnalyticDB for MySQL cluster.

    image