Use a Docker image to start the interactive JupyterLab development environment and submit a Spark job - AnalyticDB for MySQL

AnalyticDB for MySQL Spark allows you to use a Docker image to start the interactive JupyterLab development environment. This environment helps you connect to AnalyticDB for MySQL Spark and perform interactive testing and computing based on elastic resources.

Prerequisites

An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster is created. For more information, see Create a Data Lakehouse Edition cluster.
A job resource group is created in the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. For more information, see Create a resource group.
A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
- If you use an Alibaba Cloud account, you must create a privileged account. For more information, see the "Create a privileged account" section of the Create a database account topic.
- If you use a Resource Access Management (RAM) user, you must create both a privileged account and a standard account and associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user.
AnalyticDB for MySQL is authorized to assume the AliyunADBSparkProcessingDataRole role to access other cloud resources. For more information, see Perform authorization within an Alibaba Cloud account.

Usage notes

AnalyticDB for MySQL Spark supports interactive Jupyter jobs only in Python 3.7 or Scala 2.12.
If an interactive Jupyter job remains idle for a time-to-live (TTL) period of 1,200 seconds after the last code snippet is executed, the job is automatically released. You can use the spark.adb.sessionTTLSeconds parameter to specify the TTL period for interactive Jupyter jobs.

Procedure

Install and start a Docker image. For more information, see the Docker documentation.

Pull the Jupyter image of AnalyticDB for MySQL. Sample command:

docker pull registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:livy.0.2.pre

Start the interactive JupyterLab development environment.

Command syntax:

docker run -it -p {Host port}:8888 -v {Host file path}:{Docker file path} registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:livy.0.2.pre -d {ADB Instance Id} -r {Resource Group Name} -e {api endpoint} -i {AkId} -k {aksec}

The following table describes the parameters.

Parameter	Required	Description
-p	No	Maps a host port to a container port. Specify the parameter in the `-p {Host port}:{Container port}` format. Specify a random value for the host port and set the container port to `8888`. Example: `-p 8888:8888`.
-v	No	If you do not mount the host path and disable the Docker container, the editing files may be lost. After you disable the Docker container, the container attempts to terminate all interactive Spark jobs that are running. You can use one of the following methods to prevent losing the editing files: When you start the interactive JupyterLab development environment, mount the host path to the Docker container and store the job files in the corresponding file path. Specify the parameter in the `-v {Host file path}:{Docker file path}` format. Specify a random value for the file path of the Docker container. Recommended value: `/root/jupyter`. Before you disable the Docker container, make sure that all files are copied and stored. Example: `-v /home/admin/notebook:/root/jupyter`. In this example, the host files that are stored in the `/home/admin/notebook` path are mounted to the `/root/jupyter` path of the Docker container. Note Save the editing notebook files to the `/tmp` folder. After you disable the Docker container, you can view the corresponding files in the `/home/admin/notebook` path of the host. After you re-enable the Docker container, you can modify and execute the files. For more information, see Volumes.
-d	Yes	The ID of the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster. You can log on to the AnalyticDB for MySQL console and go to the Clusters page to view cluster IDs.
-r	Yes	The name of the resource group in the AnalyticDB for MySQL cluster. You can log on to the AnalyticDB for MySQL console, choose Cluster Management > Resource Management in the left-side navigation pane, and then click the Resource Groups tab to view resource group names.
-e	Yes	The endpoint of the AnalyticDB for MySQL cluster. For more information, see Endpoints.
-i	Yes	The AccessKey ID of the Resource Access Management (RAM) user. For information about how to view the AccessKey ID, see Accounts and permissions.
-k	Yes	The AccessKey secret of the RAM user. For information about how to view the AccessKey secret, see Accounts and permissions.

Example:

docker run -it  -p 8888:8888 -v /home/admin/notebook:/root/jupyter registry.cn-hangzhou.aliyuncs.com/adb-public-image/adb-spark-public-image:livy.0.2.pre -d amv-bp164l3xt9y3**** -r test -e adb.aliyuncs.com -i LTAI55stlJn5GhpBDtN8**** -k DlClrgjoV5LmwBYBJHEZQOnRF7****

After you start the interactive JupyterLab development environment, the following information is returned. You can copy and paste the http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291 URL to your browser and use JupyterLab to connect to AnalyticDB for MySQL Spark.

[I 2023-11-24 09:55:09.852 ServerApp] nbclassic | extension was successfully loaded.
[I 2023-11-24 09:55:09.852 ServerApp] sparkmagic extension enabled!
[I 2023-11-24 09:55:09.853 ServerApp] sparkmagic | extension was successfully loaded.
[I 2023-11-24 09:55:09.853 ServerApp] Serving notebooks from local directory: /root/jupyter
[I 2023-11-24 09:55:09.853 ServerApp] Jupyter Server 1.24.0 is running at:
[I 2023-11-24 09:55:09.853 ServerApp] http://419e63fc7821:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2023-11-24 09:55:09.853 ServerApp]  or http://127.0.0.1:8888/lab?token=1e2caca216c1fd159da607c6360c82213b643605f11ef291
[I 2023-11-24 09:55:09.853 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Note

If an error message is returned when you start the interactive JupyterLab development environment, you can view the proxy_{timestamp}.log file for troubleshooting.