Alibaba Cloud Data Lake Analytics (DLA) introduces solutions to support the Spark read-eval-print loop (REPL) feature. These solutions allow you to install JupyterLab and the Livy proxy of DLA on your on-premises machine, and to use the Docker image provided by DLA to quickly start JupyterLab. This helps you connect JupyterLab to the serverless Spark engine of DLA. After you establish the connection between JupyterLab and the serverless Spark engine of DLA, you can perform interactive testing and compute data by using the elastic resources of DLA.
Usage notes
- The serverless Spark engine of DLA supports JupyterLab interactive jobs that are programmed in Python 3.0 or Scala 2.11.
- JupyterLab of the latest version supports Python 3.6 and later.
- To develop a JupyterLab interactive job, we recommend that you use the Docker image to quickly start JupyterLab. For more information, see Use the Docker image to quickly start JupyterLab.
- JupyterLab interactive jobs are automatically released after they are idle for a specified
period of time. By default, a JupyterLab interactive job is released 1200 seconds
after the last code block of the job is executed. You can use the
spark.dla.session.ttl
parameter to configure the idle time of a JupyterLab interactive job before the job is automatically released.
Install the Livy proxy of DLA and JupyterLab on your on-premises machine
Use the Docker image to quickly start JupyterLab
You can use the Docker image provided by DLA to quickly start JupyterLab. For more information about how to install and use a Docker image, see Docker Documentation.
- When you perform troubleshooting, you must query the related information from the
dlaproxy.log
file. If the information in the following figure appears in the log file, JupyterLab is started. - You must mount the host path to the path of the Docker image. Otherwise, the system
automatically deletes the notebooks that are in Edit mode when you terminate the Docker
image. When you terminate the Docker image, the system also automatically attempts
to terminate all JupyterLab interactive jobs that are running. To address this issue,
you can use one of the following solutions:
- Before you terminate the Docker image, make sure that you keep all files secure.
- Mount the host path to the path of the Docker image and save job files to the path
of the Docker image.
For example, in Linux, if you want to mount the host path
/home/admin/notebook
to the path of the Docker image/root/notebook
, run the following command:docker run -it --privileged=true -p 8888:8888 -v /home/admin/notebook:/root/notebook registry.cn-hangzhou.aliyuncs.com/dla_spark/dla-jupyter:0.2 -i {AkId} -k {AkSec} -r {RegionId} -c {VcName}
You must save the notebooks that are in Edit mode to the
/tmp
path. This ensures that you can view the related files in the/home/admin/notebook
path on the host and continue to use the notebooks the next time the Docker image starts.Note For more information, see Use volumes.
FAQ
-
Problem description: JupyterLab fails to start and the following error messages appear:
[C 09:53:15.840 LabApp] Bad config encountered during initialization:
[C 09:53:15.840 LabApp] Could not decode '\xe6\x9c\xaa\xe5\x91\xbd\xe5\x90\x8d' for unicode trait 'untitled_notebook' of a LargeFileManager instance.
Solution: Run LANG=zn jupyter lab.
-
Problem description: The error message "
$ jupyter nbextension enable --py --sys-prefix widgetsnbextension Enabling notebook extension jupyter-js-widgets/extension... - Validating: problems found: - require? X jupyter-js-widgets/extension
" appears.Solution: Run the
jupyter nbextension install --py widgetsnbextension --user
andjupyter nbextension enable widgetsnbextension --user --py
commands. -
Problem description: The error message "
ValueError: Please install nodejs >=12.0.0 before continuing. nodejs may be installed using conda or directly from the nodejs website.
" appears.Solution: Run the conda install nodejs command. For more information about how to install Conda, see the Conda official documentation.
-
Problem description: Sparkmagic fails to be installed and the error message in the following figure appears.
Solution: Install Rust.
-
Problem description: I fail to create charts by using Matplotlib. The error information in the following figure is displayed after I run the
%matplotlib inline
command.Solution: If you use PySpark in the cloud, run the
%matplot plt
command with theplt.show()
function to create a chart, as shown in the following figure.