All Products
Search
Document Center

E-MapReduce:Use Jupyter Notebook to interact with EMR Serverless Spark

Last Updated:Mar 26, 2026

Connect Jupyter Notebook to EMR Serverless Spark for interactive PySpark and Scala development. This guide covers two setup methods: a Docker image for a quick, portable environment, and the sparkmagic plugin for direct integration with an existing Jupyter installation.

How it works

EMR Serverless Spark exposes a Livy Gateway that implements the Apache Livy RESTful API. When you create a Spark session from Jupyter, sparkmagic (or the preconfigured Docker image) authenticates via a token and communicates with the Livy Gateway to submit code and retrieve results. The gateway handles session lifecycle and resource scheduling on the Serverless Spark side.

Jupyter Notebook / JupyterLab
        │
        │  sparkmagic (Livy protocol)
        ▼
  Livy Gateway  ──── token auth
        │
        ▼
 EMR Serverless Spark cluster

For Livy API details, see REST API.

Choose a method

MethodUse when
Method 1: Use a Docker imageYou want a self-contained environment, or need to reproduce the same setup across machines
Method 2: Use the sparkmagic pluginYou already have Jupyter Notebook installed and want to add sparkmagic to it

Prerequisites

Before you begin, ensure that you have:

Method 1: Use a Docker image to quickly build and start an environment

Step 1: Create a gateway and a token

  1. Create and start a gateway.

    1. Log on to the EMR console.

    2. In the left navigation bar, select EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, click O&M Center > Gateway in the left-side navigation pane.

    5. Click the Livy Gateway tab.

    6. Click Create Livy Gateway.

    7. On the Create Gateway page, enter a Name (for example, Livy-gateway) and click Create. To configure additional parameters, see Manage gateways.

    8. On the Livy Gateway page, find the created gateway and click Start in the Actions column.

  2. Create a token.

    1. On the Gateway page, find Livy-gateway and click Tokens in the Actions column.

    2. Click Create Token.

    3. In the Create Token dialog box, enter a Name (for example, Livy-token) and click OK.

    4. Copy the token immediately after it is created.

      Important

      After the token is created, copy it right away. The token is not displayed again after you leave the page. If the token expires or is lost, reset it or create a new one.

Step 2: Pull and start the Docker image

  1. Pull the image:

    docker pull emr-registry-registry.cn-hangzhou.cr.aliyuncs.com/serverless-spark-public/emr-spark-jupyter:latest
  2. Start the image:

    docker run -p <host_port>:8888 emr-registry-registry.cn-hangzhou.cr.aliyuncs.com/serverless-spark-public/emr-spark-jupyter:latest <endpoint> <token>
    ParameterDescription
    <host_port>The port on the host machine to map to the container's port 8888
    <endpoint>The Livy gateway endpoint. To find it, click the gateway name on the Livy Gateway page, then check the Overview tab.
    <token>The token you copied in step 1

    After the container starts, the output includes a URL similar to:

    [I 2024-09-23 05:38:14.640 ServerApp] http://127.0.0.1:8888/lab?token=258c0dd75e22a10fb6e2c87ac738c2a7ba6a314c6b******
  3. Open the URL in a browser to access Jupyter.

    Note
    • If you are connecting to EMR Serverless Spark from a remote server, you must replace 127.0.0.1 with the actual IP address of the server.

    • If the host_port is not 8888, you must replace the port number with the actual port number.

Step 3: Test the connectivity

  1. On the JupyterLab page, click PySpark in the Notebook section.

    image

  2. Run the following code to query all accessible databases:

    spark.sql("show databases").show()

    The output that is shown in the following figure is returned.

    image

Method 2: Use the sparkmagic plugin to build and start an environment

Step 1: Create a gateway and a token

Follow the same steps as Method 1, Step 1.

Step 2: Install and enable the sparkmagic plugin

  1. Install sparkmagic:

    pip install sparkmagic
  2. Enable the plugin based on your Jupyter environment:

    • Jupyter Notebook:

      jupyter nbextension enable --py --sys-prefix widgetsnbextension
    • JupyterLab:

      jupyter labextension install "@jupyter-widgets/jupyterlab-manager"

For more information, see sparkmagic on GitHub.

Step 3: Configure and start a Spark session

  1. Open Jupyter. See JupyterLab documentation if needed.

  2. Import the sparkmagic plugin:

    %load_ext sparkmagic.magics
  3. Extend the session startup timeout to avoid failures caused by resource scheduling delays:

    import sparkmagic.utils.configuration as conf
    conf.override("livy_session_startup_timeout_seconds", 1000)
  4. (Optional) Customize Spark resource configuration. The following example sets driver cores and memory. For all available parameters (including ttl and conf), see Livy Docs - REST API.

    %% spark config
    {
       "conf": {
           "spark.driver.cores": "1",
           "spark.driver.memory": "7g"
       }
    }
  5. Create a Spark session. Choose Python or Scala based on your requirements. Python:

    ParameterDescription
    <session_name>A name for the Spark session. Specify any custom name.
    <endpoint>The Endpoint(Public) or Endpoint(Private) value from the Overview tab of the Livy gateway. If you use a private endpoint, change https:// to http:// and make sure the machine running Jupyter is in the same region as the Livy gateway.
    <token>The token you copied in step 1
    %spark add -s <session_name> -l python -u https://<endpoint> -a username -p <token>

    Scala:

    %spark add -s <session_name> -l scala -u https://<endpoint> -a username -p <token>
  6. Wait 1 to 5 minutes for the session to become ready. The session is ready when idle appears in the State column. To confirm, log on to the EMR Serverless Spark console and check the Sessions tab of the Livy gateway.

    image

Step 4: Verify the session

After the session is ready, run code using %%spark. The following example lists all databases in the current Spark environment:

%%spark
spark.sql("show databases").show()

The output that is shown in the following figure is returned.

image

Step 5: Release session resources (optional)

When you are done, release the session resources to avoid unnecessary charges.

  • Automatic release: Sessions idle for two hours are terminated automatically.

  • Manual release via sparkmagic:

    %spark delete -s <session_name>
  • Manual release via the console: On the Sessions tab of the Livy gateway, find the session and click Close in the Actions column.

    image