All Products
Search
Document Center

E-MapReduce:Get started with notebook development

Last Updated:Mar 26, 2026

EMR Serverless Spark supports interactive development through notebooks. This guide walks you through creating a notebook, loading sample data, running PySpark code, visualizing results, and publishing your first notebook.

Prerequisites

Before you begin, ensure that you have:

Step 1: Prepare a test file

Download employee.csv to use in this guide. The file contains employee names, departments, and salaries.

Step 2: Upload the test file to OSS

Notebooks in EMR Serverless Spark read data from Object Storage Service (OSS). Upload employee.csv to an OSS bucket so the notebook can access it. See Upload files for instructions.

Note the OSS path of the uploaded file — you will use it in the next step.

Step 3: Create and run a notebook

Create a notebook

  1. On the EMR Serverless Spark page, click Development in the left navigation pane.

  2. On the Development tab, click the image icon.

  3. In the dialog box, enter a name, select Interactive Analytic > Notebook as the type, and then click OK.

Attach a notebook session

In the upper-right corner, select a running notebook session instance from the drop-down list.

Multiple notebooks can share a single notebook session instance, so you can access the same session resources from multiple notebooks simultaneously without creating a new session for each one. To create a new session instead, select Create Notebook Session from the drop-down list. For details, see Manage notebook sessions.

Run a PySpark job

The following code reads the CSV file from OSS, displays the first five rows, and calculates the total salary per department.

  1. Copy the following code into the Python cell of the notebook. Replace oss://path/to/file with the OSS path of the file you uploaded in Step 2.

    # Read the CSV file from OSS
    df = spark.read.option("delimiter", ",").option("header", True).csv("oss://path/to/file")
    # Display the first five rows
    df.show(5)
    # Calculate total salary by department
    sum_salary_per_department = df.groupBy("department").agg({"salary": "sum"}).show()
  2. Click Execute All Cells to run the notebook. To run a single cell, click the image icon next to that cell.

    image

  3. (Optional) To inspect job execution, hover over the image icon for the current session in the session drop-down list and click Spark UI. The Spark Jobs page opens, where you can view detailed Spark job information.

    image

Visualize data with third-party libraries

Notebook sessions come with matplotlib, numpy, and pandas pre-installed. The following example uses matplotlib to plot a simple line chart.

For other third-party libraries, see Use third-party Python libraries in a notebook.
  1. Copy the following code into a cell.

    import matplotlib.pyplot as plt
    
    l = sc.parallelize(range(20)).collect()
    plt.plot(l)
    plt.ylabel('some numbers')
    plt.show()
  2. Click Execute All Cells to run the notebook. To run a single cell, click the image icon next to that cell.

    image

Step 4: Publish the notebook

After the notebook finishes running, click Publish in the upper-right corner. In the Publish dialog box, configure the parameters and click OK to save the notebook as a new version.

What's next