All Products
Search
Document Center

E-MapReduce:Get started with JAR development

Last Updated:Mar 26, 2026

Build and deploy a Spark JAR job on E-MapReduce (EMR) Serverless Spark — from Maven configuration through execution and publishing.

EMR Serverless Spark does not provide an integrated development environment (IDE) for JAR packages. Build and package your Spark application on a local or standalone development platform before uploading it.

Prerequisites

Before you begin, ensure that you have:

Step 1: Configure Maven dependencies

In the pom.xml of your Maven project, add the Spark dependencies with scope set to provided. The EMR Serverless Spark runtime already includes these libraries, so setting provided prevents duplicate packaging and version conflicts while keeping the dependencies available during compilation and testing.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.12</artifactId>
    <version>3.5.2</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.12</artifactId>
    <version>3.5.2</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.12</artifactId>
    <version>3.5.2</version>
    <scope>provided</scope>
</dependency>

Code examples

The following two examples are used throughout this guide. Each targets a different main class, which you specify when configuring the job.

Example 1: Query a Data Lake Formation (DLF) table

Main class: org.example.HiveTableAccess

public class HiveTableAccess {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
                .appName("DlfTableAccessExample")
                .enableHiveSupport()
                .getOrCreate();
        spark.sql("SELECT * FROM test_table").show();
        spark.stop();
    }
}

Example 2: Calculate the approximate value of pi (π)

Main class: org.example.JavaSparkPi

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;

import java.util.ArrayList;
import java.util.List;

/**
 * Computes an approximation to pi
 * Usage: JavaSparkPi [partitions]
 */
public final class JavaSparkPi {

  public static void main(String[] args) throws Exception {
    SparkSession spark = SparkSession
      .builder()
      .appName("JavaSparkPi")
      .getOrCreate();

    JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());

    int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
    int n = 100000 * slices;
    List<Integer> l = new ArrayList<>(n);
    for (int i = 0; i < n; i++) {
      l.add(i);
    }

    JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

    int count = dataSet.map(integer -> {
      double x = Math.random() * 2 - 1;
      double y = Math.random() * 2 - 1;
      return (x * x + y * y <= 1) ? 1 : 0;
    }).reduce((integer, integer2) -> integer + integer2);

    System.out.println("Pi is roughly " + 4.0 * count / n);

    spark.stop();
  }
}

Click SparkExample-1.0-SNAPSHOT.jar to download a prebuilt test JAR package.

Step 2: Upload the JAR package

  1. Log on to the EMR console.

  2. In the left navigation pane, choose EMR Serverless > Spark.

  3. On the Spark page, click the name of your workspace.

  4. In the left navigation pane of the workspace, click Artifacts.

  5. On the Artifacts page, click Upload File.

  6. In the Upload File dialog box, click the upload area to select a local JAR package, or drag the package into the area. This guide uses SparkExample-1.0-SNAPSHOT.jar as an example.

Step 3: Create and run a job

  1. In the left navigation pane, click Development.

  2. On the Development tab, click the image icon to create a new job.

  3. Enter a name, set Type to Application(Batch) > JAR, and click OK.

  4. In the upper-right corner, select a resource queue. For instructions on adding a queue, see Manage resource queues.

  5. Configure the following parameters, leave the remaining settings at their defaults, and click Run.

    ParameterDescription
    Main JAR ResourceSelect the JAR package uploaded in Step 2. In this example, select SparkExample-1.0-SNAPSHOT.jar.
    Main ClassThe entry point class for your Spark job. Enter org.example.JavaSparkPi for the pi example, or org.example.HiveTableAccess for the DLF table query.
  6. After the job runs, go to the Execution Records section and click Logs in the Actions column to view the output.

    image

    image

Step 4: Publish the job

Important

Publishing a job makes it available as a node in a workflow.

  1. After the job completes, click Publish in the upper-right corner.

  2. In the dialog box, enter the release information and click OK.

(Optional) Step 5: View the Spark UI

After the job runs successfully, inspect its execution details on the Spark UI.

  1. In the left navigation pane, click Job History.

  2. On the Application page, find your job and click Spark UI in the Actions column.

  3. On the Spark Jobs page, view the job details.

    image

What's next

After publishing, use your job as a scheduled node in a workflow. See Manage workflows for details. For a complete walkthrough of job orchestration, see Get started with SparkSQL development.