All Products
Search
Document Center

E-MapReduce:Get started with SparkSQL development

Last Updated:Mar 26, 2026

EMR Serverless Spark lets you write and run SQL jobs directly in the console. This guide walks you through the full workflow: create two SparkSQL jobs, assemble them into a workflow, run the workflow, and verify the results.

By the end of this guide, you will be able to:

  1. Create and publish SparkSQL jobs in the development editor.

  2. Build a workflow that chains those jobs with a dependency.

  3. Run the workflow and monitor its execution.

  4. Verify the output data using a SQL query.

Prerequisites

Before you begin, make sure you have:

Key concepts

  • Job: A unit of SQL code you develop, test, and publish. Jobs are the building blocks of workflows.

  • Workflow: A directed pipeline of nodes. Each node runs one published job. Nodes can declare upstream dependencies to control execution order.

  • Node: A reference to a published job inside a workflow. Adding a node to a workflow and configuring its Upstream Nodes defines the execution graph.

Step 1: Create and publish the jobs

Important

A job must be published before it can be added to a workflow.

Navigate to the development editor:

  1. Log on to the EMR console.

  2. In the left navigation pane, choose EMR Serverless > Spark.

  3. Click the target workspace name.

  4. In the left navigation pane, click Development.

Create the users_task job

  1. On the Development tab, click the image icon.

  2. In the New dialog box, enter users_task as the name, leave the type as SparkSQL, and click OK.

  3. Copy the following code into the users_task tab:

    CREATE TABLE IF NOT EXISTS students (
      name VARCHAR(64),
      address VARCHAR(64)
    )
    USING PARQUET
    PARTITIONED BY (data_date STRING)
    OPTIONS (
      'path'='oss://<bucketname>/path/'
    );
    
    INSERT OVERWRITE TABLE students PARTITION (data_date = '${ds}') VALUES
      ('Ashua Hill', '456 Erica Ct, Cupertino'),
      ('Brian Reed', '723 Kern Ave, Palo Alto');

    The ${ds} placeholder is a date variable. The default value is the previous day. The following table lists all supported date variables.

    VariableData typeFormatExample
    {data_date}strYYYY-MM-DD2023-09-18
    {ds}str
    {dt}str
    {data_date_nodash}strYYYYMMDD20230918
    {ds_nodash}str
    {dt_nodash}str
    {ts}strYYYY-MM-DDTHH:MM:SS2023-09-18T16:07:43
    {ts_nodash}strYYYYMMDDHHMMSS20230918160743
  4. From the database and session drop-down lists, select a database and a running session instance. To create a new session instead, select Connect to SQL Session from the drop-down list. See Manage SQL sessions.

  5. Click Run. Results appear on the Execution Results tab. If an exception occurs, check the Execution Issues tab.

  6. Click Publish.

  7. In the dialog box, enter a description and click OK.

Note

Job parameters are published with the job and applied when the job runs in a pipeline. Session parameters are applied when the job runs in the SQL editor.

Create the users_count job

  1. On the Development tab, click the image icon.

  2. In the New dialog box, enter users_count as the name, leave the type as SparkSQL, and click OK.

  3. Copy the following code into the users_count tab:

    SELECT COUNT(1) FROM students;
  4. From the database and session drop-down lists, select a database and a running session instance. To create a new session instead, select Connect to SQL Session from the drop-down list. See Manage SQL sessions.

  5. Click Run. Results appear on the Execution Results tab. If an exception occurs, check the Execution Issues tab.

  6. Click Publish.

  7. In the dialog box, enter a description and click OK.

Step 2: Create a workflow

  1. In the left navigation pane, click Workflows.

  2. On the Workflows page, click Create Workflow.

  3. In the Create Workflow panel, enter a name such as spark_workflow_task, then click Next. Configure the options in Other Settings as needed. See Manage workflows for parameter details.

  4. Add the users_task node:

    1. On the node canvas, click Add Node.

    2. In the Add Node panel, select users_task from the Source Path drop-down list, then click Save.

  5. Add the users_count node:

    1. Click Add Node.

    2. In the Add Node panel, select users_count from the Source Path drop-down list, select users_task from the Upstream Nodes drop-down list, then click Save.

  6. Click Publish Workflow.

    image

  7. In the Publish dialog box, enter a description and click OK.

Step 3: Run the workflow

  1. On the Workflows page, click the workflow name (for example, spark_workflow_task).

  2. On the Workflow Runs page, click Run.

    Note

    After you configure a scheduling cycle, you can also start the schedule on the Workflows page by turning on the switch.

  3. In the Start Workflow dialog box, click OK.

Step 4: Monitor the run

  1. On the Workflows page, click the target workflow.

  2. On the Workflow Runs page, view all workflow instances with their runtime and status.

    image.png

  3. Click a Workflow Run ID in the Workflow Runs section, or open the Workflow Runs Diagram tab, to view the instance graph.

  4. Click a node instance to open the node information dialog, where you can perform operations or view details. See View node instances for available operations. Click Spark UI to open the Spark Jobs page and view real-time task information.

    image.png

    image

  5. Click the Name to open the Job History page, where you can review metrics, diagnostics, and logs.

    image

Step 5: Manage the workflow

On the Workflows page, click the workflow name to open the Workflow Runs page:

  • In the Workflow Information section, edit workflow parameters as needed.

  • In the Workflow Runs section, view all workflow instances. Click a Workflow Run ID to open the corresponding instance graph.

    image.png

Step 6: Verify the data

  1. In the left navigation pane, click Development.

  2. Create a SparkSQL job and run the following query to confirm the data was written:

    SELECT * FROM students;

    The query returns:

    image.png

What's next