Connect to EMR Serverless Spark and visualize data by using Livy Interpreter for Apache Zeppelin - E-MapReduce

Apache Zeppelin provides an interactive development environment that enables users to write code, run queries, and perform data visualization and analysis in a web UI. This topic describes how to connect to E-MapReduce (EMR) Serverless Spark by using Livy Interpreter for Apache Zeppelin to efficiently build and optimize an interactive development environment.

Prerequisites

An EMR Serverless Spark workspace is created. For more information, see Create a workspace.
Apache Zeppelin is installed and started. For more information, see Apache Zeppelin official documentation.

Procedure

Step 1: Create a gateway and a token

Create and start a gateway.
1. Go to the Gateways page.
  1. Log on to the EMR console.
  2. In the left-side navigation pane, choose EMR Serverless > Spark.
  3. On the Spark page, find the desired workspace and click the name of the workspace.
  4. In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Gateways.
2. On the Gateways page, click the Livy Gateways tab.
3. On the Livy Gateways tab, click Create Livy Gateway.
4. On the Create Livy Gateway page, configure the Name parameter and click Create. In this example, set the Name parameter to Livy-gateway.
  You can configure other parameters based on your business requirements. For more information, see Manage gateways.
5. On the Livy Gateways tab, find the created gateway and click Start in the Actions column.
Create a token.
1. On the Gateways page, find the gateway Livy-gateway and click Tokens in the Actions column.
2. On the Tokens tab, click Create Token.
3. In the Create Token dialog box, configure the Name parameter and click OK.
4. Copy the token.
  Important
  After the token is created, you must immediately copy the token. After you leave the page, you can no longer view the token. If your token expires or is lost, reset the token or create another token.

Step 2: Configure Livy Interpreter for Apache Zeppelin

Log on to Apache Zeppelin, click the username in the upper-right corner, and then select Interpreter from the drop-down list.
Click +Create in the upper-right corner and set the required parameters to create an interpreter.
Parameter
Description
Interpreter Name
Enter a custom name, such as mylivy.
Interpreter Group
Set this parameter to livy.

After you set the Interpreter Group parameter to livy, configure the required parameters.

The following table describes the required parameters. You can also configure other parameters based on your business requirements. For more information, see Apache Zeppelin official documentation.

Parameter	Description
zeppelin.livy.url	The URL of the Livy gateway. Enter the URL in the `http://{endpoint}` format. `{endpoint}` indicates the internal endpoint of the Livy gateway that you created.
zeppelin.livy.session.create_timeout	The maximum wait time for Apache Zeppelin to create a session. Unit: seconds. We recommend that you set this parameter to 600.
zeppelin.livy.http.headers	The custom header of the HTTP request. You need to click the icon to add the configuration and enter `x-acs-spark-livy-token:{token}`. `{token}` is the token that you created on the Token Management tab.

Click Save in the lower part of the page to save the settings.

Step 3: Create a notebook for data analytics

In the top navigation bar, click Notebook. Then, select Create new note.
Enter a custom note name and select mylivy from the Default Interpreter drop-down list.
Click Create.
Enter the following code in the created notebook to start a Spark session.
The time required for the first startup is 1 to 3 minutes. If you enter %pyspark, the Python environment is used. If you enter %spark, the Scala environment is used.
```
%pyspark
```
After the Spark session is started, you can view the link to the Spark UI and execute the code. You can use Python and Scala code together.
Enter the following code in the new notebook to query the available databases in the current Spark environment.
```
%pyspark

spark.sql("show databases").show()
```
The following figure shows the returned information.
Optional. View session information.
After you create a Spark session by using the Livy interface, you can view information about the Spark session, such as the session ID and status, on the Sessions tab of a specified Livy gateway.
1. On the Livy Gateways tab, find the desired Livy gateway and click the name of the gateway.
2. Click the Sessions tab.
  On the Sessions tab, you can view information about the Spark session that is created by using the Livy interface.

Parameter	Description
Interpreter Name	Enter a custom name, such as mylivy.
Interpreter Group	Set this parameter to livy.