Livy gateways and Kyuubi gateways provide APIs for you to submit jobs to E-MapReduce (EMR) Serverless Spark.
Background information
Livy is a service that allows you to call RESTful APIs to simplify the interactions between Livy and Spark. Livy allows you to use open source projects of Airflow, such as livy_operator and spark_magic, to submit jobs to EMR Serverless Spark, query the job status, and obtain computing results.
Kyuubi provides Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) APIs for you to connect to EMR Serverless Spark by using SQL queries or BI tools such as Tableau and Power BI. Kyuubi allows you to isolate resources in a multi-tenant environment to meet the requirements of enterprise-level applications.
Manage Livy gateways
Create a Livy gateway
Go to the Gateways page.
Log on to the EMR console.
In the left-side navigation pane, choose
.On the Spark page, find the desired workspace and click the name of the workspace.
In the left-side navigation pane of the EMR Serverless Spark page, choose
.
Click the Livy Gateways tab. On the tab, click Create Livy Gateway.
On the Create Livy Gateway page, configure parameters and click Create. The following table describes the parameters.
Parameter
Description
Name
The name of the Livy gateway. The name can contain lowercase letters, digits, and hyphens (-). It must start and end with a letter or digit.
Livy Gateway Resources
The resource configurations. Default value:
1 CPU, 4 GB
.Livy Version
The Livy version. Default value: 0.8.0.
Engine Version
The version of the Spark engine that is used by the Livy gateway. For more information about engine versions, see Engine versions.
Use Fusion Acceleration
Specifies whether to enable Fusion acceleration. The Fusion engine helps accelerate the processing of Spark workloads and lower the overall cost of jobs. For more information about billing, see Billing. For more information about the Fusion engine, see Fusion engine.
Associated Queue
The queue in which the Livy gateway is deployed. When a Spark job is submitted by using a gateway, the Spark job is submitted by using the identity of the gateway creator.
Runtime Environment
The runtime environment. When you use a Livy gateway to submit a job, the resources used to run the job are pre-installed based on the runtime environment.
Automatic Stop
By default, the switch is turned off.
After you turn on the switch for a gateway, the system automatically stops the gateway if no activity is detected in the gateway in the previous 45 minutes.
Authentication Method
The authentication mode. You can select only Token.
After you create a gateway, you must generate a unique authentication token for the gateway. This way, you can use the token for identity authentication and access control when you submit requests over the gateway. For information about how to generate a token, see the Manage tokens section in this topic.
On the Livy Gateways tab, find the created Livy gateway and click Start in the Actions column.
Manage tokens
To use a token, add --header `x-acs-spark-livy-token: token`
to the headers of the requests.
On the Livy Gateways tab, find the desired Livy gateway and click Tokens in the Actions column.
On the Tokens tab, click Create Token.
In the Create Token dialog box, configure parameters and click OK. The following table describes the parameters.
Parameter
Description
Name
The name of the token.
Expired At
The validity period of the token. The validity period must be greater than or equal to 1 day. By default, this parameter is enabled and set to 365 days.
Copy the token.
ImportantAfter you create the token, you must immediately copy the token. You can no longer view the token after you leave the page. If the token expires or is lost, reset the token or create a new token.
View information about a Spark session
After you create a Spark session by using the Livy interface, you can view the information about the Spark session, such as the session ID and status, on the Sessions tab of a specified Livy gateway.
On the Livy Gateways tab, find the desired Livy gateway and click the name of the gateway.
Click the Sessions tab.
On the Sessions tab, you can view information about the Spark session that is created by using the Livy interface.
Manage Kyuubi gateways
You can create only one Kyuubi gateway for each workspace.
Create a Kyuubi gateway
On the Kyuubi Gateways tab, click Create Kyuubi Gateway.
On the Create Kyuubi Gateway page, configure parameters and click Create. The following table describes the parameters.
Parameter
Description
Name
The name of the Kyuubi gateway. The name can contain only lowercase letters, digits, and hyphens (-). It must start and end with a letter or digit.
Kyuubi Gateway Resources
The resource configurations. Default value:
1 CPU, 4 GB
.Kyuubi Version
The Kyuubi version. Default value: 1.9.2.
Engine Version
The version of the Spark engine that is used by the Kyuubi gateway. For more information about engine versions, see Engine versions.
Associated Queue
The queue in which the Kyuubi gateway is deployed. When a Spark job is submitted by using a gateway, the Spark job is submitted by using the identity of the gateway creator.
Kyuubi Configuration
The Kyuubi configurations. Separate the key and value of a configuration item with spaces. Example:
kyuubi.engine.pool.size 1
.Only the following Kyuubi configuration items are supported.
kyuubi.engine.pool.size kyuubi.engine.pool.size.threshold kyuubi.engine.share.level kyuubi.engine.single.spark.session kyuubi.session.engine.idle.timeout kyuubi.session.engine.initialize.timeout kyuubi.engine.security.token.max.lifetime kyuubi.session.engine.check.interval kyuubi.session.idle.timeout kyuubi.session.engine.request.timeout kyuubi.session.engine.login.timeout kyuubi.backend.engine.exec.pool.shutdown.timeout kyuubi.backend.server.exec.pool.shutdown.timeout kyuubi.backend.server.exec.pool.keepalive.time kyuubi.frontend.thrift.login.timeout kyuubi.operation.status.polling.timeout
Spark Configuration
The Spark configurations. Separate the key and value of a configuration item with spaces. Example: spark.sql.catalog.paimon.metastore dlf.
spark.kubernetes.*
configuration items are not supported.Authentication Type
The authentication mode. You can select only Token.
After you create a gateway, you must generate a unique authentication token for the gateway. This way, you can use the token for identity authentication and access control when you submit requests over the gateway.
On the Kyuubi Gateways tab, find the created Kyuubi gateway and click Start in the Actions column.
Manage tokens
On the Kyuubi Gateways tab, find the desired gateway and click Tokens in the Actions column.
On the Tokens tab, click Create Token.
In the Create Token dialog box, configure parameters and click OK. The following table describes the parameters.
Parameter
Description
Name
The name of the token.
Expired At
The validity period of the token. The validity period must be greater than or equal to 1 day. By default, this parameter is enabled and set to 365 days.
Copy the token.
ImportantAfter you create the token, you must immediately copy the token. You can no longer view the token after you leave the page. If the token expires or is lost, reset the token or create a new token.
Connect to a Kyuubi gateway
When you connect to a Kyuubi gateway, configure the following parameters in jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>
based on your business requirements:
<endpoint>
: the endpoint that you obtain on the Overview tab of the gateway.<port>
: the port number. If you use the public endpoint to connect to the Kyuubi gateway, the port number is 443. If you use the internal endopint to connect to the Kyuubi gateway, the port number is 80.<token>
: the token that you copy on the Tokens tab of the gateway.<tokenname>
: the name of the token. You can view the name of the token on the Tokens tab. This parameter is required when you use Python to connect to a Kyuubi Gateway.
Use Beeline to connect to a Kyuubi gateway
When you use Beeline to connect to a Kyuubi gateway, make sure that the version of Beeline is compatible with the version of the Kyuubi server. If Beeline is not installed, install Beeline by referring to Getting Started.
beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"
If you use this method to connect to a Kyuubi gateway, you can add session-related parameters and modify the parameter values. Example: beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;#spark.sql.shuffle.partitions=100;spark.executor.instances=2;"
.
Use Java to connect to a Kyuubi gateway
Update the pom.xml file.
Replace the versions of the
hadoop-common
andhive-jdbc
dependencies based on your business requirements.<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.3.9</version> </dependency> </dependencies>
Write Java code to connect to the desired Kyuubi gateway.
import org.apache.hive.jdbc.HiveStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; public class Main { public static void main(String[] args) throws Exception { String url = "jdbc:hive2://jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"; Class.forName("org.apache.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection(url); HiveStatement stmt = (HiveStatement) conn.createStatement(); String sql = "select * from students;"; System.out.println("Running " + sql); ResultSet res = stmt.executeQuery(sql); ResultSetMetaData md = res.getMetaData(); String[] columns = new String[md.getColumnCount()]; for (int i = 0; i < columns.length; i++) { columns[i] = md.getColumnName(i + 1); } while (res.next()) { System.out.print("Row " + res.getRow() + "=["); for (int i = 0; i < columns.length; i++) { if (i != 0) { System.out.print(", "); } System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'"); } System.out.println(")]"); } conn.close(); } }
Use Python to connect to a Kyuubi gateway
Run the following command to install PyHive and Thrift:
pip3 install pyhive thrift
Write a Python script to connect to the desired Kyuubi gateway.
The following Python sample code provides an example on how to connect to a Kyuubi gateway and query databases.
from pyhive import hive if __name__ == '__main__': cursor = hive.connect('<endpoint>', port="<port>", scheme='http', username='<tokenname>', password='<token>').cursor() cursor.execute('show databases') print(cursor.fetchall()) cursor.close()
References
For more information about applications of Livy gateways, see the following topics: