Spark SQL execution modes - AnalyticDB - Alibaba Cloud Documentation Center

AnalyticDB for MySQL provides two Spark SQL execution modes: batch and interactive. In each mode, you can read and write databases and tables of AnalyticDB for MySQL by using the metadata feature of AnalyticDB for MySQL. This topic describes the usage notes, scenarios, features, and startup methods of two Spark SQL execution modes.

Batch execution mode

Usage notes

When you execute an SQL statement in batch mode, you must execute the USE <database_name>; statement to select a database first.
When you specify a table in an SQL statement, you must specify the table in the database_name.table_name format.
When you execute a DML, DDL, or DQL statement in batch mode, the system returns a message of execution success or failure, but does not return data. Sample results of successful SQL statements are displayed in logs. For information about how to view the returned data of SQL statements, see the "View information about a Spark application" section of the Spark editor topic.

Scenarios

Mutually dependent SQL statements are executed.
Resource isolation is highly required for SQL statements.
A large amount of data is involved. For example, an SQL statement is executed to perform extract-transform-load (ETL) operations at one time.
Complex third-party dependency packages must be uploaded and may be repeatedly tested and replaced.

Features

An SQL statement that is submitted in batch mode runs in an individual Spark application to ensure stability.
You can execute an SQL statement to describe an independent configuration, such as SET spark.sql.adaptive.coalescePartitions.minPartitionSize = 2MB;.
If SELECT statements are contained in the SQL statements that are executed in batch mode, sample execution results of the SELECT statements are displayed in logs.

Startup methods

On the SQLConsole tab, select the Spark engine and a job resource group. After you enter an SQL statement, click Execute.

Interactive execution mode

Usage notes

When you execute a DDL or DML statement in interactive mode, the system returns up to 1,000 rows of result data.
When you execute a DDL statement in interactive mode, the system returns a message of execution success or failure but does not return data. For example, if you execute the CREATE TABLE statement, the system returns a message of execution success or failure, but does not return the table data. This is consistent with the open source SQL.
A period of time is required to start a Spark interactive resource group. If the Spark interactive resource group fails to be started, you can wait a while and try again.

Scenarios

A data computing operation that does not require all data to be returned is performed.
A large number of DDL statements must be executed.
A DQL statement must be executed immediately after it is submitted. The execution of the DQL statement does not impose high requirements on resource isolation and allows resource isolation to fail.

Features

Resources are isolated at the thread level. If multiple users execute SQL statements in the same Spark application, the SQL statements may intervene with each other.
After you configure SQL statements, thread-level configurations take effect.
Application-level configurations take effect only after you restart the Spark interactive resource group. To modify application-level configurations, stop the Spark interactive resource group, reconfigure the parameters, and then restart the Spark interactive resource group.

Startup methods

On the SQLConsole tab, select the Spark engine and a Spark interactive resource group. After you enter an SQL statement, click Execute.