This topic describes the precautions, scenarios, features, and startup methods of two Spark SQL execution modes.

Background information

AnalyticDB for MySQL Data Lakehouse Edition (V3.0) provides two Spark SQL execution modes: batch and interactive. In each mode, you can read and write databases and tables of AnalyticDB for MySQL by using the metadata feature of AnalyticDB for MySQL.

Batch execution mode

Precautions

  • When you execute an SQL statement in batch mode, you must execute the USE <database_name>; statement to select a database first.
  • When you specify a table in an SQL statement, the table must be in the database_name.table_name format.
  • When you execute a DML, DDL, or DQL statement in batch mode, the system returns a message of execution success or failure, but does not return data. Sample results of successful SQL statements are displayed in logs. For information about how to view the returned data of SQL statements, see View information about a Spark application.

Scenarios

  • Mutually dependent SQL statements are executed.
  • Resource isolation is highly required for SQL statements.
  • A large amount of data is involved. For example, an SQL statement is executed to perform extract-transform-load (ETL) operations at one time.
  • Complex third-party dependency packages must be uploaded and may be repeatedly tested and replaced.

Features

  • An SQL statement that is submitted in batch mode runs in an individual Spark application to ensure stability.
  • You can execute an SQL statement to describe an independent configuration, such as SET spark.sql.adaptive.coalescePartitions.minPartitionSize = 2MB;.
  • If SELECT statements are contained in the SQL statements that are executed in batch mode, sample execution results of the SELECT statements are displayed in logs.

Startup methods

On the SQLConsole tab, select a job resource group and the Spark engine. When you execute an SQL statement, click Proceed in the message that appears.

Interactive execution mode

Precautions

  • When you execute a DDL or DML statement in interactive mode, the system returns up to 1,000 rows of result data.
  • When you execute a DQL statement in interactive mode, the system returns a message of execution success or failure but does not return data. For example, if you execute the SELECT * FROM test; statement, the system returns a message of execution success or failure, but does not return the table data. This is consistent with the open source SQL.
  • If you enter multiple SQL statements that end with semicolons (;), the Spark engine executes only the last SQL statement.
  • A period of time is required to start a Thrift server. If the Thrift server fails to be started, you can wait a while and try again.

Scenarios

  • A data computing operation that does not require all data to be returned is performed.
  • A large number of DDL statements must be executed.
  • A DQL statement must be executed immediately after it is submitted. The execution of the DQL statement does not impose high requirements on resource isolation and allows resource isolation to fail.

Features

  • The SQL statements in the same resource group are executed by a shared Thrift server, and up to 1,000 rows of result data are displayed in the AnalyticDB for MySQL console.
  • Resources are isolated at the thread level. If multiple users execute SQL statements in the same Spark application, the SQL statements may intervene with each other.
  • After you configure SQL statements, thread-level configurations take effect.
  • Application-level configurations take effect only after you restart the Thrift server. To modify application-level configurations, stop the Thrift server, reconfigure the parameters, and then restart the Thrift server. For more information about how to start and stop a Thrift server, see Start a Thrift server.

Startup methods

If you use a job resource group to execute SQL statements in interactive mode, you must start a Thrift server. To start a Thrift server, use one of the following methods:
  • On the SQLConsole tab, select a job resource group and the Spark engine. When you execute an SQL statement, click Start Thrift Server in the message that appears.
  • On the Resource Groups page, find the job resource group for which you want to start a Thrift server and click Advanced Settings in the Actions column. Configure the parameters and click Start. For more information, see Start a Thrift server.

After you start a Thrift server for a job resource group, SQL statements are executed in interactive mode.