How do I use the Spark-SQL CLI? - Data Lake Analytics - Deprecated

This topic describes how to use the Spark-SQL CLI and provides sample code.

Important

Data Lake Analytics (DLA) is discontinued. AnalyticDB for MySQL Data Lakehouse Edition supports the existing features of DLA and provides more features and better performance. For more information about how to use AnalyticDB for MySQL to develop Spark applications, see Use spark-submit to develop Spark applications.

Prerequisites

The Spark-SQL CLI is installed and configured. For more information, see Use the spark-submit CLI.

Submit an offline SQL job

The Spark-SQL CLI provides the -e command to execute multiple SQL statements that are separated by semicolons (;). The CLI also provides the -f command to execute statements in an SQL file. The SQL statements in the file are separated by semicolons (;). You can place the configuration specified by the conf field into the spark-defaults.conf file in the conf folder and submit the file in the following format:

## Run the -e command to execute multiple statements that are separated by semicolons (;). The --database parameter specifies the default database.

$ ./bin/spark-sql \
--verbose \
--database mydb \
--name offlinesql \
-e "select * from t1;insert into table t1 values(4,'test');select * from t1" 


## You can also place SQL statements in a file and separate statements in the file with semicolons (;). Run the -f command to specify the local directory in which the SQL file is stored. 
The directory must be an absolute path.

$ ./bin/spark-sql \
--verbose \
--name offlinesql \
-f /path/to/your/sql/file


## The following result is returned:
++++++++++++++++++executing sql: select * from t1
| id|name|
|  1|  zz|
|  2|  xx|
|  3|  yy|
|  4|test|
++++++++++++++++++ end ++++++++++++++++++
++++++++++++++++++executing sql: insert into table t1 values(4,'test')
||
++++++++++++++++++ end ++++++++++++++++++
++++++++++++++++++executing sql: select * from t1
| id|name|
|  1|  zz|
|  2|  xx|
|  3|  yy|
|  4|test|


## The Spark-SQL CLI also supports variable substitution. You can use the -d key=value format to specify the value of a parameter, and use the ${key} format to reference the value of a key in SQL statements. 
$ ./bin/spark-sql \
--verbose \
--name variableSubstitution \
-d table=mytable \
-e "use <db>; select * from \${table}"

## The $ symbol has a special meaning. If you reference the variable table, you must escape the $ symbol. If you reference the variable table in an SQL file, you do not need to escape the $ symbol.
## After the variable substitution, the following SQL statement is used:
use <db>; select * from mytable