This topic describes how to use the Spark-SQL command-line tool and provides sample code.
Prerequisites
The Spark-SQL package is obtained.
You can click here to download the dla-spark-toolkit.tar.gz package or use the following wget command
to download this package.
wget https://dla003.oss-cn-hangzhou.aliyuncs.com/dla_spark_toolkit_1/dla-spark-toolkit.tar.gz
After you download the package, decompress it.
tar zxvf dla-spark-toolkit.tar.gz
Note To use the Spark-SQL command-line tool, make sure that JDK 8 or later is installed.
Procedure
- View help information.
- Run the following command to view the help information:
cd /path/to/dla-spark-toolkit ./bin/spark-sql --help
- After the preceding command is executed, the following result is returned:
./spark-sql [options] [cli option] Options: --keyId Your ALIYUN_ACCESS_KEY_ID, required --secretId Your ALIYUN_ACCESS_KEY_SECRET, required --regionId Your Cluster Region Id, required --vcName Your Virtual Cluster Name, required --oss-keyId Your ALIYUN_ACCESS_KEY_ID to upload local resource to oss default is same as --keyId --oss-secretId Your ALIYUN_ACCESS_KEY_SECRET, default is same as --secretId --oss-endpoint Oss endpoint where the resource will upload. default is http://oss-$regionId.aliyuncs.com --oss-upload-path the user oss path where the resource will upload If you want to upload a local jar package to the OSS directory, you need to specify this parameter --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of jars to include on the driver and executor classpaths. --conf PROP=VALUE Arbitrary Spark configuration property --help, -h Show this help message and exit. --driver-resource-spec Indicates the resource specifications used by the driver: small | medium | large --executor-resource-spec Indicates the resource specifications used by the executor: small | medium | large --num-executors Number of executors to launch --properties-file Default properties file location, only local files are supported --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. --files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName). Specially, you can pass in a custom log output format file named `log4j.properties` Note: The file name must be `log4j.properties` to take effect --status job_id If given, requests the status and details of the job specified --verbose print more messages, enable spark-submit print job status and more job details. List Spark Job Only: --list List Spark Job, should use specify --vcName and --regionId --pagenumber, -pn Set page number which want to list (default: 1) --pagesize, -ps Set page size which want to list (default: 1) Get Job Log Only: --get-log job_id Get job log Kill Spark Job Only: --kill job_id,job_id Comma-separated list of job to kill spark job with specific ids Spark Offline SQL options: -e <quoted-query-string> SQL from command line -f <filename> SQL from files
- Run the following command to view the help information:
- Use the spark-defaults.conf file to configure common parameters.
The spark-defaults.conf file allows you to configure the following parameters. Only the common parameters under spark conf are listed.
# cluster information # AccessKeyId #keyId = # AccessKeySecret #secretId = # RegionId #regionId = # set vcName #vcName = # set OssUploadPath, if you need upload local resource #ossUploadPath = ##spark conf # driver specifications : small 1c4g | medium 2c8g | large 4c16g #spark.driver.resourceSpec = # executor instance number #spark.executor.instances = # executor specifications : small 1c4g | medium 2c8g | large 4c16g #spark.executor.resourceSpec = # when use ram, role arn #spark.dla.roleArn = # when use option -f or -e, set catalog implementation #spark.sql.catalogImplementation = # config dla oss connectors #spark.dla.connectors = oss # config eni, if you want to use eni #spark.dla.eni.enable = #spark.dla.eni.vswitch.id = #spark.dla.eni.security.group.id = # config log location, need an oss path to store logs #spark.dla.job.log.oss.uri = # config spark access dla metadata #spark.sql.hive.metastore.version = dla
Note- The spark-submit scripts automatically read the
spark-defaults.conf
file in the conf folder. - Command line parameters take precedence over the
spark-defaults.conf
file. - For mappings between regions and
regionIds
, see Regions and zones.
- The spark-submit scripts automatically read the
- Submit an offline SQL job.
The Spark-SQL command-line tool provides
-e
to execute multiple SQL statements that are separated by semicolons (;). This tool also provides-f
to execute statements in an SQL file. Each SQL statement in the file ends with a semicolon (;). You can place the configuration specified by the conf field into thespark-defaults.conf
file in the conf folder and submit the file in the following format:$ ./bin/spark-sql \ --verbose \ --name offlinesql \ -e "select * from t1;insert into table t1 values(4,'test');select * from t1" ## You can also place SQL statements in the file. Separate statements in the file with semicolons (;) and use the -f option to specify the directory where the SQL file is saved. $ ./bin/spark-sql \ --verbose \ --name offlinesql \ -f /path/to/your/sql/file ## The following result is returned: ++++++++++++++++++executing sql: select * from t1 | id|name| | 1| zz| | 2| xx| | 3| yy| | 4|test| ++++++++++++++++++ end ++++++++++++++++++ ++++++++++++++++++executing sql: insert into table t1 values(4,'test') || ++++++++++++++++++ end ++++++++++++++++++ ++++++++++++++++++executing sql: select * from t1 | id|name| | 1| zz| | 2| xx| | 3| yy| | 4|test|