Add a Spark SQL job to your E-MapReduce (EMR) project on the Data Platform. Once saved, the job is ready to include in a workflow or run on demand.
Prerequisites
Before you begin, make sure you have:
An EMR project. See Manage projects to create one.
How it works
EMR submits Spark SQL jobs using the following command:
spark-sql [options] [cli options] {SQL_CONTENT}| Parameter | Description |
|---|---|
options | Spark CLI flags passed via the SPARK_CLI_PARAMS environment variable. Set this in Job Settings > Advanced Settings > Environment Variables. Example: SPARK_CLI_PARAMS="--executor-memory 1g --executor-cores" |
cli options | Flags that control how SQL input is provided: -e <quoted-query-string> runs inline SQL; -f <filename> runs SQL from a file. |
SQL_CONTENT | The SQL statements entered in the Content field. |
By default, a Spark SQL job is submitted in yarn-client mode.
Create a Spark SQL job
Log on to the Alibaba Cloud EMR console.
In the top navigation bar, select the region where your cluster resides and select a resource group.
Click the Data Platform tab.
In the Projects section, find the project you want to edit and click Edit Job in the Actions column.
In the Edit Job pane on the left, right-click the folder where you want to add the job and select Create Job.
In the Create Job dialog box, fill in the following fields and click OK:
Name: Enter a name for the job.
Description: Add a description.
Job Type: Select SparkSQL from the drop-down list.
In the Content field, enter your Spark SQL statements. For example:
Note SQL statements cannot exceed 64 KB. EMR automatically appendsLIMIT 2000to SELECT statements.-- Show all databases show databases; -- Show all tables show tables; -- Query a table (LIMIT 2000 is automatically added to SELECT statements) select * from test1;Click Save.
Pass Spark configuration flags
To customize executor resources or other Spark settings for this job, set the SPARK_CLI_PARAMS environment variable:
On the job page, click Job Settings in the upper-right corner.
In the Job Settings panel, click the Advanced Settings tab.
In the Environment Variables section, click the + icon and add the variable. For example:
SPARK_CLI_PARAMS="--executor-memory 1g --executor-cores"
These flags are passed as the options argument when EMR submits the job.
What's next
After saving the job, you can:
Add it to a workflow and schedule it to run automatically.
Run it manually from the job page to verify the SQL output.
Check execution logs in the EMR console to confirm the job completed successfully.