You can develop PySpark jobs by writing a Python script with your business logic and uploading it to EMR Serverless Spark. This topic provides an example to guide you through the development process.
Prerequisites
You have an Alibaba Cloud account. For more information, see Account registration.
You have been granted the required roles. For more information, see Role authorization for an Alibaba Cloud account.
A workspace has been created. For more information, see Create a workspace.
Procedure
Step 1: Prepare test files
In EMR Serverless Spark, you can develop Python files on an on-premises or standalone development platform and then submit the files to EMR Serverless Spark for execution. This Quick Start provides test files to help you quickly become familiar with PySpark jobs. Download the test files to use in the following steps.
Click DataFrame.py and employee.csv to download the test files.
The DataFrame.py file contains code that uses the Apache Spark framework to process data in OSS.
The employee.csv file contains a list of data, including employee names, departments, and salaries.
Step 2: Upload the test files
Upload the Python file to EMR Serverless Spark.
Go to the resource upload page.
Log on to the EMR console.
In the navigation pane on the left, choose .
On the Spark page, click the name of the target workspace.
On the EMR Serverless Spark page, in the left navigation pane, click Artifacts.
On the Artifacts page, click Upload File.
In the Upload File dialog box, click the upload area to select the Python file, or drag the file into the area.
In this example, upload the DataFrame.py file.
Upload the data file (employee.csv) to the Object Storage Service (OSS) console. For more information, see Upload files.
Step 3: Develop and run the job
On the EMR Serverless Spark page, click Development in the navigation pane on the left.
On the Development tab, click the
icon.In the dialog box that appears, enter a name, select for Type, and then click OK.
In the upper-right corner, select a queue.
For more information about how to add a queue, see Manage resource queues.
On the new job tab, configure the following parameters. Keep the default settings for the other parameters. Then, click Run.
Parameter
Description
Main Python Resource
Select the Python file that you uploaded on the Artifacts page in the previous step. In this example, select DataFrame.py.
Execution Parameters
Enter the path of the data file (employee.csv) that is uploaded to OSS. Example: oss://<yourBucketName>/employee.csv.
After the job runs, in the Execution Records section below, click Logs in the Actions column for the job.
On the Log Exploration tab, you can view the log information.

Step 4: Publish the job
A published job can be used as a node in a workflow.
After the job runs, click Publish on the right.
In the Publish Job dialog box, enter the release information and click OK.
Step 5: View the Spark UI
After the job runs successfully, you can view its status on the Spark UI.
In the navigation pane on the left, click Job History.
On the Application page, in the Actions column for the target job, click Spark UI.
On the Spark Jobs page, you can view the job details.

References
After a job is published, you can use it in a workflow for scheduling. For more information, see Manage workflows. For a complete example of the job development and orchestration process, see Quick Start for SparkSQL development.
For an example of how to develop a PySpark streaming job, see Submit a PySpark streaming job using Serverless Spark.