The Python environment for EMR Serverless Spark includes pre-installed libraries such as matplotlib, numpy, and pandas. To use other third-party libraries, you must create a runtime environment.
Prerequisites
A workspace must be created. For more information, see Manage workspaces.
Create a runtime environment
Go to the Runtime Environment Management page.
Log on to the E-MapReduce console.
In the navigation pane on the left, choose .
On the Spark page, click the name of the target workspace.
On the EMR Serverless Spark page, choose Environment in the navigation pane on the left.
Click Create Environment.
On the Create Environment page, configure the following parameters.
Parameter
Required
Description
Name
Yes
Enter a name for the runtime environment.
Description
No
Enter a description for the environment.
Resource Queue
Yes
Select a queue for environment initialization. When you create the runtime environment, 1 Core and 4 GB of resources from this queue are used for initialization. The resources are automatically released after the initialization is complete.
Network Connection
No
To add PyPI libraries from sources other than the Alibaba Cloud source, select the appropriate network connectivity. This network connection is used to access the source address when the runtime environment is created.
For more information about how to create a network connection, see Establish network connectivity between EMR Serverless Spark and other VPCs.
Python Version
Yes
Python 3.8 is used by default. You can select another version as needed.
Ensure that the selected Python version is compatible with the target Python libraries. This prevents packaging failures or runtime errors due to version mismatches.
Add library information.
Click Add Library.
In the New Library dialog box, select a Type, configure the related parameters, and then click OK.
Parameter
Description
PyPI
PyPI Package: Enter the name and version of the PyPI library. If you do not specify a version, the latest version is installed by default. The Alibaba Cloud source is used by default.
For example,
PlotlyorPlotly==4.9.0.Package Source: Specify the source address for the PyPI package. If you leave this blank, the Alibaba Cloud source is used by default. If you use a custom source address, ensure that you have selected the appropriate network connectivity.
Workspace
From the Workspace drop-down list, select a file resource from the current workspace. If no resources are available, upload a file on the Artifacts page.
Supported file types:
.zip,.tar,.whl,.tar.gz,.jar, and.txt.NoteIf the file type is
.txt, the system installs the specified Python libraries and versions based on the content of the file, similar to a requirements.txt file.OSS
For OSS, enter the path of the file stored in Alibaba Cloud OSS.
Supported file types:
.zip,.tar,.whl,.tar.gz,.jar, and.txt.NoteIf the file type is
.txt, the system installs the specified Python libraries and versions based on the content of the file, similar to a requirements.txt file.
Click Create.
After you click Create, the environment initialization starts.
Edit a runtime environment
You can edit a runtime environment to update the libraries it contains.
On the Environment page, find the target runtime environment and click Edit in the Actions column.
On the Edit Environment page, update the environment configuration.
Click Save Changes.
After you save the changes, the environment is re-initialized based on the updated configuration.
NoteAfter the environment is re-initialized, the changes do not take effect immediately in active Notebook sessions. To use the latest runtime environment in a Notebook session, you must restart the session resources.
Use a runtime environment
When a runtime environment is in the Ready state, you can use it for data development or in corresponding sessions.
PySpark batch jobs: When a job starts, the system pre-installs the necessary libraries based on the selected runtime environment.
Job orchestration: When adding a Notebook node to a workflow, select the corresponding runtime environment.
Notebook sessions: When a Notebook session starts, libraries are pre-installed according to the selected environment.
Livy Gateway: When you submit a job through Livy Gateway, the resources required for the job are pre-configured based on the selected environment.
When you submit jobs using Spark Submit, Apache Airflow, or Livy, specify the runtime environment by configuring the
--conf spark.emr.serverless.environmentId=<runtime_environment_id>parameter.