Create an E-MapReduce (EMR) Data Science cluster on an existing Container Service for Kubernetes (ACK) cluster to run big data ETL workloads and GPU-accelerated AI training jobs.
Prerequisites
Before you begin, ensure that you have:
The
AliyunOSSFullAccessandAliyunDLFFullAccesspolicies attached to your Alibaba Cloud account. For details, see Attach policies to a RAM role.An ACK cluster (dedicated or managed) that meets the following requirements: To create an ACK cluster, see Create an ACK dedicated cluster or Create an ACK managed cluster.
Requirement Value Kubernetes version 1.22–1.24 vCPU 16 or more Memory 64 GiB or more Instance type General-purpose, compute-optimized, or memory-optimized (ecs.g5, ecs.g6, ecs.g7, or higher) A node pool created in the ACK cluster. For details, see Create and manage a node pool.
Each ACK cluster can be associated with only one Data Science cluster.
Creating a Data Science cluster overwrites the following namespaces in the associated ACK cluster: anonymous, cert-manager, fluid-system, ingress-nginx, istio-system, knative-serving, kubeflow, kubernetes-dashboard, and monitoring.
Create a Data Science cluster
Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.
On the EMR on ACK page, click Create Cluster.
On the E-MapReduce on ACK page, configure the cluster parameters. See Parameter reference for details on each field.
Click Create.
The cluster is ready when its status changes to Running.
Parameter reference
| Parameter | Description |
|---|---|
| Region | The region where the cluster is created. The region cannot be changed after the cluster is created. |
| Cluster type | Select Data Science. Data Science clusters support offline ETL with Hive and Spark, and TensorFlow model training using a CPU+GPU heterogeneous computing framework with NVIDIA GPU deep learning algorithms — suited for big data and AI workloads. |
| Product version | The EMR version to deploy. The latest version is selected by default. |
| Component version | Read-only. Displays the components and their versions included in the selected cluster type. |
| ACK Cluster | Select an existing ACK cluster, or go to the ACK console to create one. |
| Configure Dedicated Nodes | (Optional) Add taints and labels to a node or node pool to reserve it exclusively for EMR workloads. |
| Cluster name | A name for the cluster. Must be 1–64 characters and can contain letters, digits, hyphens (-), and underscores (_). |