All Products
Search
Document Center

E-MapReduce:Create a Data Science cluster

Last Updated:Jan 26, 2025

This topic describes how to use an Alibaba Cloud account to log on to the E-MapReduce (EMR) console and create a cluster on the EMR on ACK page.

Prerequisites

  • The AliyunOSSFullAccess and AliyunDLFFullAccess policies are attached to the Alibaba Cloud account. For more information, see Attach policies to a RAM role.

  • A Container Service for Kubernetes (ACK) cluster is created. For more information, see Create an ACK dedicated cluster or Create an ACK managed cluster.

  • Important

    Before you create an ACK cluster, take note of the following limits:

    • Kubernetes version: Kubernetes of a version only from 1.22 to 1.24 is supported.

    • vCPU: The number of vCPUs must be greater than or equal to 16.

    • Memory: The memory size must be greater than or equal to 64 GiB.

    • Instance type:

      • Only general-purpose, compute-optimized, and memory-optimized instances are available.

      • Only the ecs.g5, ecs.g6, and ecs.g7 instance types and instance types with higher specifications are supported.

  • A node pool is created. For more information, see Create and manage a node pool.

Precautions

You can associate an ACK cluster with only one Data Science cluster.

Procedure

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ACK.

  2. On the EMR on ACK page, click Create Cluster.

  3. On the E-MapReduce on ACK page, configure the parameters. The following table describes the parameters.

    Parameter

    Description

    Region

    The region in which you want to create a cluster. You cannot change the region after the cluster is created.

    Cluster Type

    Data Science: Data Science clusters are commonly used in big data and AI scenarios. Data Science clusters support the offline extract, transform, and load (ETL) of big data based on Hive and Spark, and TensorFlow model training. You can use the CPU+GPU heterogeneous computing framework and deep learning algorithms supported by NVIDIA GPUs to run computing jobs more efficiently.

    Product Version

    The version of EMR. By default, the latest version is selected.

    Component Version

    Displays the type and version of the component that is deployed in the cluster of the specified type.

    ACK Cluster

    Select an existing ACK cluster or create an ACK cluster in the ACK console.

    Note

    The following namespaces are available for a Data Science cluster: anonymous, cert-manager, fluid-system, ingress-nginx, istio-system, knative-serving, kubeflow, kubernetes-dashboard, and monitoring. If your ACK cluster has these namespaces, these namespaces are overwritten after a Data Science cluster that you want to associate with the ACK cluster is created.

    Configure Dedicated Nodes

    You can click Configure Dedicated Nodes to configure an EMR-dedicated node. You can configure an EMR-dedicated node or node pool by adding taints and labels to the node or node pool. This way, the node or node pool can be used only for EMR.

    Cluster Name

    The name of the cluster. The name must be 1 to 64 characters in length and can contain only letters, digits, hyphens (-), and underscores (_).

  4. Click Create.

    If the status of the cluster changes to Running, the cluster is created.