This topic describes the Elastic Compute Service (ECS) instance families supported by E-MapReduce (EMR) and their application scenarios.

ECS instance families supported by EMR

  • General purpose

    This instance family uses cloud disks for storage. The ratio of vCPUs to memory is 1:4, for example, 8 vCPUs and 32 GiB memory.

  • Compute optimized

    This instance family uses cloud disks for storage and provides more computing resources. The ratio of vCPUs to memory is 1:2, for example, 8 vCPUs and 16 GiB memory.

  • Memory optimized

    This instance family uses cloud disks for storage and provides more memory resources. The ratio of vCPUs to memory is 1:8, for example, 8 vCPUs and 64 GiB memory.

  • Big data

    This instance family uses local SATA disks for storage, which is highly cost-effective. If you want to store large volumes of data (terabytes), we recommend that you use this instance family.

    Note This instance family only applies to core nodes. Core nodes can be created only in Hadoop, Flink, and Druid clusters.
  • Local SSD type

    This instance family uses local SSDs for storage, which provides high local IOPS and throughput.

  • Shared type (entry level)

    Instances in this instance family share CPUs, so they are not stable in scenarios that require large volumes of computing. This instance family is suitable for entry-level users, not enterprise customers.

  • GPU

    This instance family is a heterogeneous GPU-based model and applies to scenarios such as machine learning.

Application scenarios of instance families

  • Master nodes

    Instances in general-purpose and memory-optimized instance families can serve as master nodes for EMR. They are suitable for scenarios where data is stored on cloud disks provided by Alibaba Cloud. There are three replicas of data to ensure high data reliability.

  • Core nodes
    • Instances in general-purpose, compute-optimized, and memory-optimized instance families can serve as core nodes for EMR. They are suitable for small volumes of data (below terabytes) and scenarios in which OSS is used as primary data storage.
    • If the volume of data is 10 terabytes or more, we recommend that you use the big data type because it is more cost-effective.
    • If local disks are used, the data reliability cannot be ensured because data is maintained on the EMR platform.
  • Task nodes

    All instance families except the big data type apply to task nodes to supplement the computing capabilities of a cluster. The local SSD type is in development.