When you create an E-MapReduce (EMR) cluster, you need to consider the following main factors: business scenario, region and storage service, metadata service, hardware and network configurations, and billing method. This topic describes how to plan your cluster based on these factors.
Factors to consider
Factor | Description | References |
Business scenario | EMR provides the following predefined business scenarios of clusters: Data Lake, Data Analytics, Real-time Data Streaming, and Data Service. You can select a business scenario based on your business requirements. EMR also allows you to create a custom cluster that meets your expectations. | |
Region and storage service | EMR provides multiple regions to ensure that your cluster can be deployed in the same region as the region that stores data. For storage architecture, EMR supports the compute-storage integration and compute-storage separation architectures. | |
Metadata service | EMR allows you to store metadata in Data Lake Formation (DLF), ApsaraDB RDS for MySQL, and built-in MySQL. | |
Hardware and network configurations | EMR provides various instance types, such as general-purpose, compute-optimized, and memory-optimized types, to adapt to your business characteristics. In addition, EMR allows you to deploy multiple master nodes to ensure high availability (HA) of services. For network architecture, Alibaba Cloud provides you with independent and fully isolated virtual private clouds (VPCs). | |
Billing method | EMR provides the following billing methods: Subscription: Subscription is a billing method that requires you to pay for resources before you can use the resources. This billing method is suitable for long-term and stable business scenarios. Pay-as-you-go: Pay-as-you-go is a billing method that allows you to use resources before you pay for the resources. This billing method is suitable for short-term or temporary business scenarios. |