This topic describes the big data instance families and their instance types.

Description

Big data instance families are designed to deliver cloud computing and massive data storage to support big data-oriented business needs. These instances are suitable for scenarios that require offline computing and storage of massive data, such as Hadoop distributed computing, massive log processing, and large data warehousing. The storage performance, storage capacity, internal bandwidth, and other specifications of these instance types can satisfy the requirements of distributed computing businesses that typically use Hadoop.

These instance families are suitable for customers in industries, such as Internet and finance, that need to compute, store, and analyze big data. Following the requirements of distributed computing high availability frameworks such as Hadoop, big data instance families use the local storage to guarantee massive storage space and high storage performance.

Big data instances have the following features:
  • Enterprise-grade computing power that guarantees efficient and stable data processing.
  • Enhanced network performance including maximum internal bandwidth per instance and maximum packet forwarding rate that satisfies the demand for data transfer during peak hours, such as shuffling in Hadoop MapReduce.
  • A sequential read and write performance of 190 MB/s for a single disk (when you create an instance for the first time, disks require time to warm up before they can achieve optimal performance) and a maximum 5 GB/s of storage throughput for a single instance. These features shorten the time of reading data from and writing data to Hadoop Distributed File System (HDFS) files.
  • The cost of local storage is 97% lower than that of SSD disk storage, representing a huge reduction in costs for building Hadoop clusters.
When you use big data instances, take note of the following items:
  • Instances with local SSDs do not support instance type, bandwidth, or billing method changes or automatic recovery upon host failures.
  • The associations of instances to local disks are fixed. The number and capacity of local disks of an instance are based on the instance type. Instances with local SSDs are bound to their local disks. You cannot attach additional local disks to these instances or detach local disks from these instances and attach the disk to another instance.
  • You cannot create snapshots for local disks. If you need to create an image for the system disk and data disks of an instance with local SSDs, we recommend that you create an image by using the snapshots of both the system disk and data disks (data disks must be non-local disks).
  • You cannot create images that contain system disks and data disks based on instance IDs.
  • You can attach a standard SSD to an instance with local SSDs. The capacity of the standard SSD is scalable.
  • Local disks are attached to a single physical server, which increases the risk of single point of failure (SPOF). The reliability of data stored on local disks depends on the reliability of the physical server. To ensure data availability, we recommend that you implement data redundancy at the application layer. You can use deployment sets to distribute ECS instances across multiple physical machines to achieve high availability and disaster recovery. For more information, see Create a deployment set.
    Warning Data stored on local disks may be lost, for example, when a hardware failure occurs. We recommend that you do not use local disks for long-term data storage. If your applications do not have data reliability architecture, we recommend that you use cloud disks in your ECS instances for data reliability.
  • Operations on an instance with local SSDs may affect the data stored on the local disk. For more information, see Impact of instance operations on the data stored on local disks.

Best practices for mounting a file system for a big data instance

You must initialize the inode table when you mount an ext4 file system for the first time. By default, the lazyinit feature is enabled for Linux kernel 2.6.37 and later versions. In this case, the inode table is initialized until the file system is mounted. In addition, local disks consume a large amount of throughput during initialization, which affects service stability. For example, the throughput of 30 local disks may reach 600 MB/s. The concurrency of lazyinit in Linux kernel 4.x is improved to solve this problem. For more information, visit Community. We recommend that you use the following best practices to initialize the inode table in a relatively short time:
  1. Obtain a list of all local hard disk drives (SATA HDDs).
  2. Run the following command to enable independent background initialization for each local disk.

    In this example, create an ext4 file system on the local disk named /dev/vdb.

    mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdb &
  3. When all local disks are initialized, run iostat -x 5 until the I/O activity of all local disks is displayed as 0.
  4. Mount multiple ext4 file systems.

d2s, storage-intensive big data instance family

Features:
  • I/O optimized
  • Supports standard SSDs and ultra disks
  • High-capacity local SATA hard disk drives (SATA HDDs) with high throughput and a maximum of 35 Gbit/s of bandwidth among instances
  • Supports hot swapping of damaged disks. If a local disk fails, it can be hot swapped to avoid instance shutdown. If a backup disk is available on the physical machine, the disk is replaced online. If no backup disk is available, recover the disk after the damaged disk is replaced.
    Note The data in the damaged disk cannot be recovered.
  • Equipped with 2.5 GHz Intel ® Xeon ® Skylake 8163
  • Provide strong network performance proportional to computing capacity
  • Scenarios:
    • Big data computing and storage business scenarios such as Hadoop MapReduce, HDFS, Hive, and HBase
    • Machine learning scenarios such as Spark in-memory computing and MLlib
    • Search and log data processing scenarios such as Elasticsearch and Kafka
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d2s.5xlarge 20 88.0 8 × 7,300 12.0 1,600 Yes 8 8 20
ecs.d2s.10xlarge 40 176.0 15 × 7,300 20.0 2,000 Yes 16 8 20
ecs.d2s.20xlarge 80 352.0 30 × 7,300 35.0 4,500 Yes 32 8 20
Note

d1ne, big data instance family with enhanced network performance

Features:
  • I/O optimized
  • Supports standard SSDs and ultra disks
  • High-capacity local SATA HDDs with high throughput and a maximum of 35 Gbit/s of bandwidth among instances
  • CPU-to-memory ratio of 1:4, designed for big data scenarios
  • Equipped with 2.5 GHz Intel ® Xeon ® E5-2682 v4 (Broadwell) processors
  • Provide strong network performance proportional to computing capacity
  • Scenarios:
    • Hadoop MapReduce, HDFS, Hive, and HBase
    • Spark in-memory computing and MLlib
    • Elasticsearch and logging
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d1ne.2xlarge 8 32.0 4 × 5,500 6.0 1,000 Yes 4 4 10
ecs.d1ne.4xlarge 16 64.0 8 × 5,500 12.0 1,600 Yes 4 8 20
ecs.d1ne.6xlarge 24 96.0 12 × 5,500 16.0 2,000 Yes 6 8 20
ecs.d1ne-c8d3.8xlarge 32 128.0 12 × 5,500 20.0 2,000 Yes 6 8 20
ecs.d1ne.8xlarge 32 128.0 16 × 5,500 20.0 2,500 Yes 8 8 20
ecs.d1ne-c14d3.14xlarge 56 160.0 12 × 5,500 35.0 4,500 Yes 14 8 20
ecs.d1ne.14xlarge 56 224.0 28 × 5,500 35.0 4,500 Yes 14 8 20
Note

d1, big data instance family

Features:
  • I/O optimized
  • Supports standard SSDs and ultra disks
  • High-capacity local SATA HDDs with high throughput and a maximum of 17 Gbit/s of bandwidth among instances
  • CPU-to-memory ratio of 1:4, designed for big data scenarios
  • Equipped with 2.5 GHz Intel ® Xeon ® E5-2682 v4 (Broadwell) processors
  • Provide strong network performance proportional to computing capacity
  • Scenarios:
    • Hadoop MapReduce, HDFS, Hive, and HBase
    • Spark in-memory computing and MLlib
    • Enterprises in industries, such as Internet and finance, that need to compute, store, and analyze large volumes of data
    • Elasticsearch and logging
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d1.2xlarge 8 32.0 4 × 5,500 3.0 300 No 1 4 10
ecs.d1.3xlarge 12 48.0 6 × 5,500 4.0 400 No 1 6 10
ecs.d1.4xlarge 16 64.0 8 × 5,500 6.0 600 No 2 8 20
ecs.d1.6xlarge 24 96.0 12 × 5,500 8.0 800 No 2 8 20
ecs.d1-c8d3.8xlarge 32 128.0 12 × 5,500 10.0 1,000 No 4 8 20
ecs.d1.8xlarge 32 128.0 16 × 5,500 10.0 1,000 No 4 8 20
ecs.d1-c14d3.14xlarge 56 160.0 12 × 5,500 17.0 1,800 No 6 8 20
ecs.d1.14xlarge 56 224.0 28 × 5,500 17.0 1,800 No 6 8 20
Note