This topic describes the features of big data instance families and lists the instance types of each family.

Description

Big data instance families are designed to deliver cloud computing and storage for a large amount of data to support big data-oriented business needs. These instances are applicable to scenarios that require offline computing and storage of a large amount of data, such as Hadoop distributed computing, extensive log processing, and large-scale data warehousing. The instance families are suitable for businesses that use a distributed network and require high-performance storage systems.

These instance families are suitable for customers in Internet, finance, and other industries that need to compute, store, and analyze big data. Big data instance families use local storage to ensure a large amount of storage space and high storage performance.

Common features of big data instances:
  • Enterprise-level computing power ensures efficient and stable data processing.
  • More bandwidth per instance and higher packet forwarding rates enhance network performance. Enhanced network performance ensures that the network can withstand the demands at peak times.
  • When you create an instance for the first time, disks require time to warm up before they can achieve optimal performance. Big data instances provide a sequential read and write performance of 190 MB/s for a single disk and a maximum storage throughput of 5 GB/s for a single instance. This can shorten the time of reading data from and writing data to Hadoop Distributed File System (HDFS) files.
  • The cost of local storage is 97% lower than that of standard SSDs, a huge reduction in costs for building Hadoop clusters.
When you use big data instances, take note of the following items:
  • Instances with local SSDs do not support changes in instance types, bandwidth, and billing methods, and do not support failover.
  • The associations of instances to local disks are fixed. The number and capacity of local disks of an instance are based on the instance type. Instances with local SSDs are bound to their local disks. You cannot attach additional local disks to these instances or detach local disks from these instances and attach the disks to another instance.
  • You cannot create snapshots for local disks. If you need to create an image for the system disk and data disks of an instance with local SSDs, we recommend that you create an image by using the snapshots of both the system disk and data disks (data disks must be non-local disks).
  • You cannot create images that contain system disks and data disks based on instance IDs.
  • You can attach a standard SSD to an instance with local SSDs. The capacity of the standard SSD is scalable.
  • Local disks are attached to a single physical server, which increases the risk of single point of failure (SPOF). The reliability of data stored on local disks depends on the reliability of the physical server.
    Warning For example, data stored on local disks may be lost when a hardware failure occurs. We recommend that you do not use local disks for long-term data storage.
    • To ensure data availability, we recommend that you implement data redundancy at the application layer. You can use deployment sets to distribute ECS instances across multiple physical machines to achieve high availability and disaster recovery. For more information, see Create a deployment set.
    • If your applications do not have data reliability architecture, we recommend that you use cloud disks or the backup service in your ECS instances for data reliability. For more information, see Disk overview or What is Hybrid Backup Recovery?.
  • Operations on an instance with local SSDs may affect the data stored on the local disks. For more information, see Impacts of instance operations on the data stored on local disks.

Best practices for mounting a file system on a big data instance

You must initialize the inode table when you mount a file system such as ext4 for the first time. By default, the lazyinit feature is enabled in Linux kernel V2.6.37 and later. In this case, the inode table is not initialized until the file system is mounted. In addition, local disks require a large amount of throughput during initialization. For example, the throughput of 30 local disks may reach 600 MB/s. This affects service stability. The concurrency of lazyinit in Linux kernel V4.x is improved to solve this problem. For more information, visit index: kernel/git/stable/linux.git. We recommend that you use the following best practices to initialize the inode table at your earliest convenience.
  1. Obtain a list of all local SATA HDDs.
  2. Run the following command to enable separate initialization for each local disk:

    In this example, an ext4 file system is created on a local disk named /dev/vdb.

    mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdb &
  3. After all local disks are initialized, run the iostat -x 5 command until the I/O activity of all local disks is displayed as 0.
  4. Run the mount command in batches.

d2c, compute intensive big data instance family

d2c is under invitational preview. To use d2c, submit a ticket.

Features:
  • I/O optimized.
  • Supports enhanced SSDs, standard SSDs, and ultra disks.
  • High-capacity local SATA HDDs with high throughput and a maximum of 35 Gbit/s bandwidth among instances.
  • Supports online replacement and hot swapping of damaged disks to avoid instance shutdown.
    If a local disk fails, you will receive a notification about the system event. You can respond to the system event by initiating the process to fix the damaged disk. For more information, see Overview of system events on ECS instances equipped with local disks.
    • If a backup disk is available on the physical machine, Alibaba Cloud will replace the damaged disk with the backup disk online.
    • If no backup disk is available on the physical machine, the disk hardware must be replaced manually before Alibaba Cloud can replace the damaged disk.
    Notice After you have started the process to fix the damaged disk, data in the damaged disk cannot be recovered.
  • Equipped with 2.5 GHz Intel® Xeon ® Platinum 8269CY (Cascade Lake) processors.
  • Provides a fast and reliable network based on large computing capacity.
  • Suitable for the following scenarios:
    • Big data computing and storage business scenarios that use Hadoop MapReduce, HDFS, Hive, and HBase
    • Scenarios in which EMR JindoFS and OOS are used to store hot and cold data separately and decouple storage and computing
    • Machine learning scenarios such as in-memory computing with Spark and scalable machine learning with MLlib
    • Search and log data processing scenarios that use solutions such as Elasticsearch and Kafka
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d2c.6xlarge 24 88.0 3 × 4,000 12.0 1,600 Yes 8 8 20
ecs.d2c.12xlarge 48 176.0 6 × 4000 20.0 2,000 Yes 16 8 20
ecs.d2c.24xlarge 96 352.0 12 × 4000 35.0 4,500 Yes 16 8 20
Note

d2s, storage intensive big data instance family

Features:
  • I/O-optimized.
  • Supports enhanced SSDs, standard SSDs, and ultra disks.
  • High-capacity local SATA HDDs with high throughput and a maximum of 35 Gbit/s bandwidth among instances.
  • Supports online replacement and hot swapping of damaged disks to avoid instance shutdown.
    If a local disk fails, you will receive a notification about the system event. You can respond to the system event by initiating the process to fix the damaged disk. For more information, see Overview of system events on ECS instances equipped with local disks.
    • If a backup disk is available on the physical machine, Alibaba Cloud will replace the damaged disk with the backup disk online.
    • If no backup disk is available on the physical machine, the disk hardware must be replaced manually before Alibaba Cloud can replace the damaged disk.
    Notice After you have started the process to fix the damaged disk, data in the damaged disk cannot be recovered.
  • Equipped with 2.5 GHz Intel ® Xeon® Platinum 8163 (Skylake) processors.
  • Provides a fast and reliable network based on large computing capacity.
  • Suitable for the following scenarios:
    • Big data computing and storage business scenarios that use Hadoop MapReduce, HDFS, Hive, and HBase
    • Machine learning scenarios such as in-memory computing with Spark and scalable machine learning with MLlib
    • Search and log data processing scenarios that use solutions such as Elasticsearch and Kafka
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d2s.5xlarge 20 88.0 8 × 7,300 12.0 1,600 Yes 8 8 20
ecs.d2s.10xlarge 40 176.0 15 × 7,300 20.0 2,000 Yes 16 8 20
ecs.d2s.20xlarge 80 352.0 30 × 7,300 35.0 4,500 Yes 32 8 20
Note

d1ne, big data instance family with enhanced network performance

Features:
  • I/O optimized.
  • Supports standard SSDs and ultra disks.
  • High-capacity local SATA HDDs with high throughput and a maximum of 35 Gbit/s bandwidth among instances.
  • Offers a CPU-to-memory ratio of 1:4, which is designed for big data scenarios.
  • Equipped with 2.5 GHz Intel® Xeon® E5-2682 v4 (Broadwell) processors.
  • Provides a fast and reliable network based on large computing capacity.
  • Suitable for the following scenarios:
    • Scenarios that use Hadoop MapReduce, HDFS, Hive, and HBase
    • Machine learning scenarios such as in-memory computing with Spark and scalable machine learning with MLlib
    • Use of solutions such as Elasticsearch for log data processing
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d1ne.2xlarge 8 32.0 4 × 5,500 6.0 1,000 Yes 4 4 10
ecs.d1ne.4xlarge 16 64.0 8 × 5,500 12.0 1,600 Yes 4 8 20
ecs.d1ne.6xlarge 24 96.0 12 × 5,500 16.0 2,000 Yes 6 8 20
ecs.d1ne-c8d3.8xlarge 32 128.0 12 × 5,500 20.0 2,000 Yes 6 8 20
ecs.d1ne.8xlarge 32 128.0 16 × 5,500 20.0 2,500 Yes 8 8 20
ecs.d1ne-c14d3.14xlarge 56 160.0 12 × 5,500 35.0 4,500 Yes 14 8 20
ecs.d1ne.14xlarge 56 224.0 28 × 5,500 35.0 4,500 Yes 14 8 20
Note

d1, big data instance family

Features:
  • I/O optimized.
  • Supports standard SSDs and ultra disks.
  • High-capacity local SATA HDDs with high throughput and up to 17 Gbit/s of bandwidth among instances.
  • Offers a CPU-to-memory ratio of 1:4, which is designed for big data scenarios.
  • Equipped with 2.5 GHz Intel® Xeon® E5-2682 v4 (Broadwell) processors.
  • Provides a fast and reliable network based on large computing capacity.
  • Suitable for the following scenarios:
    • Scenarios that use Hadoop MapReduce, HDFS, Hive, and HBase
    • Machine learning scenarios such as in-memory computing with Spark and scalable machine learning with MLlib
    • Suitable for customers in Internet, finance, and other industries that need to compute, store, and analyze big data
    • Use of solutions such as Elasticsearch for log data processing
Instance types
Instance type vCPUs Memory (GiB) Local storage (GiB) Bandwidth (Gbit/s) Packet forwarding rate (Kpps) IPv6 support NIC queues ENIs (including one primary ENI) Private IP addresses per ENI
ecs.d1.2xlarge 8 32.0 4 × 5,500 3.0 300 No 1 4 10
ecs.d1.3xlarge 12 48.0 6 × 5,500 4.0 400 No 1 6 10
ecs.d1.4xlarge 16 64.0 8 × 5,500 6.0 600 No 2 8 20
ecs.d1.6xlarge 24 96.0 12 × 5,500 8.0 800 No 2 8 20
ecs.d1-c8d3.8xlarge 32 128.0 12 × 5,500 10.0 1,000 No 4 8 20
ecs.d1.8xlarge 32 128.0 16 × 5,500 10.0 1,000 No 4 8 20
ecs.d1-c14d3.14xlarge 56 160.0 12 × 5,500 17.0 1,800 No 6 8 20
ecs.d1.14xlarge 56 224.0 28 × 5,500 17.0 1,800 No 6 8 20
Note