Hologres V1.1 and later support the multi-instance high-availability deployment mode in which storage is shared by primary and secondary instances in online production environments that require high availability. This deployment mode supports the isolation of faults and loads for high availability. This topic describes the basic principles of high availability solutions and describes how to configure primary and secondary instances that share storage.

Present high-availability deployment with automatic recovery of a single instance

Hologres compute nodes (the worker nodes in the following figure) are scheduled like containers, and the resource manager performs periodic health checks. If a compute node takes more than 1 minute to respond due to out-of-memory (OOM) errors or faults in hardware or software, the resource manager automatically starts a new compute node and migrates shards from the faulty compute node to the new node. For example, if Worker Node 3 takes more than 1 minute to respond, the resource manager starts Worker Node 4 to replace Worker Node 3. This implements fast recovery. Data is stored in Apsara Distributed File System and does not need to be migrated between compute nodes. Compute nodes are lightweight and stateless to support fast recovery. By default, the single-instance deployment mode is enabled for each Hologres instance. If an exception occurs on a node, the node can automatically recover without the need for manual O&M. If a query operator attempts to access a node while the node is in automatic recovery, the query immediately fails. Hologres V1.1 and later use a new recovery mechanism that can recover nodes within about 1 minute, which is 5 to 10 times faster than earlier versions. Single-instance deployment

Multi-instance high-availability deployment

How it works

In single-instance deployment mode, faults are monitored in real time, and faulty nodes are replaced for recovery. During node recovery, the service may be unavailable. In key business scenarios, a higher level high-availability solution is required to support the isolation of faults and loads. Hologres V1.1 and later use the multi-instance high-availability deployment mode in which storage is shared for the instances. In this deployment mode, the primary instance has full capabilities, including data reads and writes, and configurations of permissions and system parameters. Secondary instances are read-only. All operations are performed on the primary instance. The following figure shows the multi-instance high-availability deployment mode. Multi-instance deploymentPrimary and secondary instances do not share computing resources, and loads and faults of these instances are isolated. All instances share the same data, access control configurations, and storage charges.

The memory status of instances is automatically synchronized in real time. The memory status of instances within the same region can be synchronized within milliseconds. The memory status is synchronized across instances. If you write data to a primary instance, the system automatically synchronizes data from the primary instance to a secondary instance. Therefore, if the secondary instance is not used, the CPU and memory resources of the secondary instance can still be consumed. About 1/8 of CPU and memory resources of the primary instance are consumed. We recommend that the specification configurations of the primary instance do not significantly differ from those of secondary instances.

Usage notes

  • A maximum of nine read-only secondary instances can be configured for each primary instance. Resource configurations among the instances may be different, but the differences must not be significant. The shard count must be the same for all instances.
  • Each read-only secondary instance has an independent endpoint, and different read-only secondary instances serve different business scenarios. Endpoints can be used to isolate business scenarios.
  • In Hologres V1.3.27 and later, the latency threshold of data synchronization from the primary instance to a secondary instance is changed from 20 minutes to 60 minutes. If the synchronization latency exceeds 60 minutes and the resource utilization of the secondary instance remains at 100% for a long period of time, the secondary instance automatically restarts to reduce the synchronization latency. If the resource utilization of the secondary instance remains at 100% for a long period of time, we recommend that you optimize query statements that run on the secondary instance or you scale out the secondary instance.
  • When you associate a read-only secondary instance with a primary instance, you can normally use the primary instance, and the primary instance is not affected.
  • It takes about 3 to 5 minutes to associate a read-only secondary instance with a primary instance. After the association is complete, you can normally use the read-only secondary instance.
  • You cannot access read-only secondary instances that are not associated with a primary instance.

Suggestions in different scenarios

  • Common scenario:

    We recommend that you use a primary instance to write data and process data and use read-only secondary instances to analyze data. This ensures read/write splitting.

  • Multiple scenarios:
    • Online service queries require a stable P99 latency. Therefore, we recommend that you specify a read-only secondary instance for data queries. This ensures high availability of online services.
    • For online analytical processing (OLAP) queries, we recommend that you specify a secondary instance for data analysis. The specified secondary instance is different from the secondary instance used for the preceding online service queries. This ensures read splitting for OLAP queries and online service queries. If a large amount of data is queried, online service queries are not affected.

Configure primary and secondary instances that share storage

When you configure multi-instance high-availability deployment, take note of the following limits:
  • Only Hologres instances whose version is V1.1 or later can be used as primary instances. If the version of your Hologres instance is earlier than V1.1, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
  • You cannot access read-only secondary instances that are not associated with a primary instance.
  • A primary instance and its read-only secondary instances must be of the same version.
  • A primary instance and its read-only secondary instances must reside in the same region.

Policy of associating or disassociating read-only secondary instances

If you want to associate read-only secondary instances with or disassociate them from a primary instance as a RAM user, attach the AliyunHologresFullAccess policy to the RAM user. For more information about the policies that can be attached to a RAM user in Hologres, see Grant permissions on Hologres to RAM users.

Perform the following steps to configure multi-instance high-availability deployment:

  1. Purchase a Hologres instance.
    Important A read-only secondary instance must reside in the same region as its primary instance.

    When you purchase a Hologres instance, set Specifications to Read-only Secondary Instance and Primary Instance ID of the Read-only Secondary Instance to the ID of the primary instance with which you want to associate the read-only secondary instance in the zone. For more information about other parameters, see Purchase a Hologres instance.

  2. Associate the read-only secondary instance with a primary instance.
    After you purchase the read-only secondary instance, the read-only secondary instance is associated with the primary instance that you selected on the buy page. You can use the read-only secondary instance after the instance state becomes Running.
  3. Use the instances.
    When you use the multi-instance high-availability deployment mode, take note of the following items:
    • The endpoint of the read-only secondary instance can be used to provide online services.
    • All operations such as creating tables and granting permissions to users must be performed on the primary instance. Only data reads can be performed on the read-only secondary instance.
    • The read-only secondary instance automatically inherits all objects of the primary instance. The objects include users and tables. Users cannot be separately created for the read-only secondary instance.