All Products
Search
Document Center

ApsaraMQ for Kafka:Cluster resource specification evaluation

Last Updated:Jan 03, 2025

For Cloud MSMQ Confluent Edition clusters, many factors affect the resource usage, including the business scenarios used and the performance of business applications. This topic provides resource evaluation suggestions for Cloud MSMQ Confluent Edition clusters based on general scenarios to help you evaluate the cluster size when you purchase and create a cluster. After a cluster is created, you can modify the resource configurations of the cluster based on the actual resource usage.

Composition architecture

Cloud MSMQ Confluent is a streaming data platform that can organize and manage data from different data sources. It is a stable and efficient system. It consists of six components: Kafka Broker, Rest Proxy, Connect, Zookeeper, ksqlDB, and Control Center.

Note

By default, the number of replicas of Kafka Broker, Rest Proxy, Connect, ZooKeeper, ksqlDB, and Control Center in a Cloud MSMQ Confluent cluster is 3, 2, 2, 3, 2, and 1. You can set the number of replicas based on your business requirements.

Cluster resource evaluation

Kafka brokers

  • First, you need to evaluate your business requirements. The requirements parameters are listed in the following table.

    Requirement Parameters

    Description

    fan-out factor

    How many times the written data will be consumed by the consumer, which does not include the replication traffic of the internal replica of the Broker.

    Peak data inflow

    The peak traffic of business data. Unit: MB/s.

    Average data inflow

    The average traffic of business data. Unit: MB/s.

    Data retention period

    The data retention period. Default value: 7.

    Partition Replica Factor

    The partition replica factor. Default value: 3. Each partition has three replicas.

  • Estimate the number of Broker nodes: Ideally, a single Kafka Broker can support a maximum of 100 MB/s traffic. We recommend that you need at least four Broker nodes in the production cluster and keep the 50% I /O bandwidth resources redundant. In addition, the number of partition replicas on each broker should not exceed 2,000. The number of partition replicas in the entire cluster should not exceed 200,000. If the total number of partition replicas in the cluster is estimated to be large, we recommend that you estimate the number of brokers based on the total number of partitions.

    Note

    Number of Broker Nodes= Max(4, Peak Data Inflow × (Fan-out Factor +2 × Partition Replica Factor -1) × 2/400 MB/s).

  • Estimate the number of CUs per broker: The number of CUs that are required is difficult to estimate because it is related to factors such as the cluster configurations, client configurations and how the client is used, number of partitions, cluster size, number of consumers, and number of producers. We recommend that you create a production cluster with more than 8 CUs per Broker and a development test cluster with 4 CUs per Broker. At the same time, we recommend that you do not have more than 100 leader replicas or 300 partition replicas (including leader replicas) on each 4-CU broker.

  • Estimate the disk size of each broker: Disk size of each broker = Max(1 TB, Average inbound traffic × Data retention period × Partition replica factor/Number of brokers).

Connect

  • Node estimation: We recommend that you configure more than two nodes to ensure that Connect is highly available.

  • CU evaluation: We recommend that you select more than 8 CUs per node.

Schema Registry

  • We recommend that you configure two nodes in the production environment of Schema Resgitry. Each node has 2 CUs.

Control Center

  • In the production environment of the Control Center, we recommend that you configure one node with more than 4 CUs of computing resources and more than 300 GB of data storage.

ksqlDB resource evaluation

  • Node estimation: We recommend that you configure more than two nodes to ensure that REST Proxy is highly available.

  • CU evaluation: We recommend that you select more than 4 CUs per node.

  • Storage evaluation: The storage size of ksqlDB depends on the number of aggregate statements and concurrent queries. By default, 100 GB is selected.

REST Proxy Resource Evaluation

  • Node estimation: We recommend that you configure more than two nodes to ensure that REST Proxy is highly available.

  • CU evaluation: If you need to use a REST proxy to continuously produce and consume messages, you must select more than 8 CUs per node. Otherwise, you can select 4 CUs per node.

Cluster resource performance comparison

The following table shows the changes in the total throughput of the cluster and the latency of a single producer for different numbers of CUs. You can select an appropriate number of CUs based on the data traffic and latency requirements of your business.

Note

The following test results are obtained when a cluster of the 4-broker specification, a single topic, 300 Partion, 60 Producer, and a single message size of 1 KB is tested. The actual business scenarios may have performance differences with the test environment.

Cluster Specification

Under non-throttling conditions,

Total Cluster Throughput

Under non-throttling conditions,

Average Producer Throughput

Under non-throttling conditions,

Average Latency

If the latency is less than 100ms,

Total Cluster Throughput

4 CUs per Broker

370 MB/s

5.95 MB/s

9718 ms

130 MB/s

8 CUs per Broker

400 MB/s

7.33 MB/s

8351 ms

195 MB/s

Single Broker 12 CU

400 MB/s

7.39 MB/s

8343 ms

240 MB/s

Single Broker 16 CU

400 MB/s

7.47 MB/s

8335 ms

290 MB/s

Single Broker 20 CU

400 MB/s

7.58 MB/s

8237 ms

305 MB/s

By default, a Cloud MSMQ Confluent cluster uses 4 brokers and delivers a throughput of 400 MB/s. To increase the throughput of a cluster, you can scale out the cluster. Each time a new broker is added, the cluster throughput performance increases by 100 MB/s.

Note

We recommend that you configure more than 8 CUs for each broker when you horizontally increase the number of CUs. This ensures that the number of CUs does not become a bottleneck factor that affects the throughput performance of the cluster.

The following table lists the cluster throughput performance, message processing capability, and the number of supported partions for different numbers of brokers.

Cluster Specification

Cluster Throughput

Number of messages processed per hour

The throughput of each Partion is 1 MB/s and the number of Partions can be supported.

4 Broker

400 MB/s

Article 1.47 billion

400

8 Broker

800 MB/s

Article 2.95 billion

800

12 Broker

1200 MB/s

Article 4.42 billion

1,200

16 Broker

1600 MB/s

Article 5.9 billion

1,600

20 Broker

2000 MB/s

Article 7.37 billion

2,000

Note

In general scenarios, the recommended value for the throughput of a Partion ranges from 1 MB/s to 5 MB/s. In scenarios with low latency requirements, you should limit the throughput size of each Partion. When the number of partions reaches a certain number, the cluster throughput decreases and the latency increases.

Recommended cluster specification selection

The following table describes the specifications of clusters in typical scenarios. For more information, see Confluent.

Scenario

We recommend that you use the

We recommend that you use the

Cluster Specification

Production specifications (400 MB/s throughput, excluding replication traffic)

Minimum specifications (300 MB/s throughput, excluding replication traffic)

Configure metrics

CU

Disk

Nodes

CU

Disk

Nodes

Kafka Brokers

12

2400 GB

4

4

800 GB

3

Zookeeper

4

100 GB

3

2

100 GB

3

Kafka Connect

12

N/A

2

4

N/A

2

Control Center

12

300 GB

1

4

300 GB

1

Schema Registry

2

N/A

2

2

N/A

2

REST Proxy

16

N/A

2

4

N/A

2

KsqlDB

4

100 GB

2

4

100 GB

2

Note

After you create a cluster, you can still adjust the cluster resource configurations based on different specifications based on your business needs.

Resource allocation rules for cluster components

The following table lists the resource configuration ranges for each component.

Note

You can select an appropriate cluster specification from the following supported resource ranges based on your business requirements.

Product Components

Supported version

Replicas

Number of CUs per Node

Single Node Disk (Progressive 100 GB)

Kafka Brokers

Professional Edition /Enterprise Edition

Default value: 3.

Minimum value: 3

Maximum value: 20

Default value: 4.

Minimum value: 4

Maximum value: 20

Default value: 800 GB

Range: 800 GB -30000 GB

ZooKeeper

Professional Edition /Enterprise Edition

Default value: 3.

Minimum value: 3

Maximum value: 3

Default value: 2.

Minimum value: 2

Maximum value: 20

Default value: 100 GB

Range: 100 GB -30000 GB

Control Center

Professional Edition /Enterprise Edition

Default value: 1.

Minimum value: 1.

Maximum value: 20

Default value: 4.

Minimum value: 4

Maximum value: 20

Default value: 300 GB

Range: 300 GB -30000 GB

Schema Register

Professional Edition /Enterprise Edition

Default value: 2.

Minimum value: 2

Maximum value: 3

Default value: 2.

Minimum value: 2

Maximum value: 20

No storage

Kafka Connect

Professional Edition /Enterprise Edition (This parameter is selected by default. You can cancel not using this component feature.)

Default value: 2.

Minimum value: 2

Maximum value: 20

Default value: 8.

Minimum value: 4

Maximum value: 20

No storage

KsqlDB

Professional Edition /Enterprise Edition (This parameter is selected by default. You can cancel not using this component feature.)

Default value: 2.

Minimum value: 2

Maximum value: 20

Default value: 4.

Minimum value: 4

Maximum value: 20

Default value: 100 GB

Range: 100 GB -30000 GB

Rest Proxy

Professional Edition /Enterprise Edition (This parameter is selected by default. You can cancel not using this component feature.)

Default value: 2.

Minimum value: 2

Maximum value: 20

Default value: 8.

Minimum value: 4

Maximum value: 20

No storage

References

Confluent provides a resource evaluation tool for Kafka and Confluent Platform. This tool is applicable to Cloud MSMQ Confluent. For more information, see Sizing Calculator for Apache Kafka and Confluent Platform.