For Message Queue for Confluent clusters, many factors affect resource usage, including business scenarios and application performance. This topic provides reference suggestions for Message Queue for Confluent cluster resource evaluation based on general scenarios to help you evaluate cluster scale when purchasing and creating clusters. After the cluster is created, you can still change the resource configuration of the cluster by scaling out based on actual resource utilization.
Architecture components
Message Queue for Confluent is a streaming data platform that organizes and manages data from different data sources, providing a stable and efficient system. It consists of six components: Kafka Broker, Rest Proxy, Connect, Zookeeper, ksqlDB, and Control Center.
By default, the number of replicas for Kafka Broker, Rest Proxy, Connect, Zookeeper, ksqlDB, and Control Center components in a Message Queue for Confluent cluster is 3, 2, 2, 3, 2, and 1 respectively. You can also set an appropriate number of replicas based on your business requirements.
Cluster resource evaluation
Kafka Broker resource evaluation
First, you need to evaluate your business requirements. The requirement parameters are listed in the following table.
Requirement parameter
Parameter description
Fan-out factor
How many times the written data will be consumed by consumers, not including the replication traffic within the Broker.
Peak data inflow
Peak traffic of business data, unit: MB/s.
Average data inflow
Average traffic of business data, unit: MB/s.
Data retention period
Data retention period, default is 7 days.
Partition replication factor
Partition replication factor, default is 3, meaning each partition has 3 replicas.
Estimate the number of Broker nodes: Ideally, a single Kafka Broker can support a maximum traffic of 100 MB/s. A production cluster needs at least 4 Broker nodes, with 50% IO bandwidth resource redundancy. Additionally, from a partition replica limitation perspective, each Broker should not exceed 2,000 partition replicas, and the entire cluster should not exceed 200,000 partition replicas. If the total number of partition replicas in the cluster is evaluated to be large, we recommend that you evaluate the number of brokers based on the total number of partitions.
NoteNumber of Broker nodes = Max(4, peak data inflow × (fan-out factor + 2 × partition replication factor - 1) × 2 / 400 MB/s).
Estimate the number of CUs per Broker: Since the required number of CUs is related to cluster configuration, client configuration and usage, number of partitions, cluster size, number of Consumers, number of Producers, etc., it is difficult to estimate. When creating a production cluster, use more than 8 CUs per Broker configuration, and for development and testing clusters, use 4 CUs per Broker configuration. At the same time, each Broker with 4 CUs should not exceed 100 leader replicas or 300 partition replicas (including leader replicas).
Estimate the disk size for each Broker: Disk size per Broker = Max(1TB, average data inflow × data retention period × partition replication factor / number of Broker nodes).
Connect resource evaluation
Node number evaluation: It is recommended to choose 2 or more nodes to ensure Connect is highly available.
CU evaluation: It is recommended to choose 8 or more CUs per node.
Schema Registry resource evaluation
For Schema Registry in a production environment, it is recommended to configure 2 nodes with 2 CUs per node.
Control Center resource evaluation
For Control Center in a production environment, it is recommended to configure 1 node with more than 4 CUs of computing resource and more than 300 GB of data storage.
KsqlDB resource evaluation
Node number evaluation: It is recommended to choose 2 or more nodes to ensure ksqlDB is highly available.
CU evaluation: It is recommended to choose 4 or more CUs per node.
Storage evaluation: The storage size of ksqlDB depends on the number of aggregation statements and concurrent queries, with a default choice of 100 GB.
REST Proxy resource evaluation
Node number evaluation: It is recommended to choose 2 or more nodes to ensure REST Proxy is highly available.
CU evaluation: If you need to continuously produce and consume messages through REST Proxy, you need to choose 8 or more CUs per node. Otherwise, you can choose 4 CUs per node.
Cluster resource performance comparison
The following table shows the changes in total cluster throughput and single Producer latency with different numbers of CUs. You can choose an appropriate number of cluster CUs based on your business data traffic and latency requirements.
The following test results were obtained under test conditions with a 4-Broker specification cluster, single Topic, 300 Partitions, 60 Producers, and a single message size of 1 KB. There may be some performance differences between actual business scenarios and the test environment.
Cluster specification | Total cluster throughput without flow control | Average Producer throughput without flow control | Average latency without flow control | Total cluster throughput with latency less than 100ms |
Single Broker 4 CU | 370 MB/s | 5.95 MB/s | 9718 ms | 130 MB/s |
Single Broker 8 CU | 400 MB/s | 7.33 MB/s | 8351 ms | 195 MB/s |
Single Broker 12 CU | 400 MB/s | 7.39 MB/s | 8343 ms | 240 MB/s |
Single Broker 16 CU | 400 MB/s | 7.47 MB/s | 8335 ms | 290 MB/s |
Single Broker 20 CU | 400 MB/s | 7.58 MB/s | 8237 ms | 305 MB/s |
By default, a Message Queue for Confluent cluster has a 4-Broker specification with a cluster performance of 400 MB/s throughput. If you want to increase the cluster's throughput performance, you can horizontally scale out the cluster. Each additional Broker increases the cluster's throughput performance by 100 MB/s.
When horizontally adding Brokers, configure 8 CUs or more computing resources for each Broker to ensure that the number of CUs does not become a bottleneck factor affecting the cluster's throughput performance.
The following table lists the cluster throughput performance, message processing capability, and the number of Partitions that can be supported with different numbers of Brokers.
Cluster specification | Cluster throughput | Messages processed per hour | Number of Partitions supported with 1MB/s throughput per Partition |
4 Brokers | 400 MB/s | 1.47 billion | 400 |
8 Brokers | 800 MB/s | 2.95 billion | 800 |
12 Brokers | 1,200 MB/s | 4.42 billion | 1,200 |
16 Brokers | 1,600 MB/s | 5.9 billion | 1,600 |
20 Brokers | 2,000 MB/s | 7.37 billion | 2,000 |
In typical scenarios, the recommended Partition throughput value is between 1 MB/s and 5 MB/s. For low-latency requirement scenarios, the throughput size of each Partition should be limited. When the number of Partitions reaches a certain amount, the cluster throughput will decrease and latency will increase.
Cluster specification selection recommendations
The following provides cluster specification recommendations for typical scenarios. You can also refer to the Confluent official documentation.
Scenarios | Recommended for production environment | Recommended for test environment | ||||
Cluster specification | Production specification (400MB/s throughput, excluding replication traffic) | Minimum specification (300MB/s throughput, excluding replication traffic) | ||||
Configuration metrics | CU | Disk | Nodes | CU | Disk | Nodes |
Kafka Brokers | 12 | 3 TB | 4 | 4 | 1 TB | 3 |
Zookeeper | 4 | 100 GB | 3 | 2 | 100 GB | 3 |
Kafka Connect | 12 | N/A | 2 | 4 | N/A | 2 |
Control Center | 12 | 300 GB | 1 | 4 | 250 GB | 1 |
Schema Registry | 2 | N/A | 2 | 2 | N/A | 2 |
REST Proxy | 16 | N/A | 2 | 4 | N/A | 2 |
KsqlDB | 5 | 100 GB | 2 | 5 | 100 GB | 2 |
After creating a cluster, you can still adjust the cluster resource configuration based on business needs on different specifications.
Cluster component resource allocation rules
The following table lists the resource configuration ranges for each component.
You can choose appropriate cluster specification configurations within the following supported resource ranges based on your business requirements.
Product component | Supported version | Number of replicas | CUs per node component | Disk per node (in 100 GB increments) |
Kafka Brokers | Professional Edition/Enterprise Edition | Default value: 3 Minimum value: 3 Maximum value: 20 | Default value: 4 Minimum value: 4 Maximum value: 20 | Default value: 1000 GB Range: 1000 GB - 30000 GB |
ZooKeeper | Professional Edition/Enterprise Edition | Default value: 3 Minimum value: 3 Maximum value: 3 | Default value: 2 Minimum value: 2 Maximum value: 20 | Default value: 100 GB Range: 100 GB - 30000 GB |
Control Center | Professional Edition/Enterprise Edition | Default value: 1 Minimum value: 1 Maximum value: 1 | Default value: 8 Minimum value: 8 Maximum value: 20 | Default value: 300 GB Range: 300 GB - 30000 GB |
Schema Register | Professional Edition/Enterprise Edition | Default value: 2 Minimum value: 2 Maximum value: 3 | Default value: 1 Minimum value: 1 Maximum value: 20 | No storage |
Kafka Connect | Professional Edition/Enterprise Edition (selected by default, can be unchecked to not use this component function) | Default value: 2 Minimum value: 1 Maximum value: 20 | Default value: 4 Minimum value: 1 Maximum value: 20 | No storage |
KsqlDB | Professional Edition/Enterprise Edition (selected by default, can be unchecked to not use this component function) | Default value: 2 Minimum value: 1 Maximum value: 20 | Default value: 5 Minimum value: 5 Maximum value: 20 | Default value: 100 GB Range: 100 GB - 30000 GB |
Rest Proxy | Professional Edition/Enterprise Edition (selected by default, can be unchecked to not use this component function) | Default value: 2 Minimum value: 2 Maximum value: 20 | Default value: 4 Minimum value: 4 Maximum value: 20 | No storage |
References
Confluent provides a resource evaluation tool for Kafka and Confluent Platform, which is applicable to Message Queue for Confluent. For more information, see Sizing Calculator for Apache Kafka and Confluent Platform.