This topic describes how to configure client parameters for ApsaraMQ for Kafka. Properly configured parameters directly affect message throughput, delivery reliability, and consumer stability. The following sections cover producer and consumer parameters with recommended values and tuning guidance for production workloads.
Producer parameters
Message delivery
acks
Controls how many broker acknowledgments the producer requires before considering a send successful.
| Value | Behavior | Trade-off |
|---|---|---|
0 | No acknowledgment from the broker. | Highest throughput, highest risk of data loss. |
1 | Acknowledgment after the leader writes the data. | Balanced throughput and durability. Data loss is possible if the leader fails before followers replicate the data. |
all | Acknowledgment after the leader and all in-sync replicas write the data. | Lowest throughput, strongest durability. Data loss occurs only if the leader and all in-sync replicas fail simultaneously. |
Recommended value: 1 for most workloads that prioritize throughput over strict durability.
retries
Maximum number of times the producer retries a failed send. A higher value helps the producer recover from transient broker failures such as leader elections. Combine with retry.backoff.ms to control retry pacing.
retry.backoff.ms
Delay between send retries. A value that is too low can cause retry storms during broker failovers.
| Recommended | Default | Unit |
|---|---|---|
| 1000 | -- | milliseconds |
Batching and throughput
Batching amortizes network overhead by combining multiple records into a single request. Two parameters control when a batch is sent: size and time.
batch.size
Maximum size of a single batch per partition. When a batch reaches this size, the producer sends it immediately.
| Type | Default | Valid values | Unit |
|---|---|---|---|
| int | 16384 | [0,...] | bytes |
Keep the default value of 16384 for most workloads. A smaller value increases the number of network requests and reduces throughput. If you increase batch.size, make sure buffer.memory is large enough to accommodate the larger batches.
linger.ms
Maximum time the producer waits for a batch to fill before sending it. This works like Nagle's algorithm in TCP: once a batch reaches batch.size, it is sent immediately regardless of the linger timer. If the batch is still below batch.size when linger.ms elapses, the producer sends whatever has accumulated.
| Recommended | Default | Unit |
|---|---|---|
| 100 to 1000 | 0 | milliseconds |
A higher linger.ms increases batching efficiency and throughput at the cost of per-message latency.
Memory management
buffer.memory
Total memory the producer allocates for buffering unsent records across all partitions. If this pool is exhausted, send() blocks or throws an exception depending on max.block.ms. An undersized buffer causes slow memory allocation, reduced throughput, or send timeouts.
Unit: bytes. Default: 33554432 (32 MB).
Sizing formula:
buffer.memory >= batch.size x number_of_partitions x 2For example, with batch.size=16384 and 50 partitions:
16384 x 50 x 2 = 1,638,400 bytes (~1.6 MB minimum)If you increase batch.size for throughput, scale buffer.memory proportionally.
Partitioning
partitioner.class
Determines how the producer assigns records to partitions. The sticky partitioning strategy reduces the number of incomplete batches by filling one partition's batch before moving to the next.
| Kafka client version | Default strategy |
|---|---|
| 2.4 and later | Sticky partitioning (default) |
| Earlier than 2.4 | Round-robin |
If your producer client is earlier than version 2.4, explicitly set the sticky partitioner to improve batching efficiency.
Consumer parameters
Fetch tuning
These parameters control how much data the consumer retrieves per fetch request. Tuning them affects both throughput and latency.
fetch.min.bytes
Minimum amount of data the broker accumulates before returning a fetch response. A larger value reduces fetch frequency and broker CPU overhead, which improves throughput but increases end-to-end message latency. Unit: bytes.
Evaluate your producer's message rate before setting this value. If messages arrive slowly, a large fetch.min.bytes adds unnecessary delay.
fetch.max.wait.ms
Maximum time the broker waits to accumulate fetch.min.bytes before returning a response. Unit: milliseconds.
Behavior varies by storage type:
Local storage: The broker waits until
fetch.min.bytesis reached orfetch.max.wait.mselapses, whichever comes first.Cloud storage: The broker returns a response immediately when new data arrives, regardless of
fetch.min.bytes.
max.partition.fetch.bytes
Maximum amount of data the broker returns per partition in a single fetch. Unit: bytes.
Session management and rebalancing
Misconfigured session and polling parameters are the most common cause of unexpected consumer rebalances. A rebalance pauses all consumption in the group until partitions are reassigned, so avoid triggering unnecessary rebalances.
session.timeout.ms
Maximum time between heartbeats before the broker considers the consumer dead and triggers a rebalance.
| Recommended | Valid range | Default | Unit |
|---|---|---|---|
| 30000 to 60000 | 6000 to 300000 | 10000 | milliseconds |
poll(). In earlier Java versions or non-Java clients, heartbeats are sent during poll() calls, so session.timeout.ms must account for both data processing time and the heartbeat interval.heartbeat.interval.ms to no more than one-third of session.timeout.ms. For example, if session.timeout.ms is 45000, set heartbeat.interval.ms to 15000 or lower.max.poll.records
Maximum number of records returned in a single poll() call. If the consumer cannot process this many records before the next poll() deadline, the broker considers it dead and triggers a rebalance.
Sizing formula:
max.poll.records < messages_per_thread_per_second x consumer_threads x session_timeout_secondsFor example, with 500 msg/s per thread, 4 threads, and a 45-second session timeout:
500 x 4 x 45 = 90,000Set max.poll.records below this value to ensure the consumer always finishes processing before the session times out.
max.poll.interval.ms
Maximum interval between consecutive poll() calls before the broker removes the consumer from the group. This parameter applies only to Java client 0.10.1 and later, where heartbeats run on a separate thread.
| Default | Unit |
|---|---|
| 300000 | milliseconds |
Sizing formula:
max.poll.interval.ms > time_per_record x max.poll.recordsIn most cases, the default of 300000 (5 minutes) is sufficient. Increase it only if your processing logic is exceptionally slow.
Offset management
enable.auto.commit
Controls whether the consumer automatically commits offsets at a fixed interval.
| Value | Behavior |
|---|---|
true (default) | Offsets are committed automatically every auto.commit.interval.ms milliseconds. Simpler to manage, but may cause duplicate processing after a crash. |
false | Your application must call commitSync() or commitAsync() explicitly. Use this for at-least-once or exactly-once semantics when combined with idempotent processing. |
auto.commit.interval.ms
Interval for automatic offset commits when enable.auto.commit is true.
| Default | Unit |
|---|---|
| 1000 | milliseconds |
A shorter interval reduces the window for duplicate messages after a crash but increases the number of commit requests to the broker.
auto.offset.reset
Determines what happens when the consumer has no committed offset or the committed offset is invalid (e.g., the offset has been deleted due to retention policies).
| Value | Behavior |
|---|---|
latest | Start consuming from the most recent offset. New messages only. |
earliest | Start consuming from the oldest available offset. Reprocesses all retained messages. |
none | Throw an exception. Use this when your application manages offsets manually. |
Recommended value: latest. Using earliest causes the consumer to reprocess all retained messages whenever an invalid offset is encountered, which can lead to duplicate processing and a spike in consumer lag.
If your application handles offset management manually (e.g., storing offsets in an external database), set this to none.
Parameter quick reference
Producer parameters
| Parameter | Default | Recommended | Unit |
|---|---|---|---|
acks | -- | 1 (throughput) or all (durability) | -- |
retries | -- | Set based on availability requirements | -- |
retry.backoff.ms | 100 | 1000 | ms |
batch.size | 16384 | 16384 (default) | bytes |
linger.ms | 0 | 100 to 1000 | ms |
buffer.memory | 33554432 | >= batch.size x partitions x 2 | bytes |
partitioner.class | Sticky (2.4+) | Sticky partitioning | -- |
Consumer parameters
| Parameter | Default | Recommended | Unit |
|---|---|---|---|
fetch.min.bytes | 1 | Tune based on producer message rate | bytes |
fetch.max.wait.ms | 500 | -- | ms |
max.partition.fetch.bytes | 1048576 | -- | bytes |
session.timeout.ms | 10000 | 30000 to 60000 | ms |
heartbeat.interval.ms | 3000 | <= 1/3 of session.timeout.ms | ms |
max.poll.records | 500 | See sizing formula | -- |
max.poll.interval.ms | 300000 | 300000 (default) | ms |
enable.auto.commit | true | Depends on delivery semantics | -- |
auto.commit.interval.ms | 1000 | 1000 (default) | ms |
auto.offset.reset | latest | latest | -- |