Each Alibaba Cloud Time Series Database (TSDB) instance has a maximum write throughput limit. Exceeding that limit triggers a throttling rule, causing write failures. Configure Prometheus remote_write settings to match your TSDB instance's capacity so that metrics flow into TSDB smoothly and reliably.
For the full list of remote_write configuration options, see the Prometheus documentation.
How it works
When Prometheus writes to remote storage, it reads from the write-ahead log (WAL) and distributes samples into an in-memory queue managed by shards. Each shard sends batched requests to the configured endpoint:
|--> queue (shard_1) --> TSDB endpoint
WAL --> buffer --|--> queue (shard_2) --> TSDB endpoint
|--> queue (shard_n) --> TSDB endpointThe queue_config block controls this pipeline. Two parameters determine the maximum write throughput: max_shards sets the number of concurrent shards, and max_samples_per_send sets the batch size. Together, they cap how fast Prometheus can push data:
Max throughput = max_shards x max_samples_per_send / send_latencyFor example, with the default values (max_shards: 1000, max_samples_per_send: 100) and a 100 ms send latency, Prometheus can reach 1,000,000 data points per second — far above most TSDB instance limits.
Queue configuration parameters
The following parameters control how Prometheus buffers and sends data to TSDB:
| Parameter | Default | Description |
|---|---|---|
capacity | 10,000 | Samples buffered per shard before the WAL blocks. |
max_shards | 1,000 | Maximum number of concurrent shards. Controls maximum throughput and memory usage. |
min_shards | 1 | Minimum number of shards. Supported in Prometheus v2.6.0 and later; has no effect in earlier versions. |
max_samples_per_send | 100 | Samples per request batch. Higher values improve efficiency but increase per-request size. |
batch_send_deadline | 5s | Maximum time a sample waits before the batch is sent, regardless of batch size. |
max_retries | 3 | Maximum retry attempts on recoverable errors before the batch is dropped. |
min_backoff | 30ms | Initial retry delay. Doubles on each retry attempt. |
max_backoff | 100ms | Upper bound for the retry delay. |
Reference configuration by TSDB specification
Set max_shards based on your TSDB instance specification. Keep capacity and max_samples_per_send at the values below unless you have specific latency or memory constraints.
| TSDB specification | Max write throughput (data points/s) | capacity | max_samples_per_send | max_shards |
|---|---|---|---|---|
| mlarge | 5,000 | 10,000 | 500 | 1 |
| large | 10,000 | 10,000 | 500 | 2 |
| 3xlarge | 30,000 | 10,000 | 500 | 6 |
| 4xlarge | 40,000 | 10,000 | 500 | 8 |
| 6xlarge | 60,000 | 10,000 | 500 | 12 |
| 12xlarge | 120,000 | 10,000 | 500 | 24 |
| 24xlarge | 240,000 | 10,000 | 500 | 48 |
| 48xlarge | 480,000 | 10,000 | 500 | 96 |
| 96xlarge | 960,000 | 10,000 | 500 | 192 |
Complete configuration example
The following example shows a full Prometheus configuration for an mlarge TSDB instance. Replace the placeholders with your actual values before using it.
| Placeholder | Description | Example |
|---|---|---|
<tsdb-instance-id> | Your TSDB instance ID | ts-bp1234567890abcdef |
# Global settings
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate alerting rules
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Alerting rule files
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# Scrape configuration
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Remote write configuration -- sends metrics to TSDB
remote_write:
- url: "http://<tsdb-instance-id>.hitsdb.rds.aliyuncs.com:3242/api/prom_write"
queue_config:
capacity: 10000 # Buffer per shard
max_shards: 1 # Set based on your TSDB specification
max_samples_per_send: 500 # Batch size per request
# Remote read configuration -- reads recent metrics from TSDB
remote_read:
- url: "http://<tsdb-instance-id>.hitsdb.rds.aliyuncs.com:3242/api/prom_read"
read_recent: trueTo use a different TSDB specification, adjust max_shards according to the reference table above and keep the other parameters unchanged.