Use DataX to migrate historical time series data from Prometheus to a Time Series Database (TSDB) instance. DataX is an open source offline data synchronization tool from Alibaba Group that uses two plug-ins for this workflow: Prometheus Reader reads data points from Prometheus, and TSDB Writer writes them to TSDB.
How it works
Prometheus Reader calls the
/api/v1/query_rangeAPI to read data points from Prometheus within a specified time range.DataX transfers the data through one or more parallel channels.
TSDB Writer calls the
/api/putAPI to write the data points to TSDB.
Prerequisites
Before you begin, make sure you have:
A Linux environment
Java Development Kit (JDK) 1.8 or later installed. JDK 1.8 is recommended. Download from the Oracle website.
Python 2.6.x installed. Download from python.org.
Prometheus 2.9.x. Earlier versions are not fully compatible with DataX.
TSDB 2.4.x or later. Earlier versions are not fully compatible with DataX.
Network connectivity from the machine running DataX to both the Prometheus endpoint and the TSDB endpoint. DataX calls their HTTP APIs directly — a connection exception is thrown if either endpoint is unreachable.
Download DataX and the plug-ins
Download DataX with the TSDB Writer plug-in.
Download the Prometheus Reader plug-in.
For general DataX documentation, see the DataX README.
Test the DataX installation
Before migrating your Prometheus data, run the built-in test job to confirm DataX is working correctly. The test uses Stream Reader and Stream Writer — two plug-ins that require no external dependencies. Stream Reader generates random strings; Stream Writer prints them to your terminal.
Decompress the DataX package and run the built-in job:
cd ${DATAX_HOME}
python bin/datax.py job/job.jsonA successful run produces output similar to the following:
Task start time: 2019-04-26 11:18:07
Task end time: 2019-04-26 11:18:17
Time consumed: 10s
Average traffic: 253.91KB/s
Write rate: 10000rec/s
Number of records obtained: 100000
Number of write and read failures: 0If Number of write and read failures is 0, DataX is installed correctly. For a full walkthrough, see the quick start demo.
Migrate data from Prometheus to TSDB
Create the job configuration
Create a JSON file for the migration job. This example uses the filename prometheus2tsdb.json.
All jobs use the same top-level structure: a reader block for Prometheus Reader and a writer block for TSDB Writer.
{
"job": {
"content": [
{
"reader": {
"name": "prometheusreader",
"parameter": {
"endpoint": "http://localhost:9090",
"column": [
"up"
],
"beginDateTime": "2019-05-20T16:00:00Z",
"endDateTime": "2019-05-20T16:00:10Z"
}
},
"writer": {
"name": "tsdbwriter",
"parameter": {
"endpoint": "http://localhost:8242"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}Replace the placeholder values with your actual endpoints, metric names, and time range. See the Parameters section for a full description of each field.
Run the migration
Place prometheus2tsdb.json in the parent directory of the extracted DataX package, then run:
cd ${DATAX_HOME}/..
ls
# datax/ datax.tar.gz prometheus2tsdb.json
python datax/bin/datax.py prometheus2tsdb.jsonPerformance tuning: For large migrations, increase the Java Virtual Machine (JVM) heap size with the -j flag:
python datax/bin/datax.py prometheus2tsdb.json -j "-Xms4096m -Xmx4096m"Verify the migration
A successful migration produces output similar to the following:
Task start time: 2019-05-20 20:22:39
Task end time: 2019-05-20 20:22:50
Time consumed: 10s
Average traffic: 122.07KB/s
Write rate: 1000rec/s
Number of records obtained: 10000
Number of write and read failures: 0Check these two fields to diagnose problems:
| Field | What it means |
|---|---|
Number of records obtained | Total data points read from Prometheus. A value of 0 usually means the time range or metric name is incorrect. |
Number of write and read failures | Failed write attempts after retries. A non-zero value indicates a network issue or a TSDB connectivity problem. Check that the TSDB endpoint is accessible and that the IP address running DataX is on the TSDB IP address whitelist. |
For a full walkthrough of this migration, see the Prometheus to TSDB migration demo.
Parameters
Prometheus Reader
| Parameter | Type | Required | Default | Description | Example |
|---|---|---|---|---|---|
endpoint | String | Yes | — | HTTP endpoint of the Prometheus instance. | http://127.0.0.1:9090 |
column | Array | Yes | [] | List of metric names to migrate. | ["m"] |
beginDateTime | String | Yes | — | Start of the time range to migrate. Used together with endDateTime. | 2019-05-13 15:00:00 |
endDateTime | String | Yes | — | End of the time range to migrate. Used together with beginDateTime. | 2019-05-13 17:00:00 |
TSDB Writer
| Parameter | Type | Required | Default | Description | Example |
|---|---|---|---|---|---|
endpoint | String | Yes | — | HTTP endpoint of the destination TSDB instance. | http://127.0.0.1:8242 |
batchSize | Integer | No | 100 | Number of data points written per batch. Must be greater than 0. | 100 |
maxRetryTime | Integer | No | 3 | Maximum number of retries after a write failure. Must be greater than 1. | 3 |
ignoreWriteError | Boolean | No | false | If true, write errors are ignored and the job continues. If false, the job stops after maxRetryTime retries are exhausted. | false |
FAQ
If my migration job runs on an Elastic Compute Service (ECS) instance, how do I configure the VPC network?
See Use cases of ECS security groups for guidance on configuring virtual private cloud (VPC) access and security group rules.
What's next
Set the IP address whitelist — add the IP address of the machine running DataX to the TSDB whitelist before running a migration.
DataX README — learn more about DataX configuration, plug-ins, and advanced options.