When you migrate all time series data from a Time Series Database (TSDB) instance to the Lindorm time series engine (LindormTSDB), use the LindormTSDB migration tool. The tool reads data from TSDB, splits the workload into parallel subtasks, and writes data to LindormTSDB using the multi-value data model.
Prerequisites
Before you begin, make sure that you have:
A Linux or macOS client with the following installed:
Java Development Kit (JDK) 1.8 or later
Python 2.x or 3.x
A TSDB instance running version 2.7.4 or later
A Lindorm instance with LindormTSDB activated. For details, see Create an instance
How the migration tool works
The tool processes data in the following sequence:
Reads all time series data from the TSDB instance and saves it to a local file.
Splits the migration task into time groups based on
beginDateTime,endDateTime, andsplitIntervalMs. Within each time group, splits the workload into read subtasks based onoidBatch. Each read subtask reads a batch of time series within the specified time range and sends the data to the write component.After all read subtasks in a time group complete, records the time group ID, migration task ID, and task status in a list named
internal_datax_job<jobName>.The write component receives data from each read subtask and writes it to LindormTSDB using the multi-value data model.
Before you start
Network connectivity
Deploy the client, the Lindorm instance, and the TSDB instance in the same Virtual Private Cloud (VPC) to avoid latency and connectivity issues.
To migrate over the Internet, enable the public endpoints of both the Lindorm and TSDB instances, and add your client's IP address to the whitelists of both instances. For details, see Configure whitelists.
Performance planning
Assess how the migration may affect your business before you start. Key factors include:
TSDB instance specification
Specification of the environment (such as an Elastic Compute Service (ECS) instance) running your application
Number of time series in the TSDB instance
Total data size to migrate
Average reporting frequency per time series
Time range of the data
splitIntervalMsvalue
See Performance test results for reference throughput numbers by instance size.
Data model and query changes after migration
Be aware of the following data model differences before you migrate:
| Item | TSDB | LindormTSDB | Impact |
|---|---|---|---|
| Timestamp length | 10 digits (seconds) | 13 digits (milliseconds) | Timestamps are automatically converted to milliseconds after migration |
| Write model | Single-value | Multi-value | Data written with the single-value model must be queried using multi-value syntax after migration |
| SQL queries | Supported | Not available for multi-value writes | Create a time series table before migrating if you need SQL queries |
Configure the migration task
Save your migration configuration as a JSON file (for example, job.json). The configuration has three sections: task settings, reader (TSDB source), and writer (LindormTSDB destination).
Task settings
| Parameter | Required | Description |
|---|---|---|
channel | No | Number of concurrent tasks. Default: 1. |
errorLimit | No | Number of write errors allowed before the task fails. Default: 0. |
Reader parameters (TSDB source)
Configure these parameters based on your TSDB instance specification and data volume.
| Parameter | Required | Description |
|---|---|---|
sinkDbType | Yes | Set to LINDORM-MIGRATION. |
endpoint | Yes | TSDB instance endpoint. For details, see Network connection. |
beginDateTime | Yes | Start time of the migration range. |
endDateTime | Yes | End time of the migration range. |
splitIntervalMs | Yes | Duration of each time group in milliseconds. Controls how much data is read per subtask together with oidBatch. Example: 604800000 (7 days). Set to less than one day if data is reported at second-level frequency; use a larger value for hourly reporting. |
selfId | Yes | ID of this migration task. If running multiple concurrent tasks, list all task IDs in jobIds. |
jobIds | Yes | List of migration task IDs. |
jobName | Yes | Name of the migration task. Used as the suffix of the task status list (internal_datax_job<jobName>). All concurrent tasks sharing the same job must use the same name. |
oidPath | Yes | Path to the file containing all time series to migrate in the TSDB instance. |
oidBatch | Yes | Number of time series each read subtask processes at once. Together with splitIntervalMs and the reporting frequency, determines how many data points are read per subtask. |
oidCache | Yes | Whether to cache time series in memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory. |
metrics | No | Specific metrics to migrate. This parameter does not have a default value. Example: ["METRIC_1","METRIC_2"...]. |
How splitIntervalMs and oidBatch determine read volume
Each read subtask reads oidBatch × splitIntervalMs / reportingInterval data points. For example, with oidBatch set to 100, splitIntervalMs set to 604800000, and hourly reporting:
100 × 604800000 / 3600000 = 16,800 data points per subtaskWriter parameters (LindormTSDB destination)
| Parameter | Required | Description |
|---|---|---|
endpoint | Yes | LindormTSDB endpoint. For details, see View endpoints. |
batchSize | Yes | Maximum number of data points sent to LindormTSDB in a single write request. |
multiField | Yes | Set to true to use the multi-value data model. |
Example configuration
The following example shows a complete job.json file:
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.00
}
},
"content": [
{
"reader": {
"name": "tsdbreader",
"parameter": {
"sinkDbType": "LINDORM-MIGRATION",
"endpoint": "ts-xxxx:3242",
"beginDateTime": "2022-5-2 00:00:00",
"endDateTime": "2022-7-2 00:00:00",
"splitIntervalMs": 86400000,
"jobName": "myjob",
"selfId": 1,
"jobIds": [1],
"oidPath": "{$myworkplace}/oidfile",
"oidBatch": 100,
"oidCache": true
}
},
"writer": {
"name": "tsdbwriter",
"parameter": {
"endpoint": "ld-xxxx:8242",
"multiField": true,
"batchSize": 500
}
}
}
]
}
}Replace the placeholders with your actual values:
| Placeholder | Description | Example |
|---|---|---|
ts-xxxx:3242 | TSDB instance endpoint (port 3242) | ts-bp1xxxxx:3242 |
ld-xxxx:8242 | LindormTSDB endpoint (port 8242) | ld-bp1xxxxx:8242 |
{$myworkplace}/oidfile | Path to the time series file in the TSDB instance | /data/migration/oidfile |
Run the migration
Download the migration tool.
Decompress the package:
tar -zxvf tsdb2lindorm.tar.gzStart the migration task:
python datax/bin/datax.py --jvm="-Xms8G -Xmx8G" job.json > job.resultReplace
jobwith the name of your configuration file.After the command finishes, check
job.resultfor errors. If the file contains no error output, the migration completed successfully.(Optional) If the task fails, query the task status list to identify which time groups succeeded and where to resume:
curl -u <username>:<password> <tsdb-endpoint>:3242/api/mquery -XPOST -d '{ "start": 1, "queries": [ { "metric": "internal_datax_job<jobName>", "fields": [ { "field": "*", "aggregator": "none" } ] } ] }'Replace the placeholders:
Placeholder Description Example <username>:<password>TSDB account credentials. See Manage accounts. admin:mypassword<tsdb-endpoint>TSDB instance ID ts-****<jobName>Migration task name internal_datax_jobmyjobThe response lists each time group with its completion status:
Timestamp (endtime) jobId (tag) state (field) 1651795199999 (2022-05-05 23:59:59.999) 3 ok 1651795199999 (2022-05-05 23:59:59.999) 2 ok 1651795199999 (2022-05-05 23:59:59.999) 1 ok 1651881599999 (2022-05-06 23:59:59.999) 2 ok To resume without re-migrating completed time groups, update
beginDateTimeinjob.jsonto the start of the first incomplete time group before restarting. In this example, setbeginDateTimeto2022-05-06 00:00:00.
Query migrated data
TSDB data originally written with the single-value model is stored in LindormTSDB under the multi-value data model. Use the /api/mquery endpoint and the multi-value query format when reading this data.
The following example shows the query difference for a metric named test_metric:
Query in TSDB:
curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{
"start": 1657004460,
"queries": [
{
"aggregator": "none",
"metric": "test_metric"
}
]
}'Response from TSDB:
[
{
"aggregateTags": [],
"dps": {
"1657004460": 1.0
},
"fieldName": "",
"metric": "test_metric",
"tags": {
"tagkey1": "1"
}
}
]Query in LindormTSDB:
curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{
"start": 1657004460,
"queries": [
{
"metric": "test_metric",
"fields": [
{
"field": "*",
"aggregator": "none"
}
],
"aggregator": "none"
}
]
}'Response from LindormTSDB:
[
{
"aggregatedTags": [],
"columns": [
"timestamp",
"value"
],
"metric": "test_metric",
"tags": {
"tagkey1": "1"
},
"values": [
[
1657004460000,
1.0
]
]
}
]Key differences:
Use
/api/mqueryinstead of/api/query.Specify fields explicitly using the
fieldsarray.Timestamps in LindormTSDB responses are in milliseconds (13 digits), compared to seconds (10 digits) in TSDB.
Performance test results
Use the following results as a reference when planning your migration. Actual performance depends on your instance specification, data volume, and reporting frequency.
TSDB Basic Edition II (4 CPU cores, 8 GB memory) — 2 instances
| Test | Time series | Data points | Processes | Configuration | Time series file size | Throughput (data points/sec) | Duration | CPU utilization |
|---|---|---|---|---|---|---|---|---|
| 1 | 30,000 | 86,400,000 | 1 | channel: 2, oidCache: true, oidBatch: 100, splitInterval: 6h, mem: -Xms6G -Xmx6G | 1.5 MB | 230,000 | 12 min 30 sec | 30% |
| 2 | 6,000,000 | 2,592,000,000 | 1 | channel: 10, oidCache: true, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G | 292 MB | 200,000 | 2 hr 55 min 30 sec | 70%–90% |
| 3 | 30,000,000 | 4,320,000,000 | 1 | channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms28G -Xmx28G | 1.5 GB | 140,000 | 9 hours | 40%–80% |
| 4 | 30,000,000 | 4,320,000,000 | 3 | channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G | 1.5 GB | 250,000 | 5 hours | 90% |
TSDB Standard Edition I (8 CPU cores, 16 GB memory) — 2 instances
| Time series | Data points | Processes | Configuration | Time series file size | Throughput (data points/sec) | Duration | CPU utilization |
|---|---|---|---|---|---|---|---|
| 40,000,000 | 5,760,000,000 | 3 | channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G | 2 GB | 150,000–200,000 | 9 hours | 10%–20% |
What's next
To query migrated data using SQL, create a time series table in LindormTSDB before migrating. See the LindormTSDB documentation for instructions on creating time series tables.
After migration, validate a sample of your data in LindormTSDB using the query examples in Query migrated data.