Migrate full data from TSDB to LindormTSDB - Lindorm

When you migrate all time series data from a Time Series Database (TSDB) instance to the Lindorm time series engine (LindormTSDB), use the LindormTSDB migration tool. The tool reads data from TSDB, splits the workload into parallel subtasks, and writes data to LindormTSDB using the multi-value data model.

Note TSDB is no longer available for sale. Alibaba Cloud recommends migrating all data from your TSDB instances to LindormTSDB.

Prerequisites

Before you begin, make sure that you have:

A Linux or macOS client with the following installed:
- Java Development Kit (JDK) 1.8 or later
- Python 2.x or 3.x
A TSDB instance running version 2.7.4 or later
A Lindorm instance with LindormTSDB activated. For details, see Create an instance

How the migration tool works

The tool processes data in the following sequence:

Reads all time series data from the TSDB instance and saves it to a local file.
Splits the migration task into time groups based on beginDateTime, endDateTime, and splitIntervalMs. Within each time group, splits the workload into read subtasks based on oidBatch. Each read subtask reads a batch of time series within the specified time range and sends the data to the write component.
After all read subtasks in a time group complete, records the time group ID, migration task ID, and task status in a list named internal_datax_job<jobName>.
The write component receives data from each read subtask and writes it to LindormTSDB using the multi-value data model.

Note A new time group starts only after all read subtasks in the previous time group have completed. The migration supports multiple concurrent tasks, and each task's ID is tracked in the task ID list.

Before you start

Network connectivity

Deploy the client, the Lindorm instance, and the TSDB instance in the same Virtual Private Cloud (VPC) to avoid latency and connectivity issues.
To migrate over the Internet, enable the public endpoints of both the Lindorm and TSDB instances, and add your client's IP address to the whitelists of both instances. For details, see Configure whitelists.

Performance planning

Assess how the migration may affect your business before you start. Key factors include:

TSDB instance specification
Specification of the environment (such as an Elastic Compute Service (ECS) instance) running your application
Number of time series in the TSDB instance
Total data size to migrate
Average reporting frequency per time series
Time range of the data
splitIntervalMs value

See Performance test results for reference throughput numbers by instance size.

Data model and query changes after migration

Be aware of the following data model differences before you migrate:

Item	TSDB	LindormTSDB	Impact
Timestamp length	10 digits (seconds)	13 digits (milliseconds)	Timestamps are automatically converted to milliseconds after migration
Write model	Single-value	Multi-value	Data written with the single-value model must be queried using multi-value syntax after migration
SQL queries	Supported	Not available for multi-value writes	Create a time series table before migrating if you need SQL queries

Configure the migration task

Save your migration configuration as a JSON file (for example, job.json). The configuration has three sections: task settings, reader (TSDB source), and writer (LindormTSDB destination).

Task settings

Parameter	Required	Description
`channel`	No	Number of concurrent tasks. Default: `1`.
`errorLimit`	No	Number of write errors allowed before the task fails. Default: `0`.

Reader parameters (TSDB source)

Configure these parameters based on your TSDB instance specification and data volume.

Parameter	Required	Description
`sinkDbType`	Yes	Set to `LINDORM-MIGRATION`.
`endpoint`	Yes	TSDB instance endpoint. For details, see Network connection.
`beginDateTime`	Yes	Start time of the migration range.
`endDateTime`	Yes	End time of the migration range.
`splitIntervalMs`	Yes	Duration of each time group in milliseconds. Controls how much data is read per subtask together with `oidBatch`. Example: `604800000` (7 days). Set to less than one day if data is reported at second-level frequency; use a larger value for hourly reporting.
`selfId`	Yes	ID of this migration task. If running multiple concurrent tasks, list all task IDs in `jobIds`.
`jobIds`	Yes	List of migration task IDs.
`jobName`	Yes	Name of the migration task. Used as the suffix of the task status list (`internal_datax_job<jobName>`). All concurrent tasks sharing the same job must use the same name.
`oidPath`	Yes	Path to the file containing all time series to migrate in the TSDB instance.
`oidBatch`	Yes	Number of time series each read subtask processes at once. Together with `splitIntervalMs` and the reporting frequency, determines how many data points are read per subtask.
`oidCache`	Yes	Whether to cache time series in memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory.
`metrics`	No	Specific metrics to migrate. This parameter does not have a default value. Example: `["METRIC_1","METRIC_2"...]`.

How splitIntervalMs and oidBatch determine read volume

Each read subtask reads oidBatch × splitIntervalMs / reportingInterval data points. For example, with oidBatch set to 100, splitIntervalMs set to 604800000, and hourly reporting:

100 × 604800000 / 3600000 = 16,800 data points per subtask

Writer parameters (LindormTSDB destination)

Parameter	Required	Description
`endpoint`	Yes	LindormTSDB endpoint. For details, see View endpoints.
`batchSize`	Yes	Maximum number of data points sent to LindormTSDB in a single write request.
`multiField`	Yes	Set to `true` to use the multi-value data model.

Example configuration

The following example shows a complete job.json file:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.00
            }
        },
        "content": [
            {
                "reader": {
                    "name": "tsdbreader",
                    "parameter": {
                        "sinkDbType": "LINDORM-MIGRATION",
                        "endpoint": "ts-xxxx:3242",
                        "beginDateTime": "2022-5-2 00:00:00",
                        "endDateTime": "2022-7-2 00:00:00",
                        "splitIntervalMs": 86400000,
                        "jobName": "myjob",
                        "selfId": 1,
                        "jobIds": [1],
                        "oidPath": "{$myworkplace}/oidfile",
                        "oidBatch": 100,
                        "oidCache": true
                    }
                },
                "writer": {
                    "name": "tsdbwriter",
                    "parameter": {
                        "endpoint": "ld-xxxx:8242",
                        "multiField": true,
                        "batchSize": 500
                    }
                }
            }
        ]
    }
}

Replace the placeholders with your actual values:

Placeholder	Description	Example
`ts-xxxx:3242`	TSDB instance endpoint (port 3242)	`ts-bp1xxxxx:3242`
`ld-xxxx:8242`	LindormTSDB endpoint (port 8242)	`ld-bp1xxxxx:8242`
`{$myworkplace}/oidfile`	Path to the time series file in the TSDB instance	`/data/migration/oidfile`

Run the migration

Download the migration tool.
Decompress the package:
```
tar -zxvf tsdb2lindorm.tar.gz
```
Start the migration task:
```
python datax/bin/datax.py --jvm="-Xms8G -Xmx8G" job.json > job.result
```
Replace job with the name of your configuration file.
After the command finishes, check job.result for errors. If the file contains no error output, the migration completed successfully.

(Optional) If the task fails, query the task status list to identify which time groups succeeded and where to resume:

curl -u <username>:<password> <tsdb-endpoint>:3242/api/mquery -XPOST -d '{
    "start": 1,
    "queries": [
        {
            "metric": "internal_datax_job<jobName>",
            "fields": [
                {
                    "field": "*",
                    "aggregator": "none"
                }
            ]
        }
    ]
}'

Replace the placeholders:

Placeholder	Description	Example
`<username>:<password>`	TSDB account credentials. See Manage accounts.	`admin:mypassword`
`<tsdb-endpoint>`	TSDB instance ID	`ts-****`
`<jobName>`	Migration task name	`internal_datax_jobmyjob`

The response lists each time group with its completion status:

Timestamp (endtime)	jobId (tag)	state (field)
1651795199999 (2022-05-05 23:59:59.999)	3	ok
1651795199999 (2022-05-05 23:59:59.999)	2	ok
1651795199999 (2022-05-05 23:59:59.999)	1	ok
1651881599999 (2022-05-06 23:59:59.999)	2	ok

To resume without re-migrating completed time groups, update beginDateTime in job.json to the start of the first incomplete time group before restarting. In this example, set beginDateTime to 2022-05-06 00:00:00.

Query migrated data

TSDB data originally written with the single-value model is stored in LindormTSDB under the multi-value data model. Use the /api/mquery endpoint and the multi-value query format when reading this data.

The following example shows the query difference for a metric named test_metric:

Query in TSDB:

curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{
    "start": 1657004460,
    "queries": [
        {
            "aggregator": "none",
            "metric": "test_metric"
        }
    ]
}'

Response from TSDB:

[
    {
        "aggregateTags": [],
        "dps": {
            "1657004460": 1.0
        },
        "fieldName": "",
        "metric": "test_metric",
        "tags": {
            "tagkey1": "1"
        }
    }
]

Query in LindormTSDB:

curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{
    "start": 1657004460,
    "queries": [
        {
            "metric": "test_metric",
            "fields": [
                {
                    "field": "*",
                    "aggregator": "none"
                }
            ],
            "aggregator": "none"
        }
    ]
}'

Response from LindormTSDB:

[
  {
    "aggregatedTags": [],
    "columns": [
      "timestamp",
      "value"
    ],
    "metric": "test_metric",
    "tags": {
      "tagkey1": "1"
    },
    "values": [
      [
        1657004460000,
        1.0
      ]
    ]
  }
]

Key differences:

Use /api/mquery instead of /api/query.
Specify fields explicitly using the fields array.
Timestamps in LindormTSDB responses are in milliseconds (13 digits), compared to seconds (10 digits) in TSDB.

Performance test results

Use the following results as a reference when planning your migration. Actual performance depends on your instance specification, data volume, and reporting frequency.

TSDB Basic Edition II (4 CPU cores, 8 GB memory) — 2 instances

Test	Time series	Data points	Processes	Configuration	Time series file size	Throughput (data points/sec)	Duration	CPU utilization
1	30,000	86,400,000	1	channel: 2, oidCache: true, oidBatch: 100, splitInterval: 6h, mem: -Xms6G -Xmx6G	1.5 MB	230,000	12 min 30 sec	30%
2	6,000,000	2,592,000,000	1	channel: 10, oidCache: true, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G	292 MB	200,000	2 hr 55 min 30 sec	70%–90%
3	30,000,000	4,320,000,000	1	channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms28G -Xmx28G	1.5 GB	140,000	9 hours	40%–80%
4	30,000,000	4,320,000,000	3	channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G	1.5 GB	250,000	5 hours	90%

TSDB Standard Edition I (8 CPU cores, 16 GB memory) — 2 instances

Time series	Data points	Processes	Configuration	Time series file size	Throughput (data points/sec)	Duration	CPU utilization
40,000,000	5,760,000,000	3	channel: 10, oidCache: false, oidBatch: 100, splitInterval: 6h, mem: -Xms8G -Xmx8G	2 GB	150,000–200,000	9 hours	10%–20%

What's next

To query migrated data using SQL, create a time series table in LindormTSDB before migrating. See the LindormTSDB documentation for instructions on creating time series tables.
After migration, validate a sample of your data in LindormTSDB using the query examples in Query migrated data.