Migrate full data from TSDB to LindormTSDB - Lindorm - Alibaba Cloud Documentation Center

This topic describes how to migrate full data from a Time Series Database instance to LindormTSDB.

Prerequisites

Linux or macOS is installed on the client and the following environment is installed:
- Java Development Kit (JDK) 1.8 or later is installed.
- Python 2.x or 3.x is installed.
The version of the TSDB instance is 2.7.4 or later.
A Lindorm instance is created and LindormTSDB is activated for the instance. For more information, see Create an instance.

Background information

LindormTSDB is developed by Alibaba Cloud and is compatible with most APIs of TSDB. Compared with TSDB, LindormTSDB offers higher performance with lower costs and supports more features. TSDB is no longer available for sale. We recommend that you migrate all data in your TSDB instances to LindormTSDB.

Process

You can perform the following steps to migrate full data from a TSDB instance to LindormTSDB:

Use the migration tool provided by LindormTSDB to read the data of all time series in the TSDB instance and save the data to a local file.
Split the migration task into multiple time groups based on the configurations of the task, including the start time, end time, and interval. Split each time group into multiple read subtasks based on the value of the oidBatch parameter in the configurations of the migration task. Each read subtask reads data in multiple time series with the specified time range and sends the data to the write component.
After all read subtasks in a time group are complete, record the ID of the time group, the ID of the migration task, and the task status in a list whose name is in the following format: internal_datax_jobjobName.
Note The migration tool provided by LindormTSDB support multiple migration tasks. The ID of each migration task is recorded in a task ID list. The migration of data in a time group does not start until all read subtasks in the previous time group are complete.
The write component receives the data sent by each read subtask and writes the data to LindormTSDB by using the multi-value data model.

Usage notes

If your application is deployed on an ECS instance, we recommend that you deploy the ECS instance and the Lindorm instance in the same VPC as the TSDB instance that you want to migrate to ensure the communication between the instances.
If you migrate data from a TSDB instance to LindormTSDB by using the Internet, make sure that the public endpoints of the Lindorm and TSDB instances are enabled, and the IP address of your client is added to the whitelists of the Lindorm and TSDB instances. For more information, see Configure a whitelist.
During the migration process, data is read from the TSDB instance and is written to LindormTSDB. Therefore, check whether your business is affected during the migration from the following dimensions before you migrate data:
- The specification of the TSDB instance
- The specifications of the environment (such as an ECS instance) on which applications are deployed
- The number of time series in the TSDB instance
- The total size of data that you want to migrate
- The average frequency at which data in each time series are reported
- The time range of the data that you want to migrate
- The interval at which each migration task is split
Note For more information about performance evaluation, see Performance testing.
Data written by using the multi-value data model cannot be queried by using SQL statements. To use SQL statements to query migrated data, create a time series table before you migrate data to LindormTSDB.
By default, the timestamps used in LindormTSDB is 13 bits in length, which indicate time values in milliseconds. The timestamps used in TSDB is 10 bits in length, which indicate time values in seconds. After data is migrated from the TSDB instance to LindormTSDB, the timestamps of the data are converted to 13 bits in length.

The single-value model is not recommended in LindormTSDB for writing data. Therefore, data that has been written to the TSDB instance by using the single-value model must be queried by using the multi-value data model. The following sample code shows how to query data that has been written to the TSDB instance by using the single value data model in TSDB and LindormTSDB:

// The statement used to query data in TSDB.
curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{
    "start": 1657004460,
    "queries": [
        {
            "aggregator": "none",
            "metric": "test_metric"
        }
    ]
}'
// The query results in TSDB.
[
    {
        "aggregateTags": [],
        "dps": {
            "1657004460": 1.0
        },
        "fieldName": "",
        "metric": "test_metric",
        "tags": {
            "tagkey1": "1"
        }
    }
]

// The statement used to query data in LindormTSDB.
curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{
    "start":1657004460,
    "queries": [
        {
            "metric": "test_metric",
            "fields": [
                {
                    "field": "*",
                    "aggregator": "none"
                }
            ],
            "aggregator": "none"
        }
    ]
}'
// The query results in LindormTSDB.
[
  {
    "aggregatedTags": [],
    "columns": [
      "timestamp",
      "value"
    ],
    "metric": "test_metric",
    "tags": {
      "tagkey1": "1"
    },
    "values": [
      [
        1657004460000,
        1.0
      ]
    ]
  }
]

Configure a data migration task

Configure the parameters described in the following three tables and save the configurations as a JSON file such as job.json.

Configure parameters related to the task


Parameter	Required	Description
channel	No	The number of concurrent tasks that can be performed at the same time. Default value: 1.
errorLimit	No	The number of write errors that can be allowed during the migration task. Default value: 0.

Configure parameters related to data reading Specify the values of the parameters based on the specification of the TSDB instance.


Parameter	Required	Description
sinkDbType	Yes	The type of the destination database. Set this parameter to LINDORM-MIGRATION.
endpoint	Yes	The endpoint that is used to connect to the TSDB instance. For more information, see Network connection.
beginDateTime	Yes	The time when the migration task starts.
endDateTime	Yes	The time when the migration task ends.
splitIntervalMs	Yes	The interval at which the migration task is split. The value is calculated based on the total duration of the migration task and the average frequency at which data in each time series is reported. Example: 604800000 (7 days). If data in each time series is reported at a frequency of seconds or less, we recommend that you set the interval to a value shorter than one day. If data in each time series is reported at a frequency of hours, you can set the interval to a larger value based on your requirements.
selfId	Yes	The ID of the custom migration task. If you use multiple concurrent tasks to migrate data, specify the IDs of all tasks in the value of the jobIds parameter. If you use only one task to migrate data, specify the ID of the tasks in the value of the jobIds parameter.
jobIds	Yes	The IDs of the migration tasks.
jobName	Yes	The name of the migration task. The name of a migration task is the same as the suffix of the task in the task status list. If you use multiple concurrent tasks to migrate data, the names of the migration tasks must be the same.
oidPath	Yes	The path in which all time series that you want to migrate are stored in the TSDB instance.
oidBatch	Yes	The number of time series that are read by each read subtask each time.
oidCache	Yes	Specifies whether to save the time series migrated by the migration task to the memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory.
metrics	No	The table that you want to migrate. This parameter does not have the default value.

Note The amount of data that is read each time in a migration task is determined by the splitIntervalMs and oidBatch parameters and the average frequency at which data in each time series is reported. For example, if the value of splitIntervalMs is set to 604800000 and the value of oidBatch is set to 100, and data in each time series is reported on an hourly basis, the number of data records that can be read each time can be calculated by using the following formula: 100 × 604800000/3600000 = 16800.

Configure parameters related to data writing


Parameter	Required	Description
endpoint	Yes	The endpoint used to access LindormTSDB. For more information, see View endpoints.
batchSize	Yes	The maximum number of data points that can be sent to LindormTSDB at a time.
multiField	Yes	Specifies whether the multi-value data model is used to write data. If you use the multi-value data model to write data to LindormTSDB, set this parameter to true.

The following example shows the content contained in the job.json file:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.00
            }
        },
        "content": [
            {
                "reader": {
                    "name": "tsdbreader",
                    "parameter": {
                        "sinkDbType": "LINDORM-MIGRATION",
                        "endpoint": "ts-xxxx:3242",
                        "beginDateTime": "2022-5-2 00:00:00",
                        "endDateTime": "2022-7-2 00:00:00",
                        "splitIntervalMs": 86400000,
                        "jobName":"myjob",
                        "selfId":1,
                        "jobIds":[1],
                        "oidPath":"{$myworkplace}/oidfile",
                        "oidBatch":100,
                        "oidCache":true
                    }
                },
                "writer": {
                    "name": "tsdbwriter",
                    "parameter": {
                        "endpoint": "ld-xxxx:8242",
                        "multiField":true,
                        "batchSize":500
                    }
                }
            }
        ]
    }
}

Start the data migration task

Download the migration tool for time series data.
Run the following command to decompress the downloaded package of the migration tool:
```
tar -zxvf tsdb2lindorm.tar.gz
```
Run the following command to start the migration task:
```
python datax/bin/datax.py  --jvm="-Xms8G -Xmx8G" job.json > job.result
```
Note Replace job.json in the preceding command with the actual JSON file that you use to store the parameter configurations.
After the command is run, check whether error information is recorded in the job.result file. If no error information is returned, the migration task is successful.

Optional:If the migration task fails, you can execute the following multi-value statement to query the task status list of the TSDB instance:

curl -u username:password ts-****:3242/api/mquery -XPOST -d '{
    "start": 1,
    "queries": [
        {
            "metric": "internal_datax_jobjobName",
            "fields": [
                {
                    "field": "*",
                    "aggregator": "none"
                }
            ]
        }
    ]
}'

Note

username:password: Replace this value with the account and password that you use to access the TSDB instance. For more information, see Manage accounts.
ts-****: Replace this value with the ID of the TSDB instance.
jobName: Replace this value with the name of the migration task. Example: internal_datax_jobmyjob.

The following table describes the returned task status list.


Timestamp (endtime)	jobId (Tag)	state(field)
1651795199999 (2022-05-05 23:59:59.999)	3	ok
1651795199999 (2022-05-05 23:59:59.999)	2	ok
1651795199999 (2022-05-05 23:59:59.999)	1	ok
1651881599999 (2022-05-06 23:59:59.999)	2	ok

To prevent an executed migration task from being executed again, modify the value of beginDateTime in the job.json file before you start the task. In this example, the value of beginDateTime is changed to 2022-05-06 00:00:00.

Performance testing

Before you migrate data from a TSDB instance, you must evaluate the performance of the TSDB instance. The following tables show the performance test results of TSDB Basic Edition instance and TSDB Standard Edition instances for reference.

Test results of two TSDB Basic Edition II instances each with 4 CPU cores and 8 GB of memory


Tests	Amount of data	Number of task processes	Configurations	Size of time series files	Number of data points migrated per second	Migration duration	Consumed TSDB resources
1	Total number of time series: 30,000 Total number of data points: 86,400,000	1	channel:2 oidCache:true oidBatch:100 splitInterval:6h mem:-Xms6G -Xmx6G	1.5 MB	230000	12 minutes 30 seconds	CPU utilization: 30%
2	Total number of time series: 6,000,000 Total number of data points: 2,592,000,000	1	channel:10 oidCache:true oidBatch:100 splitInterval:6h mem:-Xms8G -Xmx8G	292 MB	200000	2 hours 55 minutes 30 seconds	CPU utilization: 70% to 90%
3	Total number of time series: 30,000,000 Total number of data points: 4,320,000,000	1	channel:10 oidCache:false oidBatch:100 splitInterval:6h mem:-Xms28G -Xmx28G	1.5 GB	140000	9 hours	CPU utilization: 40% to 80%
4	Total number of time series: 30,000,000 Total number of data points: 4,320,000,000	3	channel:10 oidCache:false oidBatch:100 splitInterval:6h mem:-Xms8G -Xmx8G	1.5 GB	250000	5 hours	CPU utilization: 90%

Test results of two TSDB Standard Edition I instances each with 8 CPU cores and 16 GB of memory


Amount of data	Number of task processes	Configurations	Size of time series files	Number of data points migrated per second	Migration duration	Consumed TSDB resources
Total number of time series: 40,000,000 Total number of data points: 5,760,000,000	3	channel:10 oidCache:false oidBatch:100 splitInterval:6h mem:-Xms8G -Xmx8G	2 GB	150000~200000	9 hours	CPU utilization: 10% to 20%