This topic describes how to migrate full data from a Time Series Database instance to LindormTSDB.

Prerequisites

  • Linux or macOS is installed on the client and the following environment is installed:
    • Java Development Kit (JDK) 1.8 or later is installed.
    • Python 2.x or 3.x is installed.
  • The version of the TSDB instance is 2.7.4 or later.
  • A Lindorm instance is created and LindormTSDB is activated for the instance. For more information, see Create an instance.

Background information

LindormTSDB is developed by Alibaba Cloud and is compatible with most APIs of TSDB. Compared with TSDB, LindormTSDB offers higher performance with lower costs and supports more features. TSDB is no longer available for sale. We recommend that you migrate all data in your TSDB instances to LindormTSDB.

Process

You can perform the following steps to migrate full data from a TSDB instance to LindormTSDB:
  1. Use the migration tool provided by LindormTSDB to read the data of all time series in the TSDB instance and save the data to a local file.
  2. Split the migration task into multiple time groups based on the configurations of the task, including the start time, end time, and interval. Split each time group into multiple read subtasks based on the value of the oidBatch parameter in the configurations of the migration task. Each read subtask reads data in multiple time series with the specified time range and sends the data to the write component.
  3. After all read subtasks in a time group are complete, record the ID of the time group, the ID of the migration task, and the task status in a list whose name is in the following format: internal_datax_jobjobName.
    Note The migration tool provided by LindormTSDB support multiple migration tasks. The ID of each migration task is recorded in a task ID list. The migration of data in a time group does not start until all read subtasks in the previous time group are complete.
  4. The write component receives the data sent by each read subtask and writes the data to LindormTSDB by using the multi-value data model.

Usage notes

  • If your application is deployed on an ECS instance, we recommend that you deploy the ECS instance and the Lindorm instance in the same VPC as the TSDB instance that you want to migrate to ensure the communication between the instances.
  • If you migrate data from a TSDB instance to LindormTSDB by using the Internet, make sure that the public endpoints of the Lindorm and TSDB instances are enabled, and the IP address of your client is added to the whitelists of the Lindorm and TSDB instances. For more information, see Configure a whitelist.
  • During the migration process, data is read from the TSDB instance and is written to LindormTSDB. Therefore, check whether your business is affected during the migration from the following dimensions before you migrate data:
    • The specification of the TSDB instance
    • The specifications of the environment (such as an ECS instance) on which applications are deployed
    • The number of time series in the TSDB instance
    • The total size of data that you want to migrate
    • The average frequency at which data in each time series are reported
    • The time range of the data that you want to migrate
    • The interval at which each migration task is split
    Note For more information about performance evaluation, see Performance testing.
  • Data written by using the multi-value data model cannot be queried by using SQL statements. To use SQL statements to query migrated data, create a time series table before you migrate data to LindormTSDB.
  • By default, the timestamps used in LindormTSDB is 13 bits in length, which indicate time values in milliseconds. The timestamps used in TSDB is 10 bits in length, which indicate time values in seconds. After data is migrated from the TSDB instance to LindormTSDB, the timestamps of the data are converted to 13 bits in length.
  • The single-value model is not recommended in LindormTSDB for writing data. Therefore, data that has been written to the TSDB instance by using the single-value model must be queried by using the multi-value data model. The following sample code shows how to query data that has been written to the TSDB instance by using the single value data model in TSDB and LindormTSDB:
    // The statement used to query data in TSDB.
    curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{
        "start": 1657004460,
        "queries": [
            {
                "aggregator": "none",
                "metric": "test_metric"
            }
        ]
    }'
    // The query results in TSDB.
    [
        {
            "aggregateTags": [],
            "dps": {
                "1657004460": 1.0
            },
            "fieldName": "",
            "metric": "test_metric",
            "tags": {
                "tagkey1": "1"
            }
        }
    ]
    
    // The statement used to query data in LindormTSDB.
    curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{
        "start":1657004460,
        "queries": [
            {
                "metric": "test_metric",
                "fields": [
                    {
                        "field": "*",
                        "aggregator": "none"
                    }
                ],
                "aggregator": "none"
            }
        ]
    }'
    // The query results in LindormTSDB.
    [
      {
        "aggregatedTags": [],
        "columns": [
          "timestamp",
          "value"
        ],
        "metric": "test_metric",
        "tags": {
          "tagkey1": "1"
        },
        "values": [
          [
            1657004460000,
            1.0
          ]
        ]
      }
    ]

Configure a data migration task

Configure the parameters described in the following three tables and save the configurations as a JSON file such as job.json.
  • Configure parameters related to the task
    ParameterRequiredDescription
    channelNoThe number of concurrent tasks that can be performed at the same time. Default value: 1.
    errorLimitNoThe number of write errors that can be allowed during the migration task. Default value: 0.
  • Configure parameters related to data reading Specify the values of the parameters based on the specification of the TSDB instance.
    ParameterRequiredDescription
    sinkDbTypeYesThe type of the destination database. Set this parameter to LINDORM-MIGRATION.
    endpointYesThe endpoint that is used to connect to the TSDB instance. For more information, see Network connection.
    beginDateTimeYesThe time when the migration task starts.
    endDateTimeYesThe time when the migration task ends.
    splitIntervalMsYesThe interval at which the migration task is split. The value is calculated based on the total duration of the migration task and the average frequency at which data in each time series is reported. Example: 604800000 (7 days).
    • If data in each time series is reported at a frequency of seconds or less, we recommend that you set the interval to a value shorter than one day.
    • If data in each time series is reported at a frequency of hours, you can set the interval to a larger value based on your requirements.
    selfIdYesThe ID of the custom migration task.
    • If you use multiple concurrent tasks to migrate data, specify the IDs of all tasks in the value of the jobIds parameter.
    • If you use only one task to migrate data, specify the ID of the tasks in the value of the jobIds parameter.
    jobIdsYesThe IDs of the migration tasks.
    jobNameYesThe name of the migration task. The name of a migration task is the same as the suffix of the task in the task status list. If you use multiple concurrent tasks to migrate data, the names of the migration tasks must be the same.
    oidPathYesThe path in which all time series that you want to migrate are stored in the TSDB instance.
    oidBatchYesThe number of time series that are read by each read subtask each time.
    oidCacheYesSpecifies whether to save the time series migrated by the migration task to the memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory.
    metricsNoThe table that you want to migrate. This parameter does not have the default value.
    Note The amount of data that is read each time in a migration task is determined by the splitIntervalMs and oidBatch parameters and the average frequency at which data in each time series is reported. For example, if the value of splitIntervalMs is set to 604800000 and the value of oidBatch is set to 100, and data in each time series is reported on an hourly basis, the number of data records that can be read each time can be calculated by using the following formula: 100 × 604800000/3600000 = 16800.
  • Configure parameters related to data writing
    ParameterRequiredDescription
    endpointYesThe endpoint used to access LindormTSDB. For more information, see View endpoints.
    batchSizeYesThe maximum number of data points that can be sent to LindormTSDB at a time.
    multiFieldYesSpecifies whether the multi-value data model is used to write data. If you use the multi-value data model to write data to LindormTSDB, set this parameter to true.
The following example shows the content contained in the job.json file:
{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.00
            }
        },
        "content": [
            {
                "reader": {
                    "name": "tsdbreader",
                    "parameter": {
                        "sinkDbType": "LINDORM-MIGRATION",
                        "endpoint": "ts-xxxx:3242",
                        "beginDateTime": "2022-5-2 00:00:00",
                        "endDateTime": "2022-7-2 00:00:00",
                        "splitIntervalMs": 86400000,
                        "jobName":"myjob",
                        "selfId":1,
                        "jobIds":[1],
                        "oidPath":"{$myworkplace}/oidfile",
                        "oidBatch":100,
                        "oidCache":true
                    }
                },
                "writer": {
                    "name": "tsdbwriter",
                    "parameter": {
                        "endpoint": "ld-xxxx:8242",
                        "multiField":true,
                        "batchSize":500
                    }
                }
            }
        ]
    }
}
                

Start the data migration task

  1. Download the migration tool for time series data.
  2. Run the following command to decompress the downloaded package of the migration tool:
    tar -zxvf tsdb2lindorm.tar.gz
  3. Run the following command to start the migration task:
    python datax/bin/datax.py  --jvm="-Xms8G -Xmx8G" job.json > job.result
    Note Replace job.json in the preceding command with the actual JSON file that you use to store the parameter configurations.

    After the command is run, check whether error information is recorded in the job.result file. If no error information is returned, the migration task is successful.

  4. Optional:If the migration task fails, you can execute the following multi-value statement to query the task status list of the TSDB instance:
    curl -u username:password ts-****:3242/api/mquery -XPOST -d '{
        "start": 1,
        "queries": [
            {
                "metric": "internal_datax_jobjobName",
                "fields": [
                    {
                        "field": "*",
                        "aggregator": "none"
                    }
                ]
            }
        ]
    }'
    Note
    • username:password: Replace this value with the account and password that you use to access the TSDB instance. For more information, see Manage accounts.
    • ts-****: Replace this value with the ID of the TSDB instance.
    • jobName: Replace this value with the name of the migration task. Example: internal_datax_jobmyjob.
    The following table describes the returned task status list.
    Timestamp (endtime)jobId (Tag)state(field)
    1651795199999 (2022-05-05 23:59:59.999)3ok
    1651795199999 (2022-05-05 23:59:59.999)2ok
    1651795199999 (2022-05-05 23:59:59.999)1ok
    1651881599999 (2022-05-06 23:59:59.999)2ok
    To prevent an executed migration task from being executed again, modify the value of beginDateTime in the job.json file before you start the task. In this example, the value of beginDateTime is changed to 2022-05-06 00:00:00.

Performance testing

Before you migrate data from a TSDB instance, you must evaluate the performance of the TSDB instance. The following tables show the performance test results of TSDB Basic Edition instance and TSDB Standard Edition instances for reference.

  • Test results of two TSDB Basic Edition II instances each with 4 CPU cores and 8 GB of memory
    TestsAmount of dataNumber of task processesConfigurationsSize of time series filesNumber of data points migrated per secondMigration durationConsumed TSDB resources
    1
    • Total number of time series: 30,000
    • Total number of data points: 86,400,000
    1
    • channel:2
    • oidCache:true
    • oidBatch:100
    • splitInterval:6h
    • mem:-Xms6G -Xmx6G
    1.5 MB23000012 minutes 30 secondsCPU utilization: 30%
    2
    • Total number of time series: 6,000,000
    • Total number of data points: 2,592,000,000
    1
    • channel:10
    • oidCache:true
    • oidBatch:100
    • splitInterval:6h
    • mem:-Xms8G -Xmx8G
    292 MB2000002 hours 55 minutes 30 secondsCPU utilization: 70% to 90%
    3
    • Total number of time series: 30,000,000
    • Total number of data points: 4,320,000,000
    1
    • channel:10
    • oidCache:false
    • oidBatch:100
    • splitInterval:6h
    • mem:-Xms28G -Xmx28G
    1.5 GB1400009 hoursCPU utilization: 40% to 80%
    4
    • Total number of time series: 30,000,000
    • Total number of data points: 4,320,000,000
    3
    • channel:10
    • oidCache:false
    • oidBatch:100
    • splitInterval:6h
    • mem:-Xms8G -Xmx8G
    1.5 GB2500005 hoursCPU utilization: 90%
  • Test results of two TSDB Standard Edition I instances each with 8 CPU cores and 16 GB of memory
    Amount of dataNumber of task processesConfigurationsSize of time series filesNumber of data points migrated per secondMigration durationConsumed TSDB resources
    • Total number of time series: 40,000,000
    • Total number of data points: 5,760,000,000
    3
    • channel:10
    • oidCache:false
    • oidBatch:100
    • splitInterval:6h
    • mem:-Xms8G -Xmx8G
    2 GB150000~2000009 hoursCPU utilization: 10% to 20%