TSDB data source - DataWorks - Alibaba Cloud Documentation Center

DataWorks Data Integration provides TSDB Writer for you to write data points to Lindorm Time Series Database (TSDB) provided by Alibaba Cloud ApsaraDB for Lindorm. This topic describes the capabilities of synchronizing data to TSDB data sources.

Supported TSDB versions

TSDB Writer supports all versions of ApsaraDB for Lindorm and HiTSDB V2.4.X or later.

Limits

TSDB Writer supports only exclusive resource groups for Data Integration.
You can configure TSDB Writer only by using the code editor.

How it works

TSDB Writer connects to a TSDB instance by using the TSDB client hitsdb-client and writes data points by using the HTTP API endpoint. For more information, see TSDB SDK documentation.

Data type mappings

If the sourceDbType parameter is set to TSDB, source data is read by using TSDB Reader or OpenTSDB Reader. In this case, TSDB Writer writes the source data to Lindorm TSDB in the format of JSON strings. If the sourceDbType parameter is set to RDB, the source is a relational database. In this case, TSDB Writer parses the source data based on the records of the relational database. The following table lists the valid values of the columnType parameter and the data types that match the column types when the sourceDbType parameter is set to RDB.

Data model	Valid value of columnType	Data type
Tag	tag	A string data type. A tag describes the features of the data source. In most case, a tag does not change over time.
Timestamp	timestamp	The TIMESTAMP data type. A timestamp specifies the point in time at which data is generated. The timestamp can be manually specified when data is written or automatically generated by the system.
Field	field_string	A string data type. A field describes the measurement metrics of the data source. In most case, a field changes over time.
	field_double	A numeric data type. A field describes the measurement metrics of the data source. In most case, a field changes over time.
	field_boolean	A Boolean data type. A field describes the measurement metrics of the data source. In most case, a field changes over time.

Develop a data synchronization task

For information about the configuration procedure, see Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.

Appendix: Code and parameters

Appendix: Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for TSDB Writer

Write data from RDB to TSDB by using the following default configurations (recommended)

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",// You can replace the stream plug-in with the specific RDB plug-in. RDB databases include MySQL, Oracle, PostgreSQL, and DRDS databases. 
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "tsdb",
            "parameter": {
                "endpoint": "http://localhost:8242",
                "username": "xxx",
                "password": "xxx",
                "sourceDbType": "RDB",
                "batchSize": 256,
                "columnType": [
                    "tag",
                    "tag",
                    "field_string",
                    "field_double",
                    "timestamp",
                    "field_bool"
                ],
                "column": [
                    "tag1",
                    "tag2",
                    "field1",
                    "field2",
                    "timestamp",
                    "field3"
                ],
                "multiField": "true",
                "table": "testmetric",
                "ignoreWriteError": "false",
                "database": "default"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Write data from a database that supports the OpenTSDB protocol to TSDB

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "opentsdb",
            "parameter": {
                "endpoint": "http://localhost:4242",
                "column": [
                    "m1",
                    "m2",
                    "m3",
                    "m4",
                    "m5",
                    "m6"
                ],
                "startTime": "2019-01-01 00:00:00",
                "endTime": "2019-01-01 03:00:00"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "tsdb",
            "parameter": {
                "endpoint": "http://localhost:8242"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Use the OpenTSDB protocol to write a univariate data point to TSDB (not recommended)


{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",// You can replace the stream plug-in with the specific RDB plug-in. RDB databases include MySQL, Oracle, PostgreSQL, and DRDS databases. 
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "tsdb",
            "parameter": {
                "endpoint": "http://localhost:8242",
                "username": "xxx",
                "password": "xxx",
                "sourceDbType": "RDB",
                "batchSize": 256,
                "columnType": [
                    "tag",
                    "tag",
                    "field_string",
                    "field_double",
                    "timestamp",
                    "field_boolean"
                ],
                "column": [
                    "tag1",
                    "tag2",
                    "field_metric_1",
                    "field_metric_2",
                    "timestamp",
                    "field_metric_3"
                ],
                "ignoreWriteError": "false"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. Unit: MB/s. 
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Note

The names of the TSDB metrics are determined by the column names of fields for the column parameter. In the preceding code, a row of data in a relational database is written to three metrics: field_metric_1, field_metric_2, and field_metric_3.

Parameters in code for TSDB Writer

Parameter type	Parameter	Description	Required	Default value
Common parameters	sourceDbType	The type of the source database.	No	TSDB Note Valid values: TSDB and RDB. The value TSDB indicates that the source database is an OpenTSDB, Prometheus, or Timescale database. The value RDB indicates that the source database is a relational database, such as a MySQL, Oracle, PostgreSQL, or DRDS database.
	endpoint	The HTTP URL of the destination TSDB database. Specify the endpoint in the format of http://IP address:Port number. You can obtain the HTTP endpoint in the ApsaraDB for Lindorm console.	Yes	No default value
	database	The name of the TSDB database to which data is written.	No	default Note You must create a database first.
	username	The username of the TSDB database. You must specify a value for this parameter if you configure authentication for the TSDB database.	No	No default value
	batchSize	The number of data records to write at a time. The value of this parameter is of the INT type and must be greater than 0. If you want to configure a large value for the batchSize parameter, you must reserve more memory space.	No	100
Parameters for TSDB	maxRetryTime	The maximum number of retries allowed after a failure. The value of this parameter is of the INT type and must be greater than 1.	No	3
Parameters for TSDB	ignoreWriteError	Specifies whether to ignore write errors. The value of this parameter is of the BOOLEAN type. If you set this parameter to true, TSDB Writer continues to perform the write operation after a write error occurs. If the write operation fails after the specified number of retries, the synchronization task is terminated.	No	false
Parameters for RDB	table	The names of the metrics that you want to import to TSDB. If the multiField parameter is set to false, you can leave this parameter empty. In this case, you need to specify the names of the metrics for the column parameter. If the multiField parameter is set to true, you must configure this parameter.	No	No default value
	multiField	Specifies whether to write a multivariate data point to TSDB by using the HTTP API endpoint. Note If you want to use the native SQL capabilities of Lindorm TSDB to access data that is written by using the HTTP API endpoint, you must create a table in TSDB. Otherwise, you can query a multivariate data point only by using the TSDB HTTP API endpoint. For more information, see Query a multivariate data point.	Yes	false Note To write a multivariate data point to TSDB, you must set the value to true.
	column	The names of the columns whose data you want to write to the TSDB database.	Yes	No default value Note You must specify the columns in the same order as the columns specified for a reader.
	columnType	The data types of the columns in the relational database. The following types are supported: timestamp: a timestamp column. tag: a tag column. field_string: a metric column whose value is of a string data type. field_double: a metric column whose value is of a numeric data type. field_boolean: a metric column whose value is of a Boolean data type.	Yes	No default value Note You must specify the columns in the same order as the columns specified for a reader.
	batchSize	The number of data records to write at a time. The value of this parameter is of the INT type and must be greater than 0.	No	100

Performance test report

Characteristics of test data
- Metric: a metric, which is m.
- tag_k and tag_v: the key and value of a tag. The keys and values of the first four tags constitute a time series of 2,000,000 data points. The number of data points is calculated by using the following formula: 10 (zones) × 20 (clusters) × 100 (groups) × 100 (applications). The ip tag corresponds to the index of the 2,000,000 data points, starting from 1.
  tag_k
  tag_v
  zone
  z1 to z10
  cluster
  c1 to c20
  group
  g1 to g100
  app
  a1 to a100
  ip
  ip1 to ip2,000,000
- value: a random value from 1 to 100.
- interval: a collection interval of 10 seconds. The total duration of data collection is 3 hours, and a total number of 2,160,000,000 data points are collected. The number of data points is calculated by using the following formula: 3 × 60 × 60/10 × 2,000,000.
Performance test results
Number of channels
Data integration speed (record/s)
Data integration bandwidth (Mbit/s)
1
129,753
15.45
2
284,953
33.70
3
385,868
45.71

tag_k	tag_v
zone	z1 to z10
cluster	c1 to c20
group	g1 to g100
app	a1 to a100
ip	ip1 to ip2,000,000

Number of channels	Data integration speed (record/s)	Data integration bandwidth (Mbit/s)
1	129,753	15.45
2	284,953	33.70
3	385,868	45.71