同步資料到時序表 - Tablestore

您可以使用kafka-connect-tablestore包將Kafka中資料寫入Tablestore的時序表中。本文主要介紹了如何配置Kafka寫入時序資料。

前提条件

已安裝Kafka，並且已啟動ZooKeeper和Kafka。更多資訊，請參見Kafka官方文檔。
已開通Tablestore服務，建立執行個體以及建立時序表。具體操作，請參見快速使用時序模型。

说明您也可以通過Tablestore Sink Connector自動建立目標時序表，此時需要配置auto.create為true。
已擷取AccessKey。具體操作，請參見擷取AccessKey。

背景信息

Tablestore支援對時序資料進行儲存以及分析。更多資訊，請參見時序模型概述。

步驟一：部署Tablestore Sink Connector

通過以下任意一種方式擷取Tablestore Sink Connector。
- 通過GitHub下載源碼並編譯。源碼的GitHub路徑為Tablestore Sink Connector源碼。
  1. 通過Git工具執行以下命令下載Tablestore Sink Connector源碼。
```
git clone https://github.com/aliyun/kafka-connect-tablestore.git
```
  2. 進入到下載的源碼目錄後，執行以下命令進行Maven打包。
```
mvn clean package -DskipTests
```
    編譯完成後，產生的壓縮包（例如kafka-connect-tablestore-1.0.jar）會存放在target目錄。
- 直接下載編譯完成的kafka-connect-tablestore壓縮包。
將壓縮包複製到各個節點的$KAFKA_HOME/libs目錄下。

步驟二：啟動Tablestore Sink Connector

Tablestore Sink Connector具有standalone模式和distributed模式兩種工作模式。請根據實際選擇。

由於寫入時序資料時，Kafka側的訊息記錄必須為JSON格式，因此啟動Tablestore Sink Connector時需要使用Jsonconverter，且不需要提取schema以及不需要輸入key，請在connect-standalone.properties和connect-distributed.properties中按照如下樣本配置對應配置項。

说明如果輸入了key，請按照key的格式配置key.converter和key.converter.schemas.enable。

value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false

此處以配置standalone模式為例介紹，distributed模式的配置步驟與同步資料到資料表時的distributed模式配置步驟類似，只需按照上述樣本在worker設定檔connect-distributed.properties中修改對應配置項以及在connetor檔案connect-tablestore-sink-quickstart.json中修改時序相關配置即可。具體操作，請參見步驟二：啟動Tablestore Sink Connector中distributed模式的配置步驟。

standalone模式的配置步驟如下：

根據實際修改worker設定檔connect-standalone.properties和connetor設定檔connect-tablestore-sink-quickstart.properties。

worker設定檔connect-standalone.properties的配置樣本

worker配置中包括Kafka串連參數、序列化格式、提交位移量的頻率等配置項。此處以Kafka官方樣本為例介紹。更多資訊，請參見Kafka Connect。

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=localhost:9092

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=false

offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include 
# any combination of: 
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples: 
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
#plugin.path=

connetor設定檔connect-tablestore-sink-quickstart.properties的配置樣本

connetor配置中包括連接器類、Tablestore串連參數、資料對應等配置項。更多資訊，請參見配置說明。

# 設定連接器名稱。
name=tablestore-sink
# 指定連接器類。
connector.class=TableStoreSinkConnector
# 設定最大任務數。 
tasks.max=1
# 指定匯出資料的Kafka的Topic列表。
topics=test

# 以下為Tablestore串連參數的配置。
# Tablestore執行個體的Endpoint。
tablestore.endpoint=https://xxx.xxx.ots.aliyuncs.com
# 指定認證模式。
tablestore.auth.mode=aksk
# 填寫AccessKey ID和AccessKey Secret。如果使用aksk認證，則需要填入這兩項。
tablestore.access.key.id=xxx
tablestore.access.key.secret=xxx
# 指定Tablestore執行個體名稱。
tablestore.instance.name=xxx
## STS認證相關配置，如果使用STS認證，則下列各項必填。此外aksk還需要在環境變數中配置配入ACCESS_ID和ACCESS_KEY。
#sts.endpoint=
#region=
#account.id=
#role.name=

# 定義目標表名稱的格式字串，字串中可包含<topic>作為原始Topic的預留位置。
# topics.assign.tables配置的優先順序更高，如果配置了topics.assign.tables，則忽略table.name.format的配置。
# 例如當設定table.name.format為kafka_<topic>時，如果kafka中主題名稱為test，則將映射到Tablestore的表名為kafka_test。
table.name.format=<topic>
# 指定Topic與目標表的映射關係，以"<topic>:<tablename>"格式映射Topic和表名，Topic和表名之間的分隔字元為半形冒號（:），不同映射之間分隔字元為半形逗號（,）。
# 如果預設，則採取table.name.format的配置。
# topics.assign.tables=test:test_kafka


# 是否自動建立目標表，預設值為false。
auto.create=true


# 以下為髒資料處理相關配置。
# 在解析Kafka Record或者寫入時序表時可能發生錯誤，您可以可通過以下配置進行處理。
# 指定容錯能力，可選值包括none和all，預設值為none。
# none表示任何錯誤都將導致Sink Task立即失敗。
# all表示跳過產生錯誤的Record，並記錄該Record。
runtime.error.tolerance=none
# 指定髒資料記錄模式，可選值包括ignore、kafka和tablestore，預設值為ignore。
# ignore表示忽略所有錯誤。
# kafka表示將產生錯誤的Record和錯誤資訊儲存在Kafka的另一個Topic中。
# tablestore表示將產生錯誤的Record和錯誤資訊儲存在Tablestore另一張資料表中。
runtime.error.mode=ignore

# 當髒資料記錄模式為kafka時，需要配置Kafka叢集地址和Topic。
# runtime.error.bootstrap.servers=localhost:9092
# runtime.error.topic.name=errors

# 當髒資料記錄模式為tablestore時，需要配置Tablestore中資料表名稱。
# runtime.error.table.name=errors

##以下為時序表新增配置。

# connector工作模式，預設為normal。
tablestore.mode=timeseries
# 時序表主鍵欄位對應。
tablestore.timeseries.test.measurement=m
tablestore.timeseries.test.dataSource=d
tablestore.timeseries.test.tags=region,level
# 時序表時間欄位對應。
tablestore.timeseries.test.time=timestamp
tablestore.timeseries.test.time.unit=MILLISECONDS
# 是否將時序資料欄位（field）的列名轉為小寫，預設為true。由於當前時序模型中時序表的列名不支援大寫字母，如果配置為false，且列名中有大寫字母，寫入會報錯。
tablestore.timeseries.toLowerCase=true
# 是否將所有非主鍵以及時間的欄位以field的形式儲存在時序表，預設為true，如果為false，則只儲存tablestore.timeseries.test.field.name中配置的欄位
tablestore.timeseries.mapAll=true
# 配置field欄位名稱，多個欄位名稱之間用半形冒號（,）分隔。
tablestore.timeseries.test.field.name=cpu
# 配置field欄位類型。取值範圍為double、integer、string、binary和boolean。
# 當field中包含多個欄位時，欄位類型必須和欄位名稱一一對應。多個欄位類型之間用半形冒號（,）分隔。
tablestore.timeseries.test.field.type=double

進入到$KAFKA_HOME目錄後，執行以下命令啟動standalone模式。

bin/connect-standalone.sh config/connect-standalone.properties config/connect-tablestore-sink-quickstart.properties

步驟三：生產新的記錄

進入到$KAFKA_HOME目錄後，執行以下命令啟動一個控制台生產者。

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

配置項說明請參見下表。


配置項	樣本值	描述
--broker-list	localhost:9092	Kafka叢集broker地址和連接埠。
--topic	test	主題名稱。啟動Tablestore Sink Connetor時預設會自動建立Topic，您也可以選擇手動建立。

向主題test中寫入一些新的訊息。
注意如果要匯入資料到時序表，則向主題中寫入資料時必須輸入JSON格式的資料。
```
{"m":"cpu","d":"127.0.0.1","region":"shanghai","level":1,"timestamp":1638868699090,"io":5.5,"cpu":"3.5"}
```
登入Tablestore控制台查看資料。