By Zhang Youdong (Linqing) from the ApsaraDB team
Apache Database for IoT (IoTDB) is a database specifically designed for IoT time series data to provide data collection, storage, and analysis functions. IoTDB provides an integrated solution with high-performance data reading and writing and rich query capabilities on the cloud. It customizes an efficient directory organization structure for IoT scenarios and seamlessly integrates with big data systems, such as Apache Hadoop, Spark, and Flink. It provides lightweight TsFile management on edge nodes. Data on edge nodes can be written to the local TsFile, and basic query capabilities are provided. TsFile data can be synchronized to the cloud.
TsFile is a file format customized for storing time series data on IoT devices. It is organized in a tree directory structure. One TsFile can store the data of multiple devices, and each device contains multiple measurements (metrics.) The following figure shows a TsFile that contains the data of two devices, which are identified as d1 and d2. Each device contains three monitoring metrics: s1, s2, and s3.
The TsFile is a multi-level mapping table.
TsFileMetaData ==> TimeSeriesMetadata ==> ChunkMetadata ==> Chunk.
TsFileMetadatadescribes an entire TsFile, which contains metadata information, such as version information, the location of
MetadataIndexNode, and the total number of chunks.
TimeSeriesMetadatapoints to the metadata information of a device, the
ChunkMetadatapoints to the ChunkHeader location and corresponds to the final chunk data.
The built-in query engine in IoTDB parses all user commands, generates a plan, submits the plan to the corresponding executor, and returns the result set. Through the query engine, IoTDB provides a JDBC API, which is simple and easy to use.
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true); IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71) IoTDB> SELECT status FROM root.ln.wf01.wt01 +-----------------------+------------------------+ | Time|root.ln.wf01.wt01.status| +-----------------------+------------------------+ |1970-01-01T08:00:00.100| true| |1970-01-01T08:00:00.200| false| +-----------------------+------------------------+ Total line number = 2
The metadata model of IoTDB is organized in a tree structure. An instance contains multiple
storage groups that are similar to the concept of namespace and database. A
storage group contains multiple
device contains multiple
measurements. The time series data corresponding to
measurements is stored in
TsFile chunks. To facilitate data expiration, each
storage group segments data by time range and stores data in different directories. By default, data is segmented by week.
//Storage Group storage structure data -- sequence -- [Storage group name 1] ------ [Time partition ID 1] -------- xxxx.tsfile -------- xxxx.resource ------ [Time partition ID 2] -- [Storage group name 2] -- unsequence
The IoTDB storage engine is designed based on the LSM Tree structure. First, the written data is recorded in the WAL. Then, it is written to the memtable in the memory and gradually written to the TsFile on the disk in the background. The TsFile on the disk is compacted based on certain rules to ensure query efficiency.
IoTDB can be deployed on edge nodes and the cloud. Generally, data collected on edge nodes need to be synchronized to a remote end for further analysis and processing. IoTDB provides a synchronization tool to synchronize TsFile data on terminals or devices to the cloud.
IoTDB supports seamless connection with existing big data processing systems, including Hive and Spark. IoTDB provides connectors, such as
spark-iotdb, so Hive and Spark can directly access the TsFile data and IoTDB data.
HDFS or local disks are used for storage. HDFS for storage can ensure the high availability of the storage layer, but not of the computing layer.
Apache Flink Community China - April 23, 2020
Alibaba Clouder - July 21, 2020
GeekHouse - October 19, 2018
Apache Flink Community China - December 25, 2019
Apache Flink Community China - September 27, 2020
digoal - September 6, 2019
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.Learn More
A cloud solution for smart technology providers to quickly build stable, cost-efficient, and reliable ubiquitous platformsLearn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
A fully-managed Apache Kafka service to help you quickly build data pipelines for your big data analytics.Learn More
More Posts by ApsaraDB