×
Community Blog PostgreSQL Time-Series Database Plug-in TimescaleDB Deployment Practices

PostgreSQL Time-Series Database Plug-in TimescaleDB Deployment Practices

This article will present an example of using PostgreSQL's TimescaleDB plug-in, which covers using PostGIS and TimescaleDB for a PG spatial-temporal database.

Background

In the real world, the data generated by many businesses have the attribute of time-series data (that is, the data is written sequentially in time dimension, and a large number of requests for time-interval query statistics are also included).

For example, FEED data of the business, time-series data generated by Internet of Things (such as weather sensors and vehicle trajectories) and real-time data from the financial industry.

PostgreSQL UDF and BRIN (block-level index) are ideal for processing time-series data. Specifically, see the two following examples.

Implementation of On-demand Slicing in PostgreSQL - plpgsql Schemaless Implementation of the Automatic Slicing Feature of the TimescaleDB Plug-in

PostgreSQL Time-series Best Practices - Design a Stock Exchange System Database - Alibaba Cloud RDS for PostgreSQL Best Practices

1

In fact, in PostgreSQL ecology, a time-series plug-in named TimescaleDB has been derived, which is specially used to process time-series data. (Timescale improvements, including improvements to the SQL optimizer (it supports "merge append", and time shard aggregation is very efficient), rotate interface, and automatic slicing)

Many investors are also interested in TimescaleDB and it has already received an investment of USD 50 million, which indirectly indicates that the time-series database will be very popular with users in the future.

The Advantage of TimescaleDB

First, TimescaleDB is automatically sharded and has no influence from the users' perspective. When the amount of data is very large, the write performance does not deteriorate. (This mainly refers to disks with lower IOPS. For disks with better IOPS, PG performs OK after writing a large amount of data.)

2

Secondly, Timescale improves SQL optimizer and adds the execution node of "merge append". When "group by" is performed on small time shards, it does not need to perform HASH or GROUP operation on the entire timestamp range, but instead performs calculation on shards, which makes it very efficient.

Finally, some APIs have been added to Timescale, making it very efficient for users to write, maintain, and query time-series data, and very easy to maintain the data.

These APIs are as follows: http://docs.timescale.com/v0.8/api

Deploy TimescaleDB

Take CentOS 7.x x64 as an example.

1.  Install PostgreSQL

Please see PostgreSQL on Linux Best Deployment Manual

export USE_NAMED_POSIX_SEMAPHORES=1  
LIBS=-lpthread CFLAGS="-O3" ./configure --prefix=/home/digoal/pgsql10 --with-segsize=8 --with-wal-segsize=256  
LIBS=-lpthread CFLAGS="-O3" make world -j 64  
LIBS=-lpthread CFLAGS="-O3" make install-world  

2.  Install cmake3

epel  
  
yum install -y cmake3  
  
ln -s /usr/bin/cmake3 /usr/bin/cmake  

3.  Compile TimescaleDB

git clone https://github.com/timescale/timescaledb/  
  
cd timescaledb  
git checkout release-0.8.0  
  
或  
  
wget https://github.com/timescale/timescaledb/archive/0.8.0.tar.gz  
  
  
  
export PATH=/home/digoal/pgsql10/bin:$PATH  
export LD_LIBRARY_PATH=/home/digoal/pgsql10/lib:$LD_LIBRARY_PATH  
  
# Bootstrap the build system  
./bootstrap  
  
cd ./build && make  
  
make install  
  
  
[  2%] Built target sqlupdatefile  
[  4%] Built target sqlfile  
[100%] Built target timescaledb  
Install the project...  
-- Install configuration: "Release"  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb.control  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.8.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.1--0.8.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.1.0--0.2.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.2.0--0.3.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.3.0--0.4.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.0--0.4.1.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.1--0.4.2.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.4.2--0.5.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.5.0--0.6.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.0--0.6.1.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.0.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.6.1--0.7.1.sql  
-- Installing: /home/dege.zzz/pgsql10/share/extension/timescaledb--0.7.0--0.7.1.sql  
-- Installing: /home/dege.zzz/pgsql10/lib/timescaledb.so  

4.  Configure postgresql.conf to automatically load the timescale lib library when the database is started

vi $PGDATA/postgresql.conf  
shared_preload_libraries = 'timescaledb'  
  
pg_ctl restart -m fast  

5.  Create plug-ins for databases that need to use TimescaleDB

psql  
psql (10.1)  
Type "help" for help.  
  
postgres=# create extension timescaledb ;  

6.  Parameters related to TimescaleDB

timescaledb.constraint_aware_append     
timescaledb.disable_optimizations       
timescaledb.optimize_non_hypertables    
timescaledb.restoring    
  
postgres=# show timescaledb.constraint_aware_append ;  
 timescaledb.constraint_aware_append   
-------------------------------------  
 on  
(1 row)  
  
postgres=# show timescaledb.disable_optimizations ;  
 timescaledb.disable_optimizations   
-----------------------------------  
 off  
(1 row)  
  
postgres=# show timescaledb.optimize_non_hypertables ;  
 timescaledb.optimize_non_hypertables   
--------------------------------------  
 off  
(1 row)  
  
postgres=# show timescaledb.restoring ;  
 timescaledb.restoring   
-----------------------  
 off  
(1 row)  

TimescaleDB usage example 1 - perspective analysis of New York taxi data

The first example is the actual New York city taxicab data, http://docs.timescale.com/v0.8/tutorials/tutorial-hello-nyc

The data is real, taken from New York city taxi cabs, http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

1.  Download sample data

wget https://timescaledata.blob.core.windows.net/datasets/nyc_data.tar.gz  

2.  Extract

tar -zxvf nyc_data.tar.gz   

3.  Create a table, which involves using the create_hypertable API to convert ordinary tables into time-series storage tables.

psql -f nyc_data.sql  

Some of the truncated nyc_data.sql content is as follows:

cat nyc_data.sql  
  
-- 打车数据: 包括时长、计费、路程、上车、下车经纬度、时间、人数等等。  
  
CREATE TABLE "rides"(  
    vendor_id TEXT,  
    pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,  
    dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,  
    passenger_count NUMERIC,  
    trip_distance NUMERIC,  
    pickup_longitude  NUMERIC,  
    pickup_latitude   NUMERIC,  
    rate_code         INTEGER,  
    dropoff_longitude NUMERIC,  
    dropoff_latitude  NUMERIC,  
    payment_type INTEGER,  
    fare_amount NUMERIC,  
    extra NUMERIC,  
    mta_tax NUMERIC,  
    tip_amount NUMERIC,  
    tolls_amount NUMERIC,  
    improvement_surcharge NUMERIC,  
    total_amount NUMERIC  
);  

This sentence converts the "rides" table into a time-series storage table

SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);  

Create an index

CREATE INDEX ON rides (vendor_id, pickup_datetime desc);  
CREATE INDEX ON rides (pickup_datetime desc, vendor_id);  
CREATE INDEX ON rides (rate_code, pickup_datetime DESC);  
CREATE INDEX ON rides (passenger_count, pickup_datetime desc);  

4.  Import test data

psql -c "\COPY rides FROM nyc_data_rides.csv CSV"  
COPY 10906858  

5.  Execute some test SQL on the "rides" table that has been converted to a time-series storage table, the performance of which is better than PostgreSQL ordinary tables.

What is the average charge for transactions with more than two passengers per day?

-- Average fare amount of rides with 2+ passengers by day  
  
SELECT date_trunc('day', pickup_datetime) as day, avg(fare_amount)  
  FROM rides  
  WHERE passenger_count > 1 AND pickup_datetime < '2016-01-08'  
  GROUP BY day ORDER BY day;  
  
        day         |         avg  
--------------------+---------------------  
2016-01-01 00:00:00 | 13.3990821679715529  
2016-01-02 00:00:00 | 13.0224687415181399  
2016-01-03 00:00:00 | 13.5382068607068607  
2016-01-04 00:00:00 | 12.9618895561740149  
2016-01-05 00:00:00 | 12.6614611935518309  
2016-01-06 00:00:00 | 12.5775245695086098  
2016-01-07 00:00:00 | 12.5868802584437019  
(7 rows)  

6.  The performance of some queries is even more than 20 times better

How many transactions are there every day?

-- Total number of rides by day for first 5 days  
  
SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides  
  GROUP BY day ORDER BY day  
  LIMIT 5;  
  
        day         | count  
--------------------+--------  
2016-01-01 00:00:00 | 345037  
2016-01-02 00:00:00 | 312831  
2016-01-03 00:00:00 | 302878  
2016-01-04 00:00:00 | 316171  
2016-01-05 00:00:00 | 343251  
(5 rows)  

Timescale adds the execution optimization of "merge append", so it is highly efficient to aggregate by small granularity on time shards. The more data, the more obvious the difference in performance improvement.

For example, TimescaleDB introduces a time-based "merge append" optimization to minimize the number of groups which must be processed to execute the following (given its knowledge that time is already ordered).

For our 100M row table, this results in query latency that is 396x faster than PostgreSQL (82ms vs. 32566ms).

SELECT date_trunc('minute', time) AS minute, max(usage_user)  
  FROM cpu  
  WHERE time < '2017-01-01'  
  GROUP BY minute  
  ORDER BY minute DESC  
  LIMIT 5;  

7.  Execute some functions specific to TimescaleDB, such as time_bucket, and some acceleration algorithms built into TimescaleDB is also used here.

Every 5-minute interval is a BUCKET, which produces the number of orders generated in each interval.

-- Number of rides by 5 minute intervals  
--   (using the TimescaleDB "time_bucket" function)  
  
SELECT time_bucket('5 minute', pickup_datetime) as five_min, count(*)  
  FROM rides  
  WHERE pickup_datetime < '2016-01-01 02:00'  
  GROUP BY five_min ORDER BY five_min;  
  
      five_min       | count  
---------------------+-------  
 2016-01-01 00:00:00 |   703  
 2016-01-01 00:05:00 |  1482  
 2016-01-01 00:10:00 |  1959  
 2016-01-01 00:15:00 |  2200  
 2016-01-01 00:20:00 |  2285  
 2016-01-01 00:25:00 |  2291  
 2016-01-01 00:30:00 |  2349  
 2016-01-01 00:35:00 |  2328  
 2016-01-01 00:40:00 |  2440  
 2016-01-01 00:45:00 |  2372  
 2016-01-01 00:50:00 |  2388  
 2016-01-01 00:55:00 |  2473  
 2016-01-01 01:00:00 |  2395  
 2016-01-01 01:05:00 |  2510  
 2016-01-01 01:10:00 |  2412  
 2016-01-01 01:15:00 |  2482  
 2016-01-01 01:20:00 |  2428  
 2016-01-01 01:25:00 |  2433  
 2016-01-01 01:30:00 |  2337  
 2016-01-01 01:35:00 |  2366  
 2016-01-01 01:40:00 |  2325  
 2016-01-01 01:45:00 |  2257  
 2016-01-01 01:50:00 |  2316  
 2016-01-01 01:55:00 |  2250  
(24 rows)  

8.  Execute some statistical analysis SQL

The volume of taxi transactions in each city.

-- Join rides with rates to get more information on rate_code  
  
SELECT rates.description, COUNT(vendor_id) as num_trips FROM rides  
  JOIN rates on rides.rate_code = rates.rate_code  
  WHERE pickup_datetime < '2016-01-08'  
  GROUP BY rates.description ORDER BY rates.description;  
  
      description      | num_trips  
-----------------------+-----------  
 JFK                   |     54832  
 Nassau or Westchester |       967  
 Newark                |      4126  
 group ride            |        17  
 negotiated fare       |      7193  
 standard rate         |   2266401  
(6 rows)  

Statistics of taxi rides in some cities in January 2016 (including longest distance, shortest distance, average number of passengers, and hours)

-- Analysis of all JFK and EWR rides in Jan 2016  
  
SELECT rates.description, COUNT(vendor_id) as num_trips,  
    AVG(dropoff_datetime - pickup_datetime) as avg_trip_duration, AVG(total_amount) as avg_total,  
    AVG(tip_amount) as avg_tip, MIN(trip_distance) as min_distance, AVG(trip_distance) as avg_distance, MAX(trip_distance) as max_distance,  
    AVG(passenger_count) as avg_passengers  
  FROM rides  
  JOIN rates on rides.rate_code = rates.rate_code  
  WHERE rides.rate_code in (2,3) AND pickup_datetime < '2016-02-01'  
  GROUP BY rates.description ORDER BY rates.description;  
  
 description | num_trips | avg_trip_duration |      avg_total      |      avg_tip       | min_distance |    avg_distance     | max_distance |   avg_passengers  
-------------+-----------+-------------------+---------------------+--------------------+--------------+---------------------+--------------+--------------------  
 JFK         |    225019 | 00:45:46.822517   | 64.3278115181384683 | 7.3334228220728027 |         0.00 | 17.2602816651038357 |       221.00 | 1.7333869584346211  
 Newark      |     16822 | 00:35:16.157472   | 86.4633688027582927 | 9.5461657353465700 |         0.00 | 16.2706122934252764 |       177.23 | 1.7435501129473309  
(2 rows)  

9.  Automatic data sharding and run plan

postgres=# \d+ rides  
                                                     Table "public.rides"  
        Column         |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 vendor_id             | text                        |           |          |         | extended |              |   
 pickup_datetime       | timestamp without time zone |           | not null |         | plain    |              |   
 dropoff_datetime      | timestamp without time zone |           | not null |         | plain    |              |   
 passenger_count       | numeric                     |           |          |         | main     |              |   
 trip_distance         | numeric                     |           |          |         | main     |              |   
 pickup_longitude      | numeric                     |           |          |         | main     |              |   
 pickup_latitude       | numeric                     |           |          |         | main     |              |   
 rate_code             | integer                     |           |          |         | plain    |              |   
 dropoff_longitude     | numeric                     |           |          |         | main     |              |   
 dropoff_latitude      | numeric                     |           |          |         | main     |              |   
 payment_type          | integer                     |           |          |         | plain    |              |   
 fare_amount           | numeric                     |           |          |         | main     |              |   
 extra                 | numeric                     |           |          |         | main     |              |   
 mta_tax               | numeric                     |           |          |         | main     |              |   
 tip_amount            | numeric                     |           |          |         | main     |              |   
 tolls_amount          | numeric                     |           |          |         | main     |              |   
 improvement_surcharge | numeric                     |           |          |         | main     |              |   
 total_amount          | numeric                     |           |          |         | main     |              |   
Indexes:  
    "rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)  
    "rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)  
    "rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)  
    "rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)  
Child tables: _timescaledb_internal._hyper_1_1_chunk,  
              _timescaledb_internal._hyper_1_2_chunk,  
              _timescaledb_internal._hyper_1_3_chunk,  
              _timescaledb_internal._hyper_1_4_chunk  
  
其中一个分片的约束如下  
Check constraints:  
    "constraint_1" CHECK (pickup_datetime >= '2015-12-31 00:00:00'::timestamp without time zone AND pickup_datetime < '2016-01-30 00:00:00'::timestamp without time zone)  
    "constraint_2" CHECK (_timescaledb_internal.get_partition_hash(payment_type) >= 1073741823)  
Inherits: rides  
-- Peek behind the scenes  
  
postgres=#  select count(*) from rides;  
  count     
----------  
 10906858  
(1 row)  
  
Time: 376.247 ms  
postgres=# explain select count(*) from rides;  
                                                 QUERY PLAN                                                   
------------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=254662.23..254662.24 rows=1 width=8)  
   ->  Gather  (cost=254661.71..254662.22 rows=5 width=8)  
         Workers Planned: 5  
         ->  Partial Aggregate  (cost=253661.71..253661.72 rows=1 width=8)  
               ->  Append  (cost=0.00..247468.57 rows=2477258 width=0)  
                     ->  Parallel Seq Scan on rides  (cost=0.00..0.00 rows=1 width=0)  
                     ->  Parallel Seq Scan on _hyper_1_1_chunk  (cost=0.00..77989.57 rows=863657 width=0)  
                     ->  Parallel Seq Scan on _hyper_1_2_chunk  (cost=0.00..150399.01 rows=1331101 width=0)  
                     ->  Parallel Seq Scan on _hyper_1_3_chunk  (cost=0.00..6549.75 rows=112675 width=0)  
                     ->  Parallel Seq Scan on _hyper_1_4_chunk  (cost=0.00..12530.24 rows=169824 width=0)  
(10 rows)  

10.  You can also check the shards directly

postgres=# select count(*) from  _timescaledb_internal._hyper_1_1_chunk;  
  count    
---------  
 3454961  
(1 row)  

Slices Are Completely Transparent to Users

Sliced metadata:

postgres=# \dn  
         List of schemas  
         Name          |  Owner     
-----------------------+----------  
 _timescaledb_cache    | postgres  
 _timescaledb_catalog  | postgres  
 _timescaledb_internal | postgres  
 public                | postgres  
(4 rows)  

Timescaledb + Postgis Combination - Spatial-Temporal Database

The time-series database timescaleDB plug-in is combined with the spatial-temporal database PostGIS plug-in. PostgreSQL is very good at handling spatial data.

1.  Create a spatial database PostGIS

create extension postgis;  

2.  Add a spatial type field

http://postgis.net/docs/manual-2.4/AddGeometryColumn.html

postgres=# SELECT AddGeometryColumn ('public','rides','pickup_geom',2163,'POINT',2);  
                   addgeometrycolumn                      
--------------------------------------------------------  
 public.rides.pickup_geom  SRID:2163 TYPE:POINT DIMS:2   
(1 row)  
  
postgres=# SELECT AddGeometryColumn ('public','rides','dropoff_geom',2163,'POINT',2);  
                    addgeometrycolumn                      
---------------------------------------------------------  
 public.rides.dropoff_geom  SRID:2163 TYPE:POINT DIMS:2   
(1 row)  
  
postgres=#   
postgres=# \d+ rides  
                                                     Table "public.rides"  
        Column         |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description   
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------  
 vendor_id             | text                        |           |          |         | extended |              |   
 pickup_datetime       | timestamp without time zone |           | not null |         | plain    |              |   
 dropoff_datetime      | timestamp without time zone |           | not null |         | plain    |              |   
 passenger_count       | numeric                     |           |          |         | main     |              |   
 trip_distance         | numeric                     |           |          |         | main     |              |   
 pickup_longitude      | numeric                     |           |          |         | main     |              |   
 pickup_latitude       | numeric                     |           |          |         | main     |              |   
 rate_code             | integer                     |           |          |         | plain    |              |   
 dropoff_longitude     | numeric                     |           |          |         | main     |              |   
 dropoff_latitude      | numeric                     |           |          |         | main     |              |   
 payment_type          | integer                     |           |          |         | plain    |              |   
 fare_amount           | numeric                     |           |          |         | main     |              |   
 extra                 | numeric                     |           |          |         | main     |              |   
 mta_tax               | numeric                     |           |          |         | main     |              |   
 tip_amount            | numeric                     |           |          |         | main     |              |   
 tolls_amount          | numeric                     |           |          |         | main     |              |   
 improvement_surcharge | numeric                     |           |          |         | main     |              |   
 total_amount          | numeric                     |           |          |         | main     |              |   
 pickup_geom           | geometry(Point,2163)        |           |          |         | main     |              |   
 dropoff_geom          | geometry(Point,2163)        |           |          |         | main     |              |   
Indexes:  
    "rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)  
    "rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)  
    "rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)  
    "rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)  
Child tables: _timescaledb_internal._hyper_1_1_chunk,  
              _timescaledb_internal._hyper_1_2_chunk,  
              _timescaledb_internal._hyper_1_3_chunk,  
              _timescaledb_internal._hyper_1_4_chunk  

3.  Update the data to the geometry field (It is actually stored as two automatic fields, representing longitude and latitude respectively. In fact, it does not matter whether it is updated or not, because PG supports expression indexes, and you can use these two fields to create expression spatial indexes.)

-- Generate the geometry points and write to table  
--   (Note: These calculations might take a few mins)  
  
UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163);  
UPDATE rides SET dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);  
  
  
vacuum full rides;  

4.  Examples of Spatio-Temporal Analysis.

How many cars are called every 30 minutes within 400 meters of (lat, long) (40.7589,-73.9851).

-- Number of rides on New Years Eve originating within  
--   400m of Times Square, by 30 min buckets  
--   Note: Times Square is at (lat, long) (40.7589,-73.9851)  
  
SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq  
  FROM rides  
  WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400  
    AND pickup_datetime < '2016-01-01 14:00'  
  GROUP BY thirty_min ORDER BY thirty_min;  
  
     thirty_min      | near_times_sq  
---------------------+--------------  
 2016-01-01 00:00:00 |      74  
 2016-01-01 00:30:00 |     102  
 2016-01-01 01:00:00 |     120  
 2016-01-01 01:30:00 |      98  
 2016-01-01 02:00:00 |     112  
 2016-01-01 02:30:00 |     109  
 2016-01-01 03:00:00 |     163  
 2016-01-01 03:30:00 |     181  
 2016-01-01 04:00:00 |     214  
 2016-01-01 04:30:00 |     185  
 2016-01-01 05:00:00 |     158  
 2016-01-01 05:30:00 |     113  
 2016-01-01 06:00:00 |     102  
 2016-01-01 06:30:00 |      91  
 2016-01-01 07:00:00 |      88  
 2016-01-01 07:30:00 |      58  
 2016-01-01 08:00:00 |      72  
 2016-01-01 08:30:00 |      94  
 2016-01-01 09:00:00 |     115  
 2016-01-01 09:30:00 |     118  
 2016-01-01 10:00:00 |     135  
 2016-01-01 10:30:00 |     160  
 2016-01-01 11:00:00 |     212  
 2016-01-01 11:30:00 |     229  
 2016-01-01 12:00:00 |     244  
 2016-01-01 12:30:00 |     230  
 2016-01-01 13:00:00 |     235  
 2016-01-01 13:30:00 |     238  

Example 2 - Sensor Data and Weather Data

http://docs.timescale.com/v0.8/tutorials/other-sample-datasets

No more details are given here.

Common APIs for TimescaleDB

http://docs.timescale.com/v0.8/api

1.  Create a Time-Series Table

create_hypertable()

Required Arguments

Name Description
main_table Identifier of table to convert to hypertable
time_column_name Name of the column containing time values

Optional Arguments

Name Description
partitioning_column Name of an additional column to partition by. If provided, number_partitions must be set.
number_partitions Number of hash partitions to use for partitioning_column when this optional argument is supplied. Must be > 0.
chunk_time_interval Interval in event time that each chunk covers. Must be > 0. Default is 1 month.
create_default_indexes Boolean whether to create default indexes on time/partitioning columns. Default is TRUE.
if_not_exists Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE.
partitioning_func The function to use for calculating a value's partition.

2.  Add a Multi-Level Sharded Field

Hash and interval shards are supported

add_dimension()

Required Arguments

Name Description
main_table Identifier of hypertable to add the dimension to.
column_name Name of the column to partition by.

Optional Arguments

Name Description
number_partitions Number of hash partitions to use on column_name. Must be > 0.
interval_length Interval that each chunk covers. Must be > 0.
partitioning_func The function to use for calculating a value's partition (see create_hypertable instructions).

3.  Delete a Shard

Delete shards before the specified time

drop_chunks()

Required Arguments

Name Description
older_than Timestamp of cut-off point for data to be dropped, i.e., anything older than this should be removed.

Optional Arguments

Name Description
table_name Hypertable name from which to drop chunks. If not supplied, all hypertables are affected.
schema_name Schema name of the hypertable from which to drop chunks. Defaults to public.
cascade Boolean on whether to CASCADE the drop on chunks, therefore removing dependent objects on chunks to be removed. Defaults to FALSE.

4.  Set the Time Interval of Shards

set_chunk_time_interval()

Required Arguments

Name Description
main_table Identifier of hypertable to update interval for.
chunk_time_interval Interval in event time that each new chunk covers. Must be > 0.

5.  Analysis Function - the First Record

first()

Required Arguments

Name Description
value The value to return (anyelement)
time The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type)

For example, find the earliest uploaded temperature values for all sensors.

SELECT device_id, first(temp, time)  
  FROM metrics  
  GROUP BY device_id;  

This can also be done by using recursive SQL:

Applications of PostgrSQL Recursive SQL - Geeks and Normal People

6.  Analysis Function - the Last Record

last()

Required Arguments

Name Description
value The value to return (anyelement)
time The timestamp to use for comparison (TIMESTAMP/TIMESTAMPTZ or integer type)

For example, find the latest temperature value of each sensor every 5 minutes.

SELECT device_id, time_bucket('5 minutes', time) as interval,  
  last(temp, time)  
  FROM metrics  
  WHERE time > now () - interval '1 day'  
  GROUP BY device_id, interval  
  ORDER BY interval DESC;  

This can also be done by using recursive SQL:

Applications of PostgrSQL Recursive SQL - Geeks and Normal People

7.  Analysis Function - Histogram

histogram()

Required Arguments

Name Description
value A set of values to partition into a histogram
min The histogram's lower bound used in bucketing
max The histogram's upper bound used in bucketing
nbuckets The integer value for the number of histogram buckets (partitions)

For example:

The battery level of 20 to 60 is divided into FIVE BUCKET intervals, and an array of 5 + 2 values (representing the number of records in each bucket interval) is returned. The two values at the beginning and the end indicate how many records are outside the boundary.

SELECT device_id, histogram(battery_level, 20, 60, 5)  
  FROM readings  
  GROUP BY device_id  
  LIMIT 10;  
  
 device_id  |          histogram  
------------+------------------------------  
 demo000000 | {0,0,0,7,215,206,572}  
 demo000001 | {0,12,173,112,99,145,459}  
 demo000002 | {0,0,187,167,68,229,349}  
 demo000003 | {197,209,127,221,106,112,28}  
 demo000004 | {0,0,0,0,0,39,961}  
 demo000005 | {12,225,171,122,233,80,157}  
 demo000006 | {0,78,176,170,8,40,528}  
 demo000007 | {0,0,0,126,239,245,390}  
 demo000008 | {0,0,311,345,116,228,0}  
 demo000009 | {295,92,105,50,8,8,442}  

8.  Analysis Function - Time Interval

This is similar to date_trunc, but it is more powerful and can be truncated with any interval. It is easy for users to use.

time_bucket()

Required Arguments

Name Description
bucket_width A PostgreSQL time interval for how long each bucket is (interval)
time The timestamp to bucket (timestamp/timestamptz/date)

Optional Arguments

Name Description
offset The time interval to offset all buckets by (interval)

9.  The View Function for Data Overview - Time-Series Table Overview

hypertable_relation_size_pretty()

SELECT * FROM hypertable_relation_size_pretty('conditions');  
  
 table_size | index_size | toast_size | total_size  
------------+------------+------------+------------  
 1171 MB    | 1608 MB    | 176 kB     | 2779 MB  

10.  The View Function for Data Overview - Shard Size

chunk_relation_size_pretty()

SELECT * FROM chunk_relation_size_pretty('conditions');  
  
                chunk_table                 | table_size | index_size | total_size  
---------------------------------------------+------------+------------+------------  
 "_timescaledb_internal"."_hyper_1_1_chunk"  | 28 MB      | 36 MB      | 64 MB  
 "_timescaledb_internal"."_hyper_1_2_chunk"  | 57 MB      | 78 MB      | 134 MB  
 ...  

11.  View Function for Data Overview - Index Size

indexes_relation_size_pretty()

SELECT * FROM indexes_relation_size_pretty('conditions');  
  
             index_name_              | total_size  
--------------------------------------+------------  
 public.conditions_device_id_time_idx | 1143 MB  
 public.conditions_time_idx           | 465 MB  

12.  Export Time-Series Metadata

https://raw.githubusercontent.com/timescale/timescaledb/master/scripts/dump_meta_data.sql

psql [your connect flags] -d your_timescale_db < dump_meta_data.sql > dumpfile.txt  

Summary

TimescaleDB is a very useful time-series data processing plug-in, hiding the shard logic (it is transparent to users) and providing a large number of API function interfaces and performance optimization. It is great for time-series scenarios.

Combined with PostGIS plug-in, PostgreSQL is more powerful in spatio-temporal processing.

0 0 0
Share on

digoal

48 posts | 2 followers

You may also like

Comments