×
Community Blog Use Anomaly Detection in Time Series Data with Alibaba Cloud Lindorm Machine Learning

Use Anomaly Detection in Time Series Data with Alibaba Cloud Lindorm Machine Learning

In this article, I'll introduce the anomaly detection in time series data using SQL only with the Machine Learning (ML) feature of Alibaba Cloud Lindorm.

Preface

Alibaba Cloud Lindorm is a cloud-native multi-modal database provided by Alibaba Cloud. Its time series engine includes Lindorm ML, which enables easy and quick time series machine learning tasks such as anomaly detection, predictive maintenance, clustering, exception detection, and SVM using only SQL.

In this tutorial, I will explain how to use Lindorm ML to automatically and regularly detect exceptions in time series data by creating continuous queries.

For example, let's assume that there are temperature and humidity sensors for IoT, and the related data is uploaded to Lindorm every second. Typically, temperature and humidity do not experience sudden fluctuations. However, extremely high or low values would be considered as exceptions in the data.

1

1.Architecture

Deploy ECS and Lindorm in the same region/zone and use a script on ECS instead of temperature and humidity sensors to write data to Lindorm every second. Then, use SQL on Lindorm to try timing exception detection with Lindorm ML.

2

2. Creating a Lindorm Instance with a Time Series Engine

As Lindorm ML only provides a time series engine, we need to create a Lindorm instance that comes with a time series engine.

3
4
5

  1. Click the "Create" button on the Lindorm console.
  2. Configure the instance settings such as Region, Zone, network, and name.
  3. Add specific nodes and enable the time series engine on the instance.
  4. Click the "Buy Now" button and proceed to the order confirmation page.
  5. Check the checkbox to agree to the terms of use.
  6. Click the "Activate Now" button to execute the operation.

Once completed, the new instance will be created with the Creating status.

6

Lindorm has specific requirements, such as the minimum number of nodes needed, depending on the type of engine. As a time series engine, you must verify that the number of nodes meets the requirements. If there are insufficient nodes, the "Buy Now" button will be disabled and an error message will appear.

7

3. Configure Lindorm Instance

Once the instance status changes to Running, you can perform several configurations, such as changing the maintenance time, instance name, and updating the whitelist.

8

General configuration items are detailed on the instance page. By clicking the Modify button next to the corresponding item, you can modify the settings.

9
10
11

Similar to other engines, you can apply for a public endpoint by clicking the relevant button based on the network requirements.

12

The instance comes with a whitelist, and it only accepts connections from the whitelisted IP addresses for security. If you don't add an IP address to the whitelist, an error may occur when connecting.

13
14

Since we are sending the request from an ECS under the same Region/Zone/VPN, we only need to add the intranet IP address to the whitelist, and there is no need to apply for a public endpoint.

4. Setup Lindorm CLI and Java Work Environment

We will use an ECS instance under the same VPC as the work environment. Connect to the pre-prepared ECS instance and perform several configurations.

15

5. Prepare Lindorm CLI

Lindorm CLI connects to the Lindorm time series engine and uses Lindorm ML to detect data anomalies. When you download and extract the Lindorm CLI installer, you will get an executable tool named "lindorm-cli." The author is using Linux to build it on ECS, but if you are using a different operating system, please refer to Lindorm CLI.

wget -O lindorm-cli-linux-latest.tar.gz https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-linux-latest.tar.gz?spm=a2c63.p38356.0.0.338d5a2egBGtdx&file=lindorm-cli-linux-latest.tar.gz
tar -xvf lindorm-cli-linux-latest.tar.gz

16

6. Prepare Java Environment

We generate and send dummy data for temperature and humidity sensors using Java code built on ECS. To run the Java file on the ECS instance, you need to prepare the Java environment

sudo apt-get update
sudo apt-get install openjdk-8-jdk
java -version

17

7. Creating a Database and Table on the Lindorm Instance

Connect to the Lindorm instance using Lindorm CLI and create the necessary database and table. Although you can leave this step for the Java code, it is better to separate the data generation process by moving the creation step here since you may execute the Java code multiple times.

Check the connection information on the database connection page of the console.

18

Execute the command to connect to the Lindorm instance.

./lindorm-cli -url jdbc:lindorm:tsdb:url=http://ld-xxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242 -username root -password root

19

Create the database and table using SQL. You can also use a custom table structure, but you need to modify the relevant information in the Java code later.

create database demo;
use demo;
create table demo.sensor (device_id varchar tag,region varchar tag,time bigint,temperature double,humidity double,primary key(device_id));

20

8. Verify Preconfigured Data Anomaly Detection Functionality

Lindorm ML is a Machine Learning and Deep Learning technology for databasesd developed by DAMO Academy.

This time, we will perform data anomaly detection. The detection algorithm learns the data in a time-series manner and detects any exceptions if present. The supported detection algorithms are esd, nsigma, ttest, Online STL with T-test, and Online STL with ESD. As a simple demo using Lindorm ML, we will use esd in the following code, which is suitable for single data points or spike-type anomalies.

9. Simulating Sensors in Java Code

In this scenario, the sensors record temperature and humidity data in the environment and send the data to the Lindorm instance every second. To generate data anomalies, you need to decrease the probability of generating high or low dummy data in the middle.

The connection to the time series engine, data generation, and transmission are performed in the Java Native SDK on an hourly basis. There is a sample source code on the Help page that you can use as a reference.

Add the following to the pom.xml file of your Maven project. The dependency section contains the necessary packages for the Lindorm Java Native SDK. The build plugin section describes the tool used to build the executable package. The section needs to be updated according to your project configuration.

    <dependencies>
        <dependency>
            <groupId>com.aliyun.lindorm</groupId>
            <artifactId>lindorm-tsdb-client</artifactId>
            <version>1.0.0</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>xxxxxxx</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.17</version>
            </plugin>
        </plugins>
    </build>

The following is the same code. Please update it with your own connection information from the console.

package org.lindorm.demo;

import com.aliyun.lindorm.tsdb.client.ClientOptions;
import com.aliyun.lindorm.tsdb.client.LindormTSDBClient;
import com.aliyun.lindorm.tsdb.client.LindormTSDBFactory;
import com.aliyun.lindorm.tsdb.client.exception.LindormTSDBException;
import com.aliyun.lindorm.tsdb.client.model.Record;
import com.aliyun.lindorm.tsdb.client.model.WriteResult;
import com.aliyun.lindorm.tsdb.client.utils.ExceptionUtils;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CompletableFuture;

public class MainAPP {

    public static void main(String[] args) {
        // 1. Create a Lindorm client based on the connection URL
        String url = "http://ld-xxxxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242";
        //  LindormTSDBClient - Thread-safe, reusable, no need to create and destroy frequently
        ClientOptions options = ClientOptions.newBuilder(url).build();
        LindormTSDBClient lindormTSDBClient = LindormTSDBFactory.connect(options);
        // 2. Create a database named "demo" and a table named "sensor."
        // Use the following statement to perform this operation.
        // Since this function may be executed multiple times, use the SQL interface of the HTTP API.
        /*lindormTSDBClient.execute("CREATE DATABASE demo");
        lindormTSDBClient.execute("demo", "CREATE TABLE sensor (device_id VARCHAR TAG,region VARCHAR TAG,time BIGINT,temperature DOUBLE,humidity DOUBLE,PRIMARY KEY(device_id))");*/
        // 3. Insert the table with the generated data.
        long currentTime = System.currentTimeMillis();
        SimpleDateFormat format=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        for(int j = 0; j < 5; j++) {
            // Prepare an hour's worth of data.
            int numRecords = 3600;
            List<Record> records = new ArrayList<>(numRecords);
            currentTime += 3600 * 1000;
            for (int i = 0; i < numRecords; i++) {
                Record record = Record
                        .table("sensor")
                        .time(currentTime + i * 1000)
                        .tag("device_id", "demo")
                        .tag("region", "tokyo-jp")
                        .addField("temperature", generateRandomValue(15.0))
                        .addField("humidity", generateRandomValue(50.0))
                        .build();
                records.add(record);
            }
            System.out.println("Insert data from: " + format.format(new Date(currentTime)));
            CompletableFuture<WriteResult> future = lindormTSDBClient.write("demo", records);
            // Process asynchronous write results.
            future.whenComplete((r, ex) -> {
                // Process write failures.
                if (ex != null) {
                    System.out.println("Failed to write.");
                    Throwable throwable = ExceptionUtils.getRootCause(ex);
                    if (throwable instanceof LindormTSDBException) {
                        LindormTSDBException e = (LindormTSDBException) throwable;
                        System.out.println("LindormTSDBException occurred. This means that the request was delivered to Lindorm TSDB"
                                + " but it was rejected as an error response for some reason.");
                        System.out.println("Error Code: " + e.getCode());
                        System.out.println("SQL State:  " + e.getSqlstate());
                        System.out.println("Error Message: " + e.getMessage());
                    } else {
                        throwable.printStackTrace();
                    }
                } else {
                    System.out.println("Write successfully.");
                }
            });
            // We have a simple synchronization wait like the following example
            System.out.println(future.join());
        }
        lindormTSDBClient.shutdown();
    }

    private static double generateRandomValue(double baseValue){
        Random random = new Random();
        int flag = random.nextInt(100);
        double results = 0.0;
        if(flag > 5){
            results = baseValue + random.nextDouble()*(random.nextInt() % 5);
        }else {
            results =  baseValue + random.nextDouble()*(random.nextInt() % 5) + 100;
        }
        return Double.parseDouble(String.format("%.1f",results));
    }

}

Build an executable package using the mvn package command.

21
22

When you upload the package to the ECS instance and run it, dummy data is generated.

23

10. Detect Data Anomalies with Pre-defined Functions

Check the generated dummy data on the Lindorm instance.

24

We will detect data anomalies using the pre-defined anomaly_detect function. anomaly_detect is a function that performs anomaly detection using supervised machine learning, which is included in Lindorm ML. You need to input the field name to be detected, algorithm, and the behavior when detected as arguments to the anomaly_detect function. As mentioned earlier, we will use the esd algorithm here. To manually find exceptions, we combine the deterrent results with dummy data, as shown in the image below. The general temperature is around 15 degrees, but the exception data is 114.1 degrees.

select device_id, region, time, anomaly_detect(temperature, 'esd', 'adhoc_state=true') as detect_result from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00' sample by 0;

select device_id, region, time, temperature from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00';

25

The dummy data is generated based on the execution time, so please change the time zone in the SQL statement according to the situation.

11. Continuous Detection of Anomalies in Time-Series Data

Continuous queries are time-series queries that are automatically and regularly executed within the time-series engine.

In continuous detection queries, the same algorithm and parameters of the data anomaly detection function are used. Therefore, the queries share the same exception detection status. The second detection query is executed based on the exception detection status returned from the first query, resulting in more accurate results.

Next, we will update the Java code to send dummy data every 10 seconds. To perform continuous queries, it is desirable to generate continuous dummy data. When you update the sample Java code like below, dummy data will be sent every 10 seconds.

package org.lindorm.demo;

import com.aliyun.lindorm.tsdb.client.ClientOptions;
import com.aliyun.lindorm.tsdb.client.LindormTSDBClient;
import com.aliyun.lindorm.tsdb.client.LindormTSDBFactory;
import com.aliyun.lindorm.tsdb.client.exception.LindormTSDBException;
import com.aliyun.lindorm.tsdb.client.model.Record;
import com.aliyun.lindorm.tsdb.client.model.WriteResult;
import com.aliyun.lindorm.tsdb.client.utils.ExceptionUtils;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Random;
import java.util.concurrent.CompletableFuture;

public class MainAPPInterval {

    public static void main(String[] args) throws InterruptedException {
        // 1. Creating a Lindorm client based on the connection URL
        String url = "http://ld-xxxxx-proxy-tsdb.lindorm.rds.aliyuncs.com:8242";
        //  LindormTSDBClient - Thread-safe, reusable, no need to create and destroy frequently.
        ClientOptions options = ClientOptions.newBuilder(url).build();
        LindormTSDBClient lindormTSDBClient = LindormTSDBFactory.connect(options);
        // 2. Create a database named "demo" and a table named "sensor."
        // Use the following statement to perform this operation.
        // Since this function may be executed multiple times, use the SQL interface of the HTTP API.
        /*lindormTSDBClient.execute("CREATE DATABASE demo");
        lindormTSDBClient.execute("demo", "CREATE TABLE sensor (device_id VARCHAR TAG,region VARCHAR TAG,time BIGINT,temperature DOUBLE,humidity DOUBLE,PRIMARY KEY(device_id))");*/
        // 3. Insert the table with the generated data.
        SimpleDateFormat format=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        for(int j = 0; j < 2 * 3600; j++) {
            // Prepare data
            int numRecords = 10;
            List<Record> records = new ArrayList<>(numRecords);
            long currentTime = System.currentTimeMillis();
            for (int i = 0; i < numRecords; i++) {
                Record record = Record
                        .table("sensor")
                        .time(currentTime + i * 1000)
                        .tag("device_id", "demo")
                        .tag("region", "tokyo-jp")
                        .addField("temperature", generateRandomValue(15.0))
                        .addField("humidity", generateRandomValue(50.0))
                        .build();
                records.add(record);
            }
            System.out.println("Insert data from: " + format.format(new Date(currentTime)));
            CompletableFuture<WriteResult> future = lindormTSDBClient.write("demo", records);
            // Process asynchronous write results.
            future.whenComplete((r, ex) -> {
                // Process write failures.
                if (ex != null) {
                    System.out.println("Failed to write.");
                    Throwable throwable = ExceptionUtils.getRootCause(ex);
                    if (throwable instanceof LindormTSDBException) {
                        LindormTSDBException e = (LindormTSDBException) throwable;
                        System.out.println("LindormTSDBException occurred. This means that the request was delivered to Lindorm TSDB"
                                + " but it was rejected as an error response for some reason.");
                        System.out.println("Error Code: " + e.getCode());
                        System.out.println("SQL State:  " + e.getSqlstate());
                        System.out.println("Error Message: " + e.getMessage());
                    } else {
                        throwable.printStackTrace();
                    }
                } else {
                    System.out.println("Write successfully.");
                }
            });
            // We have a simple synchronization wait like the following example
            System.out.println(future.join());
            Thread.sleep(10000);
        }
        lindormTSDBClient.shutdown();
    }

    private static double generateRandomValue(double baseValue){
        Random random = new Random();
        int flag = random.nextInt(100);
        double results = 0.0;
        if(flag > 5){
            results = baseValue + random.nextDouble()*(random.nextInt() % 5);
        }else {
            results =  baseValue + random.nextDouble()*(random.nextInt() % 5) + 100;
        }
        return Double.parseDouble(String.format("%.1f",results));
    }

}

Build an executable package and run it on an ECS instance as before.

26

12. Detect Data Anomalies with Continuous Queries

To continuously save query results, you need a new table. Create a new result table along with the query.

CREATE TABLE demo.anomaly_points(
    device_id varchar tag,
    region varchar tag,
    time bigint,
    anomaly_result boolean,
    PRIMARY KEY(device_id)
);

CREATE continuous query demo.cq_detector WITH(
    interval = '1m'
) AS
INSERT INTO demo.anomaly_points(
    device_id,
    region,
    time,
    anomaly_result
)
SELECT
    device_id,
    region,
    time,
    anomaly_detect(temperature, 'esd') AS anomaly_result
FROM
    demo.sensor
WHERE
    device_id = 'demo' sample BY
        0;

show continuous queries;

27

In this example, we detect exceptions from the previous minute's data in the dummy data table every minute using the esd algorithm.

28

You can display the data difference with the following command.

select count(*) from demo.anomaly_points;
select max(time) from demo.anomaly_points;

You can see that the number of table data and the maximum time are updated every minute with continuous queries.

29

After the verification is completed, you can delete the continuous query.

drop continuous query demo.cq_detector;

30

13. Additional Information

13-1. How Can I Add Parameters to the Data Anomaly Detection Function?

Pre-defined detection functions accept parameters such as maxAnomalyRatio and warmupCount. You can find related information in the help documentation.

SELECT
    device_id,
    region,
    time,
    anomaly_detect(temperature, 'esd', 'lenHistoryWindow=30,maxAnomalyRatio=0.1') AS detect_result
FROM
    sensor
WHERE
    device_id IN('demo')
AND time >= '2022-11-16 15:00:00'
AND time < '2022-11-16 00:01:00' SAMPLE BY
        0;

Different input parameters will yield different detection results.

31

However, if you input the wrong parameters, several errors may occur. For example, the lenHistoryWindow of the esd algorithm must not be less than 20. If you set it to 2, a NullPointerException will occur instead of the query result.

lindorm:demo> SELECT device_id, region, time, anomaly_detect(temperature, 'esd', 'lenHistoryWindow=2,maxAnomalyRatio=0.8') AS detect_result FROM sensor WHERE device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 00:01:00' SAMPLE BY 0;
ERROR 9000 (HY000): Server internal error; Please try again, or follow the error message to troubleshoot the problem.
Caused by: java.lang.NullPointerException
        at com.alibaba.lindorm.tsdb.tsql.connector.LindormTSDBConnector.runDownsampleQuery(LindormTSDBConnector.java:141)
        at com.alibaba.lindorm.tsdb.tsql.connector.downsample.DownsampleExecutorImpl.getDataRows(DownsampleExecutorImpl.java:49)
        at com.alibaba.lindorm.tsdb.tsql.simple.SimpleExecutorCommand.lambda$execute$0(SimpleExecutorCommand.java:88)
        at com.alibaba.lindorm.tsdb.tsql.jdbc.LindormTSQLMetaImpl$LindormTSQLSignature$1.enumerator(LindormTSQLMetaImpl.java:430)
        at org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
        at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:90)
        at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:186)
        at org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:666)

32

13-2. How Can I Fix ERROR 8012(42000) In the Detection Function?

This error, ERROR 8012(42000), occurs because the detected field is in the select statement with the detection function. In other words, the field value and detection result could not be displayed together. To solve this issue, remove the detected field from the select statement.

lindorm:demo> select device_id, region, time, temperature, anomaly_detect(temperature, 'esd', 'adhoc_state=true') as detect_result from sensor where device_id in ('demo') and time >= '2022-11-16 15:00:00' and time < '2022-11-16 15:01:00' sample by 0;
ERROR 8012 (42000): Unsupported operation; Field aggregator must be specified in downsample query: temperature

33

14. Conclusion

This article showcases an example of exception detection using SQL with Lindorm ML.

Lindorm is a Multi-Modal Database that enables fast storage of time series data and early detection of anomalies and threats. This makes it possible to store time series data, such as IoT data, in a time series engine and JSON data, such as metrics, in a wide-column wide table engine. It also allows for quick retrieval of relevant JSON information related to abnormalities in time series data. The ability to perform supervised and unsupervised machine learning within the database without the need for separate product services for machine learning, ETL, and data transfer is a significant advantage of the Lindorm service.

15. Reference

In-database machine learning:
https://www.alibabacloud.com/help/en/lindorm/latest/in-database-machine-learning

Use Lindorm ML for time series forecasting:
https://www.alibabacloud.com/help/en/lindorm/latest/time-series-prediction

Use Lindorm ML for time series anomaly detection:
https://www.alibabacloud.com/help/en/lindorm/latest/time-series-anomaly-detections

This article is a translated piece of work from SoftBank: https://www.softbank.jp/biz/blog/cloud-technology/articles/202301/lindorm-ml-to-detect-anomalies/

1 2 1
Share on

Hironobu Ohara

9 posts | 0 followers

You may also like

Comments

Dikky Ryan Pratama June 27, 2023 at 12:48 am

awesome!

Hironobu Ohara

9 posts | 0 followers

Related Products