DataV is a data visualization service provided by Alibaba Cloud. DataV allows you to build professional visualization applications on a GUI with ease. You can use DataV to visualize log analysis data. This topic describes how to connect Simple Log Service to DataV and visualize log data on a dashboard.
Prerequisites
Data is collected. For more information, see Data collection overview.
Indexes are created. For more information, see Create indexes.
Background information
Real-time dashboards are widely used in large-scale online promotions. Real-time dashboards are based on a stream computing architecture. The architecture consists of the following modules:
Data collection: collects data from each source in real time.
Intermediate storage: uses Kafka queues to decouple production systems and consumption systems.
Real-time computing: subscribes to real-time data and uses computing rules to compute data on a dashboard.
Result storage: stores the computing results in SQL and NoSQL databases.
Visualization: calls API operations to obtain results and visualize the results.
Alibaba Group provides multiple services to support the modules.
You can connect Simple Log Service to DataV by calling the API operations that are related to the log query and analysis feature. Then, you can use DataV to visualize data on a dashboard.
Features
The following computing methods are supported:
Real-time computing (streaming computing): fixed computing and variable data.
Offline computing (data warehouse and offline computing): dynamic computing and variable data.
Simple Log Service can create indexes on logs that are stored in LogHub in real time. This allows you to query and analyze data in scenarios that have high timeliness requirements by using the LogSearch/Analytics method. The LogSearch/Analytics method provides the following benefits:
Fast processing: A query statement can query billions of rows of data within 1 second. In this case, you can specify up to five conditions in the query statement. A query statement can analyze and aggregate hundreds of millions of rows of data within 1 second. In this case, you can specify up to five conditions and a GROUP BY clause in the query statement. You do not need to wait for or predict the results.
Real-time display: Up to 99.9% of logs can be displayed on a dashboard within 1 second after the logs are generated.
Dynamic data refresh: When you modify analysis methods or import data to Logstores, the data displayed on a dashboard is refreshed in real time.
The LogSearch/Analytics method has the following limits:
Data volume: Up to 10 billion rows of data can be computed at the same time. If you need to compute more data, you must specify a time period that is during off-peak hours.
Computing flexibility: Only the SQL-92 syntax is supported. User-defined functions (UDFs) are not supported.
Example
During the Apsara Conference, the following requirements are proposed: collect the statistics of unique visitors (UVs) on the conference website from China-wide users, and visualize the data on a dashboard. Before the conference, you enabled collection for full logs and enabled the log query and analysis feature in Simple Log Service. Therefore, you need to only execute a query statement to obtain the UV statistics. However, the requirements often change during the process. The following changes are made:
Original requirement: On the first day of the conference, you are required to collect the statistics of UVs for the current day.
To collect the statistics, you must query the forward field under NGINX in all access logs. Each log contains a forward field, and the field records one or more IP addresses of each visitor. You can use the
approx_distinct(forward)
function to calculate the number of distinct IP addresses for the field. Therefore, to collect the UV statistics in a time range from 00:00 on the first day of the conference to the current time, use the following query statement:* | select approx_distinct(forward) as uv
First change: On the second day of the conference, you are required to collect the statistics of UVs for the yunqi.aliyun.com domain.
You can add a host filter condition to query the host data. To collect the UV statistics, use the following query statement:
host:yunqi.aliyun.com | select approx_distinct(forward) as uv
Second change: If the forward field under NGINX in all access logs contains multiple IP addresses, you are required to reserve only the first IP address.
Use the following query statement:
host:yunqi.aliyun.com | select approx_distinct(split_part(forward,',',1)) as uv
Third change: On the third day of the conference, you are required to collect the statistics of UVs that are not redirected from advertisements on UC browsers.
You can add a not filter condition to exclude the UC browser-related statistics. To collect the UV statistics, use the following query statement:
host:yunqi.aliyun.com not URL:uc-iflow | select approx_distinct(split_part(forward,',',1)) as uv