DataV is an Alibaba Cloud data visualization service. DataV allows you to build professional visualization applications by using a graphical user interface (GUI) with ease. You can use DataV to visualize log analysis data. This topic describes how to connect Log Service with DataV and show data on a dashboard.
Prerequisites
- Log data is collected. For more information, see Data collection.
- The index feature is enabled and configured. For more information, see Enable and configure the index feature for a Logstore.
Background information
Rea-time dashboards are widely used in large online promotions. Real-time data dashboards
are based on a stream computing architecture. The architecture consists of the following
modules:
- Data collection: collects data from each source in real time.
- Intermediate storage: uses Kafka Queues to decouple production systems and consumption systems.
- Real-time computing: subscribes to real-time data and uses computing rules to compute data on the dashboard.
- Result storage: stores the computing results in SQL and NoSQL databases.
- Visualization: calls API operations to obtain results and visualize the results.
Alibaba Group provides multiple services to support these modules, as shown in the
following figure.

You can connect Log Service with DataV by calling the API operations that are related
to the log search and analytics feature. Then, you can use DataV to show data on a
dashboard. 

Features
The following computing methods are supported:
- Real-time computing (streaming computing): fixed computing and dynamic data.
- Offline computing (data warehouse and offline computing): dynamic computing and fixed data.
For scenarios that have a high-level requirement of timeliness, Log Service allows
you to enable the real-time data indexing feature on logs that are stored in LogHub.
You can use the log search and analytics feature to query and analyze these logs in
an efficient manner. This method has the following benefits:
- Fast: Billions of data records can be processed and queried within one second. Each query statement can have a maximum of five specified conditions. Hundreds of millions of data records can be analyzed and aggregated within one second without the need to wait and predict the results. Each query statement can have a maximum of five aggregate functions and a group by clause.
- Real-time display: Up to 99.9% of logs can be displayed on the data dashboard within one second after these logs are generated.
- Dynamic data refresh: When you modify analysis methods or import data to Logstores, the display result is refreshed in real time.
- Data volume: Up to 10 billion lines of data can be computed at a time. If you need to compute more data, you must set multiple time ranges.
- Flexibility: Only SQL-92 syntax is supported. User-defined functions (UDFs) are not supported.
Procedure
Example
You need to collect the page view (PV) statistics of your website across China during
the Computing Conference and show the data on a dashboard. You have configured full
log data collection and enabled the log search and analytics feature in Log Service.
You need to only enter a query statement in the Query field to obtain the PV statistics.
During this period, the requirement often changes. In this example, the following
changes are made:
- Original requirement: On the first day of the conference, you need the statistics
of unique visitors (UVs) for the present day.
You need to query the values of the forward field under NGINX in all the access logs. The forward field records one or more IP addresses of each visitor. Each log has a forward field. You can use the following statement that includes the
approx_distinct(forward)
function to remove repeated IP addresses and obtain the UV statistics in a time range from 00:00 for the first day of the conference to the present time.* | select approx_distinct(forward) as uv
- First change: On the second day of the conference, you need the statistics of PVs
of the yunqi.aliyun.com domain.
You can add the following filter condition that starts with host to the statement:
host:yunqi.aliyun.com | select approx_distinct(forward) as uv
- Second change: If the NGINX access logs contain multiple IP addresses, you can enter
the following statement to reserve only the first IP address:
host:yunqi.aliyun.com | select approx_distinct(split_part(forward,',',1)) as uv
- Third change: On the third day of the conference, you need to remove statistics that
are generated by UC browser advertisement from the access statistics.
You can add the following filter condition that starts with not to the statement:
host:yunqi.aliyun.com not URL:uc-iflow | select approx_distinct(split_part(forward,',',1)) as uv