DataV is a data visualization service provided by Alibaba Cloud. DataV allows you to build professional visualization applications by using a graphical user interface (GUI) with ease. You can use DataV to visualize log analysis data. This topic describes how to connect Simple Log Service to DataV and visualize log data on a dashboard.
Prerequisites
Data is collected. For more information, see Data collection overview.
Indexes are created. For more information, see Create indexes.
Background information
Real-time dashboards are widely used in large online promotions. Real-time data dashboards are based on a stream computing architecture. The architecture consists of the following modules:
Data collection: collects data from each source in real time.
Intermediate storage: uses Kafka queues to decouple production systems and consumption systems.
Real-time computing: subscribes to real-time data and uses computing rules to compute data on the dashboard.
Result storage: stores the computing results in SQL and NoSQL databases.
Visualization: calls API operations to obtain results and visualize the results.
Alibaba Group provides multiple services to support these modules, as shown in the following figure.
You can connect Simple Log Service to DataV by calling the API operations that are related to the log query and analysis feature. Then, you can use DataV to visualize data on a dashboard.
Features
The following computing methods are supported:
Real-time computing (streaming computing): fixed computing and dynamic data.
Offline computing (data warehouse and offline computing): dynamic computing and fixed data.
For scenarios that have a high-level requirement of timeliness, Simple Log Service allows you to enable the real-time data indexing feature on logs that are stored in LogHub. You can query and analyze these logs in an efficient manner. This method has the following benefits:
Fast: Billions of rows of data can be processed and queried within 1 second by executing a query statement. You can specify up to five conditions in the query statement. Hundreds of millions of rows of data can be analyzed and aggregated within 1 second by executing a query statement. You can specify up to five aggregate functions and a GROUP BY clause in the query statement. You do not need to wait and predict the results.
Real-time display: Up to 99.9% of logs can be displayed on the data dashboard within 1 second after these logs are generated.
Dynamic data refresh: When you modify analysis methods or import data to Logstores, the display result is refreshed in real time.
This method has the following limits:
Data volume: A maximum of 10 billion rows of data can be computed at a time. If you need to compute more data, you must specify multiple time ranges.
Flexibility: Only the SQL-92 syntax is supported. User-defined functions (UDFs) are not supported.
Procedure
Create a DataV data source.
Log on to the DataV console.
On the Data Sources tab, click Add Source.
In the Add Data Source dialog box, configure the parameters and click OK. The following table describes the parameters.
Parameter
Description
Type
Select Log Service from the drop-down list.
Custom Data Source Name
Specify a name for the data source. Example: log_service_api.
AppKey
The AccessKey ID of your Alibaba Cloud account or the AccessKey ID of a RAM user that has the read permissions on Simple Log Service.
AppSecret
The AccessKey secret of your Alibaba Cloud account or the AccessKey secret of a RAM user that has the read permissions on Simple Log Service.
EndPoint
The endpoint for the region where the Simple Log Service project resides. For more information, see Endpoints.
Open a canvas.
On the Projects tab, move the pointer over an existing project and click Edit.
On the Projects tab, create a project. For more information, see Use a template to create a PC-side visual application.
Create a line chart and add a filter.
Create a line chart.
In the left-side navigation pane, choose
.On the Data tab, click Configure Data Source and configure the parameters. The following table describes the parameters.
Parameter
Description
Data Source Type
Select Log Service.
Select an existing data source
Select the data source that you created in Step 1. Example: log_service_api.
Query
Sample query statement:
{ "projectName": "dashboard-demo", "logStoreName": "access-log", "topic": "", "from": ":from", "to": ":to", "query": "*| select approx_distinct(remote_addr) as uv ,count(1) as pv , date_format(from_unixtime(date_trunc('hour',__time__) ) ,'%Y/%m/%d %H:%i:%s') as time group by time order by time limit 1000" , "line": 100, "offset": 0 }
The following table describes the parameters in the sample query statement.
Parameter
Description
projectName
The name of the project.
logstoreName
The name of the Logstore.
topic
The topic of the logs. If you have not specified a log topic, leave this parameter empty.
from and to
The from and to parameters refer to the start time and end time during which logs are queried.
NoteIn this example, the parameters are set to
:from
and:to
. When you test your parameter settings, you can enter a timestamp. Example: 1509897600. After the test, change the UNIX timestamp to:from
and:to
. Then, you can specify a time range in the URL parameter. For example, the URL in the preview ishttp://datav.aliyun.com/screen/86312
. After the URLhttp://datav.aliyun.com/screen/86312?from=1510796077&to=1510798877
is opened, the values are computed based on the specified time.query
The query criteria. For more information about the query syntax, see Log analysis overview.
NoteThe time in the query must be in the
YYYY/mm/dd HH:mm:ss
format. Example: 2017/07/11 12:00:00. You must use the following syntax to align the time to the hour. Then, you can convert the time to the specified format.date_format(from_unixtime(date_trunc('hour',__time__) ) ,'%Y/%m/%d%H:%i:%s')
line
Enter the default value 100.
offset
Enter the default value 0.
After you configure the settings, view the response data.
Create a filter.
Select Data Filter, and click the plus sign (+) next to Add Filter.
Enter a function in the New Filter field by using the following sample syntax:
return Object.keys(data).map((key) => { let d= data[key]; d["pv"] = parseInt(d["pv"]); return d; });
In the filter, convert the result that is used by the y-axis to the INT type. In this example, the y-axis indicates the PV, and the pv column must be converted.
The results contain the t and pv columns. You can set the x-axis to t and the y-axis to pv.
Create a pie chart and add a filter.
Create a carousel pie chart.
In the left-side navigation pane, choose
.On the Data tab, click Configure Data Source and configure the parameters. The following table describes the parameters.
Parameter
Description
Data Source Type
Select Log Service.
Select an existing data source
Select the data source that you created in Step 1. Example: log_service_api.
Query
Sample query statement:
{ "projectName": "dashboard-demo", "logStoreName": "access-log", "topic": "", "from": 1509897600, "to": 1509984000, "query": "*| select count(1) as pv ,method group by method" , "line": 100, "offset": 0 }
The parameters in the preceding code are described in the step of creating a line chart in this topic.
After you configure the settings, view the response data.
Add a filter.
Select Data Filter, and click the plus sign (+) next to Add Filter.
Enter a function in the New Filter field by using the following sample syntax:
return Object.keys(data).map((key) => { let d= data[key]; d["pv"] = parseInt(d["pv"]); return d; })
Enter method in the type field and pv in the value field for the pie chart.
Use callback functions to retrieve a time range in real time.
Perform the following steps to show the real-time logs of the last 15 minutes.
Create a static data source and add a filter.
On the Data tab, click Configure Data Source.
In the Set Data Sources panel, configure the following settings.
Use the default settings for the static data source.
Sample code of the filter:
return [{ value:Math.round(Date.now()/1000) }];
return [{ value:Math.round((Date.now() - 24 * 60 * 60 * 1000)/1000) }];
On the Interaction tab, select Enable in the corresponding event field, and add the required values to the Bind to Variable column.
On the Data tab, use the
:from
and:to
parameters to implement callback functions. Sample code:{ "projectName": "dashboard-demo", "logStoreName": "access-log", "topic": "", "from": ":from", "to": ":to", "query": "*| select count(1) as pv ,referer group by pv desc limit 30" , "line": 100, "offset": 0 }
Preview and publish the DataV dashboard.
Click the Preview icon to preview the DataV dashboard.
Click the Publish icon to publish the DataV dashboard.
Example
You need to collect the page view (PV) statistics of your website across China during the Apsara Conference and visualize the data on a dashboard. You have configured full log data collection and enabled the log query and analysis feature in Simple Log Service. You need to only enter a query statement in the Query field to obtain the PV statistics. During this period, the requirement often changes. In this example, the following changes are made:
Original requirement: On the first day of the conference, you need the statistics of unique visitors (UVs) for the present day.
You need to query the values of the forward field under NGINX in all access logs. The forward field records one or more IP addresses of each visitor. Each log has a forward field. You can use the following statement that includes the
approx_distinct(forward)
function to remove repeated IP addresses and obtain the UV statistics in a time range from 00:00 for the first day of the conference to the present time.* | select approx_distinct(forward) as uv
First change: On the second day of the conference, you need the statistics of PVs of the yunqi.aliyun.com domain.
You can add the following filter condition that starts with host to the statement:
host:yunqi.aliyun.com | select approx_distinct(forward) as uv
Second change: If the NGINX access logs contain multiple IP addresses, you can enter the following statement to reserve only the first IP address:
host:yunqi.aliyun.com | select approx_distinct(split_part(forward,',',1)) as uv
Third change: On the third day of the conference, you need to remove statistics that are generated by UC browser advertisement from the access statistics.
You can add the following filter condition that starts with not to the statement:
host:yunqi.aliyun.com not URL:uc-iflow | select approx_distinct(split_part(forward,',',1)) as uv