edit-icon download-icon

3-2 Create a dataset

Last Updated: Jun 01, 2018

A dataset defines the pre-aggregation mode and persistent storage of logs collected in a monitoring job. Through simple interaction, you can obtain the data organization and storage mode after ARMS analyzes and optimizes multi-dimensional data.

Create a dataset

Prerequisites

A custom monitoring job is required before you can create and setup an alarm control.For instructions on how to create a monitoring job, see Step 1 and Step 2.

  1. In the left-side navigation pane of the console, choose Custom Monitoring > Jobs.

  2. On the Instance List page, click Edit next to the monitoring job you created, and click Next until the Dataset and Alarm Configuration tab is displayed.

  3. In the Dataset Settings pane, click Add Dataset.

    Add a dataset

  4. In the Add Dataset dialog box, enter the dataset name, and create a drill-down or common dataset as needed.

    Dataset

    • Filter: It defines the type of data that will be used for dataset calculation. Data that does not meet filter criteria will be filtered out from the dataset.

      Note: Select the filter criteria with caution. Meet the following criteria simultaneously corresponds to the AND relationship, and Meet any of the following criteria corresponds to the OR relationship.

    • Metric: It is generally a type of numerical metric that evaluates an object, which is similar to a value in multidimensional on-line analytical processing. ARMS metrics correspond to values of Count, Max, Sum, and Count Distinct after realtime calculation.

    • Compound Metric: You can perform addition, subtraction, multiplication, and division on metric results of a dataset.

    • Time Field: It is the time field corresponding to log splitting, and is the most basic dimension for realtime monitoring.

    • Dimension: It is the dimension for evaluating an object. For example, if the number of students is counted by class, then the class is the dimension, which can be expressed as GROUP BY in SQL language. ARMS dimensions are classified into the common type and drill-down type. For more information, see Differences between common dimensions and drill-down dimensions.

    • Sampling Field: It specifies the field, the data of which is sampled in one minute. The sampling data facilitates troubleshooting when an exception is detected during monitoring.
  5. On the monitoring job instance list page, click Start to start a custom monitoring job.

After the preceding steps are completed, a multi-dimensional dataset is created. For instructions on how to use advanced functions of a multi-dimensional dataset, see Dataset management.

<h2 id=”Differences between common dimensions and drill-down dimensions”>Differences between common dimensions and drill-down dimensions

Common dimension: It is applicable to all scenarios, and does not have the acceleration index unless the ID dimension is enabled. For more information, see the following description.

Drill-down dimension: It is applicable to a specific scenario where a hierarchical relationship, such as Province > City > District, exists between dimensions. For drill-down dimensions, query at each layer will be accelerated.

Common dimensions

Analysis of a common dimension scenario

Take the log of an e-retailer as an example: 2017-01-01 12:00:00|Category: Men's wear|Province: Zhejiang|City: Hangzhou|District: Xihu District|Gender: Male|Height: L| Quantity: 5|Total: 100|

Split fields are as follows: time, category, province, city, district, gender, height, quantity, and total.

If the data is analyzed by Category, Gender, and Province, the corresponding dimensions are Category, Gender, and Province, and the metrics are price and quantity. After pre-aggregation, the data is as follows:

Total Quantity Time Gender Category Province
100 1 2017-01-01 12:00:00 Male Men’s wear Zhejiang
200 2 2017-01-01 12:00:00 Female Foods Jiangsu
300 3 2017-01-01 12:00:00 Female Men’s wear Beijing

To view data in the category of Men’s wear, the system must read data of all categories, genders, and provinces to filter out the data of men’s wear. Here, Number of Retrieved Records is greater than Number of Result Records.

Restrictions and optimization methods

If there are two million categories, and data in the category of Men’s wear is to be viewed, then the system must read about N x 2,000,000 data records to filter out the data of men’s wear. Here, Number of Retrieved Records is much greater than Number of Result Records, and the large number of records read by the system affects the speed for obtaining the desired data.

You can solve this problem by creating a category index.

In the dataset of the common dimension type, ARMS provides an auxiliary dimension called ID dimension. An ID dimension is equivalent to an index dimension, and the value of this ID dimension must be specified to accelerate data query. A dimension is a common non-index dimension, such as the gender, category, and province in the previous example.

Differences between a dimension and an ID dimension

Common dimensions consist of dimensions and ID dimensions. In the query process of a dataset, an ID dimension cannot be blank, but a dimension can be blank. Currently, ARMS can contain up to one ID dimension and seven dimensions.

  • Dimension

    • A dimension can be used either separately or together with other dimensions. For example, a dataset has three dimensions: A, B, and C. You can select only A, B, or C, or use the combination of B and C or the combination of A, B, and C to query data.
  • ID dimension

    • An ID dimension is equivalent to creating an index for this dimension. You can specify an ID dimension to quickly query the desired data.

    • If data cannot be enumerated or the number of dimensions is large, ID dimensions are recommended.

Drill-down dimension

Analysis of a drill-down dimension scenario

Take an area monitored by the system as an example. System logs contain three dimensions: IDC, group, and IP address. Assuming that a user needs to start from the IDC running status to drill down groups of an IDC, and then query data of a specific machine in the group. If common dimensions are used for solving this problem, a query delay may exist due to the large amount of data queried. Drill-down dimensions are more suitable for this fixed hierarchical query scenario.

Using the drill-down dimensions, you can create multiple levels of indexes as follows for the IDC, group, and IP address: IDC (index 1), IDC - group (index 2), and IDC - group - IP address (index 3). To query data of an IDC, use index 1. To query data of a group in a specific IDC, use index 2. To query data of an IP address of a specific group, use index 3.

Drill-down dimensions are also applicable to the following scenarios: business statistics by province or region, query of the student distribution by school, grade, or class, or sales statistics by manufacturer, brand, or category.

Restrictions of drill-down dimensions

  • You can configure up to three drill-down dimensions in ARMS.

  • Drill-down dimensions are of hierarchical relationship with each other. For example, to view data of the second dimension, you must first select the attribute of the first dimension. Drill-down dimensions are similar to a tree structure. Drill-down dimensions must be properly planned. For example, the first dimension is the province, the second dimension is the city, the third dimension is the district, and the metric is the citizen consumption information.

  • Unless required in special scenarios, two dimensions that are completely irrelevant with each other, such as “Region” and “Commodity type”, should not be defined simultaneously.

  • ARMS provides the drill-down function that allows you to drill down from the summary data to the detailed data to observe or add dimensions. The purpose of drill-down is to change the dimension hierarchy and the analysis granularity.

Thank you! We've received your feedback.