Data model

Last Updated: Mar 31, 2017

To facilitate understanding and use of Log Service, the following describes some basic concepts.

Region

A region is a service node of Alibaba Cloud. By deploying services in different Alibaba Cloud regions, you can make your services closer to your clients for lower access latency and better user experience. Alibaba Cloud has multiple regions throughout the country.

Project

The project is a basic management unit in Log Service and is used for resource isolation and control. You can use a project to manage all logs and related log sources of one application.

Logstore

LogStore is a unit collecting, storing, and consuming log data in Log Service. Each logstore belongs to one project, and multiple logstores can be created for each project. You can create multiple logstores for one project according to actual needs. The common practice is to create an independent logstore for each type of log in one application. For example, assume that you have a game application named big-game, and there are three types of logs on the server: operation_log, application_log, and access_log. You can first create a project named big-game, and then create three logstores for the three types of logs under this project for log collection, storage and consumption, respectively.

Log

A log is a minimum data unit processed in Log Service. Log Service uses a semi-structured data mode to define a log. The specific data model is as follows:

  • Topic: a custom field to mark a batch of logs (for example: access_logs are marked according to sites). This field is a null string by default (the null string is also a valid topic).
  • Time: a reserved field in the log, which is used to indicate the generation time of the log (precise to the second, calculated in seconds from 1970-1-1 00:00:00 UTC) and is generally generated directly based on the time in the log.
  • Content: used to record the specific content of the log. Content is composed of one or more content items, and each content item is composed of a Key-Value pair.
  • Source: source of the log, for example, the IP address of the device generating the log. This field is null by default.

Furthermore, Log Service has different requirements on values of different fields, as described in the following table:

Data Field Requirement
time Integer, standard time format of Unix. The minimum unit is second.
topic Any UTF-8 encoded string of no more than 128 bytes.
source Any UTF-8 encoded string of no more than 128 bytes.
content One or more Key-Value pairs. Key is a UTF-8 encoded string of no more than 128 bytes, which contains only letters, underlines, and numbers and cannot begin with a number. Value is any UTF-8 encoded string of no more than 1024*1024 bytes.

The following keywords cannot be used in the key in content described in the preceding table: __time__, __source__, __topic__, __partition_time__, _extract_others_, and __extract_others__.

Log topic

Logs in the same LogStore can be grouped by log topics. You can specify the topic when writing a log. For example, a platform user can use the user ID as the log topic and write it into the log. If there is no need to group the logs in one logstore, the same log topic can be used for all logs.

NOTE: A null string is a valid log topic. The default log topic is a null string.

The following diagram describes the relation among Logstore, log topic, and log:

Various log formats are used in actual application scenarios. For ease of understanding, the following describes how to map an original Nginx access_log to the log data model in Log Service. Assume that the IP address of your Nginx server is 10.249.201.117, and the following is the original log:

  1. 10.1.168.193 - - [01/Mar/2012:16:12:07 +0800] "GET /Send?AccessKeyId=8225105404 HTTP/1.1" 200 5 "-" "Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"

Map the original log to the log data model in Log Service as follows:

Data Field Content Description
topic “” Use the default value (null string).
time 1330589527 Precise generation time of the log (precise to the second), which is transformed from the time stamp in the original log.
source “10.249.201.117” Use the IP address of the server as the log source
content Key-Value 对 Content of the log

You can decide how to extract the original content of the log and combine it into Key-Value pairs. For example, see the following table:

key value
ip “10.1.168.193”
method “GET”
status “200”
length “5”
ref_url “-“
browser “Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.2) Gecko/20100101 Firef

Logs

A collection of multiple logs.

LogGroup

A group of logs.

LogGroupList

A group of LogGroups used for return of the results.

Encoding method

Currently, the system supports the following content encoding method (scalable in the future). The Restful API layer is indicated by Content-Type.

Meaning Content-Type
ProtoBuf The data model is encoded by ProtoBuf. application/x-protobuf

The following PB defines the object of the data model:

  1. message Log
  2. {
  3. required uint32 Time = 1;// UNIX Time Format
  4. message Content
  5. {
  6. required string Key = 1;
  7. required string Value = 2;
  8. }
  9. repeated Content Contents= 2;
  10. }
  11. message LogGroup
  12. {
  13. repeated Log Logs= 1;
  14. optional string Reserved = 2; // reserved fields
  15. optional string Topic = 3;
  16. optional string Source = 4;
  17. }
  18. message LogGroupList
  19. {
  20. repeated LogGroup logGroupList = 1;
  21. }

NOTE: Because PB does not require uniqueness of the Key-Value pair, you need to avoid such case. Otherwise, the behavior is undefined.

Thank you! We've received your feedback.