Data model

Last Updated: Apr 12, 2018

For easy understanding and use of Log Service, see the following basic concepts first.

Region

A region is a service node of Alibaba Cloud. By deploying services in different Alibaba Cloud regions, you can make your services closer to users for lower access latency and better user experience. Currently, Alibaba Cloud has multiple regions throughout the country.

Project

The project is a basic unit in Log Service and is used for resource isolation and control. You can use a project to manage all logs and related log sources of an application.

Logstore

The Logstore is a unit in Log Service to collect, store, and consume logs. Each Logstore belongs to a project, and each project can create multiple Logstores. You can create multiple Logstores for a project according to your actual needs. Typically, an independent Logstore is created for each type of logs in an application. For example, you have a game application big-game, and three types of logs are on the server: operation_log, application_log, and access_log. You can first create a project named big-game, and then create three Logstores in this project for these three types of logs to collect, store, and consume logs respectively.

Log

The log is the minimum data unit processed in Log Service. Log Service uses the semi-structured data mode to define a log. The specific data model is as follows:

  • Topic: A user-defined field used to mark multiple logs. For example, access logs can be marked according to sites. By default, this field is an empty string, which is also a valid topic.
  • Time: A reserved field in the log used to indicate the log generation time (the number of seconds since 1970-1-1 00:00:00 UTC). Generally this field is generated directly based on the time in the log.
  • Content: A field used to record the specific log content. The log content is composed of one or more content items, and each content item is a key-value pair.
  • Source: A field used to indicate the source of the log. For example, the IP address of the machine where the log is generated. By default, this field is empty.

Log Service has different requirements on values of different fields as follows.

Data field Requirement
time An integer in the standard UNIX time format. The unit is in seconds.
topic A UTF-8 encoded string up to 128 bytes.
source A UTF-8 encoded string up to 128 bytes.
content One or more key-value pairs. The key is a UTF-8 encoded string up to 128 bytes, which can contain letters, underscores (_), and numbers, but cannot start with a number. The value is a UTF-8 encoded string up to 1024*1024 bytes.

The key in the content cannot use any of the following keywords: __time__, __source__, __topic__, __partition_time__, _extract_others_, and __extract_others__.

Topic

Logs in a Logstore can be classified by log topics. You can specify the topic when writing logs. For example, as a platform user, you can use your user ID as the log topic when writing logs. To not classify the logs in a Logstore, use the same topic for all of the logs.

Note: An empty string is a valid log topic and is the default log topic.

The relationship among Logstores, log topics, and logs is as follows.

Various log formats are used in actual usage scenarios. For better understanding, the following example describes how to map an original Nginx access log to the Log Service log data model. Assume that the IP address of your Nginx server is 10.249.201.117. An original log of this server is as follows.

  1. 10.1.168.193 - - [01/Mar/2012:16:12:07 +0800] "GET /Send?AccessKeyId=8225105404 HTTP/1.1" 200 5 "-" "Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"

Map the original log to the Log Service log data model as follows.

Data field Content Description
topic “” Use the default value (empty string).
time 1330589527 The precise log generation time (in seconds), which is converted from the timestamp of the original log.
source “10.249.201.117” Use the IP address of the server as the log source.
content key-value pair Specific log content.

You can decide how to extract the original log contents and combine them into key-value pairs. The following table is shown as an example.

Key Value
ip “10.1.168.193”
method “GET”
status “200”
length “5”
ref_url “-“
browser “Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.2) Gecko/20100101 Firef

Logs

A collection of logs.

Log group

A group of logs.

Log group list

A collection of log groups used to return results.

Encoding method

Currently, the system supports the following content encoding method. The RESTful API layer is indicated by Content-Type.

Meaning Content-Type
ProtoBuf The data model is encoded by ProtoBuf. application/x-protobuf

The following Protocol Buffer (PB) defines the object of the data model.

  1. message Log
  2. {
  3. required uint32 Time = 1;// UNIX Time Format
  4. message Content
  5. {
  6. required string Key = 1;
  7. required string Value = 2;
  8. }
  9. repeated Content Contents= 2;
  10. }
  11. message LogGroup
  12. {
  13. repeated Log Logs= 1;
  14. optional string Reserved = 2; // reserved fields
  15. optional string Topic = 3;
  16. optional string Source = 4;
  17. }
  18. message LogGroupList
  19. {
  20. repeated LogGroup logGroupList = 1;
  21. }

Note: PB does not require the key-value pair to be unique. You must avoid such situation. Otherwise, the behavior is undefined.

Thank you! We've received your feedback.