After Log Service collects logs, you can ship the logs to a MaxCompute table for data storage and analysis. This topic describes how to create a data shipping job of the new version to ship data to MaxCompute.
Prerequisites
- A project and a Logstore are created. For more information, see Create a project and a Logstore.
- Logs are collected. For more information, see Data collection overview.
- A table is created in MaxCompute. For more information, see Create tables.
Usage notes
- If a field value of the char type or varchar type exceeds a specified length, the
excess is truncated after the value is shipped to MaxCompute.
For example, if the maximum length is set to 3 and a field value is 012345, the value is truncated to 012 after the value is shipped to MaxCompute.
- If a field value of the string type, char type, or varchar type is an empty string, the value is converted to Null after the value is shipped to MaxCompute.
- A field value of the datetime type must be in the YYYY-MM-DD HH:mm:ss format. Multiple spaces can exist between DD and HH. If a field value of the datetime type is in an invalid format, the value can be shipped to MaxCompute but the value is converted to Null.
- If a field value of the date type is in an invalid format, the value can be shipped to MaxCompute but the value is converted to Null.
- If the number of digits after the decimal point in a field value of the decimal type exceeds a specified limit, the value is rounded and the excess is truncated. If the number of digits before the decimal point exceeds a specified limit, the system discards the entire log as dirty data and increases the number of failed logs.
- By default, dirty data is discarded during data shipping.
- If a value that does not exist in a log is shipped to MaxCompute, the value is converted
to a default value or Null.
- If a default value is specified when you create a MaxCompute table, a value that does not exist in a log is converted to the default value after the value is shipped to MaxCompute.
- If no default value is specified when you create a MaxCompute table and the value Null is allowed, a value that does not exist in a log is converted to Null after the value is shipped to MaxCompute.
- You can run a maximum of 64 data shipping instances to ship data to MaxCompute at the same time. A maximum of 10 MB of data can be written to each MaxCompute partition per second.
Procedure
Data model mapping
If a data shipping job ships data from Log Service to MaxCompute, the data model mapping between the two services is enabled. The following table describes the usage notes and provides examples.
- A MaxCompute table contains at least one data column and one partition key column.
- We recommend that you use the following reserved fields in Log Service: __partition_time__, __source__, and __topic__.
- Each MaxCompute table can have a maximum of 60,000 partitions. If the number of partitions exceeds the maximum value, data cannot be written to the table.
- The previous name of the system reserved field __extract_others__ is _extract_others_. Both names can be used together.
- The value of the MaxCompute partition key column cannot be set to reserved words or keywords of MaxCompute. For more information, see Reserved words and keywords.
- You cannot leave the fields of MaxCompute partition key columns empty. The fields that are mapped to partition key columns must be reserved fields or log fields. You can use the cast function to convert a field of the string type to the type of the corresponding partition key column. Logs that correspond to an empty partition key column is discarded during data shipping.
- In Log Service, a log field can be mapped to only one data column or partition key column of a MaxCompute table. Field redundancy is not supported.
MaxCompute column type | Column name in MaxCompute | Data type in MaxCompute | Log field name in Log Service | Field type in Log Service | Description |
---|---|---|---|---|---|
Data column | log_source | string | __source__ | Reserved field | The source of the log. |
log_time | bigint | __time__ | Reserved field | The UNIX timestamp of the log. This field corresponds to the Time field in the data model. | |
log_topic | string | __topic__ | Reserved field | The topic of the log. | |
time | string | time | Log content field | This field is parsed from logs and corresponds to the key-value field in the data model. In most cases, the value of the __time__ field in the data that is collected by using Logtail is the same as the value of the time field. | |
ip | string | ip | Log content field | This field is parsed from logs. | |
thread | string | thread | Log content field | This field is parsed from logs. | |
log_extract_others | string | __extract_others__ | Reserved field | Other log fields that are not mapped in the configuration are serialized into JSON data based on the key-value field. The JSON data has a single-layer structure and JSON nesting is not supported for the log fields. | |
Partition key column | log_partition_time | string | __partition_time__ | Reserved field | This field is calculated based on the value of the __time__ field of the log. You can specify the partition granularity for the field. |
status | string | status | Log content field | This field is parsed from logs. The value of this field supports enumeration to ensure that the number of partitions does not exceed the upper limit. |
Shipping modes
- Real Time: reads data from the Logstore and ships the data to MaxCompute in real time.
- Batch Shipping: reads the data that is generated 5 to 10 minutes earlier than the current time from
the Logstore and ships the data to MaxCompute in a batch.
If you set the Shipping Mode parameter to Batch Shipping, you can set the Start At or End Time parameter in the Start Time Range section only to a point in time that is a multiple of 5 minutes. For example,
2022-05-24 16:35:00
is a valid start time or end time, whereas2022-05-24 16:36:00
is not.You can ship the data of the __unique_id__ and __receive_time__ fields in batch shipping mode.- The value of the __unique_id__ field is a unique 64-bit string that is used to identify a log.
If you want to ship the data of the __unique_id__ field, you can add this field only in the MaxCompute Common Column parameter.
- The value of the __receive_time__ field indicates the time when a log is received by Log Service. You can use a partition
format to configure the format of the value of the __receive_time__ field. The time
can be accurate to 30 minutes. For more information about the time partition format,
see References.
If you want to ship the data of the __receive_time__ field, you can add this field only in the MaxCompute Partition Column parameter.
- The value of the __unique_id__ field is a unique 64-bit string that is used to identify a log.
References
- __partition_time__ field
In most cases, MaxCompute filters data by time or uses the timestamp of a log as a partition field.
- Format
The value of the __partition_time__ field is calculated based on the value of the __time__ field in Log Service. The value is a time string that is generated based on the time zone and partition time format. The value of the date partition key column is specified based on an interval of 1,800 seconds. This prevents the number of partitions in a single MaxCompute table from exceeding the limit.
For example, if the timestamp of a log in Log Service is 27/Jan/2022 20:50:13 +0800, Log Service calculates the value of the __time__ field based on the timestamp. The value is a UNIX timestamp of 1643287813. The following table describes the values of the time partition column in different configurations.Partition Format __partition_time__ %Y_%m_%d_%H_%M_00 2022_01_27_20_30_00 %Y_%m_%d_%H_%M 2022_01_27_20_30 %Y%m%d 20220127 - Usage
You can use the __partition_time__ field to filter data to prevent a full-table scan. For example, you can execute the following query statement to query the log data of January 26, 2022:
select * from {ODPS_TABLE_NAME} where log_partition_time >= "2022_01_26" and log_partition_time < "2022_01_27";
- Format
- __extract_others__ and __extract_others_all__ fields
- The value of the __extract_others__ field contains all fields that are not mapped, excluding the __topic__, __tag__:*, and __source__ fields.
- The value of the __extract_others_all__ field contains all fields that are not mapped, including the __topic__, __tag__:*, and __source__ fields.