You can query collected logs in real time in the Log Service console and ship the logs to MaxCompute for business intelligence (BI) analysis and data mining. This topic describes how to ship logs to MaxCompute in the Log Service console.
Prerequisites
Limits
- You can create data shipping jobs only by using an Alibaba Cloud account. You cannot create data shipping jobs by using a RAM user.
- Do not ship logs from multiple Logstores to a MaxCompute table. If you ship logs from multiple Logstores to a MaxCompute table, the existing data in the MaxCompute table may be overwritten.
- Logs that are generated 14 days before the current day are automatically discarded in log shipping jobs. The _time_ field specifies the point in time when logs are generated.
- You cannot ship logs whose data type is DECIMAL, DATETIME, DATE, or TIMESTAMP to MaxCompute. For more information, see MaxCompute V2.0 data type edition.
- The following table describes the regions that support data shipping to MaxCompute.
If Log Service is deployed in other regions, use DataWorks to synchronize data. For
more information, see Use Data Integration to synchronize data from a LogHub data source to a destination.
Region where your Log Service project resides Region where MaxCompute project resides China (Qingdao) China (Shanghai) China (Beijing) China (Beijing) and China (Shanghai) China (Zhangjiakou) China (Shanghai) China (Hohhot) China (Shanghai) China (Hangzhou) China (Shanghai) China (Shanghai) China (Shanghai) China (Shenzhen) China (Shenzhen) and China (Shanghai) China (Hong Kong) China (Shanghai)
Step 1: Create a data shipping job
Step 2: View data in MaxCompute
| log_source | log_time | log_topic | time | ip | thread | log_extract_others | log_partition_time | status |
+------------+------------+-----------+-----------+-----------+-----------+------------------+--------------------+-----------+
| 10.10.*.* | 1453899013 | | 27/Jan/2016:20:50:13 +0800 | 10.10.*.* | 414579208 | {"url":"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1","user-agent":"aliyun-sdk-java"} | 2016_01_27_20_50 | 200 |
+------------+------------+-----------+-----------+-----------+-----------+------------------+--------------------+-----------+
Grant the account for Log Service the permissions to ship data
If you delete a MaxCompute table and create a new MaxCompute table in the DataWorks console, the default authorization that is performed for the previous table becomes invalid. You must grant Log Service the permissions to ship data.
Related operations
- Modify a data shipping job.
Click Settings to modify the data shipping job. For information about the parameters, see Step 1: Create a data shipping job. If you want to add a column, you can modify the schema of the corresponding table in MaxCompute.
- Disable the data shipping feature.
Click Disable. The data in the Logstore is no longer shipped to MaxCompute.
- View the status and error messages of shipping jobs.
You can view the shipping jobs that are performed on the previous two days and the status of the jobs.
- Job status
Status Description Success The shipping job succeeded. Running The shipping job is running. Check whether the job succeeds later. Failed The shipping job failed. If the job cannot be restarted due to external reasons, troubleshoot the failure based on the error message and retry the job. For example, the MaxCompute schema does not comply with the Log Service specifications or Log Service is not authorized to access MaxCompute. Log Service allows you to retry all failed jobs of the last two days.
- Error messages
If a shipping job fails, an error message is returned for the job.
Error message Solution The MaxCompute project does not exist. Check whether the specified MaxCompute project exists in the MaxCompute console. If the MaxCompute project does not exist, you must create a MaxCompute project. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue. The MaxCompute table does not exist. Check whether the MaxCompute table that is specified exists in the MaxCompute console. If the MaxCompute table does not exist, you must create a MaxCompute table. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue. Log Service is not authorized to access the MaxCompute project or table. Check whether the required permissions are granted to the account that is used to log on to Log Service in the MaxCompute console. If not, grant permissions to the account. For more information, see Grant the account for Log Service the permissions to ship data. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue. A MaxCompute error has occurred. A MaxCompute error is returned for the shipping job. For more information, see the MaxCompute documentation or submit a ticket to contact MaxCompute technical support. Log Service automatically retries all failed jobs of the last two days. The MaxCompute schema does not comply with the Log Service specifications. Reconfigure the mappings between the columns of the MaxCompute table and the log field in Log Service. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue. - Retry a data shipping job
Log Service automatically retries the jobs that fail due to internal errors. In other cases, you need to manually retry failed jobs. The minimum interval between two consecutive automatic retries is 30 minutes. If a job fails, wait for 30 minutes before you retry the job. Log Service allows you to retry all failed jobs of the last two days.
To immediately retry a failed job, you can click Retry All Failed Tasks. You can also call an API operation or use an SDK to retry a job.
- Job status
References
- __partition_time__
In most cases, MaxCompute filters data by time or uses the timestamp of a log as a partition field.
- Format
The value of the __partition_time__ field is calculated based on the value of the __time__ field in Log Service and is rounded down to the nearest integer based on the partition time format. The value of the date partition key column is specified based on the shipping interval. This prevents the number of partitions in a single MaxCompute table from exceeding the limit.
For example, if the timestamp of a log in Log Service is 27/Jan/2016 20:50:13 +0800, Log Service calculates the value of the __time__ field based on the timestamp. The value is a UNIX timestamp of 1453899013. The following table describes the values of the time partition column in different configurations.Shipping Interval Partition Format __partition_time__ 1800 yyyy_MM_dd_HH_mm_00 2016_01_27_20_30_00 1800 yyyy-MM-dd HH:mm 2016-01-27 20:30 1800 yyyyMMdd 20160127 3600 yyyyMMddHHmm 201601272000 3600 yyyy_MM_dd_HH 2016_01_27_20 - Usage
You can use the __partition_time__ field to filter data to prevent a full-table scan. For example, you can execute the following query statement to query the log data of January 26, 2016:
select * from {ODPS_TABLE_NAME} where log_partition_time >= "2015_01_26" and log_partition_time < "2016_01_27";
- Format
- __extract_others__
__extract_others__ indicates a JSON string. For example, you can execute the following query statement to obtain the user-agent content of this field:
select get_json_object(sls_extract_others, "$.user-agent") from {ODPS_TABLE_NAME} limit 10;
Note- get_json_object indicates the standard user-defined function (UDF) provided by MaxCompute. Contact MaxCompute technical support to o grant the permissions that are required to use the standard UDF. For more information, see Standard UDF provided by MaxCompute.
- The preceding example is for reference only. The instructions provided in MaxCompute documentation prevail.
- Data model mapping
If a data shipping job ships data from Log Service to MaxCompute, the data model mapping between the two services is enabled. The following table describes the usage notes and provides examples.
- A MaxCompute table contains at least one data column and one partition key column.
- We recommend that you use the following reserved fields in Log Service: __partition_time__, __source__, and __topic__.
- Each MaxCompute table can have a maximum of 60,000 partitions. If the number of partitions exceeds the maximum value, data cannot be written to the table.
- Shipping jobs are executed in batches. When you specify custom fields as partition key columns and specify data types for a data shipping job, make sure that the number of partitions that each data shipping job needs to process does not exceed 512. Otherwise, no data can be written to MaxCompute.
- The previous name of the system reserved field __extract_others__ is _extract_others_. Both names can be used together.
- The value of the MaxCompute partition key column cannot be set to reserved words or keywords of MaxCompute. For more information, see Reserved words and keywords.
- You cannot leave the fields of MaxCompute partition key columns empty. The fields that are mapped to partition key columns must be reserved fields or log fields. You can use the cast function to convert a field of the string type to the type of the corresponding partition key column. Logs that correspond to an empty partition key column are discarded during data shipping.
- In Log Service, a log field can be mapped to only one data column or partition key column of a MaxCompute table. Field redundancy is not supported. If the same field name is reused, the value that is shipped is null. If null appears in the partition key column, data cannot be shipped.
The following table describes the mapping relationships between MaxCompute data columns, partition key columns, and Log Service fields. For more information about reserved fields in Log Service, see Reserved fields.MaxCompute column type Column name in MaxCompute Data type in MaxCompute Log field name in Log Service Field type in Log Service Description Data column log_source string __source__ Reserved field The source of the log. log_time bigint __time__ Reserved field The UNIX timestamp of the log. It is the number of seconds that have elapsed since 00:00:00 UTC, Thursday, January 1, 1970. This field corresponds to the Time field in the data model. log_topic string __topic__ Reserved field The topic of the log. time string time Log content field This field is parsed from logs and corresponds to the key-value field in the data model. In most cases, the value of the __time__ field in the data that is collected by using Logtail is the same as the value of the time field. ip string ip Log content field This field is parsed from logs. thread string thread Log content field This field is parsed from logs. log_extract_others string __extract_others__ Reserved field Other log fields that are not mapped in the configuration are serialized into JSON data based on the key-value field. The JSON data has a single-layer structure and JSON nesting is not supported for the log fields. Partition key column log_partition_time string __partition_time__ Reserved field This field is calculated based on the value of the __time__ field of the log. You can specify the partition granularity for the field. status string status Log content field This field is parsed from logs. The value of this field supports enumeration to ensure that the number of partitions does not exceed the upper limit.