You can query collected logs in real time in the Log Service console and ship the logs to MaxCompute for business intelligence (BI) analysis and data mining. This topic describes how to ship logs to MaxCompute in the Log Service console.

Prerequisites

MaxCompute is activated and a MaxCompute table is created. For more information, see Create tables.

Limits

  • You can create data shipping jobs only by using an Alibaba Cloud account. You cannot create data shipping jobs by using a RAM user.
  • Do not ship logs from multiple Logstores to a MaxCompute table. If you ship logs from multiple Logstores to a MaxCompute table, the existing data in the MaxCompute table may be overwritten.
  • Logs that are generated 14 days before the current day are automatically discarded in log shipping jobs. The _time_ field specifies the point in time when logs are generated.
  • You cannot ship logs whose data type is DECIMAL, DATETIME, DATE, or TIMESTAMP to MaxCompute. For more information, see MaxCompute V2.0 data type edition.
  • The following table describes the regions that support data shipping to MaxCompute. If Log Service is deployed in other regions, use DataWorks to synchronize data. For more information, see Use Data Integration to synchronize data from a LogHub data source to a destination.
    Region where your Log Service project resides Region where MaxCompute project resides
    China (Qingdao) China (Shanghai)
    China (Beijing) China (Beijing) and China (Shanghai)
    China (Zhangjiakou) China (Shanghai)
    China (Hohhot) China (Shanghai)
    China (Hangzhou) China (Shanghai)
    China (Shanghai) China (Shanghai)
    China (Shenzhen) China (Shenzhen) and China (Shanghai)
    China (Hong Kong) China (Shanghai)

Step 1: Create a data shipping job

  1. Log on to the Log Service console.
  2. In the Projects section, click the project that you want to view.
  3. Choose Log Storage > Logstores. On the Logstores tab, find the Logstore that you want to manage, click > and choose Data Transformation > Export > MaxCompute (Formerly ODPS).
  4. On the MaxCompute (Formerly ODPS) LogShipper page, click Enable.
  5. In the Shipping Notes dialog box, click Ship.
  6. In the Ship Data to MaxCompute panel, configure the shipping rule and click OK.
    The following table describes the parameters.
    Parameter Description
    Select Region The regions where MaxCompute is supported vary with the region where your Log Service project resides. For more information, see Limits.
    Shipping Task Name The name of the data shipping job.
    MaxCompute Project Name The name of the MaxCompute project.
    Table Name The name of the MaxCompute table.
    MaxCompute Common Column The mappings between log fields and common columns. In the left text box, enter the name of a log field that is mapped to a column in the MaxCompute table. In the right text box, enter the name of column. For more information, see Data model mapping.
    Important
    • Log Service ships logs to MaxCompute based on the sequence of the specified log fields and MaxCompute table columns. Changing the names of these columns do not affect the data shipping process. If you modify the schema of a MaxCompute table, you must reconfigure the mappings between log fields and MaxCompute table columns.
    • The name of the log field that you specify in the left text box cannot contain double quotation marks ("") or single quotation marks (""), and the name cannot be a string that contains spaces.
    • If a log contains two fields that have the same name, such as request_time, Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Log Service. When you configure a shipping rule, you can use only the original field name request_time.

      If a log contains fields that have the same name, Log Service randomly ships the value of one of the fields. We recommend that you do not include fields that have the same name in your logs.

    MaxCompute Partition Column The mappings between log fields and partition key columns. In the left text box, enter the name of a log field that is mapped to a partition key column in the MaxCompute table. In the right text box, enter the name of the partition key column. For more information, see Data model mapping.
    Note You can specify up to three partition key columns. When you specify custom fields as partition key columns and specify data types for a data shipping job, make sure that the number of partitions that each data shipping job needs to process less than 512.
    Partition Format For information about the configuration examples and parameters of the partition format, see Examples and Java SimpleDateFormat.
    Note
    • The partition format takes effect only if a partition field in the MaxCompute Partition Column parameter is set to __partition_time__.
    • Do not use the date format that is accurate to seconds. If you use the date format that is accurate to seconds, the number of partitions in a single table may exceed the limit of 60,000.
    • Make sure that the number of partitions that each data shipping job needs to process less than 512.
    Shipping Interval The duration of the data shipping job. Default value: 1800. Unit: seconds.

    When the specified duration elapses, another data shipping job is created.

    After data shipping is enabled, log data is shipped to MaxCompute in 1 hour after the data is written to the Logstore. After the log data is shipped, you can view the log data in MaxCompute. For more information, see How do I check the completeness of data that is shipped from Log Service to MaxCompute?.

Step 2: View data in MaxCompute

After data is shipped to MaxCompute, you can view the data in MaxCompute. The following figure shows sample data. You can use the big data development tool Data IDE that is integrated with MaxCompute to consume data and perform BI analysis and data mining in a visualized manner.
| log_source | log_time | log_topic | time | ip | thread | log_extract_others | log_partition_time | status |
+------------+------------+-----------+-----------+-----------+-----------+------------------+--------------------+-----------+
| 10.10.*.* | 1453899013 | | 27/Jan/2016:20:50:13 +0800 | 10.10.*.* | 414579208 | {"url":"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=****************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=******************************** HTTP/1.1","user-agent":"aliyun-sdk-java"} | 2016_01_27_20_50 | 200 |
+------------+------------+-----------+-----------+-----------+-----------+------------------+--------------------+-----------+

Grant the account for Log Service the permissions to ship data

If you delete a MaxCompute table and create a new MaxCompute table in the DataWorks console, the default authorization that is performed for the previous table becomes invalid. You must grant Log Service the permissions to ship data.

  1. Log on to the DataWorks console.
  2. On the Workspaces page, find the workspace that you want to manage and click DataStudio in the Actions column.
  3. Create a workflow.
    1. On the Scheduled Workflow page, choose Create > Create Workflow.
    2. In the Create Workflow dialog box, configure the Workflow Name parameter and click Create.
  4. Create a node.
    1. On the Scheduled Workflow page, choose Create > Create Node > ODPS SQL.
    2. In the Create Node dialog box, configure the Name and Path parameters, and then click Commit.
      You must set the Path parameter to the workflow that you created in Step 3.
  5. In the editor of the node, run the required commands to complete the authorization. The following table describes the commands.
    Authorization
    Command Description
    ADD USER aliyun$shennong_open@aliyun.com; Adds a user to the MaxCompute project.

    shennong_open@aliyun.com indicates the account for Log Service. You cannot change this value.

    GRANT Read, List ON PROJECT {ODPS_PROJECT_NAME} TO USER aliyun$shennong_open@aliyun.com; Grants the user the permissions to read and query MaxCompute projects.

    {ODPS_PROJECT_NAME} specifies the name of the MaxCompute project. Replace the variable with an actual project name.

    GRANT Describe, Alter, Update ON TABLE {ODPS_TABLE_NAME} TO USER aliyun$shennong_open@aliyun.com; Grants the Describe permission, Alter permission, and Update permission to the user.

    {ODPS_TABLE_NAME} specifies the name of the MaxCompute table. Replace the variable with an actual table name.

    SHOW GRANTS FOR aliyun$shennong_open@aliyun.com; Check whether the authorization is successful.
    If an output that is similar to the following example appears, the authorization is successful:
    A       projects/{ODPS_PROJECT_NAME}: List | Read
    A       projects/{{ODPS_PROJECT_NAME}/tables/{ODPS_TABLE_NAME}: Describe | Alter | Update

Related operations

After you create a data shipping job, you can modify the job on the MaxCompute (Formerly ODPS) LogShipper page. You can also disable the data shipping feature, view the status and error messages of jobs, and retry failed jobs.
  • Modify a data shipping job.

    Click Settings to modify the data shipping job. For information about the parameters, see Step 1: Create a data shipping job. If you want to add a column, you can modify the schema of the corresponding table in MaxCompute.

  • Disable the data shipping feature.

    Click Disable. The data in the Logstore is no longer shipped to MaxCompute.

  • View the status and error messages of shipping jobs.

    You can view the shipping jobs that are performed on the previous two days and the status of the jobs.

    • Job status
      Status Description
      Success The shipping job succeeded.
      Running The shipping job is running. Check whether the job succeeds later.
      Failed The shipping job failed. If the job cannot be restarted due to external reasons, troubleshoot the failure based on the error message and retry the job. For example, the MaxCompute schema does not comply with the Log Service specifications or Log Service is not authorized to access MaxCompute.

      Log Service allows you to retry all failed jobs of the last two days.

    • Error messages
      If a shipping job fails, an error message is returned for the job.
      Error message Solution
      The MaxCompute project does not exist. Check whether the specified MaxCompute project exists in the MaxCompute console. If the MaxCompute project does not exist, you must create a MaxCompute project. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue.
      The MaxCompute table does not exist. Check whether the MaxCompute table that is specified exists in the MaxCompute console. If the MaxCompute table does not exist, you must create a MaxCompute table. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue.
      Log Service is not authorized to access the MaxCompute project or table. Check whether the required permissions are granted to the account that is used to log on to Log Service in the MaxCompute console. If not, grant permissions to the account. For more information, see Grant the account for Log Service the permissions to ship data. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue.
      A MaxCompute error has occurred. A MaxCompute error is returned for the shipping job. For more information, see the MaxCompute documentation or submit a ticket to contact MaxCompute technical support. Log Service automatically retries all failed jobs of the last two days.
      The MaxCompute schema does not comply with the Log Service specifications. Reconfigure the mappings between the columns of the MaxCompute table and the log field in Log Service. Log Service does not automatically retry the job that fails due to this error. You need to manually retry the job after you fix the issue.
    • Retry a data shipping job

      Log Service automatically retries the jobs that fail due to internal errors. In other cases, you need to manually retry failed jobs. The minimum interval between two consecutive automatic retries is 30 minutes. If a job fails, wait for 30 minutes before you retry the job. Log Service allows you to retry all failed jobs of the last two days.

      To immediately retry a failed job, you can click Retry All Failed Tasks. You can also call an API operation or use an SDK to retry a job.

References

  • __partition_time__

    In most cases, MaxCompute filters data by time or uses the timestamp of a log as a partition field.

    • Format

      The value of the __partition_time__ field is calculated based on the value of the __time__ field in Log Service and is rounded down to the nearest integer based on the partition time format. The value of the date partition key column is specified based on the shipping interval. This prevents the number of partitions in a single MaxCompute table from exceeding the limit.

      For example, if the timestamp of a log in Log Service is 27/Jan/2016 20:50:13 +0800, Log Service calculates the value of the __time__ field based on the timestamp. The value is a UNIX timestamp of 1453899013. The following table describes the values of the time partition column in different configurations.
      Shipping Interval Partition Format __partition_time__
      1800 yyyy_MM_dd_HH_mm_00 2016_01_27_20_30_00
      1800 yyyy-MM-dd HH:mm 2016-01-27 20:30
      1800 yyyyMMdd 20160127
      3600 yyyyMMddHHmm 201601272000
      3600 yyyy_MM_dd_HH 2016_01_27_20
    • Usage
      You can use the __partition_time__ field to filter data to prevent a full-table scan. For example, you can execute the following query statement to query the log data of January 26, 2016:
      select * from {ODPS_TABLE_NAME} where log_partition_time >= "2015_01_26" and log_partition_time < "2016_01_27";
  • __extract_others__
    __extract_others__ indicates a JSON string. For example, you can execute the following query statement to obtain the user-agent content of this field:
    select get_json_object(sls_extract_others, "$.user-agent") from {ODPS_TABLE_NAME} limit 10;
    Note
    • get_json_object indicates the standard user-defined function (UDF) provided by MaxCompute. Contact MaxCompute technical support to o grant the permissions that are required to use the standard UDF. For more information, see Standard UDF provided by MaxCompute.
    • The preceding example is for reference only. The instructions provided in MaxCompute documentation prevail.
  • Data model mapping

    If a data shipping job ships data from Log Service to MaxCompute, the data model mapping between the two services is enabled. The following table describes the usage notes and provides examples.

    • A MaxCompute table contains at least one data column and one partition key column.
    • We recommend that you use the following reserved fields in Log Service: __partition_time__, __source__, and __topic__.
    • Each MaxCompute table can have a maximum of 60,000 partitions. If the number of partitions exceeds the maximum value, data cannot be written to the table.
    • Shipping jobs are executed in batches. When you specify custom fields as partition key columns and specify data types for a data shipping job, make sure that the number of partitions that each data shipping job needs to process does not exceed 512. Otherwise, no data can be written to MaxCompute.
    • The previous name of the system reserved field __extract_others__ is _extract_others_. Both names can be used together.
    • The value of the MaxCompute partition key column cannot be set to reserved words or keywords of MaxCompute. For more information, see Reserved words and keywords.
    • You cannot leave the fields of MaxCompute partition key columns empty. The fields that are mapped to partition key columns must be reserved fields or log fields. You can use the cast function to convert a field of the string type to the type of the corresponding partition key column. Logs that correspond to an empty partition key column are discarded during data shipping.
    • In Log Service, a log field can be mapped to only one data column or partition key column of a MaxCompute table. Field redundancy is not supported. If the same field name is reused, the value that is shipped is null. If null appears in the partition key column, data cannot be shipped.
    The following table describes the mapping relationships between MaxCompute data columns, partition key columns, and Log Service fields. For more information about reserved fields in Log Service, see Reserved fields.
    MaxCompute column type Column name in MaxCompute Data type in MaxCompute Log field name in Log Service Field type in Log Service Description
    Data column log_source string __source__ Reserved field The source of the log.
    log_time bigint __time__ Reserved field The UNIX timestamp of the log. It is the number of seconds that have elapsed since 00:00:00 UTC, Thursday, January 1, 1970. This field corresponds to the Time field in the data model.
    log_topic string __topic__ Reserved field The topic of the log.
    time string time Log content field This field is parsed from logs and corresponds to the key-value field in the data model. In most cases, the value of the __time__ field in the data that is collected by using Logtail is the same as the value of the time field.
    ip string ip Log content field This field is parsed from logs.
    thread string thread Log content field This field is parsed from logs.
    log_extract_others string __extract_others__ Reserved field Other log fields that are not mapped in the configuration are serialized into JSON data based on the key-value field. The JSON data has a single-layer structure and JSON nesting is not supported for the log fields.
    Partition key column log_partition_time string __partition_time__ Reserved field This field is calculated based on the value of the __time__ field of the log. You can specify the partition granularity for the field.
    status string status Log content field This field is parsed from logs. The value of this field supports enumeration to ensure that the number of partitions does not exceed the upper limit.