After collecting log data, Log Service can ship the log data to Object Storage Service (OSS) for storage and analysis. This topic describes how to ship log data from Log Service to OSS.

Prerequisites

  • Log Service is activated. A project and a Logstore are created, and log data exists in the Logstore. For more information, see Manage a project and Manage a Logstore.
  • OSS is activated. A bucket is created in the region where the Log Service project resides. For more information, see Activate OSS.
  • Resource Access Management (RAM) is activated.

Background information

Log Service can automatically ship data in Logstores to OSS.
  • OSS allows you to configure the lifecycle of log data for long-term storage.
  • OSS data can be consumed by user-created programs and other systems, such as E-MapReduce and Data Lake Analytics.

Benefits

Shipping log data from Log Service to OSS has the following benefits:

  • Ease of use: You can easily specify the settings in Log Service console to ship log data in Logstores to OSS.
  • High efficiency: Log Service integrates logs from multiple servers. This eliminates the need to repeatedly import logs from the servers to OSS.
  • Easy management: You can leverage the log classification and management feature of Log Service. After you ship log data in different Logstores and projects to different directories of an OSS bucket, you can easily manage the log data in OSS.

Additional considerations

  • The Log Service project and OSS bucket must be in the same region. You cannot ship log data across regions.
  • After you create or modify a shipping task, you can check whether the shipping rule meets your requirements as follows:
    1. Check whether the new shipping task is running as expected.
    2. Check whether the new shipping task ships data to the destination storage location as expected.

Procedure

  1. Use Resource Access Management (RAM) to authorize Log Service.
    Before you start a shipping task, you must authorize Log Service to write log data to OSS.

    Click Quick authorization. On the page that appears, click Confirm Authorization Policy. Then, Log Service obtains the permission to write log data to OSS.

    Note
  2. Enable and configure the OSS LogShipper feature
    1. On the Logstores tab, find the Logstore and click the Expand icon icon next to the Logstore. Choose Logstore > Data Transformation > Export.
    2. Click Object Storage Service (OSS) to go to the OSS Shipper page.
    3. Click Enable, set the parameters, and then click OK.
      The following table describes the parameters.
      Parameter Description Value range
      OSS Shipper Name The name of the LogShipper. The name can only contain lowercase letters, digits, hyphens (-), and underscores (_). It must start and end with a lowercase letter or digit and must be 3 to 63 bytes in length.
      OSS Bucket The name of an OSS bucket. The name must be an existing bucket name. The OSS bucket and Log Service project must be in the same region.
      OSS Prefix The prefix of a directory in the bucket that resides in OSS. The data shipped from Log Service is stored in the directory. The prefix must be an existing prefix in OSS.
      Shard Format The shard format generated by formatting the creation time of the shipping task. The default value is %Y/%m/%d/%H/%M. This format defines the directory of the objects written to OSS, where a forward slash (/) indicates a level of OSS directory. For more information about how an OSS prefix and shard format are used to define the OSS target file path, see the Shard Format section in this topic. For more information about formatting, visit the strptime API operation.
      RAM Role The Alibaba Cloud Resource Name (ARN) of the RAM role. The RAM role is the identity that the OSS bucket owner creates for access control. For more information, see Figure 2. Example: acs:ram::45643:role/aliyunlogdefaultrole.
      Shipping Size The maximum size of log data in the cache. If the maximum size is reached, a shipping task is generated. Valid values: 5 to 256. Unit: MB.
      Storage Format The format in which log data is stored in the OSS bucket. The following formats are supported: JSON, Parquet, and comma-separated values (CSV). For more information about how to specify the format, see JSON storage format, Parquet storage format, and CSV storage format.
      Compress Specifies whether raw logs are compressed.
      • No Compress: Raw logs are uncompressed.
      • Compress (snappy): The snappy algorithm is used to compress raw logs to reduce the usage of OSS bucket storage space.
      Ship Tags Specifies whether to ship log tags. Select Yes or No.
      Shipping Time The time interval between shipping tasks. Valid values: 300 to 900. Default value: 300. Unit: seconds.
      Figure 1. Configure OSS LogShipper
      Configure OSS LogShipper
      Figure 2. Obtain a role ARN
      Obtain a role ARN
      Note Log Service uses multiple threads to ship log data at the backend, and separately ships log data that is written to each shard. For each shard, the frequency of task generation is based on the shipping size and time. When either of the conditions is met, a shipping task is generated.

Shard format

For each shipping task, log data is written to an OSS object. The path format is oss:// OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID. A shard format is obtained by formatting the creation time of the shipping task. The following table describes the shard formats and file paths when a shipping task is created at 19:50:43 on January 20, 2017.
OSS Bucket OSS Prefix Shard format OSS file path
test-bucket test-table %Y/%m/%d/%H/%M oss://test-bucket/test-table/2017/01/20/19/50_1484913043351525351_2850008
test-bucket log_ship_oss_example year=%Y/mon=%m/day=%d/log_%H%M%s oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet
test-bucket log_ship_oss_example ds=%Y%m%d/%H oss://test-bucket/log_ship_oss_example/ds=20170120/19_1484913043351525351_2850008.snappy
test-bucket log_ship_oss_example %Y%m%d/ oss://test-bucket/log_ship_oss_example/20170120/_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y%m%d%H oss://test-bucket/log_ship_oss_example/2017012019_1484913043351525351_2850008

If you use big data platforms such as Hive, MaxCompute, or Data Lake Analytics to analyze OSS data, you can specify the directory at each layer in the key=value format. This format allows you to easily manage the partition information.

Example: oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet. In this example, the following three partition keys are specified: year, mon, and day.

Manage log shipping tasks

After you enable the OSS LogShipper feature, Log Service starts log shipping tasks in the backend at regular intervals. You can view the status of the task on the OSS Shipper page in the Log Service console.

You can view the following details:
  • The status of all log shipping tasks in the last two days. A task can be in the Success, Failed, or Running state. The Failed state indicates that an error occurs due to external causes and the task cannot be retried. You must fix the error based on the causes.
  • Error messages of the failed tasks. After log data is written to a Logstore, the data will be shipped to OSS buckets within 30 minutes. If a task fails, an error message appears in the Log Service console. Log Service will retry the task by default. You can also manually retry the task.
    • By default, Log Service retries tasks based on the annealing policy if the tasks failed in the last two days. The minimum interval between retries is 15 minutes. If a task fails for the first time, it can be retried in 15 minutes. If the task fails for the second time, it can be retried in 30 minutes (2 × 15 minutes). If the task fails for the third time, it can be retried in 60 minutes (2 × 30 minutes).
    • To retry the failed tasks, click Retry All Failed Tasks. You can also use API operations or SDKs to retry the failed tasks.
    The following table lists the common error messages and solutions.
    Error message Error cause Solution
    UnAuthorized The error message returned because you are not authorized. To fix this error, you must ensure the following:
    • The OSS bucket owner has created a role.
    • The account ID in the role description is valid.
    • The role has been granted the permission to write data into the OSS bucket.
    • The role ARN is correctly configured.
    ConfigNotExist The error message returned because the task does not exist. For example, the task is deleted. Recreate and retry the task.
    InvalidOssBucket The error message returned because the specified OSS bucket does not exist. To fix this error, you must ensure the following:
    • The OSS bucket and Log Service project are located in the same region.
    • The bucket name is correctly specified.
    InternalServerError The error message returned because an internal error occurs in Log Service. Retry the failed task.

Store OSS data

You can use the OSS console, API operations, or SDKs to access OSS data.

If you use the OSS console, log on to the OSS console, select a bucket, and then click Files. You can view the log data shipped from Log Service.

For more information, see the OSS documentation.

Object endpoint
oss:// OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID
  • Field description
    • OSS-BUCKET and OSS-PREFIX indicate the OSS bucket name and directory prefix that you specified. INCREMENTID is a random number added by the system.
    • The shard format is defined as %Y/%m/%d/%H/%M, where %Y, %m, %d, %H, and %M indicate the year, month, day, hour, and minute. The time when the shipping task is created in log Service is retrieved by calling the strptime API operation.
    • RANDOM-ID is the unique identifier of the shipping task.
  • Directory time

    The OSS directory is specified based on the creation time of the shipping task. If log data is shipped to OSS every five minutes, the shipping task created at 00:00:00 on June 23, 2016 ships log data written to Log Service after 23:55:00 on June 22, 2016. To retrieve all logs shipped on June 22, 2016, you must check all objects in the 2016/06/23/00/ directory. You must also check the 2016/06/23/00/ directory for the objects generated in the first 10 minutes.

Object storage format