You can use Log Service to collect log data and ship the log data to Object Storage Service (OSS) for storage and analysis. This topic describes how to ship log data from Log Service to OSS.

Prerequisites

Background information

Log Service can automatically ship log data from a Logstore to an OSS bucket.
  • You can configure a lifecycle rule for the log data that is stored in the OSS bucket. You can configure the OSS bucket to store the log data for a long period of time.
  • You can use data processing platforms, such as E-MapReduce and Data Lake Analytics (DLA), or use custom programs to consume log data from the OSS bucket.

Ship log data

  1. Log on to the Log Service console.
  2. In the Projects section, click the project that you want to view.
  3. On the Log Storage > Logstores tab, click the > icon of the Logstore. Then, choose Data Transformation > Export > Object Storage Service (OSS).
  4. Move the pointer over Object Storage Service (OSS) and click +.
  5. In the OSS LogShipper panel, configure the parameters and click OK.
    You must set Shipping Version to Old Version and configure other parameters based on the following descriptions.
    Important
    • After you configure a shipping rule, multiple shipping tasks can be created to concurrently run. The frequency at which a shipping task is created is based on the Shipping Size and Shipping Time parameters that you specify when you configure the shipping rule. If one of the conditions specified by the parameters is met, a shipping task is created.
    • After a shipping task is created, you can check whether the shipping rule meets your requirements based on the task status and the data that is shipped to OSS.
    Parameter Description
    OSS Shipper Name The name of the shipping rule.
    OSS Bucket The name of the OSS bucket to which you want to ship log data.
    Important You must specify the name of an existing OSS bucket. The OSS bucket that you specify must reside in the same region as the Log Service project.
    File Delivery Directory The directory to which you want to ship log data in the OSS bucket. The value cannot start with a forward slash (/) or a backslash (\).

    After an OSS shipping task is created, the data in the Logstore is shipped to this directory.

    Shard Format The partition format that is used to generate subdirectories in the OSS bucket. A subdirectory is dynamically generated based on the time at which a shipping task is created. The default partition format is %Y/%m/%d/%H/%M. The partition format cannot start with a forward slash (/). For more information about partition format examples, see Shard Format. For more information about the parameters of partition formats, see strptime.
    OSS Write RAM Role The method that is used to authorize an OSS shipping task to write data to the OSS bucket. Valid values:
    • Default Role: specifies that the OSS shipping task assumes the system role AliyunLogDefaultRole to write data to the OSS bucket. If you select this option, the Alibaba Cloud Resource Name (ARN) of the system role AliyunLogDefaultRole is automatically passed in.
    • Custom Role: specifies that the OSS shipping task assumes a custom role to write data to the OSS bucket. If you select this option, you must grant the custom role the permissions to write data to the OSS bucket. Then, enter the ARN of the custom role in the OSS Write RAM Role field. For more information, see Perform authorization in the RAM console.
    For more information about how to obtain an ARN, see How do I obtain the ARN of a RAM role?
    Shipping Size The size of log data in a shard. When the size is reached, the log data is shipped to OSS. The value of this parameter determines the size of raw log data that is shipped to OSS and stored in an object. Valid values: 5 to 256. Unit: MB.

    If the size of log data that you want to ship reaches the specified value, a shipping task is automatically created.

    Storage Format The storage format for log data. After log data is shipped to OSS, the log data can be stored in different formats. For more information, see JSON format, CSV format, and Parquet format.
    Compress Specifies whether to compress log data that is shipped to OSS. Valid values:
    • No Compress: Log data is not suppressed.
    • Compress(snappy): The snappy algorithm is used to compress log data. This way, less storage space is occupied in the OSS bucket. For more information, see snappy.
    Shipping Time The interval between two shipping tasks that ship the log data of a shard. Valid values: 300 to 900. Default value: 300. Unit: seconds.

    If the interval is reached, a shipping task is created.

View OSS data

After log data is shipped to OSS, you can view the log data in the OSS console. You can also view the log data by using other methods, such as OSS API or OSS SDK. For more information, see Manage objects.

The following script provides an example of the URL for an OSS object:
oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID
OSS-BUCKET is the name of the OSS bucket. OSS-PREFIX is the directory in the OSS bucket. PARTITION-FORMAT is the partition format that is used to generate subdirectories. A subdirectory is generated based on the time at which a shipping task is created by using the strptime function. For more information about the strptime function, see strptime. RANDOM-ID is the unique identifier of a shipping task.
Note A subdirectory is generated based on the time at which a shipping task is created. For example, a shipping task was created at 00:00:00 on June 23, 2016. The data that was written to Log Service after 23:55:00 on June 22, 2016 is shipped to OSS. Data is shipped at 5-minute intervals. If you want to analyze the log data of June 22, 2016, you must check all the objects in the 2016/06/22 subdirectory. You must also check whether the objects that were generated before 00:10:00 on June 23, 2016 in the 2016/06/23/00/ subdirectory include the log data that was generated on June 22, 2016.

Shard Format

Each shipping task corresponds to an OSS object URL, which is in the oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID format. PARTITION-FORMAT is the partition format that is used to generate subdirectories. A subdirectory is generated based on the time at which a shipping task is created. The following table describes various partition formats for a shipping task that was created at 19:50:43 on January 20, 2017.
OSS Bucket OSS prefix Partition format URL for the OSS object
test-bucket test-table %Y/%m/%d/%H/%M oss://test-bucket/test-table/2017/01/20/19/50_1484913043351525351_2850008
test-bucket log_ship_oss_example year=%Y/mon=%m/day=%d/log_%H%M%s oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet
test-bucket log_ship_oss_example ds=%Y%m%d/%H oss://test-bucket/log_ship_oss_example/ds=20170120/19_1484913043351525351_2850008.snappy
test-bucket log_ship_oss_example %Y%m%d/ oss://test-bucket/log_ship_oss_example/20170120/_1484913043351525351_2850008
Note If you use this format, platforms such as Hive may fail to parse log data in the OSS bucket. We recommend that you do not use this format.
test-bucket log_ship_oss_example %Y%m%d%H oss://test-bucket/log_ship_oss_example/2017012019_1484913043351525351_2850008

You can use big data platforms such as Hive, MaxCompute, or DLA to analyze OSS data. If you want to use partition format information, you can set PARTITION-FORMAT in the key=value format. Example URL of an OSS object: oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet. In this example, year, month, and day are specified as partition key columns.

What to do next

After a shipping task is created, you can modify the shipping rule, disable the data shipping feature, view the status and error messages of the shipping task, and retry the shipping task on the OSS Shipper page.
  • Modify a shipping rule

    Click Settings to modify the shipping rule. For more information about parameters, see Ship log data.

  • Disable the data shipping feature

    Click Disable to disable the data shipping feature.

  • View the status and error messages of a shipping task

    You can view the shipping tasks of the previous two days and the statuses of the tasks.

    • Task statuses
      Status Description
      Succeeded The shipping task succeeded.
      Running The shipping task is running. Check whether the task succeeds later.
      Failed The shipping task failed. If the task cannot be retried due to external causes, troubleshoot the failure based on the error message and retry the task.
    • Error messages
      If a shipping task fails, an error message is returned for the task.
      Error message Cause Solution
      UnAuthorized The error message returned because the shipping task does not have the required permissions. To fix the error, check the following configurations:
      • Check whether the AliyunLogDefaultRole role is created for the Alibaba Cloud account to which the OSS bucket belongs.
      • Check whether the Alibaba Cloud account ID that is configured in the policy is valid.
      • Check whether the AliyunLogDefaultRole role is granted the permissions to write data to the OSS bucket.
      • Check whether the ARN of the AliyunLogDefaultRole role is valid.
      ConfigNotExist The error message returned because the configuration of the task does not exist. In most cases, the error occurs because the data shipping feature is disabled. We recommend that you enable the data shipping feature, configure a shipping rule for the task, and then retry the task.
      InvalidOssBucket The error message returned because the specified OSS bucket does not exist. To fix the error, check the following configurations:
      • Check whether the specified OSS bucket resides in the same region as the Log Service project.
      • Check whether the name of the specified bucket is valid.
      InternalServerError The error message returned because an internal error has occurred in Log Service. Retry the shipping task.
    • Retry a shipping task

      By default, Log Service retries a shipping task based on the retry policy if the shipping task fails. You can also manually retry the shipping task. By default, Log Service retries all failed tasks of the previous two days. The minimum interval between two consecutive retries is 15 minutes. The first time a task fails, Log Service retries the task 15 minutes later. The second time the task fails, Log Service retries the task 30 minutes later. The third time the task fails, Log Service retries the task 60 minutes later. The same method is used for subsequent retries.

      To immediately retry a failed task, you can click Retry All Failed Tasks or Retry on the right side of the task. You can also call an API operation or use an SDK to retry the task.

FAQ

How do I obtain the ARN of a RAM role?

  1. Log on to the RAM console.
  2. In the left-side navigation pane, click RAM Roles.
  3. On the Roles page, find the AliyunLogDefaultRole role and click the name of the role.
  4. On the page that appears, obtain the ARN of the RAM role in the Basic Information section.
    Obtain the ARN of a RAM role