Ship logs to OSS

Last Updated: Jul 06, 2017

LogShipper can automatically archive the logs in a Logstore to an OSS Bucket for activating such uses as:

  • OSS allows you to configure the lifecycle for the data stored in it as needed. You can then set a long-term for storing logs.
  • OSS data can be consumed through a user-defined program, as well as other systems (for example, E-MapReduce).

Using LogShipper to ship logs to an OSS Bucket provides advantages, such as:

  • Ease of use

    You can use Log Service to synchronize the data in a Logstore to OSS after completing a simple configuration setup.

  • Avoid repetitive collections

    Log collection involves the process of centralizing the logs. This means you do not have to collect the logs again before importing them to the OSS budget.

  • Full reuse of log grouping

    LogShipper automatically ships logs in different Projects and Logstores to different OSS Bucket directories, facilitating data management.

Here, an example is used to illustrate how to use the OSS shipping function of Log Service. Assume there are two primary accounts of Alibaba Cloud user: A and B.

Note:

  • A and B may be the same account. In this case, RAM authorization is recommended.
  • Log Service project and OSS Bucket must be in the same region. Cross-region data shipping is not supported.

With primary account A, project-a is activated in the Shenzhen region of Log Service, and a Logstore is created to store the nginx access log.

With primary account B, bucket-b is created in the Shenzhen region of OSS.

The nginx access log of project-a of Log Service will be archived in the prefix-b directory of bucket-b.

To implement the OSS shipping function through Log Service, you need to perform two steps (RAM authorization and shipping rule configuration).

Step 1: Resource Access Management (RAM) authorization

Quick authorization

Use primary account B to log on to Alibaba Cloud console, and activate Resource Access Management (RAM) free of charge.

Click RAM quick authorization page and confirm the granting of permissions of writing all OSS Buckets. Log Service writes the OSS Bucket of B in place of A.

View and modify the role

Use primary account B to log on to Alibaba Cloud console Resource Access Management (RAM). Access Role Management and view the role (for quick authorization, the created role is AliyunLogDefaultRole by default).

If A and B are different accounts, go to Change Role. If A and B are the same account, the created role of quick authorization can be directly used.

Record the role Arn (for example, acs:ram::1323431815622349:role/aliyunlogdefaultrole), as it needs to be provided to primary account A for the creation of OSS shipping rule.

By default, quick authorization grants B the permission of writing all OSS Buckets. For an extended access control list, go to Change Policy introduced below.

Authorization of creating shipping rule using a subaccount

After the role creation and authorization processes are completed for B, primary account A has the right to use the role created by primary account B to write data into OSS Bucket. This is provided that primary account A creates the shipping rule.

To use a sub-account a_1 of primary account A to create the shipping rule as the role, go to PassRole Authorization.

Step 2: Configure OSS shipping rule

  1. Log on to theLog Service console.

  2. Select a project, and click the project name or Manage on the right.

  3. Select the desired Logstore. Click OSS under the LogShipper column.

    1

  4. Click Enable, and set the following OSS shipping configuration, and click Confirm.

    • OSS Shipping Name: Name of the created shipping. The OSS shipping name can be 3 to 63 characters in length and can include lowercase letters, digits, hyphens (-), and underscores (_). It must begin and end with a lowercase letter or digit.
    • OSS Bucket: OSS Bucket name. Ensure that the OSS Bucket and Log Service project are in the same region.
    • OSS Prefix: The data synchronized from Log Service to OSS will be stored in this Bucket directory.
    • Partition Format: Use %Y, %m, %d, %H, and %M to format the creation time of the shipping task to generate the shard string (for information about formatting, refer to strptime API). This allows to define the directory hierarchy of the OSS Object file, where a back slash (/) indicates an OSS level. The following table is an example illustrating how to define the OSS target file path of OSS Prefix and shard format.
    • RAM Role: Used for ACL, it is an identifier of the role created by the OSS Bucket owner. For example, acs:ram::1323431815622349:role/aliyunlogdefaultrole.
    • Delivery Size: Automatic control of the interval for creating shipping tasks, with the maximum size of one OSS Object (not compressed) set. The unit is in MBs.
    • Compression: Compression of OSS data storage. It can be none or snappy. None indicates the original data is not compressed, and snappy indicates the data is compressed through the snappy algorithm. Snappy can reduce the space used for OSS Bucket storage.
    • Delivery Time: The interval after which a shipping task is generated. The default value is 300, and the unit is in s.

    2

Log Service concurrently implements data shipping at the backend. Large amounts of data may be processed by multiple shipping threads. Each shipping thread will jointly determine the frequency of task generation based on the size and time. When either condition is met, the shipping thread will create the task.

Each shipping task will be written into an OSS file, with the path format of oss:// OSS-BUCKET/OSS-PREFIX/PARTITION-FROMAT_RANDOM-ID(.snappy). A shipping task created at 2017/01/20 19:50:43 is taken as an example to explain the use of shard format.

OSS Bucket OSS Prefix Shard Format OSS File Path
test-bucket test-table %Y/%m/%d/%H/%M oss://test-bucket/test-table/2017/01/20/19/50/43_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y/%m/%d/log_%H%M%s oss://test-bucket/log_ship_oss_example/2017/01/20/log_195043_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y%m%d/%H oss://test-bucket/log_ship_oss_example/20170120/19_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y%m%d/ oss://test-bucket/log_ship_oss_example/20170120/_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y%m%d%H oss://test-bucket/log_ship_oss_example/2017012019_1484913043351525351_2850008

Log shipping task management

After the OSS shipping function is enabled, Log Service will regularly start the shipping task in the background. You can view the shipping task status on the console.

Through Log Shipping Task Management, you can:

  • View LogShipper tasks from the past two days, along with their statuses. The status of a shipping task can be Success, In Progress, or Failed. Failed indicates that the shipping task has encountered an error due to external reasons and cannot be retried. In this case, you must manually troubleshoot the issue.

  • For failed shipping tasks (created within the last two days), you can view the external cause for the failure in the task list. After you successfully troubleshoot the cause, you can retry failed tasks separately or in batches.

Procedure

  1. Log on to the Log Service console.

  2. Select the desired project, and click the project name or Manage on the right.

  3. Select the desired Logstore. Click OSS under the LogShipper column.

    1

    You can view the status of the shipping task.

    3

If the implementation of the shipping task fails, a corresponding error will be displayed on the console. Based on the policy, the system default is that a retry will be performed. You can also manually retry the operation.

Task retry

Typically, log data will be synchronized in OSS 30 minutes after it is written into the Logstore.

By default, Log Service will retry the task(s) of the latest two days based on the backoff policy. The minimum interval for retry is 15 minutes. A task that has failed once can be retried in 15 minutes, a task that failed twice can be retried in 30 minutes (2 x 15 minutes), and a task that failed three times can be retried in 60 minutes (2 x 30 minutes).

To immediately retry a failed task, click Retry All Failed Tasks on the console or specify a task and retry it through API/SDK.

Failed task error

Error information about failed tasks is as follows.

Error information Troubleshooting method
UnAuthorized Unauthorized. Check:
- Whether the OSS user has created the role.
- Whether the account ID in role description is correct.
- Whether the role has been granted OSS Bucket write permission.
- Whether role-arn is correctly configured.
ConfigNotExist Configuration does not exist. It is usually caused by the deletion of a shipping rule. If the rule is recreated, retry to resolve the problem.
InvalidOssBucket OSS Bucket does not exist. Check :
- Whether the OSS Bucket is in the same region as the Log Service project.
- Whether the Bucket name is correctly configured.
InternalServerError Internal error of Log Service. Retry to resolve the problem.

Step: View data in OSS

OSS data can be accessed through the console, API/SDK, and other methods.

To access OSS data through the console, start the OSS service, select your desired Bucket, and click Object Management to view the data shipped from Log Service.

Object address

  • Address format

    oss:// OSS-BUCKET/OSS-PREFIX/PARTITION-FROMAT_RANDOM-ID

  • Field description

    • OSS-BUCKET and OSS-PREFIX indicate the OSS Bucket name and directory prefix respectively, and are configured by the user. INCREMENTID is a random number added by the system.
    • PARTITION-FORMAT is defined as %Y/%m/%d/%H/%M, where %Y, %m, %d, %H, and %M indicate year, month, day, hour, and minute respectively. They are calculated by the shipping task server through strptime API.
    • RANDOM-ID is the unique identifier of a shipping task.
  • Directory time

Assume the following:

  • Data is shipped to OSS every 5 minutes.
  • The shipping task is created at 2016-06-23 00:00:00.
  • The data to be shipped is the data written into Log Service at 2016-06-22 23:55.

To analyze the complete log of the full day of June 22, 2016, in addition to all objects in the 2016/06/22 directory, you also need to check whether the objects in the first 10 minutes in the 2016/06/23/00/ directory contain the log with the date of 2016-06-22.

Object storage format

  • JSON
  • Parquet
Thank you! We've received your feedback.