Ship logs to OSS

Last Updated: Apr 02, 2018

Log Service can automatically archive the data in a Logstore to Object Storage Service (OSS) to achieve more functions of logs.

  • OSS data supports lifecycle configuration for long-term log storage.
  • You can consume OSS data by using self-built programs and other systems (for example, E-MapReduce).

Function advantages

Using Log Service to ship logs to OSS has the following advantages:

  • Ease of use. You can configure to synchronize Logstore data of Log Service to OSS in the console.

  • Improved efficiency. The log collection of Log Service centralizes logs of different machines, without collecting logs repetitively from different machines to import to OSS.

  • Ease of management. Shipping logs to OSS can fully reuse the log grouping in Log Service. Logs in different projects and Logstores can be automatically shipped to different OSS bucket directories, which facilitates the OSS data management.

Scenarios

Assume that you have two Alibaba Cloud main accounts: account A and account B.

  • Account A creates a project Project-a in China South 1 (Shenzhen) and creates a Logstore to store Nginx access logs in Log Service.
  • Account B creates a bucket Bucket-b in China South 1 (Shenzhen) in OSS.

Note:

  • A and B can be the same account. This is the most convenient way of Resource Access Management (RAM) authorization.
  • The Log Service project and OSS bucket must be in the same region. Data cannot be shipped across regions.

To archive the Nginx access logs in Log Service Project-a to the prefix-b directory of Bucket-b, use the OSS Shipper function of Log Service.

To use the OSS Shipper function of Log Service, perform RAM authorization and then configure a shipping rule.

Procedure

Step 1 Perform RAM authorization

Quick authorization

Use the main account B to log on to the Alibaba Cloud console, and activate RAM free of charge.

Click RAM quick authorization page and grant the main account A the permissions of writing all OSS buckets. Log Service assumes the role of A to write the OSS bucket of B.

View and modify the role

Use the main account B to log on to the RAM console. Click Roles in the left-side navigation pane and view the role (the role created by quick authorization is AliyunLogDefaultRole by default).

If A and B are different accounts, see Manage authorization roles in Advanced RAM authorization to modify the role. If A and B are the same account, the role created by quick authorization can be directly used.

Record the Arn of the role (for example, acs:ram::1323431815622349:role/aliyunlogdefaultrole). The Arn must be provided to the main account A for configuring an OSS shipping rule.

You can click Roles in the left-side navigation pane, click Manage at the right of the role, and find the Arn of the role on the Role details page.

By default, quick authorization grants the main account A the permissions of writing all OSS buckets of B. For more fine-grained permission control, see Manage authorization policies in Advanced RAM authorization to modify the policy.

Use a sub-account to configure a shipping rule

After the main account B creates the role and completes the authorization, the main account A has the permissions to use the role created by B to write data to OSS buckets. However, this only applies when the main account A is configuring a shipping rule.

If the sub-account a_1 of the main account A wants to use this role to configure a shipping rule, see Authorize roles to sub-accounts as main accounts in Advanced RAM authorization to perform PassRole authorization.

Step 2 Configure an OSS shipping rule in Log Service

  1. Log on to the Log Service console.

  2. On the Project List page, click the project name.

  3. Click LogShipperOSS in the left-side navigation pane.

  4. Select a Logstore.

  5. Click Enable. The OSS LogShipper dialog box appears.

  6. Complete the configurations and then click Confirm.

    See the following table to complete the OSS shipping configurations.

    Configuration item DescriptionValue range
    OSS Shipping Name The name of the OSS shipping.The name can be 3–63 characters long, contain lowercase letters, numbers, hyphens (-), and underscores (_), and must begin and end with a lowercase letter or number.
    OSS BucketThe name of the OSS bucket. The bucket name must exist, and make sure the OSS bucket is in the same region as the Log Service project.
    OSS Prefix The prefix of OSS. Data synchronized from Log Service to OSS will be stored in this bucket directory.The OSS prefix must exist.
    Partition Format Use %Y, %m, %d, %H, and %M to format the created time of the LogShipper task to generate the partition string. This defines the directory hierarchy of the object files written to OSS, where a forward slash (/) indicates a level of OSS directory. The following table describes how to define the OSS target file path by using OSS prefix and partition format.For more information about formatting, see Strptime API.
    RAM Role The Arn and name of the RAM role. The RAM role is used to control the access permissions and is the identity for the OSS bucket owner to create a role. For how to obtain the Arn, see View and modify the role in step 1. For example, acs:ram::45643:role/aliyunlogdefaultrole.
    Shipping Size Automatically control the interval of creating LogShipper tasks and configure the maximum size of an OSS object (not compressed). The value range is 5–256. The unit is in MB.
    Compression The compression method of OSS data storage.
    • Do Not Compress: The raw data is not compressed.
    • Compress (snappy): Use snappy algorithm to compress data, reducing the usage of OSS bucket storage space.
    Storage Format The storage format after log data is shipped to OSS.Three formats are supported (JSON, Parquet, and CSV). For more information, see JSON, Parquet, and CSV.
    Shipping Time The time interval between LogShipper tasks. The default value is 300. The value range is 300–900. The unit is in seconds.

    1

    1

Note: Log Service concurrently implements data shipping at the backend. Large amounts of data may be processed by multiple shipping threads. Each shipping thread jointly determines the frequency of task generation based on the size and time. If any condition is met, the shipping thread will create the task.

Partition format

Each LogShipper task is written into an OSS file, with the path format of oss://OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID. Format the created time of a LogShipper task to obtain the PARTITION-FORMAT. Use the LogShipper task created at 2017-01-20 19:50:43 as an example to describe how to use the partition format.

OSS bucket OSS prefix Partition format OSS file path
test-bucket test-table %Y/%m/%d/%H/%M oss://test-bucket/test-table/2017/01/20/19/50/43_1484913043351525351_2850008
test-bucket log_ship_oss_example year=%Y/mon=%m/day=%d/log_%H%M%s oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet
test-bucket log_ship_oss_example ds=%Y%m%d/%H oss://test-bucket/log_ship_oss_example/ds=20170120/19_1484913043351525351_2850008.snappy
test-bucket log_ship_oss_example %Y%m%d/ oss://test-bucket/log_ship_oss_example/20170120/_1484913043351525351_2850008
test-bucket log_ship_oss_example %Y%m%d%H oss://test-bucket/log_ship_oss_example/2017012019_1484913043351525351_2850008

Analyze the OSS data by using big data platforms such as Hive and MaxCompute. To use the partition data, set each level of directory to key=value format (Hive-style partition).

For example, oss://test-bucket/log_ship_oss_example/year=2017/mon=01/day=20/log_195043_1484913043351525351_2850008.parquet can be set to three levels of partition columns: year, month, and day.

Manage LogShipper tasks

After the LogShipper function is enabled, Log Service regularly starts the LogShipper task in the backend. You can view the status of the LogShipper tasks in the console.

By managing LogShipper tasks, you can:

  • View all the LogShipper tasks in the last two days and check their status. The status of a LogShipper task can be Success, Failed, and Running. The status Failed indicates that the LogShipper task has encountered an error because of external reasons and cannot be retried. In this case, you must manually troubleshoot the issue.

  • For the failed LogShipper tasks created within two days, you can view the external reasons that cause the failure in the task list. After fixing the external errors, you can retry all the failed tasks separately or in batches.

Procedure

  1. Log on to the Log Service console.

  2. On the Project List page, click the project name.

  3. Click LogShipperOSS in the left-side navigation pane.

  4. Select a Logstore.

  5. You can view the information such as task start time, task end time, time when logs are received, data lines, and task status.

    If the LogShipper task fails, a corresponding error message is displayed in the console. The system retries the task based on the policy by default. You can also manually retry the task.

Retry a task

Generally, log data is synchronized to OSS within 30 minutes after being written to the Logstore.

By default, Log Service retries the tasks in the last two days based on the annealing policy. The minimum interval for retry is 15 minutes. A task that has failed once can be retried in 15 minutes, a task that has failed twice can be retried in 30 minutes (2 x 15 minutes), and a task that has failed three times can be retried in 60 minutes (2 x 30 minutes).

To immediately retry a failed task, click Retry All Failed Tasks in the console or specify a task and retry it by using APIs/SDKs.

Errors of failed tasks

See the following common errors that cause the task failure.

Error messageError cause Handling method
UnAuthorized No permission. Make sure that:
  • The OSS user has created a role.
  • The account ID in the role description is correct.
  • The role has been granted the permissions of writing OSS buckets.
  • The role-arn is correctly configured.
ConfigNotExistThe configuration does not exist.This error is generally caused by the deletion of a shipping rule. Retry the task after reconfiguring the shipping rule.
InvalidOssBucket The OSS bucket does not exist.Make sure that:
  • The OSS bucket is in the same region as the Log Service project.
  • The bucket name is correctly configured.
InternalServerErrorThe internal error of Log Service.Retry the task.

OSS data storage

You can access the OSS data in the console or by using APIs/SDKs.

To access OSS data in the console, log on to the OSS console, click a bucket name in the left-side navigation pane, and then click Files to view the data shipped from Log Service.

For more information about OSS, see OSS document.

Object address

oss:// OSS-BUCKET/OSS-PREFIX/PARTITION-FORMAT_RANDOM-ID

  • Descriptions of path fields

    • OSS-BUCKET and OSS-PREFIX indicate the OSS bucket name and directory prefix respectively, and are configured by the user. INCREMENTID is a random number added by the system.
    • PARTITION-FORMAT is defined as %Y/%m/%d/%H/%M, where %Y, %m, %d, %H, and %M indicate year, month, day, hour, and minute respectively. They are obtained by using strptime API to calculate the created time of the LogShipper task in Log Service.
    • RANDOM-ID is the unique identifier of a LogShipper task.
  • Directory time

    The OSS data directory is configured according to the created time of LogShipper tasks. Assume that the data is shipped to OSS every five minutes. The LogShipper task created at 2016-06-23 00:00:00 ships the data that is written to Log Service after 2016-06-22 23:55. To analyze the complete logs of the full day of 2016-06-22, in addition to all objects in the 2016/06/22 directory, you must check whether or not the objects in the first 10 minutes in the 2016/06/23/00/directory contain the log of 2016-06-22.

Object storage format

Thank you! We've received your feedback.