All Products
Search
Document Center

Simple Log Service:ORC format

Last Updated:Dec 13, 2023

After logs are shipped from Simple Log Service to Object Storage Service (OSS), the logs can be saved as a file in different formats. This topic describes the Optimized Row Columnar (ORC) format.

Parameters

If you set Storage Format to orc when you create an OSS data shipping job of the new version, you must configure the parameters, as shown in the following figure. For more information, see Create an OSS data shipping job (new version).orc

The following table describes the parameters.

Parameter

Description

Key Name

The names of the log fields that you want to ship to OSS. You can view log fields on the Raw Logs tab of a Logstore. We recommend that you add log field names one by one. The data shipping job organizes ORC data in the same sequence and uses the log field names as the column names of the ORC file.

The log fields that you can ship to OSS include the reserved fields such as __time__, __topic__, and __source__. For more information, see Reserved fields.

In the following cases, a column value in the ORC file is null:

  • The specified log field does not exist in the Logstore.

  • The specified log field fails to be converted from the STRING type to a non-STRING type such as DOUBLE or INT64.

Note
  • Each log field can be configured as an ORC field only once.

  • If a log contains two fields that have the same name, such as request_time, Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Log Service. When you configure a shipping rule, you can use only the original field name request_time.

    If a log contains fields that have the same name, Log Service randomly ships the value of one of the fields. We recommend that you do not include fields that have the same name in your logs.

Type

The data type of the specified log field. The following types of data can be stored in the ORC file: STRING, BOOLEAN, INT32, INT64, FLOAT, and DOUBLE.

When logs are shipped from Simple Log Service to OSS, the log fields are converted from the STRING type to a data type that is supported in the ORC file. If a log field fails to be converted, the value of the column is null.

Sample URLs of OSS objects

After logs are shipped to OSS, the logs are stored in OSS buckets. The following table provides the sample URLs of the OSS objects that store the logs.

Note
  • If you specify an object suffix when you create a data shipping job, the OSS objects use the suffix.

  • If you do not specify an object suffix when you create a data shipping job, the OSS objects use the suffix that is generated based on the compression type.

Compression type

Object suffix

Sample URL

Example

Not compressed

If you specify an object suffix, the specified suffix takes effect. Example: .suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix

You can download the OSS object to your computer and use ORC tools to open the object.

If you do not specify an object suffix, .orc is used as the object suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.orc

Snappy

If you specify an object suffix, the specified suffix takes effect. Example: .suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix

If you do not specify an object suffix, .snappy.orc is used as the object suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.snappy.orc

Zstandard

If you specify an object suffix, the specified suffix takes effect. Example: .suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.suffix

If you do not specify an object suffix, .zst.orc is used as the object suffix.

oss://oss-shipper-chengdu/ecs_test/2022/01/26/20/54_1453812893059571256_937.zst.orc

Data consumption

  • You can consume data that is shipped to OSS by using E-MapReduce, Spark, or Hive. For more information, see LanguageManual DDL.

  • You can also consume data by using inspection tools.

    You can use ORC tools to view the metadata of ORC files and read data. You can also download orc-tools-1.7.2-uber.jar in maven repo to verify the consumption result.

    • View metadata

      • Run the following command:

        java -jar ~/Downloads/orc-tools-1.7.2-uber.jar meta -p file.orc
      • Output:

        Processing data file /Users/xx/file.orc [length: 200779]
        Structure for /Users/xx/file.orc
        File Version: 0.12 with ORC_CPP_ORIGINAL by ORC C++ 1.7.2
        Rows: 124022
        Compression: ZSTD
        Compression size: 65536
        Calendar: Julian/Gregorian
        Type: struct<bucket:string,bucket_region:string>
        
        Stripe Statistics:
          Stripe 1:
            Column 0: count: 124022 hasNull: false
            Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133
            Column 2: count: 0 hasNull: true
        
        File Statistics:
          Column 0: count: 124022 hasNull: false
          Column 1: count: 124022 hasNull: false min: bucket0 max: sls-training-data sum: 1468133
          Column 2: count: 0 hasNull: true
        
        Stripes:
          Stripe: offset: 3 data: 199856 rows: 124022 tail: 97 index: 578
            Stream: column 0 section ROW_INDEX start: 3 length 102
            Stream: column 1 section ROW_INDEX start: 105 length 367
            Stream: column 2 section ROW_INDEX start: 472 length 109
            Stream: column 0 section PRESENT start: 581 length 25
            Stream: column 1 section PRESENT start: 606 length 25
            Stream: column 1 section LENGTH start: 631 length 38989
            Stream: column 1 section DATA start: 39620 length 160794
            Stream: column 2 section PRESENT start: 200414 length 23
            Stream: column 2 section LENGTH start: 200437 length 0
            Stream: column 2 section DATA start: 200437 length 0
            Encoding column 0: DIRECT
            Encoding column 1: DIRECT_V2
            Encoding column 2: DIRECT_V2
        
        File length: 200779 bytes
        Padding length: 0 bytes
        Padding ratio: 0%
    • Read data

      • Run the following command:

        java -jar ~/Downloads/orc-tools-1.7.2-uber.jar data -n 5 file.orc
      • Output:

        Processing data file /Users/xx/file.orc [length: 200779]
        {"bucket":"bucket3","bucket_region":"cn-hangzhou"}
        {"bucket":"bucket3","bucket_region":"cn-hangzhou"}
        {"bucket":"bucket4","bucket_region":"cn-hangzhou"}
        {"bucket":"dashboard-bucket","bucket_region":"cn-hangzhou"}
        {"bucket":"bucket2","bucket_region":null}

    For more information, run the java -jar orc-tools-1.7.2-uber.jar command or see ORC tools documentation.