After logs are shipped from Log Service to Object Storage Service (OSS), the logs can be stored in different formats. This topic describes the Parquet format.

Parameters

The following figure shows the parameters that you must configure if you specify parquet for Storage Format in a shipping rule. For more information, see Configure a data shipping rule. Parquet Fields

The following table describes the parameters.

Parameter Description
Key Name The log field that you want to ship to OSS. You can view log fields on the Raw Logs tab of a Logstore. We recommend that you add log fields one by one. When the log fields are shipped to OSS, the log fields are stored in a Parquet file in the order that you add them. The names of the log fields are used as the names of the columns in the Parquet file. The log fields that you can ship to OSS include the fields in the log content and the reserved fields such as __time__, _topic__, and __source__. For more information about reserved fields, see Reserved fields. The values of the columns in a Parquet file are null in the following two scenarios:
  • The log fields do not exist in logs.
  • The log fields fail to be converted from the string type to a non-string type. The non-string types include DOUBLE and INT64.
Note
  • A log field can be added to Parquet Fields only once.
  • If a log contains two fields that have the same name, such as request_time, Log Service displays one of the fields as request_time_0. The two fields are still stored as request_time in Log Service. When you configure a shipping rule, you can use only the original field name request_time.

    If a log contains fields that have the same name, Log Service randomly ships the value of one of the fields. We recommend that you do not include fields that have the same name in your logs.

Type The data type of the specified log field. The following types of data can be stored in a Parquet file: STRING, BOOLEAN, INT32, INT64, FLOAT, and DOUBLE.

When logs are shipped from Log Service to OSS, the log fields in logs are converted from the string type to a data type that is supported in a Parquet file. If the log fields fail to be converted from the string type to a non-string type, the values of the columns in the Parquet file are null.

URLs of files in OSS

After logs are shipped to OSS, the logs are stored in OSS buckets. The following table provides examples of the URLs of the files that store the logs.

Compression type File extension URL example Description
Not compressed .parquet oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.parquet You can download the OSS file to your computer and use the parquet-tools utility to open the file. For more information about the parquet-tools utility, visit parquet-tools.
Snappy .snappy.parquet oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.snappy.parquet You can download the OSS file to your computer and use the parquet-tools utility to open the file. For more information about the parquet-tools utility, visit parquet-tools.

Data consumption

  • You can consume data that is shipped to OSS by using E-MapReduce, Spark, or Hive. For more information, see LanguageManual DDL.
  • You can also consume data by using inspection tools.
    You can use the parquet-tools utility provided by the open source community to inspect Parquet files, view the schema of data in the files, and read the data. You can compile the utility or download the parquet-tools-1.6.0rc3-SNAPSHOT utility that Log Service provides to consume data.
    • View the schema of data in a Parquet file
      $ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema -d 00_1490803532136470439_124353.snappy.parquet | head -n 30
      message schema {
        optional int32 __time__;
        optional binary ip;
        optional binary __source__;
        optional binary method;
        optional binary __topic__;
        optional double seq;
        optional int64 status;
        optional binary time;
        optional binary url;
        optional boolean ua;
      }
      creator: parquet-cpp version 1.0.0
      file schema: schema
      --------------------------------------------------------------------------------
      __time__: OPTIONAL INT32 R:0 D:1
      ip: OPTIONAL BINARY R:0 D:1
      .......
    • View all data in a Parquet file
      $ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar head -n 2 00_1490803532136470439_124353.snappy.parquet
      __time__ = 1490803230
      ip = 10.200.98.220
      __source__ = *.*.*.*
      method = POST
      __topic__ =
      seq = 1667821.0
      status = 200
      time = 30/Mar/2017:00:00:30 +0800
      url = /PutData?Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1
      __time__ = 1490803230
      ip = 10.200.98.220
      __source__ = *.*.*.*
      method = POST
      __topic__ =
      seq = 1667822.0
      status = 200
      time = 30/Mar/2017:00:00:30 +0800
      url = /PutData?Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1

    You can run the java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar -h command to view more information about the parquet-tools utility.