This topic describes the parameters that you can set when you ship logs from Log Service to Object Storage Service (OSS) and store log data in the Parquet format.

Parquet keys

  • Data types

    The Parquet storage format supports six data types: string, Boolean, Int32, Int64, float, and double.

    During log shipping from Log Service to OSS, data is converted from the string type to the Parquet type. If the data type fails to be converted, the data becomes null.

  • Columns
    After you specify the target data type and the key names, data is shipped from Log Service to OSS in the sequence of the specified keys. The key names of Log Service are used as the column names of Parquet data. The value of a column is set to null if the following conditions are met:
    • The key name does not exist in Log Service data.
    • The value of the key fails to be converted from the string type to a non-string type, for example, double or Int64.
    Note The keys that you add in the Parquet Keys section must be unique.
    Parquet keys
  • Reserved keys
    During log shipping from Log Service to OSS, you can use the reserved keys that are listed in the following table.
    Reserved key Description
    __time__ The Unix timestamp of a log. The value represents the number of seconds that have elapsed since the epoch time January 1, 1970, 00:00:00 UTC. This value is calculated based on the time key of the log.
    __topic__ The topic of a log.
    __source__ The client IP address of the log source.
    Note You can select these reserved keys as required when you ship data to OSS and store the data in the Parquet format. For example, if you require the log topic, you can set the key name to __topic__ and the data type to string.

OSS storage addresses

Compression type File extension Example
Uncompressed .parquet oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.parquet
SNAPPY .snappy.parquet oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.snappy.parquet

Data consumption

  • You can use E-MapReduce, Spark, and Hive to consume data. For more information, visit LanguageManual DDL.
  • You can use inspection tools to consume data.
    The parquet-tools can be used to inspect Parquet files, view the schema, and read data. You can compile the tool or download the tool provided in Log Service.
    • To view the schema of a Parquet file, use the following sample code:
      $ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema -d 00_1490803532136470439_124353.snappy.parquet | head -n 30
      message schema {
        optional int32 __time__;
        optional binary ip;
        optional binary __source__;
        optional binary method;
        optional binary __topic__;
        optional double seq;
        optional int64 status;
        optional binary time;
        optional binary url;
        optional boolean ua;
      }
      creator: parquet-cpp version 1.0.0
      file schema: schema
      --------------------------------------------------------------------------------
      __time__: OPTIONAL INT32 R:0 D:1
      ip: OPTIONAL BINARY R:0 D:1
      ...
    • To view the content of a Parquet file, use the following sample code:
      $ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar head -n 2 00_1490803532136470439_124353.snappy.parquet
      __time__ = 1490803230
      ip = 10.200.98.220
      __source__ = *. *. *.*
      method = POST
      __topic__ =
      seq = 1667821.0
      status = 200
      time = 30/Mar/2017:00:00:30 +0800
      url = /PutData? Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1
      __time__ = 1490803230
      ip = 10.200.98.220
      __source__ = *. *. *.*
      method = POST
      __topic__ =
      seq = 1667822.0
      status = 200
      time = 30/Mar/2017:00:00:30 +0800
      url = /PutData? Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1

    For more information, run the java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar -h command.