This topic describes how to ship data from Log Service to OSS and store the data in the Parquet format.
Configuration parameters
You can set the storage format of data that is shipped to OSS. The following figure shows how to set the
storage format to
Parquet. For more information, see
Configure a data shipping rule.

The following table describes the configuration parameters in the preceding figure.
Parameter | Description |
---|---|
Key Name | The name of the log field that you want to ship to OSS. You can view log fields on the Raw Logs tab of a Logstore. You can also enter the names of the fields that you want to ship to OSS in the Key Name column. When the fields are shipped to OSS, they are stored in the Parquet format in the order that the field names are entered. The names of the fields are the column names in OSS. The log fields that you can ship to OSS include the fields in the log content and the reserved fields such as __time__, _topic__, and __source__. For more information about reserved fields, see Reserved fields. The value of a field in the Parquet format is null in the following two scenarios:
Note The keys that you enter in the
Parquet Keys field must be unique.
|
Type | The Parquet storage format supports six data types: string, Boolean, Int32, Int64, float, and double. Log fields are converted from the string type to a data type that the Parquet storage format supports. If the data type of a log field fails to be converted, the value of the log field is null. |
Directories in OSS buckets
The following table lists the directories in OSS buckets that store data shipped from Log Service.
Compressed | File extension | Example | Description |
---|---|---|---|
No | .parquet | oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.parquet | After you download the OSS buckets to the local server, you can use the parquet-tools utility to open the buckets. For more information about the parquet-tools utility, visit parquet-tools. |
Yes (compressed by using Snappy) | .snappy.parquet | oss://oss-shipper-shenzhen/ecs_test/2016/01/26/20/54_1453812893059571256_937.snappy.parquet | After you download the OSS buckets to the local server, you can use the parquet-tools utility to open the buckets. For more information about the parquet-tools utility, visit parquet-tools. |
Data consumption
- You can use E-MapReduce, Spark, and Hive to consume data. For more information, visit LanguageManual DDL.
- You can use inspection tools to consume data.
The parquet-tools utility can be used to inspect Parquet files, view the schema of the data stored in the files, and read the data. You can compile the utility or download the parquet-tools-1.6.0rc3-SNAPSHOT utility that Log Service provides to consume data.
- To view the schema of the data stored in a Parquet file, use the following sample code:
$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar schema -d 00_1490803532136470439_124353.snappy.parquet | head -n 30 message schema { optional int32 __time__; optional binary ip; optional binary __source__; optional binary method; optional binary __topic__; optional double seq; optional int64 status; optional binary time; optional binary url; optional boolean ua; } creator: parquet-cpp version 1.0.0 file schema: schema -------------------------------------------------------------------------------- __time__: OPTIONAL INT32 R:0 D:1 ip: OPTIONAL BINARY R:0 D:1 .......
- To view the data stored in a Parquet file, use the following sample code:
$ java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar head -n 2 00_1490803532136470439_124353.snappy.parquet __time__ = 1490803230 ip = 10.200.98.220 __source__ = *. *. *.* method = POST __topic__ = seq = 1667821.0 status = 200 time = 30/Mar/2017:00:00:30 +0800 url = /PutData? Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1 __time__ = 1490803230 ip = 10.200.98.220 __source__ = *. *. *.* method = POST __topic__ = seq = 1667822.0 status = 200 time = 30/Mar/2017:00:00:30 +0800 url = /PutData? Category=YunOsAccountOpLog&AccessKeyId=*************&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=********************************* HTTP/1.1
For more information, run the java -jar parquet-tools-1.6.0rc3-SNAPSHOT.jar -h command.
- To view the schema of the data stored in a Parquet file, use the following sample code: