edit-icon download-icon

Concept and configuration information

Last Updated: Mar 27, 2018

LogHub Shipper Service for Table Store, known as Shipper Service, cleans and converts user data and then writes this data from Log Service to designated Table Store tables in real time, resolving issues regarding JSON formatted log data (see Log groups for more information) and allowing users to quickly query and analyze logs.

Shipper Service is also able to publish data to the Alibaba Cloud Container Hub using the Docker image method, and run on your ECS instances through the Container Service.

Example

Assume that log data in the Log Service is formatted as follows:

  1. {"__time__":1453809242,"__topic__":"","__source__":"10.170.148.237","ip":"10.200.98.220","time":"26/Jan/2016:19:54:02 +0800","url":"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0UjpekFQOVJW45A&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1","status":"200","user-agent":"aliyun-sdk-java"}

The data is written into a Table Store data table with the primary keys ip and time. The format is as follows:

ip time source status user-agent url
10.200.98.220 26/Jan/2016:19:54:02 +0800 10.170.148.237 200 aliyun-sdk-java POST /PutData…

In Table Store, historical data from a certain IP address can be searched based on a selected time period.

Shipper Service provides flexible data mapping rules, so you can configure correspondences between log data fields and attribute columns in data tables to more easily convert the data.

Associated products

To better understand Shipper Service in terms of its functionality, the following information is provided:

Data tables

The target table stores log data that have been cleaned and converted.

When using data tables, the following points must be considered:

  • You must manually create the target tables. Shipper Service does not create target tables.

  • If both Log Service and Table Store are available, the latency between when a log is entered into the Log Service and when it is written to Table Store is at the hundred millisecond level.

  • If Table Store is unavailable, the Shipper Service waits for a period of time (no more than 500 ms) before re-trying.

  • The Shipper Service regularly records persistent breakpoints.

  • If the Shipper Service is unavailable (for example, when being upgraded), after the service is restored, it continues to process logs from the most recent breakpoint.

  • We recommend that different logs in the same Logstore correspond to different rows in the target table. This maintains the final consistency of the target table if a retry occurs.

  • The Shipper Service writes data using Table Store’s UpdateRow. Therefore, multiple Shipper Service instances can use the same target table. In this situation, we recommend that these Shipper Service instances are not written to the same attribute columns.

Status table

Shipper Service relies on the status tables you create in Table Store to report certain information.

When using status tables, the following points must be considered:

  • Multiple Shipper Service instances can use the same status table.

  • If no errors are reported, each Shipper Service container adds a record to the status table at intervals of 5 minutes.

  • If errors are reported, but Table Store is available, each Shipper Service container immediately adds a record to the status table.

  • A TTL is recommended to be set at one or two days to only retain recent data.

  • A status table has four primary key columns:

    • project_logstore: String (Str) type. Your Log Service project and Logstore, separated by |.

    • shard: Integer (Int) type. The Log Service shard number.

    • target_table: Str type. The name of the target table you have stored in Table Store.

    • timestamp: Int type. The time this record was added to the status table. UNIX time, measured in milliseconds.

  • Attribute columns record the data import status for each cycle. In any line in a status table, all attribute columns are optional and may not exist. Attribute column parameters include:

    • shipper_id: Str type. The Shipper Service container ID, which is the name of the container host.

    • error_code: Str type. Identifies errors using Table Store Error messages. If no errors are reported, this attribute column does not exist.

    • error_message: Str type. Information on specific errors returned by Table Store. If no errors are reported, this column does not exist.

    • failed_sample: Str type. The error log, formatted as a JSON string.

    • __time__: Int type. This is the maximum value in the __time__ Field from the logs this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

    • row_count: Int type. This is the number of logs (rows) this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

    • cu_count: Int type. This is the number of Write capacity units this Shipper Service container has consumed since the last time it updated the status table.

    • skip_count: Int type. This is the number of logs (rows) this Shipper Service container has cleared since the last time it updated the status table.

    • skip_sample: Str type. This is one of the logs the Shipper Service container discarded since the last time it updated the status table, formatted as a JSON string. The log of the container itself records each discarded log and the reason it was discarded.

Configuration

When a container is created, Shipper Service must provide certain environment variables to the container. The following points must be considered:

  • access_key_id and access_key_secret: These are the Access Key ID and Access Key Secret of the Alibaba Cloud account used by the Shipper Service for authentication.

  • loghub: This is the configuration of the Log Service instance required by the Shipper Service, formatted as a JSON object. It includes:

    • endpoint

    • logstore

    • consumer_group

  • tablestore: This is the configuration of the Table Store instance required by the Shipper Service, formatted as a JSON object. It includes:

    • endpoint

    • instance: The instance name.

    • target_table: The name of the data table that must already exist under this instance.

    • status_table: The name of the status table that must already exist under this instance.

  • exclusive_columns: The attribute column black list, formatted as a JSON array composed of JSON strings.

    If this item is set, the fields listed are not written to the target table as an attribute column. For example, if the primary key of the target table is “A”, exclusive_columns is set to [“B”, “C”], and the log contains three fields: A, B, and D, the log in the target table is presented as one row with the primary key A and the attribute column D. C does not exist in the log, so it cannot be written. Though B does exist, it is not written because it is listed in exclusive_columns.

  • transform: The simple conversion, formatted as a JSON object. The key in this variable is the column name written in the target table (it can be the primary key column). The value is the simple conversion expression defined by the Shipper Service:

    • A log field is a regular expression.

    • An unsigned integer is a regular expression.

    • A string in double quotes is a regular expression. The string can contain the escape characters ‘\”‘ and ‘\\‘.

    • ( func arg... ) is also a regular expression. There can be zero or multiple spaces/tabs before and after the parentheses. There must be at least one space between func and the following parameter, and between different parameters. Each parameter must be a regular expression. The following functions are supported:

      • ->int: Converts a string to an integer. It has two parameters. The first is the base, which can be 2 to 36. The second parameter is the string to be converted. When the base value is higher than 10, letters (case insensitive) are represented by 10-26.

      • ->bool: Converts a string to a Bool value. It has one parameter, the string to be converted. “true” corresponds to true and “false” corresponds to false. Errors are reported for other types of strings.

      • crc32: Calculates the crc32 for a string and outputs the result as an Int value. It has one parameter, which is the string to be calculated.

If a log is missing or an error occurs during conversion, the column corresponding to the key is missing. If an error occurs, the container’s log records detailed error information.

The Shipper Service has only one cleaning rule: If the primary key column is missing, the log is erased.

Thank you! We've received your feedback.