Concept and configuration information

Last Updated: Sep 01, 2017

LogHub Shipper Service for Table Store, or Shipper Service, cleans and converts user data and then writes this data from Log Service to designated Table Store tables in real-time, resolving issues regarding JSON formatted log data (refer to Log groups for more information) and allowing users to quickly query and analyze logs.

Shipper Service is also able to publish data to the Alibaba Cloud Container Hub using the Docker image method, and runs on your ECS instances through the Container Service.

Example

Assume that log data in the Log Service is formatted as follows:

  1. {"__time__":1453809242,"__topic__":"","__source__":"10.170.148.237","ip":"10.200.98.220","time":"26/Jan/2016:19:54:02 +0800","url":"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0UjpekFQOVJW45A&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1","status":"200","user-agent":"aliyun-sdk-java"}

These data are written into a Table Store data table with the Primary Keys ip and time. The format is as follows:

ip time source status user-agent url
10.200.98.220 26/Jan/2016:19:54:02 +0800 10.170.148.237 200 aliyun-sdk-java POST /PutData…

In Table Store, historical data from a certain IP address can be searched based on a selected time period.

Shipper Service provides flexible data mapping rules, so you can configure correspondences between log data fields and attribute columns in data tables to more easily convert the data.

Associated products

To better understand Shipper Service in terms of its functionality, the following information is provided:

Data tables

The target table stores log data that have been cleaned and converted.

When using data tables the following points should be considered:

  • You must manually create the target tables. Shipper Service does not create these tables.

  • If both Log Service and Table Store are available, the latency between when a log is entered into the Log Service and when it is written to Table Store is measured in a few hundred milliseconds.

  • If Table Store is unavailable, the Shipper Service will wait for a period of time (no more than 500 ms) and try again.

  • The Shipper Service regularly records persistent breakpoints.

  • If the Shipper Service is unavailable (for example, when being upgraded), after the service is restored, it will continue to process logs from the most recent breakpoint.

  • It is generally recommended that different logs in the same log store correspond to different rows in the target table. This will maintain the final consistency of the target table if a retry occurs.

  • The Shipper Service writes data using Table Store’s UpdateRow. Therefore, multiple Shipper Service instances can use the same target table. In this situation, we generally recommend that these Shipper Service instances should not be written to the same attribute columns.

Status table

Shipper Service relies on the status tables you create in Table Store to report certain information.

When using status tables the following points should be considered:

  • Multiple Shipper Service instances can use the same status table.

  • If no errors are reported, each Shipper Service container will add a record to the status table at intervals of 5 minutes.

  • If errors are reported, but Table Store is available, each Shipper Service container immediately adds a record to the status table.

  • A TTL is recommended to be set at one or two days in order to only retain recent data.

*A status table has four Primary Key columns:

  1. - project_logstore: String (Str) type. Your Log Service project and Logstore, separated by "**|**".
  2. - shard: Integer (Int) type. The Log Service shard number.
  3. - target_table: Str type. The name of the target table you have stored in Table Store.
  4. - timestamp: Int type. The time this record was added to the status table. UNIX time, measured in milliseconds.
  • Attribute columns record the data import status for each cycle. In any line in a status table, all attribute columns are optional and might not exist. Attribute column parameters include:

    • shipper_id: Str type. The Shipper Service container ID, which his is the name of the container host.

    • error_code: Str type. Identifies errors using Table Store Error Codes. If no errors are reported, this attribute column will not exist.

    • error_message: Str type. Information on specific errors returned by Table Store. If there are no errors, this column will not exist.

    • failed_sample: Str type. The error log, formatted as a JSON string.

    • __time__: Int type. This is the maximum value in the __time__ Field from the logs this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

    • row_count: Int type. This is the number of logs (rows) this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

    • cu_count: Int type. This is the number of Write Service Capacity Units this Shipper Service container has consumed since the last time it updated the status table.

    • skip_count: Int type. This is the number of logs (rows) this Shipper Service container has cleared since the last time it updated the status table.

    • skip_sample: Str type. This is one of the logs the Shipper Service container discarded since the last time it updated the status table, formatted as a JSON string. The log of the container itself records each discarded log and the reason it was discarded.

Configuration

When a container is created, Shipper Service must provide certain environment variables to the container. The following points should be considered:

  • access_key_id and access_key_secret: These are the Access Key ID and Access Key Secret of the Alibaba Cloud account used by the Shipper Service for authentication.

  • loghub: This is the configuration of the Log Service instance required by the Shipper Service, formatted as a JSON object. It includes:

    • endpoint

    • logstore

    • consumer_group

  • tablestore: This is the configuration of the Table Store instance required by the Shipper Service, formatted as a JSON object. It includes:

    • endpoint

    • instance: The instance name.

    • target_table: The name of the data table that must already exist under this instance.

    • status_table: The name of the status table that must already exist under this instance.

  • exclusive_columns: The attribute column black list, formatted as a JSON array composed of JSON strings.

    If this item is set, the fields listed will not be written to the target table as an attribute column. For example, if the Primary Key of the target table is “A”, exclusive_columns is set to [“B”, “C”], and the log contains three fields: A, B, and D, the log in the target table will be presented as one row with the Primary Key A and the attribute column D. C does not exist in the log, so it cannot be written. Though B does exist, it is not written because it is listed in exclusive_columns.

  • transform: The simple conversion, formatted as a JSON object. The key in this variable is the column name written in the target table (it can be the Primary Key column). The value is the simple conversion expression defined by the Shipper Service:

    • A log field is a regular expression.

    • An unsigned integer is a regular expression.

    • A string in double quotes is a regular expression. The string can contain the escape characters ‘\”‘ and ‘\\‘.

    • ( func arg... ) is also a regular expression. There can be zero or multiple spaces/tabs before and after the parentheses. There must be at least one space between ‘func’ and the following parameter, and between different parameters. Each parameter must be a regular expression. The following functions are supported:

      • ->int: Converts a string to an integer. There are two parameters. The first is the base, which can be 2 to 36. The second parameter is the string to be converted. When the base value is higher than 10, letters (case insensitive) are represented by 10-26.

      • ->bool: Converts a string to a Bool value. There is one parameter, the string to be converted. “true” corresponds to true and “false” corresponds to false. Errors will be reported for other types of strings.

      • crc32: Calculates the crc32 for a string and outputs the result as an Int value. There is one parameter, which is the string to be calculated.

If a log is missing or an error occurs during conversion, the column corresponding to the key will be missing. If an error occurs, the container’s log will record detailed error information.

The Shipper Service has only one cleaning rule: If the Primary Key column is missing, the log is erased.

Configuration changes

  1. Log on to the Container Service console.

  2. Select Services to go to the services page.

  3. Find the Shipper Service and click Update in the Action bar to the right.

  4. Change your desired configuration, and then click Update.

Thank you! We've received your feedback.