User manual

Last Updated: Jul 28, 2017

LogHub Shipper for TableStore (Shipper Service) writes users’ data from the Log Service to their tables in Table Store after simple cleaning and conversion.

Officially provided by Table Store, this service publishes data to the Alibaba Cloud Container Hub using the Docker image method and runs on your ECS instances through the Container Service.

Service introduction

In the Log Service, log data are stored in JSON format and Log groups serve as the basic units for data reading and writing. Under these conditions, it is not possible to quickly query and analyze logs based on specific criteria (for example, log data from an app in the past 12 hours).

LogHub Shipper for TableStore performs structured conversion on the log data in the Log Service and then writes the data to data tables in Table Store in real time. This provides an accurate and high-performance real-time online service.

Data example

Assume that log data in the Log Service is formatted as follows:

  1. {"__time__":1453809242,"__topic__":"","__source__":"10.170.148.237","ip":"10.200.98.220","time":"26/Jan/2016:19:54:02 +0800","url":"POST /PutData?Category=YunOsAccountOpLog&AccessKeyId=U0UjpekFQOVJW45A&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=pD12XYLmGxKQ%2Bmkd6x7hAgQ7b1c%3D HTTP/1.1","status":"200","user-agent":"aliyun-sdk-java"}

These data are written into a Table Store data table with the Primary Keys ip and time. The format is as follows:

ip time source status user-agent url
10.200.98.220 26/Jan/2016:19:54:02 +0800 10.170.148.237 200 aliyun-sdk-java POST /PutData…

In Table Store, it is easy to precisely search for historical data from a certain IP address based on a selected time period.

The Shipper Service provides flexible data mapping rules, so you can configure correspondences between log data fields and attribute columns in data tables and easily convert the data.

Basic concepts

The Shipper Service is designed to connect two products in terms of functionality and is deployed on two other products. Therefore, to understand the Shipper Service, you must first understand its related products.

Data tables

The target table stores your log data that have been cleaned and converted.

When using data tables, you must pay attention to the following points:

  • You must manually create the target tables, as the Shipper Service does not create tables on its own.

  • If the Log Service and Table Store are both available, the latency between when a log is entered into the Log Service and when it is written to Table Store is measured in a few hundred milliseconds.

  • When Table Store is unavailable, the Shipper Service will wait for a period of time (no more than 500 ms) and try again.

  • The Shipper Service regularly records persistent breakpoints.

  • If the Shipper Service is unavailable (for example, when being upgraded), after the service is restored, it will continue to process logs from the most recent breakpoint.

  • It is generally recommended that different logs in the same log store correspond to different rows in the target table. This will maintain the final consistency of the target table if a retry occurs.

  • The Shipper Service writes data using Table Store’s UpdateRow. Therefore, multiple Shipper Service instances can use the same target table. In this situation, we generally recommend that these Shipper Service instances should not be written to the same attribute columns.

Status table

The Shipper Service relies on the status tables you create in Table Store to report certain information.

When using status tables you must pay attention to the following points:

  • Multiple Shipper Service instances can use the same status table.

  • When no errors are reported, each Shipper Service container will add a record to the status table at intervals of 5 minutes.

  • When errors are reported (but Table Store is still available), each Shipper Service container immediately adds a record to the status table.

  • We suggest setting a TTL of a few days for the status table, so as to only retain recent data.

The status table has four Primary Key columns:

  • project_logstore: Str type. Your Log Service project and Logstore, separated by “|“.

  • shard: Int type. The Log Service shard number.

  • target_table: Str type. The name of the target table you have stored in Table Store.

  • timestamp: Int type. The time this record was added to the status table. UNIX time, measured in milliseconds.

In addition, there are a number of attribute columns to record the data import status for each cycle. In any line in a status table, all attribute columns are optional and might not exist.

  • shipper_id: Str type. The Shipper Service container ID. At present, this is the name of the container host.

  • error_code: Str type. Identifies errors using Table Store Error Codes. If no errors are reported, this attribute column will not exist.

  • error_message: Str type. Information on specific errors returned by Table Store. If there are no errors, this column will not exist.

  • failed_sample: Str type. The error log, formatted as a JSON string.

  • __time__: Int type. This is the maximum value in the __time__ Field from the logs this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

  • row_count: Int type. This is the number of logs (rows) this Shipper Service container has successfully written to Table Store since the last time it updated the status table.

  • cu_count: Int type. This is the number of Write Service Capacity Units this Shipper Service container has consumed since the last time it updated the status table.

  • skip_count: Int type. This is the number of logs (rows) this Shipper Service container has cleared since the last time it updated the status table.

  • skip_sample: Str type. This is one of the logs the Shipper Service container discarded since the last time it updated the status table, formatted as a JSON string. The log of the container itself records each discarded log and the reason it was discarded.

Configuration

When the container is created, the Shipper Service must provide certain environment variables to the container:

  • access_key_id and access_key_secret: These are the Access Key ID and Access Key Secret of the Alibaba Cloud account used by the Shipper Service.

  • loghub: This is the configuration of the Log Service instance required by the Shipper Service, formatted as a JSON object. Includes:

    • endpoint

    • logstore

    • consumer_group

  • tablestore: This is the configuration of the Table Store instance required by the Shipper Service, formatted as a JSON object. Includes:

    • endpoint

    • instance: The instance name.

    • target_table: The name of the data table. This table must already exist under this instance.

    • status_table: The name of the status table. This table must already exist under this instance.

  • exclusive_columns: The attribute column black list, formatted as a JSON array composed of JSON strings.

    If this item is set, the fields listed will not be written to the target table as an attribute column. For example, if the Primary Key of the target table is “A”, exclusive_columns is set to [“B”, “C”], and the log contains three fields: A, B, and D, the log in the target table will be presented as one row with the Primary Key A and the attribute column D. C does not exist in the log, so it cannot be written, while B does exist, but is not written because it is listed in exclusive_columns.

  • transform: The simple conversion, formatted as a JSON object. The key in this variable is the column name written in the target table (it can be the Primary Key column). The value is the simple conversion expression defined by the Shipper Service:

    • A log field is a regular expression.

    • An unsigned integer is a regular expression.

    • A string in double quotes is a regular expression. The string can contain the escape characters ‘\”‘ and ‘\\‘.

    • ( func arg... ) is also a regular expression. There can be zero or multiple spaces/tabs before and after the parentheses. There must be at least one space between ‘func’ and the following parameter, and between different parameters. Each parameter must be a regular expression. At present, the following functions are supported:

      • ->int: Converts a string to an integer. There are two parameters. The first is the base, which can be 2 to 36. The second parameter is the string to be converted. When the base value is higher than 10, letters (case insensitive) are represented by 10-26.

      • ->bool: Converts a string to a Bool value. There is one parameter, the string to be converted. “true” corresponds to true and “false” corresponds to false. Errors will be reported for other types of strings.

      • crc32: Calculates the crc32 for a string and outputs the result as an Int value. There is one parameter, which is the string to be calculated.

If a log is missing or an error occurs during conversion, the column corresponding to the key will be missing. If an error occurs, the container’s log will record detailed error information.

The Shipper Service has only one cleaning rule: If the Primary Key column is missing, the log is erased.

Configuration changes

The procedure is as follows:

  1. Log on to the Container Service console.

  2. Select Services to go to the services page.

  3. Find the Shipper Service and click Update in the Action bar to the right, to go to the modification page.

  4. Click Update when you have changed the configuration.

Image upgrades

The procedure is as follows:

  1. Log on to the Container Service console.

  2. Select Applications to go to the applications page.

  3. Find the Shipper Service and click Redeploy in the operations bar to the right.

  4. Click OK.

Scale up

There are two effective methods to scale up the service. If you need to frequently scale up and down, you might consider using Auto Scaling and Resource Orchestration.

  • Adding nodes: Go to the “Clusters” page on the Container Service console and find the Shipper Service’s cluster. Click More > Expand to add Pay-As-You-Go ECS instances. Or, click More > Add existing instance to add an existing ECS instance to the cluster.

  • Adding Shipper Service Containers: Simply adding nodes to the cluster will not add a new Shipper Service container. To do so, you must change the Shipper Service configuration. The procedure is as follows:

    1. Log on to the Container Service console.

    2. Select Services to go to the services page.

    3. Find the Shipper Service and click Update in the Action bar to the right, to go to the modification page.

    4. Modify the container quantity and then click Update.

Scale down

This process is the reverse of the scale-up process.

Notice: The removed nodes are not automatically released.

Thank you! We've received your feedback.