Consume metering logs to generate bills - Simple Log Service

One of the key benefits of cloud services is the pay-as-you-go billing method. If you use this billing method, no resources are reserved. Cloud services do have metering and billing requirements. This topic describes a metering and billing solution that is developed based on Simple Log Service and used by various cloud services. You can use this solution to process hundreds of billions of metering logs per day.

Typical scenarios of billing based on metering logs

An electric power company receives a log at an interval of 10 seconds. The log records the power consumption, peak power consumption, and average power consumption for each user ID within the 10 seconds. Then, the company generates bills for users on an hourly, daily, or monthly basis.
A carrier receives logs from a base station at an interval of 10 seconds. The logs record the services used by a mobile number within the 10 seconds. The services include Internet access, voice calls, text messages, and voice over Internet Protocol (VoIP) calls. The logs also record the traffic and duration of each service. The backend billing program calculates the fees that are generated during the interval.
A weather forecast API service charges user requests based on different factors, such as the API type, city, query type, and size of the query result.

Requirements and challenges

The metering and billing solution must meet the following basic requirements:

Accuracy and reliability: Computing results must be accurate.
Flexibility: Data can be supplemented. For example, if some data is not pushed in time, the fees can be calculated again after the data is supplemented.
Timeliness: Services can be billed in seconds. If an account has overdue payments, services can be immediately stopped.

Additional requirements:

Bill correction: If real-time billing fails, bills can be generated based on metering logs.
Details query: You can view the consumption details.

Two major challenges:

Increase in data size: The data size continues to increase as the number of users and calls continues to grow. The challenge is to maintain the auto scaling of the architecture.
Fault tolerance: The billing program may have bugs. The challenge is to keep the metering data independent of the billing program.

The metering and billing solution described in this topic is developed by Alibaba Cloud based on Simple Log Service. This solution has been in stable operation for years without any error or latency issues.

System architecture

The following list describes how LogHub of Simple Log Service works for the metering and billing solution:

Collects metering logs in real time and writes metering logs to the metering program. LogHub supports more than 50 methods to collect and write metering logs.
Allows the metering program to consume LogHub data based on increments at regular intervals. Then, the metering program can compute data in the memory to generate billing data.
(Optional) Creates indexes for metering logs to support detail queries.
(Optional) Ships metering logs to Object Storage Service (OSS)for offline storage. This way, you can check accounts and collect statistics on a T+1 basis.

The following list describes the internal logic of the metering program:

Calls the GetCursor operation to obtain cursors of logs within a specified time period from LogHub, such as 10:00 to 11:00.
Calls the PullLogs operation to consume data in the specified time period.
Collects statistics and computes data in the memory, and generates billing data.
You can specify a time period that meets your business requirements. For example, you can specify a time period of 1 minute or 10 seconds.

The following list analyzes the performance of the metering and billing solution:

The analysis is based on the following conditions: one billion metering logs are generated on a daily basis, the size of each log is 200 bytes, and the total data size per day is 200 GB.
By default, SDKs or agents support data compression, at a compression ratio of at least 5:1. In this case, metering logs of 40 GB need to be stored per day, and metering logs of 1.6 GB need to be stored per hour.
LogHub reads a maximum of 1,000 packets at a time. The maximum size of the packet is 5 MB. On a 1-Gigabit network, the metering logs of 1.6 GB can be read within two seconds.
The metering logs of 1.6 GB that are generated within one hour can be consumed to generate the billing data within five seconds. This includes the time that is used to collect statistics and compute data in the memory.

Bills generated based on metering logs

Metering logs record billable items. The backend billing program computes billable items based on specific rules to generate billing data. For example, the following raw access log records the usage of a project:

microtime:1457517269818107 Method:PostLogStoreLogs Status:200 Source:203.0.113.10 ClientIP:198.51.100.10 Latency:1968 InFlow:1409 NetFlow:474 OutFlow:0 UserId:44 AliUid:1264425845****** ProjectName:app-myapplication ProjectId:573 LogStore:perf UserAgent:ali-sls-logtail APIVersion:0.5.0 RequestId:56DFF2D58B3D939D691323C7

The metering program reads the raw log and generates usage data from various dimensions based on specific rules. The following figure shows the generated usage data, which includes the inbound traffic, number of times that data is used, and outbound traffic.

Solution to scenarios with a large data size

In some billing scenarios, such as billing of carriers or IoT, a large number of metering logs are generated. For example, 10 trillion logs are generated per day, and the total data size is 2 PB per day. After compression, the size of data that is generated within one hour is 16 TB. On a 10-GE network, it takes 1,600 seconds to read all data that is generated within one hour. This performance is not suitable for the billing requirements of this scenario.

Limit the size of generated metering data
Modify the program, such as NGINX, that generates metering logs. The program aggregates metering logs in the memory first and dumps the aggregated metering logs every minute. This way, the total number of users is correlated with the data size. For example, NGINX serves 1,000 users. The size of metering logs that are generated within one hour is calculated by using the following formula: 1000 × 200 × 60 = 12 GB. After the metering data is compressed, the data size is only 240 MB.
Parallelize metering log processing
In LogHub, each Logstore can contain several shards. For example, a Logstore contains three shards, and three metering programs are assigned. To ensure that the metering data of a user is always processed by the same metering program, use a hash function to map users to shards by user ID. For example, the metering data of users in the Xihu District of Hangzhou is written to Shard 1, and the metering data of users in the Shangcheng District of Hangzhou is written to Shard 2. In this case, the backend metering programs can process the metering data in two shards in parallel.

FAQ

How do I supplement data?
In LogHub, you can set the lifecycle of each Logstore in the range of 1 to 365 days. If the billing program needs to consume data again, the billing program can compute data by period within the lifecycle of a Logstore.
What do I do if metering logs are scattered across multiple servers?
1. Use Logtail to collect the logs in real time from each server.
2. Use the custom IDs of servers to define a dynamic machine group for auto scaling.
How do I query consumption details?
You can create indexes for LogHub data to support real-time query and statistical analysis. The following statement is an example of search statement.
```
Inflow>300000 and Method=Post* and Status in [200 300]
```
You can also add an analytic statement after the search statement.
```
Inflow>300000 and Method=Post* and Status in [200 300] | select max(Inflow) as s, ProjectName group by ProjectName order by s desc            
```
How do I store logs and check accounts on a T+1 basis?
Simple Log Service can ship LogHub data to other systems. You can customize partitions and storage formats to store logs in OSS. Then, you can use E-MapReduce, Hadoop, Hive, Presto, or Spark to compute log data.