The biggest advantage of using cloud services is support of Pay-As-You-Go with no need to reserve resources. Therefore, all cloud products have billing demands. This document describes a billing method based on Log Service, which can process hundreds of billions of logs every day and is applied to many cloud products.
The metering log records billing-related items. The backend billing module calculates the results based on the billing items and rules, and generates the final bill. For example, the following original access log records the use of a project:
microtime:1457517269818107 Method:PostLogStoreLogs Status:200 Source:10.145.6.81 ClientIP:22.214.171.124 Latency:1968 InFlow:1409 NetFlow:474 OutFlow:0 UserId:44 AliUid:1264425845278179 ProjectName:app-myapplication ProjectId:573 LogStore:perf UserAgent:ali-sls-logtail APIVersion:0.5.0 RequestId:56DFF2D58B3D939D691323C7
The billing program reads the original log and generates the usage data (including traffic, usage count, and outbound traffic) in each dimension according to the rules:
- For electric power companies: A log is generated and sent to companies every ten seconds, which records the power consumption, peak value, and average value for each user ID within ten seconds. The users are also provided with bills every day, hour, and month.
- For Internet service providers: The base station sends the behaviors (Internet, phone calls, SMS messages, and VoIP), consumed traffic, duration, and other information every ten seconds, and the backend billing service calculates the fees incurred during this period.
- For weather forecast API services: The user requests are billed based on the types of the called API, the city of the user, the query type, and the result size.
The billing system has the following requirements:
- Accurate and reliable: The billing result must be precise.
- Flexible: The system supports data completion. For example, recalculation is supported for data correction when some data was not pushed.
- Real-time: The system supports billing in seconds and quick disconnection for outstanding payments scenarios.
- Bill correction: Ideal billing is supported for reconciliation when real-time billing fails.
- Details query: The users can view their own consumption details.
Also, the following two challenges exist:
- Increasing data volume: With the growth of users and calls, the data volume expands, and question of how to maintain the auto scaling of the system architecture becomes a challenge.
- Error tolerant processing: Bugs may exist in the billing program, and how to guarantee the metering data is isolated from the billing program becomes another challenge.
This document describes the billing method developed by Alibaba Cloud based on Log Service. This method has been running online reliably for several years without any miscalculation and latency, and provides reference for unit prices.
The following uses the LogHub feature of Alibaba Cloud Log Service as an example:
Use LogHub to collect metering logs in real time and connect with the metering program: LogHub supports more than 30 APIs and access methods for easy access to metering logs.
The metering program regularly consumes the incremental data in LogHub, calculates the results, and generates billing data in the memory.
(Additional) The index query of metering logs can be configured for detailed data queries.
(Additional) The metering logs are pushed to OSS and MaxCompute to be stored offline for T+1 reconciliation and statistics.
Internal structure of the real-time metering program:
Select the logs within a period (such as, 10:00 to 11:00) using the GetCursor feature of the LogHub reading API.
Consume the data of this period using PullLogs API.
Collect and calculate the data in the memory and generate the billing data.
Similarly, the calculation logic of the selected period can be changed to one minute, or ten seconds.
- Assume that one billion metering logs are introduced per day with each log containing 200 bytes, and the total data volume is 200 GB.
- The default LogHub SDK or Agent provides the compression feature. Thus, the actually stored data volume is 40 GB (generally at least a five-time compression rate is available) and the hourly data volume is 40/24 = 1.6 GB.
- The LogHub reading API can read up to 1,000 packages at a time (each of which is limited to 5 MB). The full data can be read within two seconds under the gigabyte network.
- The data of the metering logs for one hour can be read within five seconds, which include the data accumulation and calculation time in the memory.
In some billing scenarios (such as, ISP and IoT scenarios), the volume of metering logs is extremely large (for 10 trillion metering logs, the data volume is 2 PB per day). Namely, 16 TB data after compression is to be read per hour, which takes 1,600 seconds under the 10-gigabit network. Thus, quick bill generation cannot be implemented.
To do this, modify the metering log generation program (for example, Nginx) by implementing aggregation in the memory and dumping the aggregated metering log results every one minute. In this way, the data volume is related to the total number of users. Assume that 1,000 users exist during the period, then the hourly data volume is 1000 × 200 × 60 = 12 GB (240 MB after compression).
Each Logstore of LogHub can be assigned with different number of shards. In this case, three shards and three metering consumption programs are assigned. To guarantee that the metering data of a single user is always processed by the same consumption program, the ID of the user can be hashed to the corresponding constant shard. For example, the user data for the West Lake District of Hangzhou can be hashed to Shard 1 while that for the Shangcheng District of Hangzhou can be hashed to Shard 2. By doing this, the backend metering programs can be horizontally scaled out.
Each Logstore of LogHub can be configured with a lifecycle (1 to 365 days). If a billing program must consume the data again, the program can calculate the data based on any time period within the lifecycle.
- Collect those logs with Logtail Agent in real time.
- Define a group of dynamic machines for auto scaling with machine identification.
Inflow>300000 and Method=Post* and Status in [200 300]
Once indexing is enabled for the data in LogHub, real-time query and analysis features are available.
Also, you can perform statistical analysis after the query:
Inflow>300000 and Method=Post* and Status in [200 300] | select max(Inflow) as s, ProjectName group by ProjectName order by s desc
The data shipping feature in LogHub provided by the Log Service supports storing logs on the OSS or MaxCompute in custom shards and custom storage formats, and calculates using E-MapReduce, MaxCompute, HybridDB, Hadoop, Hive, Presto, and Spark.