An assumption during log processing is that data is not perfect. Due to the gap between the raw data and the final results, the raw data must be cleansed, converted, and sorted using methods like ETL (Extract Transformation Load) to get the final results.
“I Want Take-away” is an e-commerce website with a platform involving users, restaurants, and couriers. Users can place their take-away orders using the website, App, WeChat, or Alipay. Once order is received, a merchant starts preparing the ordered dish item, and the nearest couriers are auto-notified. Then, one of the couriers accepts the order and delivers it to the buyer.
The operation team has two jobs:
- Determine couriers’ locations and assign orders location.
- Understand how coupons and cash are used and distribute coupons locations as part of interactive operations.
GPS data (X and Y) is reported once a minute by the courier’s app in the following format:
2016-06-26 19:00:15 ID:10015 DeviceID:EXX12345678 Network:4G GPS-X:10.30.339 GPS-Y:126.96.36.199 Status:Delivering
The data feed records the reporting time, courier ID, network in use, device serial number, and coordinates (GPS-X and GPS-Y). The longitude and latitude given by GPS are accurate, but the operation team actually does not need such accurate data to understand the current status statistics for each area. Therefore, it is necessary to transform the raw data and convert the coordinates into readable fields like city, region, ZIP code.
(GPS-X,GPS-Y) --> (GPS-X, GPS-Y, City, District, ZipCode)
This is a typical ETL requirement. Use the LogHub function to create two LogStores (PositionLog), and the transformed LogStore (EnhancedLog). Run the ETL application (for example, Spark Streaming, Storm, or Consumer Library enabled in a container) to subscribe to the real-time PositionLog, convert the coordinates, and then write the EnhancedLog. Carry out real-time computing operations for visualization, or create an index to query the EnhancedLog data model warehouse.
We recommend that you use the following architecture for the entire process:
- Each courier’s location is shown on the app and their GPS locations are reported every minute and written into the first LogStore (PositionLog).
- We recommend that you use LogHub Android/IOS SDK and MAN (mobile analytics) for accessing the mobile device log.
- Use the real-time application to subscribe to the real-time PositionLog data, and write the processed data into EnhancedLog LogStore.
- Process the enhanced log, for example, visualization of computed log data. Recommendations:
- LogHub accessible to StreamCompute
- LogShipper posts (OSS, E-MapReduce, Table Store, and MaxCompute)
- LogSearch: order query etc.
The Payment Service receives a payment request which includes the payment account, payment method, amount, and coupon.
- Part of the sensitive information must be desensitized.
- Two types of information, coupon and cash, must be stripped from the payment information.
The entire process is as follows:
Create four LogStores for raw data (PayRecords), desensitized data (Cleaned), cash order (Cash), and coupon (Coupon) respectively. The application uses Log4J Appender to write the order data into the raw data LogStore (PayRecords).
The desensitization application consumes the PayRecords LogStore in real time and writes the desensitized data into the Cleaned LogStore after stripping off the account-related information.
The traffic delivery application consumes the Cleaned LogStore in real time, and according to the business logics saves the two types of information, coupon and cash, into the corresponding LogStore (Cash and Coupon) for subsequent processing.