Log Service allows you to collect and consume data from more than 30 data sources in real time by using LogHub.

In most cases, data is collected by using the following two methods. This topic describes how to use the streaming import feature of LogHub to collect data in real time.

Method Advantage Disadvantage Example
Batch import High throughput and applicable to historical data Non-real-time Import data by using FTP, upload data from OSS, ship hard drives, and export SQL-based data to LogHub
Streaming import What you see is what you get (WYSIWYG) and applicable to real-time data High requirements on collectors Upload data by using HTTP, collect data by using LogHub, and collect data from IoT Platform and Message Queue

Scenario

"I Want Takeaway" (IWT) is an e-commerce website platform that involves users, restaurants, and delivery people. A user can order food from a restaurant on this platform by using the website, mobile app, WeChat, or Alipay. After the restaurant confirms the order, the order is automatically sent to the nearest delivery person. Then, the delivery person confirms the order and delivers the food to the user.

Scenario

Operational requirements

The following issues are identified in the operational process:

  • User acquisition: The restaurant has invested lots of money in advertisements in different marketing channels, such as webpages and WeChat. These advertisements have gained some users for IWT. However, the advertising effect of the marketing channels is difficult to evaluate.
  • User satisfaction: Users often complain about slow deliveries. However, IWT cannot identify the bottlenecks in the delivery process. Therefore, IWT cannot optimize the process to eliminate the bottlenecks.
  • User engagement: IWT often organizes promotional activities, but these events do not yield the expected results.
  • Scheduling: IWT wants to help restaurants prepare food for peak hours. and dispatch more delivery people to specific areas.
  • Customer service: IWT wants to analyze the causes for failed orders. For example, IWT wants to check whether the failed order is caused by user operations or system errors.

Data collection challenges

When IWT collects scattered data and stores the data in a centralized manner, IWT may encounter the following challenges:

  • Multiple channels: advertisers and street promotions (leaflets).
  • Multiple terminals: webpages, official social media accounts, mobile phones, desktop browsers, and mobile browsers.
  • Heterogeneous networks: Virtual Private Cloud (VPC), self-managed data center, and Elastic Compute Service (ECS).
  • Various development languages: Java for the core system, NGINX for frontend servers, and C++ for the backend payment system.
  • Multiple devices: x86 device and ARM-based device.

To collect and analyze the scattered data and store the data in a centralized manner, IWT can use LogHub instead of traditional solutions. This reduces a large volume of workloads.

Data collection challenges

Manage data in a centralized manner

  1. Create a project to manage log data. For example, create a project named myorder.
  2. Create a Logstore to store logs collected from each data source.
    • Create a Logstore named wechat-server to store access logs of WeChat servers.
    • Create a Logstore named wechat-app to store app logs of WeChat servers.
    • Create a Logstore named wechat-error to store WeChat error logs.
    • Create a Logstore named alipay-server to store access logs of Alipay servers.
    • Create a Logstore named alipay-app to store app logs of Alipay servers.
    • Create a Logstore named deliver-app to store logs related to the delivery status.
    • Create a Logstore named deliver-error to store delivery error logs.
    • Create a Logstore named web-click to store logs generated by using HTML5 web pages.
    • Create a Logstore named server-access to store server-side access logs.
    • Create a Logstore named server-app to store application logs.
    • Create a Logstore named coupon to store application coupon logs.
    • Create a Logstore named pay to store payment logs.
    • Create a Logstore named order to store order logs.
  3. To cleanse raw data and perform extract, transform, load (ETL) operations, you can create other Logstores to store intermediate results.

Collect promotion logs

IWT can use the following two methods to acquire new users:

  • Offer coupons when users register on the website.
  • Offer coupons on the following channels:
    • QR codes on leaflets
    • QR codes on webpages

Implementation

IWT can use the following server address and generate QR codes for leaflets and webpages. If a user scans the QR code to register, IWT can identify the source channel and record the channel information in logs.

http://examplewebsite/login?source=10012&ref=kd4b
            

If the server receives a request, the following log is generated:

2016-06-20 19:00:00 e41234ab342ef034,102345,5k4d,467890
            

The log contains the following fields:

  • time: the time when the user registers on the website.
  • session: the current session of the browser. The session is used to track user behavior.
  • source: the source channels. For example, Promotion A is labeled as 10001, leaflets are labeled as 10002, and elevator advertisements as 10003.
  • ref: the referrer account. This field is empty if no referrer exists.
  • params: other parameters.

Collection methods:

  • Use Logtail to collect logs that are saved in hard disks. For more information, see Logtail overview.
  • Collect logs by using SDKs. For more information, see Overview.

Collect server-side data

If a user uses an Alipay or WeChat official account to register on the website, the following four types of log data are generated:

  • NGINX or Apache access logs

    NGINX or Apache access logs are used to monitor and analyze data in real time.

    10.1.168.193 - - [01/Mar/2012:16:12:07 +0800] "GET /Send?AccessKeyId=8225105404 HTTP/1.1" 200 5 "-" "Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
                        
  • NGINX or Apache error logs
    2016/04/18 18:59:01 [error] 26671#0: *20949999 connect() to unix:/tmp/fastcgi.socket failed (111: Connection refused) while connecting to upstream, client: 10.101.1.1, server: , request: "POST /logstores/test_log HTTP/1.1", upstream: "fastcgi://unix:/tmp/fastcgi.socket:", host: "ali-tianchi-log.cn-hangzhou-devcommon-intranet.   sls.aliyuncs.com" 
  • Application layer logs

    Application layer logs record log fields that involve time, location, result, latency, and request method. Log fields that record other details are located at the end of the log entry.

    {
        "time":"2016-08-31 14:00:04",
        "localAddress":"10.178.93.88:0",
        "methodName":"load",
        "param":["31851502"],
        "result":....
        "serviceName":"com.example",
        "startTime":1472623203994,
        "success":true,
        "traceInfo":"88_1472621445126_1092"
    }              
  • Application layer error logs record the error details, such as the time, code blocks, error codes, and error causes.

    2016/04/18 18:59:01 :/var/www/html/SCMC/routes/example.php:329 [thread:1] errorcode:20045 message:extractFuncDetail failed: account_hsf_service_log
                        

Implementation

  • Write logs to a local file and use regular expressions to write the logs to a specified Logstore by using Logtail.
  • To collect logs generated in a Docker container, integrate the container with Log Service.
  • To collect logs for Java programs, use Log4j Appender in scenarios where logs are not flushed into a disk. Use Alibaba Cloud Log Producer library for Java to process large amounts of data with high concurrency.
  • Use SDKs to collect logs for C#, Python, Java, PHP, and C.

Collect end-user logs

  • Mobile devices: Use the SDK for iOS or Android, or Mobile Analytics (MAN) to collect logs from mobile devices.
  • ARM-based devices: Use C native code for cross-compiling.
  • Merchant-side devices: Use SDKs to collect logs from x86 devices. Use C native code on ARM-based devices for cross-compiling.

Collect user behavior data from desktops or mobile websites

The user behavior can be divided into the following two types:

  • User behavior that interacts with backend servers: place an order, log on to the website, and log off the website.
  • User behavior that does not interact with backend servers: requests that are processed at the frontend. For example, scroll or close a page.

Implementation

  • To collect the data of user behavior that interacts with backend servers, see the methods to collect server-side data.
  • To collect the data of user behavior that does not interact with backend servers, use Tracking Pixel or JS Library.

Collect server-side O&M logs

Examples:

  • Syslog
    Aug 31 11:07:24 zhouqi-mac WeChat[9676]: setupHotkeyListenning event NSEvent: type=KeyDown loc=(0,703) time=115959.8 flags=0 win=0x0 winNum=7041 ctxt=0x0 chars="u" unmodchars="u" repeat=0 keyCode=32                  
  • Application debug logs
    __FILE__:build/release64/sls/shennong_worker/ShardDataIndexManager.cpp
    __LEVEL__:WARNING
    __LINE__:238
    __THREAD__:31502
    offset:816103453552
    saved_cursor:1469780553885742676
    seek count:62900
    seek data redo
    log:pangu://localcluster/redo_data/41/example/2016_08_30/250_1472555483
    user_cursor:1469780553885689973
                        
  • Trace logs
    [2013-07-13 10:28:12.772518]    [DEBUG] [26064]  __TRACE_ID__:661353951201    __item__:[Class:Function]_end__  request_id:1734117   user_id:124 context:.....

Implementation

For more information, see the methods to collect server-side data.

Collect data in different network environments

LogHub provides endpoints for different Alibaba Cloud regions. Each region allows access from the following networks:

  • Internal network (classic network): Recommended. LogHub can access other Alibaba Cloud services in the same region over a reliable link.
  • Internet (classic network): The Internet can be accessed without limits. The transmission speed depends on the link quality. To ensure secure data transmission, we recommend that you use HTTPS.
  • Private network (VPC): LogHub can access other Alibaba Cloud services in the same region over a VPC.