edit-icon download-icon

Collect public network data

Last Updated: May 08, 2018

In some scenarios, it is required to collect data from a public network (for example, mobile client, HTML webpage, PC, server, hardware devices, camera, and so on) for real-time processing.

In a traditional architecture, the preceding function can be achieved by using a combination of front-end server and Kafka. However, now such architecture can be replaced by the Log Service with solutions that are more reliable, cost-effective, elastic, and secure.

Scenarios

In the public network, data can be collected from mobile clients, external servers, webpages, and various devices. After data is collected, real-time computing, data warehouse, and other data applications are required.

pub

Solution 1: Front-end server + Kafka

Kafka does not support the RESTful Protocol and is more commonly used in clusters. Therefore, it is generally required to set up a Nginx server as a public network proxy, and then use LogStash or API to write data in the message middleware like Kafka using Nginx.

The required infrastructure is as follows:

Device Quantity Configuration Function Price
ECS server 2 units 1 core, 2 GB Front-end host, load balancing, and mutual backup 22.26 USD per unit * month
Load balancer 1 unit Standard Pay-As-You-Go instance 3.6 USD per month (lease) + 0.078 USD per GB (data traffic)
Kafka/ZK 3 units 1 core, 2 GB Data write and processing 22.26 USD per unit * month

Solution 2: Use LogHub

Use Mobile SDK, Logtail, or Web tracking JS to directly write data into LogHub Endpoint.

The required infrastructure is as follows:

Device Function Price
LogHub Real-time data collection < 0.003125 USD per GB

Scenario comparison

Scenario 1: Up to 10 GB of data is collected each day, which generates around one million write requests. The 10 GB in this example is the compressed size, so the actual data volume ranges from 50 GB to 100 GB.

  1. Solution 1:
  2. --------------
  3. Load balancer (lease): 0.005 * 24 * 30 = R3.6 USD
  4. Load balancer (traffic): 0.078 * 10 * 30 = 23.4 USD
  5. ECS cost: 22.26 * 2 = 44.52 USD
  6. Kafka ECS: Free, if shared with other services
  7. Total: 71.52 USD per month
  8. Solution 2:
  9. --------------
  10. LogHub traffic: 10 * 0.05 * 30 = 15 USD
  11. Number of LogHub requests: 0.03 (assuming 1 million requests per day) * 30 = 0.9 USD
  12. Total: 15.9 USD per month

Scenario 2: Up to 1 TB of data is collected each day, which generates around 100 million write requests.

  1. Solution 1:
  2. --------------
  3. Load balancer (lease): 0.005 * 24 * 30 = 3.6 USD
  4. Load balancer (traffic): 10.078 * 1000 * 30 = 2340 USD
  5. ECS cost: 22.26 * 2 = 44.52 USD
  6. Kafka ECS: Free, if shared with other services
  7. Total: 2388.12 USD per month
  8. Solution 2:
  9. --------------
  10. LogHub traffic: 0.045 * 1000 * 30 = 1350 (tiered pricing)
  11. Number of LogHub requests: 0.03 * 100 (assuming 100 million requests per day) * 30 = 90 USD
  12. Total: 1440 USD per month

Comparison of solutions

Two preceding scenarios show that you can use LogHub to collect data from the public network at a competitive cost. Furthermore, Solution 2 outperforms Solution 1 in the following aspects:

  • Auto scaling: MB-PB/Day traffic that can be controlled freely
  • Abundant permission control options: use ACL to control the read and write permissions
  • HTTPS compatibility: encrypted transmission
  • Log post at no cost: access to data warehouse without additional development
  • Detailed data monitoring: gain more information about your business
  • A rich set of SDK interfaces with upstream and downstream systems: complete downstream interfaces such as Kafka, and deep integration with Alibaba Cloud and open-source products
Thank you! We've received your feedback.