edit-icon download-icon

Collection - embedded IoT

Last Updated: Jan 30, 2018

Internet of Things (IoT) is enjoying high growth. More and more devices are being applied to our daily life, such as intelligent routers, various TV dongles, Tmall Genie, and robot vacuum cleaners, bringing us the convenience of intelligence. According to Gartner’s prediction, 20 billion intelligent devices will be put into service by the end of 2020, which gives a glimpse of the huge market in this field. The embedded development in traditional software field is faced with great challenges in the IoT device field. Besides the large number and wide distribution, IoT devices are difficult to debug and restricted in hardware, so traditional device log solutions cannot meet the demands perfectly.

Based on years of development experience in Logtail and the characteristics of IoT devices, the Log Service team has customized a log data collection solution for IoT devices: C-Producer.

image

Embedded development requirements

IoT/embedded engineers must have profound knowledge and experience in development and the capability to manage, monitor, and diagnose the “black boxes”. Embedded development mainly has the following requirements:

  • Data collection: How to collect data from millions/tens of millions of devices distributed all around the world in real time?
  • Debugging: How to meet the requirements of online data collection and real-time debugging in development with one solution?
  • Online diagnosis: When an error occurs in an online device, how to locate the device quickly and check the context causing the error?
  • Monitoring: How many devices are online now? How is the working status distribution? How is the geographic distribution? How does a device send alarms in real time when an error occurs?
  • Real-time data analysis: How to integrate the data generated by devices with real-time computing and big data warehouses to build user profiles?

1

Major challenges in the IoT field

When thinking about the solutions to the preceding issues, we find that the approaches in the traditional software field are not applicable in the IoT field. The major challenges come from the following characteristics of IoT devices:

  • Large numbers of devices: In traditional Operation & Maintenance (O&M) field, a company managing ten thousand servers is qualified to be a large one. However, for IoT devices, a hundred thousand online is only a small threshold.
  • Wide distribution: The deployed hardware devices are usually distributed all over the country even the world.
  • Black box: Difficult to log on and debug, and mostly in unknown status.
  • Restrictions in resources: Due to costs, the IoT device hardware is relatively restricted (for example, the total memory is 32 MB), so traditional PC approaches always do not work in the IoT field.

C-Producer: A log data collection solution customized by Log Service

Logtail, the client of Log Service, has millions of deployments on X86 servers. In addition, Log Service provides diversified collection solutions:

  • Mobile SDK: Data collection on Android/iOS platform with tens of millions of DAU per day.
  • Web Tracking (JS): Similar to Baidu Tongji and Google Analytics, it uses a lightweight collection mode without signature.

In the IoT field, based on years of development experience in Logtail and the characteristics of IoT devices in terms of CPU, memory, disk, network, and application mode, we have developed a log data collection solution for IoT devices: C-Producer.

1

Characteristics of C-Producer

C-Producer Library offers the same stability and convenience as Logtail, and can be considered as a “lightweight Logtail”. Although without the feature of real-time configuration management in Logtail, C-Producer Library has 70% of the features of Logtail, including:

  • Multi-tenant concept: C-Producer can process various logs (such as Metric, DebugLog, and ErrorLog) according to their priorities. Multiple clients can be configured and each client can be separately configured with collection priority and target project/Logstore.
  • Context query: Logs generated by the same client are in the same context, and the relevant logs before and after a log can be viewed.
  • Concurrent sending and breakpoint transmission: The upper limit of cache can be set. Logs fail to be written when the upper limit is exceeded.

In addition, C-Producer provides the following features specific to IoT devices, including:

  • Local debugging: Log content can be output locally. Rotation, log quantity, and rotation size can be set.
  • Fine-grained resource control: Different cache upper limits and aggregation modes can be set for different types of data/logs.
  • Log cache compression: The cache of the data failed to be sent can be compressed to reduce memory occupation of the device.

image.png

Advantages

As a custom solution for IoT devices, C-Producer has obvious advantages in the following aspects:

1

  • Highly concurrent writing on the client: Hundreds of thousands of logs can be written every second with a configurable sending thread pool. For more information, see Performance test in this article.
  • Low resource consumption: 200 thousand logs are written every second, which only consumes 70% of the CPU. Generating 100 logs on low-performance hardware (for example, Raspberry Pi) every second basically does not affect the resources.
  • No data copies on disks of client logs: Data is directly sent to the server using network after being generated.
  • Client computing separated from I/O logic: Logs are output asynchronously, without blocking the worker thread.
  • Multiple priorities: Different clients can be configured with different priorities to make sure logs with high priority are the first to be sent.
  • Local debugging: Local debugging can be configured to facilitate you to test the applications locally when the network is unavailable.

In the preceding scenarios, C-Producer Library simplifies application development. You do not have to consider log collection details or worry about the impact of log collection on your business operation, which makes data collection significantly easier.

To make it distinct, we made a comparison between C-Producer solution and other embedded collection solutions as shown in the following table:

Type C-Producer Other solutions
Programming Platform Mobile + Embedded Mobile-based
Context Supported Not supported
Mutiple logs Supported Not supported (one type of logs)
Custom format Supported Not supported (several limited fields are provided)
Priority Supported Not supported
Environment parameter Configurable Configurable
Stability Concurrency High Medium
Compression algorithm LZ4 (balance between efficiency and performance) + GZIP Optimized
Low resource consumption Optimized Medium
Transmission Breakpoint transmission Supported Not supported by default. Secondary development is required.
Access point 8 (China) + 8 (worldwide) Hangzhou
Debugging Local log Supported Supported in manual mode
Parameter configuration Supported Not supported
Real-time performance Visible on the server 1 second (99.9%), 3 seconds (Max) 1-2 hours
Custom processing 15+ interconnection modes Custom real-time and offline solution

C-Producer + Log Service solution

C-Producer integrates with Alibaba Cloud Log Service to form a complete set of log solutions for IoT devices.

  • Large scale
    • Supports hundreds of millions of real-time data writing on the client.
    • Supports petabytes of data every day.
  • High speed
    • Fast data collection: Data can be consumed after being written without any latency.
      • Quick query: Billions of data can be processed within one second for a complex query statement (with five conditions).
      • Rapid analysis: Hundreds of millions of data can be aggregated and analyzed within one second for a complex analysis statement (with aggregation by five dimensions and the GroupBy condition).
  • Wide interconnection
    • Seamlessly integrated with various Alibaba Cloud products.
    • Compatible with various open-source storage, computing, and visual systems.

1

Download and use

Download address: GitHub

One application can create multiple producers, and each producer can include multiple clients. Each client can be independently configured with target address, log level, local debugging or not, cache size, custom identifier, and topic.

image.png

Performance test

Environment configuration

  • High-performance scenario: Traditional X86 servers.
  • Low-performance scenario: Raspberry Pi (low power consumption environment).

The configurations are respectively as follows:

1

C-Producer configuration

  • ARM (Raspberry Pi)

    • Cache: 10 MB
    • Aggregation time: 3 seconds (If any of the conditions is met, namely, aggregation time, aggregation data package size, and aggregation log quantity, the data is packaged and sent.)
    • Aggregation data package size: 1 MB
    • Aggregation log quantity: 1,000
    • Sending thread: 1
    • Custom tag: 5
  • X86

    • Cache: 10 MB
    • Aggregation time: 3 seconds (If any of the conditions is met, namely, aggregation time, aggregation data package size, and aggregation log quantity, the data is packaged and sent.)
    • Aggregation data package size: 3 MB
    • Aggregation log quantity: 4096
    • Sending thread: 4
    • Custom tag: 5

Sample log

  1. The total data volume is approximately 600 bytes for 10 key-value pairs.
  2. The total data volume is approximately 350 bytes for 9 key-value pairs.
  1. __source__: 11.164.233.187
  2. __tag__:1: 2
  3. __tag__:5: 6
  4. __tag__:a: b
  5. __tag__:c: d
  6. __tag__:tag_key: tag_value
  7. __topic__: topic_test
  8. _file_: /disk1/workspace/tools/aliyun-log-c-sdk/sample/log_producer_sample.c
  9. _function_: log_producer_post_logs
  10. _level_: LOG_PRODUCER_LEVEL_WARN
  11. _line_: 248
  12. _thread_: 40978304
  13. LogHub: Real-time log collection and consumption
  14. Search/Analytics: Query and real-time analysis
  15. Interconnection: Grafana and JDBC/SQL92
  16. Visualized: dashboard and report functions

Test results

Test results on X86 platform

  • The sending speed of C-Producer can reach 90 MB/s. Uploading 200 thousand logs every second only consumes 70% of CPU and 140 MB of memory.
  • The sending speed of the server is 200 entries every second. Data sending basically does not have any impact on CPU (reduced to less than 0.01%).
  • The average time it takes the client thread to send an entry of data (export a log) is 1.2 μs.

    1

Test results on Raspberry Pi platform

  • In the Raspberry Pi test, the frequency of CPU is only 600 MHz. Therefore, the performance is approximately 1/10 of the server. The highest sending speed is 20,000 logs every second.
  • When the sending speed of Raspberry Pi is 20 entries every second, data sending basically does not have any impact on CPU (reduced to less than 0.01%).
  • The average time it takes the client thread to send an entry of data (export a log) is about 12μs (Raspberry Pi is connected to PC shared network by using USB).

1

For more typical scenarios of Log Service, see Best practice.

Thank you! We've received your feedback.