This topic describes how to use the custom event monitoring feature of CloudMonitor to monitor service exceptions and generate alerts when certain conditions are met.

Background

You may encounter service exceptions from time to time while running services. Some service exceptions can be automatically resolved through retries, but not all. Serious exceptions may even interrupt your business. Therefore, it is necessary to monitor service exceptions and generate alerts when certain conditions are met. The traditional solution is generating logs and collecting them to a specified system such as ELK, which is the acronym for three open-source projects: Elasticsearch, Logstash, and Kibana. An open-source system often consists of multiple complex and distributed systems. This causes high technical requirements and high costs. Against this background, CloudMonitor provides the custom event monitoring feature, which can monitor service exceptions and is easy to use.

Preparations

The custom event monitoring feature allows you to report events by using the Java SDK or API. This topic uses the Java SDK as an example to describe how to report exception events.

  1. Add the Maven dependency.
    <dependency>
        <groupId>com.aliyun.openservices</groupId>
        <artifactId>aliyun-cms</artifactId>
        <version>0.1.2</version>
    </dependency>
  2. Initialize the SDK.
    // The application group ID. Events can be classified by application group. You can query the application group ID on the Application Groups page in the CloudMonitor console.
    CMSClientInit.groupId = 118L;
    // The endpoint for reporting events. Currently, a public endpoint is provided. The accesskey and secretkey parameters are used for authentication.
    CMSClient c = new CMSClient("https://metrichub-cms-cn-hangzhou.aliyuncs.com", accesskey, secretkey);
  3. Specify whether to report events asynchronously.

    By default, CloudMonitor allows you to report events synchronously. The synchronous reporting mode has many advantages. For example, its code is easy to write, and it guarantees that all events are reported to CloudMonitor without data loss.

    However, this mode also has some drawbacks. To implement synchronous reporting, you need to embed the code for reporting events in the business code. If the network conditions between your system and CloudMonitor degrade, the code for reporting events may fail to be run, which affects the normal running of your business. In addition, you may not require the system to report all events to CloudMonitor in some scenarios. To avoid these drawbacks, you can use asynchronous reporting by simply writing events to an event queue of the LinkedBlockingQueue type. Then, you can use an executor of the ScheduledExecutorService type to asynchronously report multiple events in the event queue at a time.

    // Initialize the event queue and executor.
    private LinkedBlockingQueue<EventEntry> eventQueue = new LinkedBlockingQueue<EventEntry>(10000);
    private ScheduledExecutorService schedule = Executors.newSingleThreadScheduledExecutor();
    // Add an event to the event queue.
    // An event consists of the name and content. The name identifies the event, and the content provides the event details. Full event content can be searched.
    public void put(String name, String content) {
        EventEntry event = new EventEntry(name, content);
        // The following code discards new events when the event queue is full. You can modify the policy as required. 
        boolean b = eventQueue.offer(event);
        if (! b) {
            logger.warn("The following event is discarded because the event queue is full: {}", event);
        }
    }
    // Report events asynchronously. Initialize a scheduled task to call the run method every second. You can change the interval as required.
    schedule.scheduleAtFixedRate(this, 1, 1, TimeUnit.SECONDS);
    public void run() {
        do {
            batchPut();
        } while (this.eventQueue.size() > 500);
    }
    private void batchPut() {
        // Obtain 99 events from the event queue to report them at a time.
        List<CustomEvent> events = new ArrayList<CustomEvent>();
        for (int i = 0; i < 99; i++) {
            EventEntry e = this.eventQueue.poll();
            if (e == null) {
                break;
            }
            events.add(CustomEvent.builder().setContent(e.getContent()).setName(e.getName()).build());
        }
        if (events.isEmpty()) {
            return;
        }
        // Report multiple events to CloudMonitor at a time. In the following code, event reporting is not retried if an exception occurs. The SDK does not retry event reporting either. You can add a retry policy if you need to ensure high reliability of event reporting.
        try {
            CustomEventUploadRequestBuilder builder = CustomEventUploadRequest.builder();
            builder.setEventList(events);
            CustomEventUploadResponse response = cmsClient.putCustomEvent(builder.build());
            if (!" 200".equals(response.getErrorCode())) {
                logger.warn("An error occurs during event reporting: msg: {}, rid: {}", response.getErrorMsg(), response.getRequestId());
            }
        } catch (Exception e1) {
            logger.error("An exception occurs during event reporting", e1);
        }
    }

Demos for reporting exception events

  • Demo 1: monitors HTTP controller exceptions

    This demo monitors abnormal HTTP requests. An alert is generated if the number of abnormal HTTP requests in a minute exceeds the specified threshold. CloudMonitor intercepts HTTP requests by using Spring Interceptor or Servlet Filter, records abnormal HTTP requests in logs, and generates alerts based on alert rules.

    The demo code is as follows:

    // Each event must contain abundant information to help search and locate issues. This demo uses a map to store event information and converts the map to a JSON string for using it as the event content. 
    Map<String, String> eventContent = new HashMap<String, String>();
    eventContent.put("method", "GET");  // The HTTP request method.
    eventContent.put("path", "/users"); // The HTTP path.
    eventContent.put("exception", e.getClass().getName()); // The exception class name, which is used for searching.
    eventContent.put("error", e.getMessage()); // The error message.
    eventContent.put("stack_trace", ExceptionUtils.getStackTrace(e)); // The exception stack trace, which is used for troubleshooting.
    // Use the preceding asynchronous reporting method to report events. No retry policy is configured in this method. Event loss may occur at a low probability. However, this method can meet requirements for monitoring unknown HTTP exceptions.
    put("http_error", JsonUtils.toJson(eventContent));
    ![ image.png](http://ata2-img.cn-hangzhou.img-pub.aliyun-inc.com/864cf095977cf61bd340dd1461a0247c.png)
  • Demo 2: monitors background scheduled tasks and message consumption

    Like the preceding HTTP request scenario, many scenarios require the event monitoring feature. For example, you can monitor background tasks and message consumption by reporting events. This allows you to receive alerts immediately when an exception occurs.

    // Organize the information about a message consumption event.
    Map<String, String> eventContent = new HashMap<String, String>();
    eventContent.put("cid", consumerId);  // The ID of the consumer.
    eventContent.put("mid", msg.getMsgId()); // The ID of the message.
    eventContent.put("topic", msg.getTopic()); // The topic of the message.
    eventContent.put("body", body); // The message body.
    eventContent.put("reconsume_times", String.valueOf(msg.getReconsumeTimes())); // The retry times after the message fails to be consumed.
    eventContent.put("exception", e.getClass().getName()); // The exception class name.
    eventContent.put("error", e.getMessage()); // The error message.
    eventContent.put("stack_trace", ExceptionUtils.getStackTrace(e)); // The exception stack trace.
    // Report the event.
    put("metaq_error", JsonUtils.toJson(eventContent));

    View the reported events in CloudMonitor.

    Configure alert rules for message consumption exceptions.

  • Demo 3: records key events

    You can also use the custom event monitoring feature to record key events without generating alerts. This allows you to check these events in the future. For example, you can record events like key business operations, password changes, order changes, and unusual logons.