Enable log collection and alarm on control surface to improve system stability

Background

The control surface component of the service grid plays an important role in pushing the grid rules to the Sidecar agent or gateway on the data side. If there is some conflict in the grid rule content configured by the user, the push fails, so the agent or gateway cannot receive the latest configuration content. Because the agent or gateway can still use the received configuration to continue running without restarting, but once these Pods are restarted, it is likely that the Sidecar agent or gateway will fail to start. In many actual customer scenarios, gateway or agent unavailability problems caused by user misconfiguration often occur. Therefore, it is imperative to enable the log alarm on the control surface to find and solve problems in time.

ASM supports the collection of control plane logs and log alarms, such as the collection of logs related to the configuration pushed by the ASM control plane to the data plane Sidecar. This paper describes how to enable log collection and log alarm on the control plane, and introduces alarm notification through an example, and finally gives the corresponding alarm processing reference scheme.

Enable control surface log collection

1. Log in to the ASM console.

2. In the left navigation bar, select Service Grid>Grid Management.

3. On the grid management page, find the instance to be configured, click the name of the instance or click Manage in the operation column.

4. Select Grid Instance>Basic Information in the left navigation bar of the grid details page.

5. On the basic information page, click Start on the right side of the control panel log collection.

6. If it is enabled for the first time, click Enable. In the dialog box of enabling control surface log, you can select New Project or Use Existing Project, and then click OK. If you choose to create a new project, you can use the default project name or a custom project name. On the grid information page, click View Log on the right side of the control plane log collection, and then you can view the detailed control plane log on the Project page.

7. If the project has been opened and disabled before, click Open Confirm again to automatically select the project that has been specified last time.

Enable control plane log alarm

When the xDS request sent by the control plane to the data plane is rejected by the data plane, the data plane synchronization failure alarm will be triggered. At this time, the Sidecar agent or ASM gateway of your data plane will not be able to get the latest configuration information. There will be two situations:

Note that the control plane log collection must be enabled before the control plane log alarm is enabled, otherwise this function cannot be used.

• If the data plane Sidecar has received a successful configuration push before, the Sidecar will maintain the configuration of the last successful push received.

• If the data plane Sidecar has not received a successful configuration push before, the Sidecar will not have any configuration information, which means that the node may not have any listening, nor can it handle any requests and routing rules.

1. Log in to the ASM console.

2. In the left navigation bar, select Service Grid>Grid Management.

3. On the grid management page, find the instance to be configured, click the name of the instance or click Manage in the operation column.

4. Select Grid Instance>Basic Information in the left navigation bar of the grid details page.

5. On the basic information page, click the alarm setting on the right side of the control panel log collection.

6. In the control panel log alarm setting dialog box, select action strategy - service grid ASM built-in action strategy (recommended) or other customized action strategies, and then click Enable Alarm. Action policies define the behavior when an alarm is triggered. You can create and edit action policies in SLS Project. Please refer to Create Action Policy for details.

7. Click OK in the important prompt dialog box.

Configure alarm notifier

The alarm management center is a business-based unified intelligent alarm operation and maintenance platform under SLS. You can find "SLS service gateway built-in action policy" in Global Configuration ->Notification Policy ->Action Policy, and click Modify to view its alarm notification recipient, notification template, etc.

1. Configure an alarm notifier on the SLS console. On the SLS console home page, find the log application at the top of the page, click "View more log applications", and select and click "Alarm Management Center" in the pop-up page.

2. On the top right of the page, click Global Configuration.

3. Find User Management ->User Group Management in the left menu, click the Modify button on the right, and add corresponding contacts in the built-in user group of the SLS service grid by modifying to receive the notification after the alarm is generated.

4. Confirm that the corresponding alarm notifier is included in the list of added members.

Example of triggering alarm notification

1. Log in to the ASM console.

2. In the left navigation bar, select Service Grid>Grid Management.

3. On the grid management page, find the instance to be configured, click the name of the instance or click Manage in the operation column.

4. Select Traffic Management Center>Gateway Rules in the left navigation bar of the grid details page, and then click Create with YAML on the right page.

5. Follow the steps below to define the service gateway, and then click Create.

A. Select the corresponding namespace. This article takes the default namespace as an example.

B. In the text box, define the service gateway. Refer to the following YAML definitions:

6. Then select Grid Instance>Basic Information in the left navigation bar of the grid details page.

7. On the basic information page, click View Log on the right side of the control panel log collection.

8. In the log service console, search for 'ACK ERROR' to view similar log contents.

9. If the email address of the alarm notifier is correctly configured, you can receive the following similar email information:

Or configure the nail robot to receive the alarm information of the nail group, similar to the following:

Reference scheme for alarm processing

The following lists common data plane synchronization failure error messages and handling suggestions. If you do not find the corresponding error information in the table below, it is recommended that you submit the work order.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us