Log Service allows you to collect metric data from hosts by using Logtail. The metric data includes CPU, memory, load, disk, and network data. This topic describes how to create a Logtail configuration in the Log Service console to collect metric data from hosts.

Prerequisites

Logtail V0.16.40 or later is installed on a Linux server. For more information, see Install Logtail on a Linux server.

Limits

  • Windows servers are not supported.
  • Metric data about GPUs and hardware status cannot be collected.

Procedure

  1. Log on to the Log Service console.
  2. In the Import Data section, click Host Monitoring Data.
  3. In the Specify Logstore step, select the project and the Metricstore that you want to use. Then, click Next.
    You can also click Create Now to create a project or a Metricstore. For more information, see Create a project and Create a Metricstore.
  4. In the Create Machine Group step, create a machine group.
    • If a machine group is available, click Using Existing Machine Groups.
    • If no machine groups are available, perform the following steps to create a machine group. In this example, an Elastic Compute Service (ECS) instance is used.
      1. Select the ECS instance on which you want to install Logtail. For more information, see Install Logtail on ECS instances.
        If Logtail is installed on the ECS instance, click Complete Installation.
        Note If you want to collect data from servers in self-managed clusters or servers on third-party clouds, you must manually install Logtail. For more information, see Install Logtail on a Linux server.
      2. After Logtail is installed, click Complete Installation.
      3. Create a machine group.

        For more information about how to create a machine group, see Create an IP address-based machine group or Create a custom ID-based machine group.

  5. Select the newly created machine group and move it from the Source Server Groups section to the Applied Server Groups section. Then, click Next.
    Notice If you apply a machine group immediately after it is created, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Log Service. In this case, you can click Automatic Retry. If the issue persists, see What do I do if no heartbeat connections are detected on Logtail?
  6. In the Specify Data Source step, configure the Config Name and Plug-in Config parameters.
    inputs is required and specifies a Logtail configuration. You must configure inputs based on your data source.
    Note You can configure only one type of data source in inputs.
    {
        "inputs": [
            {
                "detail": {
                    "IntervalMs": 30000
                },
                "type": "metric_system_v2"
            }
        ]
    }
    Parameter Data type Required Description
    type string Yes The type of the data source. Set the value to metric_system_v2.
    IntervalMs int Yes The interval between two consecutive requests. Unit: milliseconds. The value must be greater than or equal to 5000. We recommend that you set the value to 30000.
  7. Click Next.

Metrics

The following tables describe metrics, including the metrics that are related to CPUs, memory, loads, disks, and networks.
  • CPU-related metrics
    Metric Description Unit Example value
    cpu_count The number of CPU cores. N/A 2.0
    cpu_util The CPU utilization. The CPU utilization equals one minus the sum of the idle, wait, and steal counters. Percent (%) 7.68
    cpu_guest_util The guest counter of Linux. This counter indicates the percentage of the time that the CPU spends on processes of the normal priority. Percent (%) 0.0
    cpu_guestnice_util The guest_nice counter of Linux. This counter indicates the percentage of the time that the CPU spends on processes of the niced priority. Percent (%) 0.0
    cpu_irq_util The irq counter of Linux. This counter indicates the percentage of the time that the CPU spends serving hardware interrupt requests. Percent (%) 0.0
    cpu_nice_util The nice counter of Linux. This counter indicates the percentage of the time that the CPU spends on user-mode processes of the niced priority. Percent (%) 0.0
    cpu_softirq_util The softirq counter of Linux. This counter indicates the percentage of the time that the CPU spends serving software interrupt requests. Percent (%) 0.06
    cpu_steal_util The steal counter of Linux. This counter indicates the percentage of the time that the CPU spends running other operating systems in a virtual environment. Percent (%) 0.0
    cpu_sys_util The system counter of Linux. This counter indicates the percentage of the time that the CPU spends on kernel-mode processes. Percent (%) 2.77
    cpu_user_util The user counter of Linux. This counter indicates the percentage of the time that the CPU spends on user-mode processes of the normal priority. Percent (%) 4.84
    cpu_wait_util The iowait counter of Linux. This counter indicates the percentage of the time that the CPU spends idling when outstanding disk I/O requests exist. Percent (%) 0.11
  • Memory-related metrics
    Metric Description Unit Example value
    mem_util The memory usage. Percent (%) 51.03
    mem_cache The amount of the memory that is allocated but unused. byte 3566386668.0
    mem_free The amount of the unused memory. byte 177350084.0
    mem_available The amount of the available memory. byte 3699885553.0
    mem_used The amount of the used memory. byte 4041510463.0
    mem_swap_util The swap usage. Percent (%) 0.0
    mem_total The size of the memory. byte 7919128576.0
  • Disk-related metrics
    Metric Description Unit Example value
    disk_rbps The amount of data that is read from the disk per second. byte/s 8376.81
    disk_wbps The amount of data that is written to the disk per second. byte/s 247633.58
    disk_riops The number of read operations completed on the disk per second. N/A 0.22
    disk_wiops The number of write operations completed on the disk per second. N/A 43.39
    disk_rlatency The average read latency. ms 2.83
    disk_wlatency The average write latency. ms 2.15
    disk_util The I/O usage of the disk. Percent (%) 0.27
    disk_space_usage The percentage of the used disk space. Percent (%) 9.12
    disk_inode_usage The percentage of the used index node (inode) space. Percent (%) 1.18
    disk_space_used The amount of the used disk space. byte 11068512238.59
    disk_space_total The total amount of the disk space. byte 126692061184.0
    disk_inode_total The total amount of the inode space. byte 7864320.0
    disk_inode_used The amount of the used inode space. byte 93054.78
  • Network-related metrics
    Metric Description Unit Example value
    net_drop_util The percentage of discarded packets to all packets. Percent (%) 0.0
    net_err_util The percentage of error packets to all packets. Percent (%) 0.0
    net_in The amount of data that is received per second. byte/s 8440.91
    net_in_pkt The number of packets that are received per second. N/A 40.83
    net_out The amount of data that is sent per second. byte/s 12446.53
    net_out_pkt The number of packets that are sent per second. N/A 39.95
  • TCP-related metrics
    Metric Description Unit Example value
    protocol_tcp_established The number of established connections. N/A 205.0
    protocol_tcp_insegs The number of received packets. N/A 4654.0
    protocol_tcp_outsegs The number of sent packets. N/A 4870.0
    protocol_tcp_retran_segs The number of re-sent packets. N/A 0.0
    protocol_tcp_retran_util The percentage of re-sent packets to sent packets. Percent (%) 0.0
  • System-related metrics
    Metric Description Unit Example value
    system_boot_time The system startup time. s 1578461935.0
    system_load1 The average system load every minute. N/A 0.58
    system_load5 The average system load every 5 minutes. N/A 0.68
    system_load15 The average system load every 15 minutes. N/A 0.60

What to do next

  • Query and analysis

    After metric data is collected, you can query and analyze the data on the query and analysis page of the Metricstore. For more information, see Query and analyze time series data.

  • Visualization on Log Service
    Log Service automatically creates a host monitoring dashboard in the project. In the dashboard, you can view query and analysis results, configure alerts, and perform other operations. Host monitoring dashboard
  • Visualization on Grafana

    Log Service provides a Grafana dashboard template for host metric data. You can view query and analysis results on a Grafana dashboard. For more information, see Use Prometheus to collect Kubernetes metric data.