This document describes how to collect server and process metrics from self-managed servers or servers from other cloud providers. The collected metrics are sent to Managed Service for Prometheus and integrated with the AI-powered diagnostics service of Cloud Monitor 2.0. An Alibaba Cloud ECS instance that belongs to a different Alibaba Cloud account can also be treated as a self-managed server.
Prerequisites
Applicable region: China (Hangzhou).
Operating system: Linux (supports distributions such as RHEL, CentOS, Debian, and Ubuntu).
System architecture: amd64 (x86_64).
Root permissions are required.
Support for the systemd service manager.
Public network access is required to download binary files.
Procedure
Step 1: Install LoongCollector
Download the installation package. Run the download command on your server. In the sample code, replace
${region_id}withcn-hangzhou. To speed up the package download, replace${region_id}with the ID of the region where your ECS instance is located. For more information, see Regions.wget https://aliyun-observability-release-${region_id}.oss-${region_id}.aliyuncs.com/loongcollector/linux64/latest/loongcollector.sh -O loongcollector.sh;Choose a transmission method and run the installation command. Replace
${region_id}with the ID of the region where your Project is located. For more information, see Regions.Public network: This method is suitable for most scenarios, such as cross-region deployments, servers from other cloud providers, or self-managed servers. However, this method is subject to bandwidth limitations and potential instability. For host monitoring, version 3.2.2 or later is required.
chmod +x loongcollector.sh; ./loongcollector.sh install ${region_id}-internet -v "3.2.2"Transfer acceleration: This method is used for cross-region data transfer, such as from the Chinese mainland to regions outside China. It uses Alibaba Cloud CDN to improve performance and avoid the high latency and instability associated with the public network. However, this method incurs additional traffic charges.
Before you run the installation command, enable the cross-domain log transfer acceleration feature for your Project.
chmod +x loongcollector.sh; ./loongcollector.sh install ${region_id}-acceleration
Check the service status. Run the command. If the output is
loongcollector is running, the service started successfully.sudo /etc/init.d/loongcollectord statusConfigure the user ID. The user ID file contains the ID of the Alibaba Cloud account to which the Project belongs. This ID is used to grant the account permission to access and collect logs from the server.
Log on to the Simple Log Service console. Hover over your profile picture in the upper-right corner. In the panel that appears, view and copy the account ID. Make sure to copy the ID of the root account.
On the server where LoongCollector is installed, create a user ID file named after the root account ID.
touch /etc/ilogtail/users/{Alibaba_Cloud_account_ID} # If the /etc/ilogtail/users directory does not exist, create it manually. The user ID file only needs a filename and does not require a file extension.
Configure a custom machine group ID.
On the server, write the custom string
user-defined-test-1to the custom ID file. This string will be used in Step 3: Configure data collection. A custom ID lets you configure data collection tasks in batches. We recommend that you set a unified ID for global monitoring. If you want to separate monitoring by business unit, you can set the ID to a business-specific name.# Write the custom string to the specified file. If the directory does not exist, create it manually. The file path and name are fixed by Simple Log Service and cannot be customized. echo "user-defined-test-1" > /etc/ilogtail/user_defined_id
Step 2: Install and uninstall exporters
Install the host metric generator CMS NodeExporter and the process metric generator CMS ProcessExporter.
Installation requirements:
Permission requirements: The script must run with root permissions.
Port conflicts: Before installation, the script checks if the required ports are in use. If a port is in use, the installation fails and exits.
Automatic restart: The service is configured to restart automatically. If a process exits unexpectedly, it restarts automatically.
Resource limits: By default, the service is limited to 30% CPU usage and 256 MB of memory.
Version management: Different versions are installed in separate directories. Multiple versions can coexist, but only one version can be active at a time.
Run the following command to download and install:
# Install curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/install.sh -o /tmp/install.sh && sudo bash /tmp/install.shIf you cannot run
sudo bash ...and receive an error message such asuser admin is not allowed to execute '/usr/bin/bash' as root, contact your administrator to obtain the required permissions. You can also switch to the root user and run the command directly:# Install from the public network curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/install-0.0.6.sh -o /tmp/install.sh && bash /tmp/install.sh # Install from the Alibaba Cloud internal network curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou-internal.aliyuncs.com/exporter/install.sh -o /tmp/install.sh && bash /tmp/install.sh --internal # Uninstall curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/uninstall-0.0.6.sh -o /tmp/uninstall.sh && bash /tmp/uninstall.shOptional parameters for the installation script
Parameter
Type
Functionality
--node-exporter-only
bool
Host metric generator CMS NodeExporter
--process-exporter-only
bool
Process metric generator CMS ProcessExporter
--dcgm-exporter-only
bool
GPU metric generator CMS DcgmExporter
--public
bool
Use the public network (default).
--internal
bool
Use the internal network.
Environment variable configuration
You can use environment variables to customize installation parameters:
Environment variable
Description
Default value
NODE_EXPORTER_VERSIONNodeExporter version
1.6.1PROCESS_EXPORTER_VERSIONProcessExporter version
0.1NODE_LISTEN_PORTNodeExporter listening port
19100PROCESS_LISTEN_PORTProcessExporter listening port
19256REGION_IDAlibaba Cloud region ID
cn-hangzhouNETWORK_TYPENetwork type (internal/public)
publicMEMORY_LIMITService memory limit
256MGOMAXPROCSMaximum number of CPU cores for the Go program
1COLLECT_TEXTFILE_DIRTextfile collector directory (optional)
-
Installation file paths
NodeExporter:
/usr/share/cms-integration/addons/cms-node-exporter/{version}/.ProcessExporter:
/usr/share/cms-integration/addons/cms-process-exporter/{version}/.Service files:
/etc/systemd/system/cms-node-exporter.serviceand/etc/systemd/system/cms-process-exporter.service.
Verify the installation
After the installation is successful, verify it as follows:
# Check the service status systemctl is-active cms-node-exporter systemctl is-active cms-process-exporter # Check the listening ports ss -tlnp | grep 19100 # NodeExporter ss -tlnp | grep 19256 # ProcessExporter # Access the metrics endpoints curl http://localhost:19100/metrics # NodeExporter curl http://localhost:19256/metrics # ProcessExporterService management
After installation, the service starts automatically and is configured to start on boot. Use the following commands to manage the service:
# View the service status systemctl status cms-node-exporter systemctl status cms-process-exporter # Start the service systemctl start cms-node-exporter systemctl start cms-process-exporter # Stop the service systemctl stop cms-node-exporter systemctl stop cms-process-exporter # Restart the service systemctl restart cms-node-exporter systemctl restart cms-process-exporter # View the service logs journalctl -u cms-node-exporter -f journalctl -u cms-process-exporter -fUninstall the metric generators
The uninstallation script performs the following operations:
Stops the running services.
Disables the services and prevents them from starting on boot.
Deletes the systemd service files.
Reloads the systemd daemon.
Deletes the installation directories.
Deletes the parent directory if it is empty.
# Uninstall curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/uninstall.sh -o uninstall.sh # Uninstall all exporters sudo ./uninstall.sh # Uninstall only NodeExporter sudo ./uninstall.sh --node-exporter-only # Uninstall a specific version NODE_EXPORTER_VERSION=1.6.1 sudo ./uninstall.sh --node-exporter-onlyGPU monitoring with DCGM Exporter
To install the DCGM Exporter, use the same installation script as for host monitoring and specify the required parameters. The following requirements must be met:
Only Linux amd64 hosts are supported.
Only NVIDIA GPU cards are supported. The drivers must be installed first.
# Install from the public network curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/install-0.0.6.sh -o /tmp/install.sh && bash /tmp/install.sh --dcgm-exporter-only # Install from the Alibaba Cloud internal network curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou-internal.aliyuncs.com/exporter/install-0.0.6.sh -o /tmp/install.sh && bash /tmp/install.sh --internal --dcgm-exporter-only # Uninstall curl -sSL https://cms-agent-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/exporter/uninstall-0.0.6.sh -o /tmp/uninstall.sh && bash /tmp/uninstall.sh --dcgm-exporter-only
Step 3: Configure data collection
Log on to the Cloud Monitor 2.0 console. Go to or create a Workspace in the China (Hangzhou) region. In the navigation bar, click Access Center, and then select Host Monitoring (Non-ECS Hosts).
A machine on which LoongCollector is successfully installed generates infra.server entity data. In the Host Monitoring (Non-ECS Hosts) component integration process, after you select or create an integration policy, select the scope of entities for integration. You can use one of the following two methods:
Use tags for service discovery. The following two types of tags are supported for self-managed hosts:
Key:
loongcollector_version, Value: The version number of LoongCollector, such as 3.2.2.Key:
user_defined_id, Value: The custom ID configured on the machine.
Directly select target machines by searching for and selecting their hostnames or IP addresses.
Configure data collection by completing the form as instructed on the page. Host monitoring and process monitoring are enabled by default. DCGM monitoring is disabled by default but can be enabled as needed. By default, the collection ports are the same as those specified in the installation script. If you changed the default ports, you must also update them here.
You can inject additional metric labels. Use the key-value pattern to specify labels that you want to permanently add to the metrics.
After the integration is complete, the Integration Center automatically maintains the list of LoongCollector machine groups.
Step 4: View data
Log on to the Cloud Monitor 2.0 console and go to the Workspace that you configured in Step 3: Configure data collection.
In the navigation bar, click Entity Explorer. Under Other Entities, select Infrastructure: Host to view the hosts that are integrated with the current Workspace. Click any entity to view its data on the associated dashboard.
In the Access Center, click Integration Management to view information about the integration policy. On the integration policy page, you can find the associated Prometheus data source. Click the data source to go to the Prometheus application, where you can directly query metric data.
Troubleshooting
Exporter installation fails due to a port conflict
If the installation fails because a port is in use:
Check which process is using the port:
lsof -i :19100 lsof -i :19256Use a different port:
NODE_LISTEN_PORT=19101 sudo ./install.sh
Exporter installation fails due to a download error
If the download fails:
Check your network connectivity.
Confirm that the region ID is correct.
Try switching the network type (internal/public).
Check your firewall settings.
Exporter service fails to start
If the service fails to start, you can perform the following checks:
View the service logs:
journalctl -u cms-node-exporter -n 50 journalctl -u cms-process-exporter -n 50Check the permissions of the binary files:
ls -l /usr/share/cms-integration/addons/cms-node-exporter/*/bin/node_exporter ls -l /usr/share/cms-integration/addons/cms-process-exporter/*/bin/process-exporterManual test run:
/usr/share/cms-integration/addons/cms-node-exporter/*/bin/node_exporter --version /usr/share/cms-integration/addons/cms-process-exporter/*/bin/process-exporter --version