This topic describes how to use the Splunk add-on for Log Service. The add-on can be used to send log data from Log Service to Splunk.
Implementation
- Create consumer groups by using Splunk data inputs and use the consumer groups to consume log data from Log Service in real time.
- Splunk forwarders use the Splunk private protocol or HTTP Event Collector (HEC) to forward the log data to Splunk indexers.

How it works

- A data input is a consumer that consumes log data.
- A consumer group contains multiple consumers. Each consumer in a consumer group consumes different log entries that are stored in a Logstore.
- Each Logstore contains multiple shards.
- Each shard can be allocated to only one consumer.
- Each consumer can consume data from multiple shards.
- The name of a consumer contains important information. This includes the name of the consumer group to which the consumer belongs, the hostname, the process name, and the type of protocol that is used to send Splunk events. This naming convention ensures that each consumer name in a consumer group is unique.
For more information, see Use consumer groups to consume log data.
Preparations
- Obtain an AccessKey pair that is used to access Log Service.
You can use the AccessKey pair of a Resource Access Management (RAM) user to access a Log Service project. For more information, see AccessKey pair and Configure an AccessKey pair for a RAM user to access a source Logstore and a destination Logstore.
You can use the permission assistant feature to grant permissions to a RAM user. For more information, see Use the permission assistant to grant permissions. The following example shows a common permission policy that is configured for a RAM user.Note <Project name> specifies the name of the destination project in Log Service. <Logstore name> specifies the name of the destination Logstore. Replace the values with the actual values. You can use the wildcard character (*) to specify multiple projects and Logstores.{ "Version": "1", "Statement": [ { "Action": [ "log:ListShards", "log:GetCursorOrData", "log:GetConsumerGroupCheckPoint", "log:UpdateConsumerGroup", "log:ConsumerGroupHeartBeat", "log:ConsumerGroupUpdateCheckPoint", "log:ListConsumerGroup", "log:CreateConsumerGroup" ], "Resource": [ "acs:log:*:*:project/<Project name>/logstore/<Logstore name>", "acs:log:*:*:project/<Project name>/logstore/<Logstore name>/*" ], "Effect": "Allow" } ] }
- Check the version of Splunk and the operating system on which Splunk is run.
- Make sure the latest version of the add-on is used.
- Make sure that the operating system is Linux, macOS, or Windows.
- Make sure that the version of Splunk heavy forwarders is 8.0 or later and the version of Splunk indexers is 7.0 or later.
- Configure HEC on Splunk. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
If you use HEC to send events to Splunk indexers, make sure that HEC is configured on Splunk. If you use the Splunk private protocol to send events to Splunk indexers, skip this step.Note Before you can use HEC, create one or more Event Collector tokens. The indexer acknowledgment feature cannot be enabled when you create an Event Collector token.
Install the add-on
- Method 1
- Click the
icon.
- On the Apps page, click Find More Apps.
- On the Browse More Apps page, search for Alibaba Cloud Log Service Add-on for Splunk, and click Install.
- After the add-on is installed, restart Splunk as prompted.
- Click the
- Method 2
- Click the
icon.
- On the Apps page, click Install app from file.
- On the Upload app page, select the destination TGZ file from your local host, and
click Upload.
You can click App Search Results and download the destination TGZ file on the Alibaba Cloud Log Service Add-on for Splunk page.
- Click Install.
- After the add-on is installed, restart Splunk as prompted.
- Click the
Configure the Splunk add-on
If Splunk is not running on an Elastic Compute Service (ECS) instance, you can use an AccessKey pair of your Alibaba Cloud to access Log Service. To configure the Splunk add-on, perform the following steps:
Operations
- Query data
Make sure that the data input is in the Enabled state. On the Splunk web interface, click Search & Reporting. On the App: Search & Reporting page, query audit logs that are sent to Splunk.
- Query Log Service operational logs
- Enter
index="_internal" | search "SLS info"
in the search bar to query Log Service INFO logs. - Enter
index="_internal" | search "error"
in the search bar to query Log Service ERROR logs.
- Enter
Performance and security
- Performance
The performance of the add-on and data transmission bandwidth changes based on the following factors:
- Endpoint: You can access Log Service by using an endpoint of the Internet, classic network, virtual private clouds (VPCs), or the global acceleration-based Internet. In most cases, we recommend that you use a classic network endpoint or a VPC endpoint. For more information, see Endpoints.
- Bandwidth: the bandwidth of data transmission between Log Service and Splunk heavy forwarders and between Splunk heavy forwarders and indexers.
- Processing capability of Splunk indexers: the capabilities of indexers to receive data from Splunk heavy forwarders.
- Number of shards: A higher number of shards in a Logstore indicates a higher data transmission capability. You must check the number of shards based on the speed at which raw logs are generated. For more information, see Manage shards.
- Number of Splunk data inputs: A higher number of data inputs in a consumer group that
is configured for a Logstore indicates a higher throughput.
Note The number of shards in a Logstore affects the concurrent consumption of the Logstore.
- Number of CPU cores and memory resources occupied by Splunk heavy forwarders: In most cases, one Splunk data input consumes 1 GB to 2 GB of memory resources and 1 CPU core.
If sufficient memory and CPU resources are allocated, one Splunk data input can consume log data at a speed of 1 MB to 2 MB per second. You must check the number of shards based on the speed at which raw logs are generated.
For example, if logs are received in a Logstore at a speed of 10 MB per second, you must create at least 10 shards in the Logstore and configure 10 data inputs in the Splunk add-on. If you deploy the Splunk add-on on a single server, the server must have 10 idle CPU cores and 12 GB of available memory resources.
- High availability
A consumer group stores checkpoints on the server. When a consumer stops consuming data, another consumer continues to consume data from the last checkpoint. You can create Splunk data inputs on multiple servers. If a server stops running or is damaged, a Splunk data input on another server continues to consume data from the last checkpoint. You can also launch more Splunk data inputs than the number of shards on multiple servers. This allows data to be consumed from the last checkpoint if an exception occurs.
- HTTPS-based data transmission
- Log Service
To use HTTPS to encrypt the data that is transmitted between your program and Log Service, make sure that the endpoint is prefixed by https://, for example, https://cn-beijing.log.aliyuncs.com.
The server certificate *.aliyuncs.com is issued by GlobalSign. By default, most Linux and Windows servers are preconfigured to trust this certificate. If the server does not trust this certificate, see Install a trusted root CA or self-signed certificate.
- Splunk
To use HTTPS-based HEC, enable the SSL feature when you enable HEC in the Global Settings dialog box. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
- Log Service
- AccessKey pair protection
The AccessKey pair that you use to access Log Service and HEC tokens are encrypted and stored in Splunk.
FAQ
- What can I do if a configuration error occurs?
- Check the configurations of the data inputs. For information about configuration parameters, see Table 1.
- Check the configurations of Log Service. Example error: failed to create a consumer
group.
- Command:
index="_internal" | search "error"
- Exception logs:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: LogStoreNotExist, errorMessage: logstore xxxx does not exist
- Check whether the number of consumer groups that are configured for a Logstore exceeds
the quota.
You can configure a maximum of 20 consumer groups for a Logstore. We recommend that you delete the consumer groups that are no longer required. If more than 20 consumer groups are configured for a Logstore, the ConsumerGroupQuotaExceed error is returned.
- Command:
- What can I do if a permission error occurs?
- Check whether you are authorized to access Log Service.
- Command:
index="_internal" | search "error"
- Exception logs:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: SignatureNotMatch, errorMessage: signature J70VwxYH0+W/AciA4BdkuWxK6W8= not match
- Command:
- Check whether the RAM role of your ECS instance is authorized.
- Command:
index="_internal" | search "error"
- Exception logs:
ECS RAM Role detected in user config, but failed to get ECS RAM credentials. Please check if ECS instance and RAM role 'ECS-Role' are configured appropriately.
ECS-Role is the RAM role that you create. The ECS-Role variable is displayed as the actual value.
- Possible causes:
- Check whether the SLS AccessKey parameter of Data Input is configured as the account that has a RAM role.
- Check whether the RAM role is properly configured for the account. The Username parameter must be set to ECS_RAM_ROLE and the Password parameter must be set to the name of the RAM role.
- Check whether the RAM role is assigned to the ECS instance.
- Check whether the trusted entity type of the RAM role is set to Alibaba Cloud Service. Check whether the selected trusted service is ECS.
- Check whether the ECS instance to which the RAM role is assigned is the ECS instance on which you run Splunk.
- Command:
- Check whether you are authorized to access HEC.
- Command:
index="_internal" | search "error"
- Exception logs:
ERROR HttpInputDataHandler - Failed processing http input, token name=n/a, channel=n/a, source_IP=127.0.0.1, reply=4, events_processed=0, http_input_body_size=369 WARNING pid=48412 tid=ThreadPoolExecutor-0_1 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584945639\", \"content\": \"goroutine id [0, 1584945637]\", \"content2\": \"num[9], time[2020-03-23 14:40:37|1584945637]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584945637"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: 403 Client Error: Forbidden for url: http://127.0.0.1:8088/services/collector, times: 3
- Possible causes:
- HEC is not configured or started.
- The HEC-related parameters of data inputs are invalid. For example, if you use HTTPS-based HEC, you must enable the SSL feature.
- The indexer acknowledgment feature is disabled.
- Command:
- Check whether you are authorized to access Log Service.
- What can I do if a consumption delay occurs?
You can view the status of a consumer group in the Log Service console. For more information, see View consumer group status.
Increase the number of shards in the Logstore or create more data inputs in the same consumer group. For more information, see Performance and security.
- What can I do if network jitter occurs?
- Command:
index="_internal" | search "SLS info: Failed to write"
- Exception logs:
WARNING pid=58837 tid=ThreadPoolExecutor-0_0 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584951417\", \"content2\": \"num[999], time[2020-03-23 16:16:57|1584951417]\", \"content\": \"goroutine id [0, 1584951315]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584951417"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')), times: 3
Splunk events are automatically retransmitted if network jitter occurs. If the issue persists, contact your network administrator.
- Command:
- How do I modify the start time of consumption?
Note The SLS cursor start time parameter is valid only when a consumer group is created for the first time. For the next time, data is consumed from the last checkpoint.
- On the Input page of the Splunk web interface, disable the related data input.
- Log on to the Log Service console. Find the Logstore from which data is consumed, and delete the related consumer group in the Data Consumption section.
- On the Input page of the Splunk web interface, find the data input, and choose . In the dialog box that appears, modify the SLS cursor start time parameter. Then, restart the data input.