This topic describes how to use the Splunk add-on for Log Service to send log data from Log Service to Splunk.
Implementation
- Create consumer groups by using Splunk data inputs and use the consumer groups to consume log data from Log Service in real time.
- Splunk forwarders forward the log data to Splunk indexers by using the Splunk private protocol or HTTP Event Collector (HEC).

Mechanism

- A data input is a consumer that consumes log data.
- A consumer group consists of multiple consumers. Each consumer in a consumer group consumes different data from a Logstore.
- Each Logstore has multiple shards.
- Each shard can be allocated to only one consumer.
- Each consumer can consume data from multiple shards.
- The name of a consumer consists of the name of the consumer group to which the consumer belongs, the hostname, the process name, and the type of the protocol used to send Splunk events. This naming convention ensures that each consumer name in a consumer group is unique.
For more information, see Use consumer groups to consume log data.
Before you begin
- Obtain an AccessKey pair that is used to access Log Service.
You can use the AccessKey pair of a RAM user to access a Log Service project. For more information, see AccessKey pair or Configure an AccessKey pair for a RAM user to access a source Logstore and a destination Logstore.
You can use the permission assistant feature to grant permissions to a RAM user. For more information, see Use the permission assistant to grant permissions. The following example shows the common permission policy configured for a RAM user.Note <Project name> specifies the name of the target project in Log Service. <Logstore name> specifies the name of the target Logstore. Replace the values based on your business scenarios. You can use the wildcard character (*) to specify multiple projects and Logstores.{ "Version": "1", "Statement": [ { "Action": [ "log:ListShards", "log:GetCursorOrData", "log:GetConsumerGroupCheckPoint", "log:UpdateConsumerGroup", "log:ConsumerGroupHeartBeat", "log:ConsumerGroupUpdateCheckPoint", "log:ListConsumerGroup", "log:CreateConsumerGroup" ], "Resource": [ "acs:log:*:*:project/<Project name>/logstore/<Logstore name>", "acs:log:*:*:project/<Project name>/logstore/<Logstore name>/*" ], "Effect": "Allow" } ] }
- Check the version of Splunk and the operating system on which Splunk runs.
- Make sure the latest version of the add-on is used.
- Make sure that the operating system is Linux, macOS, or Windows.
- Make sure that the version of Splunk heavy forwarders is 8.0 or later and the version of Splunk indexers is 7.0 or later.
- Configure HEC on Splunk. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
If you use HEC to send events to Splunk indexers, make sure that HEC is configured on Splunk. If you use the Splunk private protocol to send events to Splunk indexers, skip this step.Note You must create one or more Event Collector tokens before you can use HEC. The indexer acknowledgment feature cannot be enabled when you create an Event Collector token.
Install the Splunk add-on
- Method 1
- Click the
icon.
- On the Apps page, click Find More Apps.
- On the Browse More Apps page, search for Alibaba Cloud Log Service Add-on for Splunk, and click Install.
- After the installation is complete, restart Splunk as prompted.
- Click the
- Method 2
- Click the
icon.
- On the Apps page, click Install app from file.
- On the Upload app page, select the target .tgz file from your local host, and click
Upload.
You can click App Search Results and download the target .tgz file on the Alibaba Cloud Log Service Add-on for Splunk page.
- Click Install.
- After the installation is complete, restart Splunk as prompted.
- Click the
Configure the Splunk add-on
Operations
- Query data
Make sure that the data input is in the Enabled state. On the Splunk web interface, click Search & Reporting. On the App: Search & Reporting page, query audit logs that are sent to Splunk.
- Query Log Service operational logs
- Enter
index="_internal" | search "SLS info"
in the search bar to query Log Service INFO logs. - Enter
index="_internal" | search "error
in the search bar to query Log Service ERROR logs.
- Enter
Performance and security
- Performance
The performance of the add-on and data transmission bandwidth depend on the following factors:
- Endpoint: You can access Log Service by using an endpoint of the public network, classic network, virtual private clouds (VPC), or the global acceleration-based public network. In most cases, we recommend that you use a classic network endpoint or a VPC endpoint. For more information, see Endpoints.
- Bandwidth: the bandwidth of data transmission between Log Service and Splunk heavy forwarders and between Splunk heavy forwarders and indexers.
- Processing capability of Splunk indexers: the capabilities of indexers to receive data from Splunk heavy forwarders.
- Number of shards: A higher number of shards in a Logstore indicates a higher data transmission capability. You must decide the number of shards in a Logstore based on the receiving rate of raw logs. For more information, see Manage shards.
- Number of Splunk data inputs: A higher number of data inputs in a consumer group that
is configured for a Logstore indicates a higher throughput.
Note The number of shards in a Logstore affects the concurrent consumption of the Logstore.
- Number of CPU cores and memory resources occupied by Splunk heavy forwarders: In most cases, one Splunk data input consumes 1 GB to 2 GB of memory resources and 1 CPU core.
If sufficient memory and CPU resources are allocated, one Splunk data input can consume log data at a rate of 1 MB to 2 MB per second.
For example, if logs are received in a Logstore at a rate of 10 MB per second, you must create at least 10 shards in the Logstore and configure 10 data inputs in the Splunk add-on. If you deploy the Splunk add-on on a single server, the server must have 10 idle CPU cores and 12 GB of available memory resources.
- High availability
A consumer group stores checkpoints on the server. When a consumer stops consuming data, another consumer continues to consume data from the last checkpoint. You can create Splunk data inputs on multiple servers. If a server stops running or is damaged, a Splunk data input on another server continues to consume data from the last checkpoint. You can also launch more Splunk data inputs than the number of shards on multiple servers. This allows data to be consumed from the last checkpoint if an exception occurs.
- HTTPS-based data transmission
- Log Service
To use HTTPS to encrypt the data transmitted between your program and Log Service, you must set the prefix of the endpoint to https://, for example, https://cn-beijing.log.aliyuncs.com.
The server certificate *.aliyuncs.com is issued by GlobalSign. By default, most Linux and Windows servers are preconfigured to trust this certificate. If the server does not trust this certificate, see Install a trusted root CA or self-signed certificate.
- Splunk
To use HTTPS-based HEC, you must enable the SSL feature when you enable HEC in the Global Settings dialog box. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
- Log Service
- AccessKey pair protection
The AccessKey pair that you use to access Log Service and HEC tokens are encrypted and stored in Splunk.
FAQ
- What can I do if a configuration error occurs?
- Check the configurations of the data inputs. For information about configuration parameters, see Table 1.
- Check the configurations of Log Service. Example error: failed to create a consumer
group.
- Command:
index="_internal" | search "error"
- Exception logs:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: LogStoreNotExist, errorMessage: logstore xxxx does not exist
- Check whether the number of consumer groups configured for a Logstore exceeds the
quota.
You can configure a maximum of 20 consumer groups for a Logstore. We recommend that you delete unnecessary consumer groups. If more than 20 consumer groups are configured for a Logstore, the ConsumerGroupQuotaExceed error is returned.
- Command:
- What do I do if a permission error occurs?
- Check whether you are authorized to access Log Service.
- Command:
index="_internal" | search "error"
- Exception logs:
aliyun.log.consumer.exceptions.ClientWorkerException: error occour when create consumer group, errorCode: SignatureNotMatch, errorMessage: signature J70VwxYH0+W/AciA4BdkuWxK6W8= not match
- Command:
- Check whether you are authorized to access HEC.
- Command:
index="_internal" | search "error"
- Exception logs:
ERROR HttpInputDataHandler - Failed processing http input, token name=n/a, channel=n/a, source_IP=127.0.0.1, reply=4, events_processed=0, http_input_body_size=369 WARNING pid=48412 tid=ThreadPoolExecutor-0_1 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584945639\", \"content\": \"goroutine id [0, 1584945637]\", \"content2\": \"num[9], time[2020-03-23 14:40:37|1584945637]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584945637"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: 403 Client Error: Forbidden for url: http://127.0.0.1:8088/services/collector, times: 3
- Possible causes
- HEC is not configured or started.
- The HEC-relevant parameters of data inputs are invalid. For example, if you use HTTPS-based HEC, you must enable the SSL feature.
- The indexer acknowledgment feature is disabled.
- Command:
- Check whether you are authorized to access Log Service.
- What do I do if a consumption delay occurs?
You can view the status of consumer groups in the Log Service console. For more information, see View consumer group status.
Increase the number of shards in the Logstore or create more data inputs in the same consumer group. For more information, see Performance and security.
- What do I do if network jitters occur?
- Command:
index="_internal" | search "SLS info: Failed to write"
- Exception logs:
WARNING pid=58837 tid=ThreadPoolExecutor-0_0 file=base_modinput.py:log_warning:302 | SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584951417\", \"content2\": \"num[999], time[2020-03-23 16:16:57|1584951417]\", \"content\": \"goroutine id [0, 1584951315]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584951417"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. Exception: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')), times: 3
Splunk events are automatically retransmitted if network jitters occur. If the problem persists, contact your network administrator for troubleshooting.
- Command:
- Modify the start time of data consumption
Note The SLS cursor start time parameter is valid only when you create a consumer group for the first time. From the next time, data is consumed from the last checkpoint.
- On the Input page of the Splunk Web UI, disable the target data input.
- Log on to the Log Service console. Find the Logstore from which data is consumed, and delete the consumer group under Data Consumption.
- On the Input page of the Splunk Web UI, find the target data input, and choose . In the dialog box that appears, modify the SLS cursor start time parameter. Restart the data input.