通过Splunk Add-on将日志投递到Splunk - Simple Log Service

This topic describes how to ship logs from Simple Log Service to Splunk by using Alibaba Cloud Log Service Add-on for Splunk.

Architecture

The following list describes how to ship logs by using the add-on.

Create consumer groups by using Splunk data inputs and use the consumer groups to consume logs in Simple Log Service in real time.
Splunk heavy forwarders use the Splunk private protocol or HTTP Event Collector (HEC) to forward the logs to Splunk indexers.

Note

The add-on is used to only collect data. You must install the add-on on Splunk heavy forwarders. You do not need to install the add-on on Splunk indexers or search heads.

Terms

A data input is a consumer that consumes logs.
A consumer group contains multiple consumers. Each consumer in a consumer group consumes different logs that are stored in a Logstore.
A Logstore contains multiple shards.
- Each shard can be allocated to only one consumer.
- A consumer can consume data from multiple shards.
The name of a consumer contains the name of the consumer group to which the consumer belongs, hostname, process name, and protocol that is used to send Splunk events. This naming convention ensures that the name of each consumer in a consumer group is unique.

For more information about consumer groups, see Use consumer groups to consume data.

Preparations

Obtain an AccessKey pair that is used to access Simple Log Service.
You can use the AccessKey pair of a Resource Access Management (RAM) user to access a Simple Log Service project. For more information, see AccessKey pair and Access data by using AccessKey pairs.
You can use the permission assistant feature to grant permissions to a RAM user. For more information, see Configure the permission assistant feature. The following example shows a common policy.
Note
<Project name> specifies the name of your Simple Log Service project. <Logstore name> specifies the name of your Simple Log Service Logstore. Replace the values with the actual values. You can use wildcard characters to specify the names, and the asterisk (*) is supported.
```
{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "log:ListShards",
        "log:GetCursorOrData",
        "log:GetConsumerGroupCheckPoint",
        "log:UpdateConsumerGroup",
        "log:ConsumerGroupHeartBeat",
        "log:ConsumerGroupUpdateCheckPoint",
        "log:ListConsumerGroup",
        "log:CreateConsumerGroup"
      ],
      "Resource": [
        "acs:log:*:*:project/<Project name>/logstore/<Logstore name>",
        "acs:log:*:*:project/<Project name>/logstore/<Logstore name>/*"
      ],
      "Effect": "Allow"
    }
  ]
}
```
Check the version of Splunk and the operating system on which Splunk is run.
- Make sure that the latest version of the add-on is used.
- Make sure that the operating system is Linux, macOS, or Windows.
- Make sure that the version of Splunk heavy forwarders is 8.0 or later and the version of Splunk indexers is 7.0 or later.
Configure HEC on Splunk. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
If you use HEC to send events to Splunk indexers, make sure that HEC is configured on Splunk. If you use the Splunk private protocol to send events to Splunk indexers, skip this step.
Note
When you create an HEC token, do not enable the indexer acknowledgment feature for the token.

Install the add-on

You can log on to the Splunk web interface and install the add-on. To install the add-on, we recommend that you use one of the following methods.

Note

The add-on is used to only collect data. You must install the add-on on Splunk heavy forwarders. You do not need to install the add-on on Splunk indexers or search heads.

Method 1
1. Click the icon.
2. On the Apps page, click Find More Apps.
3. On the Browse More Apps page, search for Alibaba Cloud Log Service Add-on for Splunk and click Install.
4. After the add-on is installed, restart Splunk as prompted.
Method 2
1. Click the icon.
2. On the Apps page, click Install app from file.
3. On the Upload app page, select the TGZ file that you want to upload and click Upload.
  You can click App Search Results and download the required TGZ file.
4. Click Install.
5. After the add-on is installed, restart Splunk as prompted.

Configure the add-on

If Splunk is not running on an Elastic Compute Service (ECS) instance, you can use an AccessKey pair of your Alibaba Cloud account to access Simple Log Service. Procedure:

On the Splunk web interface, click Alibaba Cloud Log Service Add-on for Splunk.
Configure a global account.
On the page that appears, choose Configuration > Account. On the Configuration page, click the Account tab. On this tab, click Add. In the Add Account dialog box, specify an AccessKey pair that can be used to access Simple Log Service.
Note
You must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.
Specify the level of add-on logs.
Choose Configuration > Logging. On the Configuration page, click the Logging tab. On this tab, select a level from the Log level drop-down list.

Create a data input.

Click inputs to open the inputs page.

Click Create New Input. In the Add sls_datainput dialog box, configure the parameters of a data input.

Table 1. Parameters of a data input
Parameter	Required and data type	Description	Example
Name	Yes, string	The name of the data input. The name must be globally unique.	None.
Interval	Yes, integer	The period of time that is required to restart the Splunk data input process after the process stops. Unit: seconds.	Default value: 10.
Index	Yes, string	The Splunk index.	None.
SLS AccessKey	Yes, string	The Alibaba Cloud AccessKey pair that consists of an AccessKey ID and an AccessKey secret. Note You must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.	The AccessKey pair that you enter when you configure the global account.
SLS endpoint	Yes, string	The Simple Log Service endpoint. For more information, see Endpoints.	cn-huhehaote.log.aliyuncs.com https://cn-huhehaote.log.aliyuncs.com
SLS project	Yes, string	The name of the Simple Log Service project. For more information, see Manage a project.	None.
SLS logstore	Yes, string	The name of the Simple Log Service Logstore. For more information, see Manage a Logstore.	None.
SLS consumer group	Yes, string	The name of the Simple Log Service consumer group. If you want to use multiple data inputs to consume data that is stored in the same Logstore, you must specify the same consumer group name for the data inputs. For more information, see Use consumer groups to consume data.	None.
SLS cursor start time	Yes, string	The start time of log data consumption. This parameter is valid only when you use a new consumer group. If you use an existing consumer group, data is consumed from the last checkpoint. Note The start time is the log receiving time.	Valid values: begin, end, and a time value in the ISO 8601 format. Example: 2018-12-26 0:0:0+8:00.
SLS heartbeat interval	Yes, integer	The interval at which a heartbeat message is sent between the consumer and the server. Unit: seconds.	Default value: 60.
SLS data fetch interval	Yes, integer	The interval at which logs are pulled from Simple Log Service. If logs are generated at a low frequency, we recommend that you do not set this parameter to a small value. Unit: seconds.	Default value: 1.
Topic filter	No, string	The string that is used to filter logs by topic. Separate multiple topics with semicolons (;). If the topic of a log is matched, the log is ignored and not sent to Splunk.	TopicA;TopicB. The value specifies that logs whose topic is TopicA or TopicB are ignored.
Unfolded fields	No, JSON	The mapping relationship between the topic of a log in the JSON format and a list of fields. {"topicA": ["field_nameA1", "field_nameA2", ...], "topicB": ["field_nameB1", "field_nameB2", ...], ...}	{"actiontrail_audit_event": ["event"] }. The value specifies that in a log whose topic is actiontrail_audit_event, the JSON strings of the specified fields are expanded and stored in the event field.
Event source	No, string	The source of Splunk events.	None.
Event source type	No, string	The type of the source for Splunk events.	None.
Event retry times	No, integer	The number of retries to consume log data.	Default value: 0, which specifies unlimited retries.
Event protocol	Yes, string	The protocol that is used to send Splunk events. If you use the Splunk private protocol to send Splunk events, you do not need to configure the following parameters in the table.	HTTP for HEC HTTPS for HEC Private protocol
HEC host	Yes, string	The HEC host. This parameter is valid only if you use HEC to send Splunk events. For more information, see Set up and use HTTP Event Collector in Splunk Web.	None.
HEC port	Yes, integer	The HEC port. This parameter is valid only if you use HEC to send Splunk events.	None.
HEC token	Yes, string	The HEC token. This parameter is valid only if you use HEC to send Splunk events. For more information, see HEC token.	None.
HEC timeout	Yes, integer	The HEC timeout period. This parameter is valid only if you use HEC to send Splunk events. Unit: seconds.	Default value: 120.

If Splunk is running on an ECS instance, you can attach a RAM role to the ECS instance. Then, the ECS instance can assume the RAM role to access Simple Log Service. Procedure:

Important

Make sure that Splunk is running on the ECS instance to which the required RAM role is attached.

Attach a RAM role to an ECS instance.
In this step, create a RAM role, grant permissions to the RAM role, and then attach the RAM role to an ECS instance. For more information, see Configure an instance RAM role.
For more information about the policy of the RAM role, see the policy in Preparations.
On the Splunk web interface, click Alibaba Cloud Log Service Add-on for Splunk.
Configure a global account.
On the page that appears, choose Configuration > Account. On the Configuration page, click the Account tab. On this tab, click Add. In the Add Account dialog box, specify the RAM role that is attached to the ECS instance. In this example, enter ECS_RAM_ROLE in the Username field and enter the name of the RAM role that is created in Step 1 in the Password field.
Create a data input.
1. Click inputs to open the inputs page.
2. Click Create New Input. In the Add sls_datainput dialog box, configure the parameters of a data input.
  You must set the SLS AccessKey parameter to the global account that is created in Step 3. For more information about other parameters, see Parameters of a data input.

Related operations

Query data
Make sure that the data input is in the Enabled state. On the Splunk web interface, click Search & Reporting. On the App: Search & Reporting page, query the collected audit logs.
Query Simple Log Service operational logs
- Enter index="_internal" | search "SLS info" in the search box to query the info logs of Simple Log Service.
- Enter index="_internal" | search "error" in the search box to query the error logs of Simple Log Service.

Performance and security

Performance
The performance of the add-on and data transmission throughput vary based on the following factors:
- Endpoint: You can access Simple Log Service by using a public, classic network, virtual private cloud (VPC), or global acceleration endpoint. In most cases, we recommend that you use a classic network endpoint or a VPC endpoint. For more information, see Endpoints.
- Bandwidth: The bandwidth of data transmission between Simple Log Service and Splunk heavy forwarders and between Splunk heavy forwarders and indexers affects the performance.
- Processing capability of Splunk indexers: The capabilities of indexers to receive data from Splunk heavy forwarders affect the performance.
- Number of shards: A larger number of shards in a Logstore indicates a higher data transmission capability. You must specify the number of shards based on the speed at which raw logs are generated. For more information, see Manage shards.
- Number of Splunk data inputs: A larger number of data inputs in a consumer group that is configured for a Logstore indicates a higher throughput.
  Note
  The concurrent consumption of log data varies based on the number of shards in a Logstore.
- Number of CPU cores and memory resources occupied by Splunk heavy forwarders: In most cases, one Splunk data input consumes 1 GB to 2 GB of memory resources and 1 CPU core.
If sufficient memory and CPU resources are allocated, one Splunk data input can consume log data at a speed of 1 MB to 2 MB per second. You must specify the number of shards based on the speed at which raw logs are generated.
For example, if logs are written to a Logstore at a speed of 10 MB per second, you must create at least 10 shards in the Logstore and configure 10 data inputs in the add-on. If you deploy the add-on on a single server, make sure that the server has 10 idle CPU cores and 12 GB of available memory resources.
High availability
A consumer group stores checkpoints on the server side. When a consumer stops consuming data, another consumer continues to consume data from the last checkpoint. You can create Splunk data inputs on multiple servers. If a server stops running or is damaged, a Splunk data input on another server continues to consume data from the last checkpoint. In theory, the number of Splunk data inputs launched on multiple servers can be greater than the number of shards, which ensures that data is consumed from the last checkpoint if an exception occurs.
HTTPS-based data transmission
- Simple Log Service
  To use HTTPS to encrypt the data that is transmitted between your program and Simple Log Service, make sure that your endpoint is prefixed with https://. Example: https://cn-beijing.log.aliyuncs.com.
  The *.aliyuncs.com server certificate is issued by GlobalSign. By default, most Linux and Windows servers are preconfigured to trust this certificate. If a server does not trust this certificate, see Install a trusted root CA or self-signed certificate.
- Splunk
  To use HTTPS-based HEC, enable the SSL feature when you enable HEC in the Global Settings dialog box. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
AccessKey pair storage protection
The AccessKey pair that you use to access Simple Log Service and HEC tokens are encrypted and stored in Splunk. This prevents data leaks.

FAQ

What do I do if a configuration error occurs?
- A configuration error may occur in a data input when you create or modify the data input. In this case, check the basic configuration of the data input. For more information about parameters, see Parameters of a data input.
- A configuration error may occur in Simple Log Service. For example, the system failed to create a consumer group. In this case, check the configuration of Simple Log Service.
  - Command: index="_internal" | search "error"
  - Error log:
```
aliyun.log.consumer.exceptions.ClientWorkerException: 
error occour when create consumer group, 
errorCode: LogStoreNotExist, 
errorMessage: logstore xxxx does not exist
```
  - ConsumerGroupQuotaExceed error
    You can configure up to 20 consumer groups for a Logstore. We recommend that you view consumer groups in the Simple Log Service console and delete the consumer groups that are no longer required. If more than 20 consumer groups are configured for a Logstore, the ConsumerGroupQuotaExceed error is reported.
What do I do if a permission error occurs?
- You are not authorized to access Simple Log Service. Check your permissions.
  - Command: index="_internal" | search "error"
  - Error log:
```
aliyun.log.consumer.exceptions.ClientWorkerException: 
error occour when create consumer group, 
errorCode: SignatureNotMatch, 
errorMessage: signature J70VwxYH0+W/AciA4BdkuWxK6W8= not match
```
- RAM authentication for your ECS instance failed.
  - Command: index="_internal" | search "error"
  - Error log:
```
ECS RAM Role detected in user config, but failed to get ECS RAM credentials. Please check if ECS instance and RAM role 'ECS-Role' are configured appropriately.
```
    ECS-Role is the RAM role that you create. The ECS-Role variable is displayed as the actual value.
  - Troubleshooting:
    - Check whether the SLS AccessKey parameter of your data input is configured as the global account that has a RAM role.
    - Check whether the RAM role is properly configured for the global account. The username must be set to ECS_RAM_ROLE, and the password must be set to the name of the RAM role.
    - Check whether the RAM role is attached to the ECS instance.
    - Check whether the trusted entity type of the RAM role is set to Alibaba Cloud Service. Check whether the selected trusted service is ECS.
    - Check whether the ECS instance to which the RAM role is attached is the ECS instance on which Splunk is running.
- You are not authorized to access HEC.
  - Command: index="_internal" | search "error"
  - Error log:
```
ERROR HttpInputDataHandler - Failed processing http input, token name=n/a, channel=n/a, source_IP=127.0.0.1, reply=4, events_processed=0, http_input_body_size=369

WARNING pid=48412 tid=ThreadPoolExecutor-0_1 file=base_modinput.py:log_warning:302 | 
SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584945639\", \"content\": \"goroutine id [0, 1584945637]\", \"content2\": \"num[9], time[2020-03-23 14:40:37|1584945637]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584945637"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. 
Exception: 403 Client Error: Forbidden for url: http://127.0.0.1:8088/services/collector, times: 3
```
  - Possible causes:
    - HEC is not configured or started.
    - The HEC-related parameters of data inputs are invalid. For example, if you use HTTPS-based HEC, you must enable the SSL feature.
    - The indexer acknowledgment feature is disabled for your HEC tokens.
What do I do if a consumption latency exists?
You can check the status of your consumer group in the Simple Log Service console. For more information, see Step 2: View the status of a consumer group.
We recommend that you increase the number of shards or create more data inputs in the same consumer group. For more information, see Performance and security.

What do I do if network jitter occurs?

Command: index="_internal" | search "SLS info: Failed to write"

Error log:

WARNING pid=58837 tid=ThreadPoolExecutor-0_0 file=base_modinput.py:log_warning:302 |
SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584951417\", \"content2\": \"num[999], time[2020-03-23 16:16:57|1584951417]\", \"content\": \"goroutine id [0, 1584951315]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584951417"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. 
Exception: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')), times: 3

If network jitter occurs, Splunk events are automatically retransmitted. If the issue persists, contact your network administrator.

How do I change the start time of consumption?
Note
The SLS cursor start time parameter is valid only when you use a new consumer group. If you use an existing consumer group, data is consumed from the last checkpoint.
1. On the inputs page of the Splunk web interface, disable the related data input.
2. Log on to the Simple Log Service console. Find the Logstore from which data is consumed and delete the related consumer group in the Data Consumption section.
3. On the inputs page of the Splunk web interface, find the data input, and choose Actions > Edit. In the dialog box that appears, modify the SLS cursor start time parameter. Then, enable the data input.