This topic describes how to use the Splunk add-on for Log Service. It can be used to send log data from Log Service to Splunk.

Implementation

The following list describes how the Splunk add-on ships log data:
  • Create consumer groups by using Splunk data inputs and use the consumer groups to consume log data from Log Service in real time.
  • Splunk forwarders use the Splunk private protocol or HTTP Event Collector (HEC) to forward the log data to Splunk indexers.
Note The Splunk add-on is used only to collect log data. You must install the add-on on Splunk heavy forwarders. However, you do not need to install the add-on on Splunk indexers or search heads.
Splunk-001

Mechanism

Splunk-002
  • A data input is a consumer that consumes log data.
  • A consumer group consists of multiple consumers. Each consumer in a consumer group consumes different log entries from a Logstore.
  • Each Logstore has multiple shards.
    • Each shard can be allocated to only one consumer.
    • Each consumer can consume data from multiple shards.
  • The name of a consumer contains important information. This includes the name of the consumer group to which the consumer belongs, the hostname, the process name, and the type of protocol that is used to send Splunk events. This naming convention ensures that each consumer name in a consumer group is unique.

For more information, see Use consumer groups to consume log data.

Preparations

  • Obtain an AccessKey pair that is used to access Log Service.

    You can use the AccessKey pair of a RAM user to access a Log Service project. For more information, see AccessKey pair or Configure an AccessKey pair for a RAM user to access a source Logstore and a destination Logstore.

    You can use the permission assistant feature to grant permissions to a RAM user. For more information, see Use the permission assistant to grant permissions. The following example shows a common permission policy that is configured for a RAM user.
    Note <Project name> specifies the name of the destination project in Log Service. <Logstore name> specifies the name of the destination Logstore. Replace the values with the actual values. You can use the wildcard character (*) to specify multiple projects and Logstores.
    {
      "Version": "1",
      "Statement": [
        {
          "Action": [
            "log:ListShards",
            "log:GetCursorOrData",
            "log:GetConsumerGroupCheckPoint",
            "log:UpdateConsumerGroup",
            "log:ConsumerGroupHeartBeat",
            "log:ConsumerGroupUpdateCheckPoint",
            "log:ListConsumerGroup",
            "log:CreateConsumerGroup"
          ],
          "Resource": [
            "acs:log:*:*:project/<Project name>/logstore/<Logstore name>",
            "acs:log:*:*:project/<Project name>/logstore/<Logstore name>/*"
          ],
          "Effect": "Allow"
        }
      ]
    }
  • Check the version of Splunk and the operating system on which Splunk runs.
    • Make sure the latest version of the add-on is used.
    • Make sure that the operating system is Linux, macOS, or Windows.
    • Make sure that the version of Splunk heavy forwarders is 8.0 or later and the version of Splunk indexers is 7.0 or later.
  • Configure HEC on Splunk. For more information, see Configure HTTP Event Collector on Splunk Enterprise.
    If you use HEC to send events to Splunk indexers, make sure that HEC is configured on Splunk. If you use the Splunk private protocol to send events to Splunk indexers, skip this step.
    Note You must create one or more Event Collector tokens before you can use HEC. The indexer acknowledgment feature cannot be enabled when you create an Event Collector token.

Install the add-on

You can log on to the Splunk web interface and use one of the following methods to install the add-on.
Note The Splunk add-on is used only to collect log data. You must install the add-on on Splunk heavy forwarders. However, you do not need to install the add-on on Splunk indexers or search heads.
  • Solution 1
    1. Click the Splunk-004 icon.
    2. On the Apps page, click Find More Apps.
    3. On the Browse More Apps page, search for Alibaba Cloud Log Service Add-on for Splunk, and then click Install.
    4. After you complete the install process, restart Splunk as prompted.
  • Solution 2
    1. Click the Splunk-004 icon.
    2. On the Apps page, click Install app from file.
    3. On the Upload app page, select the destination TGZ file from your on-premises host, and then click Upload.

      You can click App Search Results and download the destination TGZ file on the Alibaba Cloud Log Service Add-on for Splunk page.

    4. Click Install.
    5. After you complete the install process, restart Splunk as prompted.

Configure the Splunk add-on

  1. On the Splunk web interface, click Alibaba Cloud Log Service Add-on for Splunk.
  2. Configure an account.
    On the page that appears, choose Configuration > Account. On the Configuration page, click the Account tab. On this tab, click Add. In the Add Account dialog box, configure an AccessKey pair that you use to access Log Service.
    Note You must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.
  3. Configure the severity level of Splunk add-on logs.
    Choose Configuration > Logging. On the Configuration page, click the Logging tab. On this tab, select a severity level from the Log level drop-down list.
  4. Create a data input.
    1. Click inputs to open the Inputs page.
    2. Click Create New Input. In the Add sls_datainput dialog box, set the parameters of the data input.
      Table 1. Parameters
      Parameter Required Description Example value
      Name Yes The unique name of the data input. Data type: string. N/A
      Interval Yes The interval that the data input restarts after exit. Unit: seconds. Data type: integer. Default value: 10.
      Indexing Yes The Splunk index. Data type: string. N/A
      SLS AccessKey Yes The Alibaba Cloud AccessKey pair that consists of an AccessKey ID and an AccessKey secret. Data type: string.
      Note You must enter an AccessKey ID in the Username field and the related AccessKey secret in the Password field.
      The AccessKey pair that you enter when you configure an account for the data input.
      SLS endpoint Yes The endpoint of Log Service. Data type: string. For more information, see Endpoints.
      • cn-huhehaote.log.aliyuncs.com
      • https://cn-huhehaote.log.aliyuncs.com
      SLS project Yes The name of a Log Service project. Data type: string. For more information, see Manage a project. N/A
      SLS logstore Yes The name of a Log Service Logstore. Data type: string. For more information, see Manage a Logstore. N/A
      SLS consumer group Yes The name of a consumer group. Data type: string. If you want to use multiple data inputs to consume data from the same Logstore, you must specify the same consumer group name for the data inputs. For more information, see Use consumer groups to consume log data. N/A
      SLS cursor start time Yes The start time of log data consumption. Data type: string. This parameter is valid only when the first consumer group is created. From the next time, data is consumed from the last checkpoint.
      Note The start time is the log receiving time.
      Valid values: begin, end, and a time in the ISO 8601 format (for example, 2018-12-26 0:0:0+8:00).
      SLS heartbeat interval Yes The heartbeat interval between the consumer and the server. Data type: integer. Unit: seconds. Default value: 60.
      SLS data fetch interval Yes The interval at which logs are pulled from Log Service. Data type: integer. If the speed at which logs are received is low, we recommend that you do not set this parameter to a small value. Unit: seconds. Default value: 1.
      Topic filter No Filters log data by topic. The semicolon (;) is used to separate multiple topics. Data type: string. If a log entry is matched, it is not sent to Splunk. TopicA;TopicB. This value indicates that log entries whose topic is TopicA or TopicB are dropped.
      Unfolded fields No Maps the fields in a log entry to the topic of the log entry in the format of {" topicA": ["field_nameA1", "field_nameA2", ...], "topicB": ["field_nameB1", "field_nameB2", ...], ...} {"actiontrail_audit_event": ["event"] }. This value indicates that the event field is mapped to the log topic actiontrail_audit_event in the JSON format.
      Event source No The source of Splunk events. Data type: string. N/A
      Event source type No The type of the Splunk event data source. Data type: string. N/A
      Event retry times No The number of retries to consume data. Data type: integer. Default value: 0. This value indicates unlimited retries.
      Event protocol Yes The protocol used to send Splunk events to a Splunk indexer. If you use the Splunk private protocol to send Splunk events, you do not need to set the following parameters in the table.
      • HTTP for HEC
      • HTTPS for HEC
      • Private protocol
      HEC host Yes The HEC host. This parameter is valid only if you use HEC to send Splunk events. Data type: string. For more information, see Set up and use HTTP Event Collector in Splunk Web. N/A
      HEC port Yes The HEC port. This parameter is valid only if you use HEC to send Splunk events. Data type: integer. N/A
      HEC token Yes The HEC token. This parameter is valid only if you use HEC to send Splunk events. Data type: string. For more information, see HEC token. N/A
      HEC timeout Yes The HEC timeout period. This parameter is valid only if you use HEC to send Splunk events. Data type: integer. Unit: seconds. Default value: 120.

Operations

  • Query data
    Make sure that the data input is in the Enabled state. On the Splunk web interface, click Search & Reporting. On the App: Search & Reporting page, query audit logs that are sent to Splunk.Splunk-003
  • Query Log Service operational logs
    • Enter index="_internal" | search "SLS info" in the search bar to query Log Service INFO logs.
    • Enter index="_internal" | search "error in the search bar to query Log Service ERROR logs.

Performance and security

  • Performance

    The performance of the add-on and data transmission bandwidth depend on the following factors:

    • Endpoint: You can access Log Service by using an endpoint of the Internet, classic network, virtual private clouds (VPCs), or the global acceleration-based Internet. In most cases, we recommend that you use a classic network endpoint or a VPC endpoint. For more information, see Endpoints.
    • Bandwidth: the bandwidth of data transmission between Log Service and Splunk heavy forwarders and between Splunk heavy forwarders and indexers.
    • Processing capability of Splunk indexers: the capabilities of indexers to receive data from Splunk heavy forwarders.
    • Number of shards: A higher number of shards in a Logstore indicates a higher data transmission capability. You must confirm the number of shards in a Logstore based on the speed at which raw logs are generated. For more information, see Manage shards.
    • Number of Splunk data inputs: A higher number of data inputs in a consumer group that is configured for a Logstore indicates a higher throughput.
      Note The number of shards in a Logstore affects the concurrent consumption of the Logstore.
    • Number of CPU cores and memory resources occupied by Splunk heavy forwarders: In most cases, one Splunk data input consumes 1 GB to 2 GB of memory resources and 1 CPU core.

    If sufficient memory and CPU resources are allocated, one Splunk data input can consume log data at a speed of 1 MB to 2 MB per second. You must confirm the number of shards based on the speed at which raw logs are generated.

    For example, if logs are received in a Logstore at a speed of 10 MB per second, you must create at least 10 shards in the Logstore and configure 10 data inputs in the Splunk add-on. If you deploy the Splunk add-on on a single server, the server must have 10 idle CPU cores and 12 GB of available memory resources.

  • High availability

    A consumer group stores checkpoints on the server. When a consumer stops consuming data, another consumer continues to consume data from the last checkpoint. You can create Splunk data inputs on multiple servers. If a server stops running or is damaged, a Splunk data input on another server continues to consume data from the last checkpoint. You can also launch more Splunk data inputs than the number of shards on multiple servers. This allows data to be consumed from the last checkpoint if an exception occurs.

  • HTTPS-based data transmission
    • Log Service

      To use HTTPS to encrypt the data transmitted between your program and Log Service, you must make sure that the endpoint is prefixed by https://, for example, https://cn-beijing.log.aliyuncs.com.

      The server certificate *.aliyuncs.com is issued by GlobalSign. By default, most Linux and Windows servers are preconfigured to trust this certificate. If the server does not trust this certificate, see Install a trusted root CA or self-signed certificate.

    • Splunk

      To use HTTPS-based HEC, you must enable the SSL feature when you enable HEC in the Global Settings dialog box. For more information, see Configure HTTP Event Collector on Splunk Enterprise.

  • AccessKey pair protection

    The AccessKey pair that you use to access Log Service and HEC tokens are encrypted and stored in Splunk.

FAQ

  • What can I do if a configuration error occurs?
    • Check the configurations of the data inputs. For information about configuration parameters, see Table 1.
    • Check the configurations of Log Service. Example error: failed to create a consumer group.
      • Command: index="_internal" | search "error"
      • Exception logs:
        aliyun.log.consumer.exceptions.ClientWorkerException: 
        error occour when create consumer group, 
        errorCode: LogStoreNotExist, 
        errorMessage: logstore xxxx does not exist
      • Check whether the number of consumer groups configured for a Logstore exceeds the quota.

        You can configure a maximum of 20 consumer groups for a Logstore. We recommend that you delete the consumer groups that are no longer required. If more than 20 consumer groups are configured for a Logstore, the ConsumerGroupQuotaExceed error is returned.

  • What can I do if a permission error occurs?
    • Check whether you are authorized to access Log Service.
      • Command: index="_internal" | search "error"
      • Exception logs:
        aliyun.log.consumer.exceptions.ClientWorkerException: 
        error occour when create consumer group, 
        errorCode: SignatureNotMatch, 
        errorMessage: signature J70VwxYH0+W/AciA4BdkuWxK6W8= not match
    • Check whether you are authorized to access HEC.
      • Command: index="_internal" | search "error"
      • Exception logs:
        ERROR HttpInputDataHandler - Failed processing http input, token name=n/a, channel=n/a, source_IP=127.0.0.1, reply=4, events_processed=0, http_input_body_size=369
        
        WARNING pid=48412 tid=ThreadPoolExecutor-0_1 file=base_modinput.py:log_warning:302 | 
        SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584945639\", \"content\": \"goroutine id [0, 1584945637]\", \"content2\": \"num[9], time[2020-03-23 14:40:37|1584945637]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584945637"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. 
        Exception: 403 Client Error: Forbidden for url: http://127.0.0.1:8088/services/collector, times: 3
      • Potential cause
        • HEC is not configured or started.
        • The HEC-related parameters of data inputs are invalid. For example, if you use HTTPS-based HEC, you must enable the SSL feature.
        • The indexer acknowledgment feature is disabled.
  • What can I do if a consumption delay occurs?

    You can view the status of a consumer group in the Log Service console. For more information, see View consumer group status.

    Increase the number of shards in the Logstore or create more data inputs in the same consumer group. For more information, see Performance and security.

  • What can I do if network jitters occur?
    • Command: index="_internal" | search "SLS info: Failed to write"
    • Exception logs:
      WARNING pid=58837 tid=ThreadPoolExecutor-0_0 file=base_modinput.py:log_warning:302 |
      SLS info: Failed to write [{"event": "{\"__topic__\": \"topic_test0\", \"__source__\": \"127.0.0.1\", \"__tag__:__client_ip__\": \"10.10.10.10\", \"__tag__:__receive_time__\": \"1584951417\", \"content2\": \"num[999], time[2020-03-23 16:16:57|1584951417]\", \"content\": \"goroutine id [0, 1584951315]\"}", "index": "main", "source": "sls log", "sourcetype": "http of hec", "time": "1584951417"}] remote Splunk server (http://127.0.0.1:8088/services/collector) using hec. 
      Exception: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer')), times: 3

    Splunk events are automatically retransmitted if network jitters occur. If the issue persists, contact your network administrator.

  • How do I modify the start time of consumption?
    Note The SLS cursor start time parameter is valid only when a consumer group is created for the first time. From the next time, data is consumed from the last checkpoint.
    1. On the Input page of the Splunk web interface, disable the related data input.
    2. Log on to the Log Service console. Find the Logstore from which data is consumed, and delete the related consumer group in the Data Consumption section.
    3. On the Input page of the Splunk web interface, find the data input, and choose Actions > Edit. In the dialog box that appears, modify the SLS cursor start time parameter. Then, restart the data input.