All Products
Search
Document Center

Elasticsearch:Using Indexing Service to manage data streams

Last Updated:Jun 18, 2025

By using Alibaba Cloud Elasticsearch 7.10 Enhanced Edition with Indexing Service, you can achieve cloud-hosted write acceleration and pay-as-you-go billing based on traffic (which means you don't need to reserve resources based on peak write throughput). This allows you to analyze massive time series logs at an extremely low cost. This topic describes how to implement data stream management and log scenario analysis based on the Indexing Service.

Background information

In complex business scenarios, massive servers, physical machines, Docker containers, mobile devices, and IoT sensors often contain scattered, diverse, and large-scale metrics and log data. In addition to various metrics and log data from underlying systems, there is often large-scale business data, such as user behavior and vehicle trajectories. When facing performance bottlenecks in writing massive time series data and log data, you can choose to use Alibaba Cloud Elasticsearch 7.10 Enhanced Edition Indexing Service according to your business requirements. This feature is based on read/write splitting architecture and a Serverless model with pay-as-you-go billing for writes, achieving cloud-based write hosting for Elasticsearch clusters and cost-effective goals.

In Alibaba Cloud Elasticsearch 7.10 Enhanced Edition with Indexing Service, it is recommended to use data stream management, which helps you store append-only time series data across multiple indices and provides a unique named resource for requests. You can also implement automatic unhosting based on associated index templates and Rollover policies, achieving automatic cleanup and cost optimization of cloud-hosted data. Data stream management is particularly suitable for logs, events, metrics, and other continuously generated data scenarios. In addition, you can use Index Lifecycle Management (ILM) to periodically manage backing indices, helping you reduce costs and overhead.

An Elasticsearch cluster can contain both data streams (Data Stream) and independent index (Index) objects. Except for system indices that are not hosted, all other indices have cloud hosting enabled by default. Independent indices support create, delete, update, and query operations, but you need to manually cancel cloud hosting before performing these operations. To help you better use data stream management for cloud-hosted indices, the Alibaba Cloud Elasticsearch console provides data stream management, index management, and create index template function modules, implementing one-stop management of data streams through a web interface.

Scenarios

This topic demonstrates how to write collected nginx service log data to an Alibaba Cloud Elasticsearch 7.10 Enhanced Edition instance with Indexing Service, and implement log data analysis and retrieval through data stream management and index lifecycle management.

Notes

  • Because data stream writes depend on the time field @timestamp, make sure that the written data contains the @timestamp field, otherwise errors will occur during data stream writing. If the source data does not have the @timestamp field, you can use an ingest pipeline to specify _ingest.timestamp, obtain metadata values, and introduce the @timestamp field data.

  • Indexing Service provides a Serverless protection mechanism. Before using it, see Limits to optimize configurations in advance and avoid non-compliant situations during use.

  • When the Indexing Service Enhanced Edition instance synchronizes data with the user cluster, it relies on the apack/cube/metadata/sync task (which can be obtained using the GET _cat/tasks?v command). It is not recommended to manually clear this task. If it is cleared, use the POST /_cube/meta/sync command to recover it as soon as possible, otherwise it will affect business writes.

Procedure

  1. Step 1: Create an Indexing Service instance

    Create an Alibaba Cloud Elasticsearch 7.10 Enhanced Edition instance with Indexing Service.

  2. Step 2: Create an index template

    Before using data streams, you need to create an index template to configure the structure of the data stream backing indices.

  3. Step 3: Create a data stream

    Create a data stream and write data to it.

  4. Step 4: Manage hosted indices

    Manage cloud hosting for data streams or independent indices.

  5. Step 5: View cluster information

    On the node visualization page, view the total write traffic and the total number of write-hosted indices for the current day.

  6. Step 6: Analyze logs

    In the Kibana console, view real-time log streams and real-time data metrics based on data stream management implemented with Indexing Service.

Step 1: Create an Indexing Service instance

Purchase an Enhanced Edition 7.10 version and enable the advanced enhancement feature Indexing Service. For the procedure, see Create an Alibaba Cloud Elasticsearch instance.

Note

After enabling the Indexing Service, the write Serverless module will be billed on a pay-as-you-go basis according to the actual write traffic and hosted storage space. For more information, see Alibaba Cloud ES billing.

Step 2: Create an index template

Note

If your business has frequent Put Mapping operations, to avoid consuming a large amount of computing resources and affecting the stability of the hosting service, it is recommended that you define index templates before writing data to reduce the impact of Put Mapping operations on cluster stability.

  1. Log on to the Alibaba Cloud Elasticsearch console.

  2. Navigate to the desired cluster.

    1. In the top navigation bar, select the resource group to which the cluster belongs and the region where the cluster resides.

    2. On the Elasticsearch Clusters page, find the cluster and click its ID.

  3. In the left-side navigation pane, choose Configuration And Management > Index Management Hub.

  4. Click the Index Template Management tab.

  5. Click Create Index Template.

  6. Optional:In the Create Index Template panel, configure the index lifecycle policy.

    Note

    If you do not need to manage the lifecycle policy for data stream backing indices, click Skip This Step.

    The following table describes some parameters. For parameters not mentioned, see the specific instructions on the page.

    Parameter

    Example value

    Description

    Index Lifecycle Policy

    Create a new index lifecycle policy

    • Create A New Index Lifecycle Policy: Create a new index lifecycle policy.

      Note

      In the Indexing Service architecture, custom freeze is not supported in the index lifecycle.

    • Select An Existing Index Lifecycle Policy: If there is a policy for business logic in the cluster, click the drop-down box to select it.

    Policy Name

    nginx_policy

    When creating a new index lifecycle policy, you need to enter a custom name; when selecting an existing index lifecycle policy, you need to select an existing lifecycle policy from the drop-down list.

    Duration after which hosting is canceled

    3 days

    Hosting is canceled after 3 days by default. Evaluate the time to cancel hosting based on your specific business scenario.

    Delete Time

    7 days

    Set how many days the index will be retained before it is automatically deleted.

    Sample commands for this step:

    {      
            "policy": {
                "phases": {
                    "hot": {
                        "min_age": "0s",
                        "actions": {
                            "cube_unfollow": {
                                "max_age": "3d",
                                "force_merge": true,
                                "force": false,
                                "read_only": true
                            },
                            "rollover": {
                                "max_size": "30gb",
                                "max_age": "1d",
                                "max_docs": 10000
                            },
                            "set_priority": {
                                "priority": 1000
                            }
                        }
                    },
                    "delete": {
                        "min_age": "7d",
                        "actions": {
                            "delete": {
                                "delete_searchable_snapshot": true
                            }
                        }
                    }
                }
            }
    }

    The newly created index lifecycle policy above indicates that when the hosted index meets any of the following conditions, a rollover will be triggered to generate a new backing index, and the original index will be automatically deleted after 7 days:

    • The number of documents written exceeds 1,000,000.

    • The index size reaches 30 GB.

    • The index has existed for 1 day since creation.

  7. Click Save And Next to configure the index template information.

    Parameter

    Example value

    Description

    Template Name

    nginx_telplate

    The defined template name.

    Index Pattern

    nginx-*

    Define the index pattern, using wildcard character (*) expressions to match data stream and index names. Spaces and characters \/?"<>| are not allowed.

    Create Data Stream

    Enabled

    Enable data stream mode. If not enabled, the index pattern cannot generate a data stream. For more information, see Data stream.

    Priority

    100

    Define the template priority. The higher the value, the higher the priority.

    Index Lifecycle Policy

    nginx_policy

    Only one index lifecycle policy can be referenced.

    Content Template Configuration

    Settings configuration as follows:

    {
       "index.number_of_replicas": "1",
       "index.number_of_shards": "6",
       "index.refresh_interval": "5s"
    }

    Configure index Settings, Mappings, Aliases.

    Important
    • Each document written to a data stream requires an @timestamp field. It is recommended to specify a mapping for the @timestamp field in the index template. If not specified, the field will be mapped as a date or date_nanos type field in Elasticsearch.

    • The configuration format strictly follows the Elastic official configuration.

    Sample command values for this step:

    PUT /_index_template/nginx_telplate
    {
      "index_patterns": [ "nginx-*" ],
      "data_stream": { },
      "template": {
        "settings": {
          "index.number_of_replicas": "1",
          "index.number_of_shards": "6",
          "index.refresh_interval": "5s",
          "index.lifecycle.name": "nginx_policy",
          "index.apack.cube.following_index": true
        }
      },
      "priority": 100
    }
    Important
    • When creating a template using commands, make sure to set index.apack.cube.following_index to true.

    • The index.refresh_interval parameter on the cloud-hosted cluster is already configured with optimal defaults, and manual configuration will not take effect. If you need the manual configuration of index.refresh_interval to take effect, you need to cancel the cloud hosting feature first.

  8. Click OK. The template you created will be displayed in the index template list.

Step 3: Create a data stream

  1. On the Index Management Hub page, click the Data Stream Management tab.

  2. Click Create Data Stream.

  3. In the Create Data Stream panel, click Preview Existing Index Templates, and enter a data stream name that matches the corresponding index template.

    Sample command values for this step:

    PUT /_data_stream/nginx-log
    Important
    • Before creating a data stream, there must be an index template that the data stream can match, which includes mappings and settings for configuring the backing indices of the data stream.

    • Data stream names can end with a hyphen (-) but do not support wildcard asterisks (*).

  4. Click OK. The system will automatically generate the data stream and backing index.

    After each data stream is successfully created, a backing index with a unified format is automatically generated, as follows:

    .ds-<data-stream>-<yyyy.MM.dd>-<generation>

    Parameter

    Description

    .ds

    Hidden index name identifier. The names of backing indices generated by data streams all start with .ds by default.

    <data-stream>

    Data stream name.

    <yyyy.MM.dd>

    The date when the backing index was created.

    <generation>

    Each data stream generates a six-digit cumulative integer value, starting from 000001 by default. Backing indices with larger generation values contain more new data.

  5. Write data.

    During data writing, the @timestamp field must be included, otherwise the write will fail. This scenario uses the filebeat+kafka+logstash architecture to collect logs and write them to the Elasticsearch instance. The @timestamp field is automatically generated during the collection process. Sample command:

    POST /nginx-log/_doc/
    {
      "@timestamp": "2099-03-07T11:04:05.000Z",
      "user": {
        "id": "vlb44hny"
      },
      "message": "Login attempt failed"
    }

Step 4: Manage hosted indices

  1. On the Index Management Hub page, click the Index Management tab to view indices in the cloud hosting state.

    Parameter

    Description

    View Only Hosted Indices

    The system displays all indices in the cluster by default (excluding system indices). After selecting View Only Hosted Indices, the system only displays indices that are being hosted, helping you quickly access data that is being hosted.

    Total Size Of Cloud-hosted Indices

    The total size of indices currently being hosted in the cloud for write operations at the current moment.

    Important

    The total size of cloud-hosted indices is a real-time changing value, not the historical total size of indices.

    Number Of Indices

    The total number of indices currently being hosted in the cloud for write operations at the current moment. This value is the real-time value in the current system.

    Important

    The number of indices is a real-time changing value, not the historical total number of indices.

    Write Hosting Status

    • Enabled: Cloud write hosting for this index is enabled. Enabled by default.

    • Disabled: Cloud write hosting for this index is canceled. Manual disabling is supported, but re-enabling is not supported after disabling.

    Note
    • When manually disabling cloud write hosting for an index, data will be written directly to the user cluster. Before disabling, confirm whether the index continues to have data written to it and check the user cluster load status, otherwise there may be a risk of high user cluster load.

    • The Indexing Service charges based on the total size of the hosted index and the amount of data written. For business purposes, it is recommended to use Data Stream and ILM rolling strategies to optimize cloud hosting space.
    • In the Indexing Service scenario, the index is in a hosted state and is not compatible with the shrink operation in ILM Action. It is recommended to perform shrink configuration when the index is in an unmanaged state. For detailed information, please refer to ILM-shrink.
    • During cloud write hosting of independent indices, index data is fully stored in the cloud hosting service Indexing Service, which will increase cloud hosting costs. Please evaluate whether you need to manually disable write hosting for the index based on your business usage scenario (such as whether the index still has data being written to it).

    Note

    Because the nginx-log data stream is configured with an index rollover policy, only the most recently generated backing index (in this scenario, .ds-nginx-log-2021.04.26-000004) is saved on the cloud hosting service, and old backing indices are automatically disabled from cloud hosting.

  2. Cancel index hosting.

    Independent indices or indices without rollover policies will be kept in the cloud hosting service indefinitely and need to be manually disabled. After disabling, the write hosting status of the corresponding index will be Disabled.

    Important
    • After canceling cloud hosting, the cloud Indexing Service write hosting feature cannot be re-enabled.

    • An Elasticsearch cluster can contain both data streams (Data Stream) and independent index (Index) objects. Except for system indices that are not hosted, all other indices have hosting enabled by default.

    1. In the Index Management tab, click the Write Hosting Status toggle switch that shows Enabled in the column to the right of the corresponding index.

    2. In the Cancel Hosting dialog box, click OK.

      Sample command for this step:

      POST /.ds-nginx-log-xxx/_cube/unfollow

Step 5: View cluster information

  1. Go to the node visualization page to view real-time write traffic and data volume information for Indexing Service.

  2. In the Indexing Service section, click Total Write Traffic For The Day to view the curve chart of Average Hourly Write Throughput.

    Note

    The Indexing Service total write traffic monitoring is a static trend monitoring chart displayed at non-real-time hourly intervals, with a maximum monitoring data display delay of 1 hour. For example, the total traffic written between 14:00 and 14:59 needs to wait until after 15:10 to be obtained at the 14:00 point on the monitoring page.

  3. Click View Monitoring Details to jump to Grafana monitoring for more detailed monitoring data.

  4. On the Indexing Service page, click Total Hosted Data Volume to view the Total Hosted Data Volume For The Day.

    Note

    The Indexing Service total write traffic monitoring is a static trend monitoring chart displayed at non-real-time hourly intervals, with a maximum monitoring data display delay of 1 hour. For example, the total data volume written between 14:00 and 14:59 needs to wait until after 15:10 to be obtained at the 14:00 point on the monitoring page.

Step 6: Analyze logs

  1. Log on to the Kibana console of your Elasticsearch cluster and go to the homepage of the Kibana console as prompted.

    For more information about how to log on to the Kibana console, see Log on to the Kibana console.

    Note

    In this example, an Elasticsearch V7.10.0 cluster is used. Operations on clusters of other versions may differ. The actual operations in the console prevail.

  2. Create an index pattern.

    1. Click 进入kibana in the upper-left corner.

    2. In the left-side navigation pane, choose Management > Stack Management.

    3. On the Stack Management page, in the Kibana section, click Index Patterns.

    4. Click Create index pattern.

    5. On the Create index pattern page, enter the index pattern name in the Index pattern name text box.

      创建索引模板

      Note

      Index pattern name can be specified as either a data stream name or a backing index name.

  3. Configure Settings.

    1. Click 进入kibana in the upper-left corner.

    2. In the left-side navigation pane, choose Observability > Logs.

    3. On the Logs page, click the Settings tab.

    4. In the Log indices text box, enter the data stream name.

      This topic uses the data stream name nginx-log as an example. The default configurations for other fields meet the data stream data requirements and do not need to be modified.设置settings

    5. In the lower-right corner, click Apply.

  4. Get real-time log stream data.

    1. On the Logs page, click the Stream tab.

    2. On the right side of the page, click Stream live.

    3. On the Stream tab, view the real-time data stream obtained.

      实时数据流

  5. Get real-time data metrics.

    1. Click 进入kibana in the upper-left corner.

    2. In the left-side navigation pane, choose Kibana > Discover.

    3. On the Discover page, select the corresponding index to obtain real-time data metrics for that index.

      实时数据流指标

For more Kibana log analysis features, see Kibana Guide.

FAQ

Q: Will configuring write parameters such as refresh and merge for write-hosted indices in the Indexing Service instance take effect?

A: No, they will not take effect. Write-hosted indices in the Indexing Service instance already use default write parameter configurations, and user-side configurations do not take effect. The default write parameter configurations are as follows:

"index.merge.policy.max_merged_segment" : "1024mb",
"index.refresh_interval" : "3s",
"index.translog.durability" : "async",
"index.translog.flush_threshold_size" : "2gb",
"index.translog.sync_interval" : "100s"