All Products
Search
Document Center

Elasticsearch:Use Monstache to synchronize data from MongoDB to Elasticsearch in real time

Last Updated:Mar 26, 2026

Monstache synchronizes data from ApsaraDB for MongoDB to Alibaba Cloud Elasticsearch in real time by tailing MongoDB oplogs. This tutorial walks you through a complete setup, using a movie dataset to demonstrate full-sync, incremental sync, and Kibana-based data analysis.

Monstache synchronizes and subscribes to data in real time based on MongoDB oplogs. It supports the change streams and aggregation pipelines of MongoDB, and enables data synchronization between MongoDB databases and later versions of Elasticsearch clusters. For more information about Monstache features, see Features.

Prerequisites

Before you begin, ensure that you have:

  • An Alibaba Cloud account with permissions to create ECS instances, ApsaraDB for MongoDB instances, and Elasticsearch clusters

  • Basic familiarity with Linux command-line operations

How it works

Monstache uses the MongoDB oplog as an event source. Every insert, update, and delete in MongoDB is recorded in the oplog; Monstache tails the oplog and propagates changes to Elasticsearch in near real time. Because the oplog is a replica set feature, your MongoDB instance must be a replica set or sharded cluster instance — standalone instances are not supported.

Step 1: Create the required resources

Create the following resources in the same virtual private cloud (VPC). Placing all three in the same VPC ensures data is transmitted over the internal network securely and at high speed.

  1. Create an Elasticsearch cluster. During creation, enable the Auto Indexing feature. An Elasticsearch V6.7 Standard Edition cluster is used in this tutorial. For details, see Create an Alibaba Cloud Elasticsearch cluster and Configure the YML file.

  2. Create an ApsaraDB for MongoDB replica set instance. An ApsaraDB for MongoDB V4.2 replica set instance is used in this tutorial. Prepare your test data after creation — the following figure shows part of the movie dataset used as an example. For details, see Quick start for replica set instances.

    Important

    The ApsaraDB for MongoDB instance must be a replica set instance or sharded cluster instance. Monstache uses the oplog as its event source, which is only available on these instance types.

    Test data

  3. Create an Elastic Compute Service (ECS) instance. The ECS instance hosts Monstache and must run Linux. For details, see Create an instance by using the wizard.

Note

You must ensure that the version of Monstache you install is compatible with your ApsaraDB for MongoDB instance and Elasticsearch cluster versions. For version compatibility information, see Monstache version.

Step 2: Install Monstache

Install Monstache on the ECS instance by building from source. Before you install Monstache, make sure that you have configured Go environment variables.

  1. Log on to the ECS instance. For details, see Connect to a Linux instance by using a password or key.

    Note

    A common (non-root) user is used in this example.

  2. Download and extract Go.

    wget https://dl.google.com/go/go1.14.4.linux-amd64.tar.gz
    tar -xzf go1.14.4.linux-amd64.tar.gz
  3. Configure Go environment variables. Open ~/.bash_profile:

    vim ~/.bash_profile

    Add the following lines. GOPROXY points to the Alibaba Cloud Go module proxy, which improves download speed.

    export GOROOT=/home/test1/go
    export GOPATH=/home/go/
    export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
    export GOPROXY=https://mirrors.aliyun.com/goproxy/

    Apply the changes:

    source ~/.bash_profile
  4. Clone the Monstache repository.

    Note

    If the error git: command not found appears, install git first: sudo yum install -y git.

    git clone https://github.com/rwynn/monstache.git
  5. Switch to the rel5 branch and install.

    cd monstache
    git checkout rel5
    sudo go install
  6. Verify the installation.

    monstache -v

    Expected output:

    5.5.5

Step 3: Configure and start data synchronization

Monstache uses TOML for configuration. In this tutorial, data is synchronized from the hotmovies and col collections in the mydb database.

  1. In the monstache directory, create a configuration file.

    vim config.toml
  2. Add the following configuration. Replace the placeholder values with your actual endpoints and credentials.

    # connection settings
    mongo-url = "mongodb://<your_mongodb_user>:<your_mongodb_password>@dds-bp1aadcc629******.mongodb.rds.aliyuncs.com:3717"
    elasticsearch-urls = ["http://es-cn-mp91kzb8m00******.elasticsearch.aliyuncs.com:9200"]
    
    # collections to sync (full-sync on startup, then tail oplogs)
    direct-read-namespaces = ["mydb.hotmovies","mydb.col"]
    
    # to use MongoDB change streams instead of oplog tailing (requires MongoDB 3.6+):
    #change-stream-namespaces = ["mydb.col"]
    
    # filter to specific collections (oplog listener only, does not trigger a full-sync):
    #namespace-regex = '^mydb\.col$'
    
    # Elasticsearch credentials
    # For production use, create a dedicated account instead of using the default elastic account.
    # Assign only the permissions the account needs. See Use the RBAC mechanism provided by
    # Elasticsearch X-Pack to implement access control.
    elasticsearch-user = "elastic"
    elasticsearch-password = "<your_es_password>"
    
    # number of concurrent Go threads pushing documents to Elasticsearch
    elasticsearch-max-conns = 4
    
    # propagate collection and database deletions to Elasticsearch
    dropped-collections = true
    dropped-databases = true
    
    # save sync progress to monstache.monstache so sync can resume after a restart
    resume = true
    resume-strategy = 0
    
    # enable debug logging (logs all requests to Elasticsearch)
    verbose = true
    
    # high availability mode: processes sharing the same cluster-name cooperate
    cluster-name = 'es-cn-mp91kzb8m00******'
    
    # index mappings: override the default database.collection index name
    [[mapping]]
    namespace = "mydb.hotmovies"
    index = "hotmovies"
    type = "movies"
    
    [[mapping]]
    namespace = "mydb.col"
    index = "mydbcol"
    type = "collection"

    Key parameters:

    ParameterDescription
    mongo-urlConnection string for the primary node of your ApsaraDB for MongoDB instance. Get it from the instance details page in the ApsaraDB for MongoDB console. Before connecting, add the ECS instance's private IP address to the MongoDB instance whitelist. See Configure a whitelist for a sharded cluster instance.
    elasticsearch-urlsInternal endpoint of your Elasticsearch cluster in the format http://<endpoint>:9200. Get it from the Basic Information page of your cluster. See View the basic information of a cluster.
    direct-read-namespacesCollections to copy from MongoDB on startup (full-sync), specified as database.collection. See direct-read-namespaces.
    change-stream-namespacesUse MongoDB change streams instead of oplog tailing. When configured, oplog tailing is disabled. Requires MongoDB 3.6+. See change-stream-namespaces.
    namespace-regexRegular expression to filter which collections Monstache listens to. This is a filter on the change event listener only — it does not trigger a full-sync.
    elasticsearch-userUsername for Elasticsearch authentication. Default is elastic.
    elasticsearch-passwordPassword for the Elasticsearch user. If forgotten, reset it. See Reset the access password for an Elasticsearch cluster.
    elasticsearch-max-connsNumber of concurrent Go threads writing to Elasticsearch. Default is 4.
    dropped-collectionsWhen true (default), deletes the mapped Elasticsearch index when a MongoDB collection is dropped.
    dropped-databasesWhen true (default), deletes mapped Elasticsearch indexes when a MongoDB database is dropped.
    resumeWhen true, saves oplog timestamps to monstache.monstache so sync can resume after a restart without data loss. Automatically set to true when cluster-name is configured. See resume.
    resume-strategyResume strategy (valid only when resume is true). 0 uses timestamps. See resume-strategy.
    verboseWhen true, enables debug logging including Elasticsearch request traces. Default is false.
    cluster-nameEnables high availability mode. Monstache processes sharing the same cluster-name coordinate with each other. See cluster-name.
    mappingOverrides the default index name (which is database.collection). See Index Mapping.
    Note

    Monstache supports many more configuration parameters. For advanced scenarios such as script-based transformation, GridFS indexing, or complex filtering, see Monstache config and Advanced.

  3. Start Monstache.

    monstache -f config.toml

    The -f flag loads the specified configuration file. Because verbose = true is set in the configuration, Monstache logs all Elasticsearch request traces.

Step 4: Verify data synchronization

Use the Data Management (DMS) console for MongoDB queries and the Kibana console for Elasticsearch queries.

Check document counts after full-sync

Run the following queries to confirm the same document count appears in both systems.

MongoDB:

db.hotmovies.find().count()

Expected output:

[
10000
]

Elasticsearch:

GET hotmovies/_count

Expected output:

{
  "count" : 10000,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  }
}

Test insert synchronization

Insert two documents in MongoDB:

db.hotmovies.insert({id: 11003,title: "Beauty",overview: "How a group of IT women with high IQ become outstanding",original_language:"cn",release_date:"2020-06-17",popularity:67.654,vote_count:65487,vote_average:9.9})
db.hotmovies.insert({id: 11004,title: "Heroic Programmers",overview: "How a group of IT men with high IQ become outstanding",original_language:"cn",release_date:"2020-06-15",popularity:77.654,vote_count:85487,vote_average:11.9})

Query Elasticsearch to confirm the documents were synced:

GET hotmovies/_search
{
  "query": {
    "bool": {
      "should": [
        {"term":{"id":"11003"}},
        null
      ]
    }
  }
}
Insert data

Test update synchronization

Update a document in MongoDB:

db.hotmovies.update({'title':'Beauty'},{$set:{'title':'Beautiful Programmers'}})

Query Elasticsearch to confirm the update:

GET hotmovies/_search
{
  "query": {
    "match": {
      "id":"11003"
    }
  }
}
Update data

Test delete synchronization

Remove the documents from MongoDB:

db.hotmovies.remove({id: 11003})
db.hotmovies.remove({id: 11004})

Query Elasticsearch to confirm the documents are gone:

GET hotmovies/_search
{
  "query": {
    "bool": {
      "should": [
        {"term":{"id":"11003"}},
        null
      ]
    }
  }
}
Remove data

Step 5: Analyze data in Kibana

Note

This tutorial uses Kibana V6.7.0. Navigation may differ in other versions.

  1. Log on to the Kibana console. For details, see Log on to the Kibana console.

  2. Create an index pattern.

    1. In the left navigation pane, click Management.

    2. In the Kibana section, click Index Patterns.

    3. Click Create index pattern.

    4. Set Index pattern and click Next step.

    5. Set Time Filter field name to I don't want to use the Time Filter.

    6. Click Create index pattern.

    Create an index pattern

  3. Create a pie chart for the top 10 popular movies.

    1. In the left navigation pane, click Visualize.

    2. Click + next to the search box.

    3. In the New Visualization dialog box, click Pie. 创建Pie图

    4. Click the hotmovies index pattern. 单击索引模式

    5. Configure the Metrics and Buckets sections as shown. Pie图配置

    6. Click the 运行图标 icon to apply the configuration. Pie图展示结果

FAQ

After enabling high availability and increasing concurrency, data loss occurs. What should I do?

Check whether the Elasticsearch cluster is healthy first. If the cluster is in an abnormal state, refer to the Elasticsearch FAQ to diagnose and resolve cluster-level issues, then lower elasticsearch-max-conns and monitor for further data loss.

If the cluster is healthy, the issue is likely in Monstache. Check the Monstache documentation for known issues and configuration guidance.