This topic describes how to use solr-to-es, a tool provided by a third-party community, to migrate documents from Solr nodes to an Alibaba Cloud Elasticsearch index.

Preparation

Before you migrate the data, prepare the following environment:

  • Prepare an on-premises Solr environment. In this example, Solr V5.0.0 is used. If you want to use other Solr versions, run a compatibility test first.
  • Install Python. The Python version must be V3.0 or higher. In this example, Python V3.6.2 is used.
  • Create an Alibaba Cloud Elasticsearch instance. The Elasticsearch version must be V6.x. In this example, Elasticsearch V6.3.2 is used.
    Notice The solr-to-es tool used in this example only supports Alibaba Cloud Elasticsearch V6.x. If you want to use other versions, run a compatibility test first.
  • Create an ECS instance. In this example, Alibaba Cloud Elastic Compute Service (ECS) CentOS V7.3 is used. If you want to use other operating systems or versions, run a compatibility test first.
  • Install Pysolr, the Python Solr client. The client version must be V3.3.3 or higher, but lower than V4.0.

Install solr-to-es

  1. Click the download link to download solr-to-es.
  2. Enter the directory where setup.py is stored, and run the python setup.py install command to install solr-to-es.
  3. After solr-to-es is installed, reference the following command to migrate documents.
    python __main__.py <solr_url>:8983/solr/my_core/select http://<username>:<password>@<elasticsearch_url>:9200 elasticsearch_indexdoc_type
    Table 1. Parameters
    Parameter Description
    <solr_url> The complete endpoint of your Solr cluster. Example: http://116.62. **. **.
    my_core Replace it with the name of the Solr Core that contains the documents to be migrated.
    <username> The username of your Alibaba Cloud Elasticsearch instance. The default username is elastic.
    <password> The password of your Alibaba Cloud Elasticsearch instance. Typically, the password is specified when you create the instance.
    <elasticsearch_url> The public or internal network endpoint of your Alibaba Cloud Elasticsearch instance. You can check the endpoint information on the Basic Information page of the instance.
    elasticsearch_index The Elasticsearch index to which the documents are migrated.
    doc_type The type of the index.
    Notice If you are using an environment different from the one described in this topic, you can try to run the following command to migrate documents. For more information, see solr-to-es.
    solr-to-es [-h] [--solr-query SOLR_QUERY] [--solr-fields COMMA_SEP_FIELDS]
    2                  [--rows-per-page ROWS_PER_PAGE] [--es-timeout ES_TIMEOUT]
    3                  solr_url elasticsearch_url elasticsearch_index doc_type

    If you use the preceding command in the environment described in this topic, the -bash: solr-to-es.py: command not found error is returned.

Procedure

Run the following command to query all documents in the my_core Solr Core, and write these documents into the index on the Elasticsearch instance. The name of the index is elasticsearch_index and the type of the index is doc_type.

  1. In the Solr environment, enter the solr-to-es-master/solr_to_es folder.
  2. Run the following command:
    python __main__.py 'http://116.62. **.**:8983/solr/my_core/select?q=*%3A*&wt=json&indent=true' 'http://elastic:your password@es-cn-so4lwf40ubsrf****.public.elasticsearch.aliyuncs.com:9200' elasticsearch_index doc_type
    Parameter Description
    q Defines a query that uses the standard query syntax in Solr. This parameter is required. Operators are supported. The value *%3A* represents all documents.
    wt The type of the data to be returned. Valid values include json, xml, python, ruby, and csv.
    indent Specifies whether to use indentation to make the returned content more readable. Default value: false.

    For more information about other parameters, see Table 1.

  3. Log on to the Alibaba Cloud Elasticsearch console, and then Log on to the Kibana console.
  4. On the Dev Tools page of the Kibana console, run the following command on the Console tab to verify that the elasticsearch_index index is created on the Elasticsearch instance.
    GET _cat/indices? v
  5. Run the following command to query details about the migrated documents.
    GET /elasticsearch_index/doc_type/_search
    If the request is successful, the following result is returned:
    {
      "took" : 12,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "elasticsearch_index",
            "_type" : "doc_type",
            "_id" : "Tz8WNW4BwRjcQciJ****",
            "_score" : 1.0,
            "_source" : {
              "id" : "2",
              "title" : [
                "test"
              ],
              "_version_" : 1648195017403006976
            }
          },
          {
            "_index" : "elasticsearch_index",
            "_type" : "doc_type",
            "_id" : "Tj8WNW4BwRjcQciJ****",
            "_score" : 1.0,
            "_source" : {
              "id" : "1",
              "title" : [
                "change.me"
              ],
              "_version_" : 1648195007391203328
            }
          }
        ]
      }
    }