Elasticsearch is a Lucene-based, distributed, real-time search and analytics engine. It is an open source product released under the Apache License. Elasticsearch is the mainstream search engine for enterprises. It provides distributed systems for you to store, query, and analyze large amounts of datasets in near-real time. Elasticsearch is used as a basic engine or technology to support complex queries and high-performance applications.

Data Lake Analytics (DLA) allows you to access Elasticsearch and use standard SQL statements to read data from Elasticsearch. This way, you can perform joint queries on data in Elasticsearch and from other data sources.

Note DLA can be used to analyze data in Elasticsearch in some regions. You can contact DLA technical support through DingTalk to check whether your region supports this function.

Before you use DLA to read data from or write data to Elasticsearch, you must prepare test data in Elasticsearch by performing the following steps:

  1. Create an Elasticsearch cluster.

    Note
    • The DLA instance and Elasticsearch cluster must be in the same region. Otherwise, the operations in this topic are not allowed.
    • DLA only supports Elasticsearch V6.3 or later.
  2. Set IP address whitelists.

  3. Connect to the Elasticsearch cluster.

  4. Create an index.

    ​​​​​​​​​PUT /product_info
    {
      "settings": {
        "number_of_shards": 5,
        "number_of_replicas": 1
      },
      "mappings": {
        "products": {
          "properties": {
            "productName": {"type": "text","analyzer": "ik_smart"},
            "annual_rate":{"type":"keyword"},
            "describe": {"type": "text","analyzer": "ik_smart"}
          }
        }
      }
    }​
  5. Write data.
    POST /product_info/products/_bulk
    {"index":{}}
    {"productName":"Wealth management product A","annual_rate":"3.2200%","describe":"180-day wealth management product. Minimum investment of USD 20,000. Low risk and stable returns. You can choose whether to receive push messages when returns are credited to your account."}
    {"index":{}}
    {"productName":"Wealth management product B","annual_rate":"3.1100%","describe":"90-day wealth management product. Minimum investment of USD 10,000. Daily message push at scheduled time when returns are credited to your account."}
    {"index":{}}
    {"productName":"Wealth management product C","annual_rate":"3.3500%","describe":"270-day wealth management product. Minimum investment of USD 40,000. Immediate message push on a daily basis when returns are credited to your account."}
    {"index":{}}
    {"productName":"Wealth management product D","annual_rate":"3.1200%","describe":"90-day wealth management product. Minimum investment of USD 12,000. Daily message push when returns are credited to your account."}
    {"index":{}}
    {"productName":"Wealth management product E","annual rate":"3.0100%","describe":"30-day wealth management product. Minimum investment of USD 8,000. Daily message push when returns are credited to your account."}
    {"index":{}}
    {"productName":"Wealth management product F","annual_rate":"2.7500%","describe":"3-day popular wealth management product. No service fees. Minimum investment of USD 500. Notification through text messages when returns are credited to your account."}

Procedure

  1. Connect DLA to Elasticsearch.

  2. Execute the following SQL statement to create an Elasticsearch schema in DLA:

    ​ CREATE DATABASE `my_es_db`
    WITH DBPROPERTIES (
     catalog = 'elasticsearch',
     location = 'http://****.elasticsearch.aliyuncs.com:9200',
     user = '******',
     password = '******',
     VPC_ID = 'vpc-*****',
     INSTANCE_ID = 'es-*****'
    )​

    Parameters:

    • catalog: the schema you want to create. The value elasticsearch indicates that the Elasticsearch schema is created.
    • location: the endpoint of Elasticsearch, which consists of the protocol, domain name, and port number.
    • user: the username used to access Elasticsearch.
    • password: the password used to access Elasticsearch.
    • VPC_ID: the ID of the VPC where the Elasticsearch cluster resides.
    • INSTANCE_ID: the ID of the Elasticsearch cluster.
  3. Execute the following SQL statement to create the product_info table in the my_es_db schema:

    CREATE EXTERNAL TABLE `product_info` (
     `productname` string NULL COMMENT '',
     `annual_rate` string NULL COMMENT '',
     `describe` string NULL COMMENT ''
     )
     TBLPROPERTIES (
     TABLE_MAPPING = 'product_info',
     COLUMN_MAPPING = 'productname,productName; '
     );​

    The value of productName contains uppercase letters, and column names in DLA are not case-sensitive. You must add mappings to COLUMN_MAPPING.

    After the table is created, you can execute the SELECT statement to read data from Elasticsearch.

     select * from product_info;