Data Lake Analytics (DLA) can access Elasticsearch and use standard SQL statements to read data from Elasticsearch. This way, you can perform joint queries on data in Elasticsearch and data from other data sources. This topic describes how to use DLA to read data from and write data to Elasticsearch.

Prerequisites

Before you use DLA to read data from or write data to Elasticsearch, make sure that the following prerequisites are met:
  1. Create an Alibaba Cloud Elasticsearch cluster.
    Note
    • The DLA instance and Elasticsearch cluster are deployed in the same region. If they are not deployed in the same region, the operations in this topic cannot be performed.
    • DLA supports only Elasticsearch V6.3 and later.
  2. Configure a whitelist to access an Elasticsearch cluster over the Internet or a VPC.
  3. Access an Elasticsearch cluster.
  4. Create an index.
  5. Create a document and insert data.

Background information

Elasticsearch is a Lucene-based, distributed, real-time search and analytics engine. It is an open source service released under the Apache License. Elasticsearch is the mainstream search engine for enterprises. It provides a distributed system for you to store, query, and analyze large amounts of datasets in near-real time. Elasticsearch is used as a basic engine or technology to support complex queries and high-performance applications.

Procedure

  1. Create an Elasticsearch schema.
    1. Log on to the DLA console.
    2. In the left-side navigation pane, choose Serverless Presto > SQL access point.
    3. On the SQL access point page, click Log on in DMS in the Public network section. You can also find your virtual private cloud (VPC) and click Log on in DMS in the Actions column. Then, execute the following SQL statement to create a schema.
      You can also use a MySQL client or program code to connect to DLA. Then, execute the following SQL statement to create a schema:
      ​CREATE DATABASE 'my_es_db'
       WITH DBPROPERTIES (
       catalog = 'elasticsearch',
       location = 'http://****.elasticsearch.aliyuncs.com:9200',
       user = '******',
       password = '******',
       VPC_ID = 'vpc-*****',
       INSTANCE_ID = 'es-*****'
      )
      Parameter Description
      catalog The schema that you want to create. The value elasticsearch indicates the Elasticsearch schema.
      location The endpoint of the Elasticsearch cluster, which consists of three parts: IP address, protocol, and port number. You can log on to the Elasticsearch console and obtain the endpoint of the Elasticsearch cluster in the Basic Information section on the cluster details page.
      Notice When you specify this parameter, you cannot append a forward slash (/) to the parameter value.
      user The username that is used to access the Elasticsearch cluster.
      password The password that is used to access the Elasticsearch cluster.
      VPC_ID The ID of the VPC where the Elasticsearch cluster resides.
      INSTANCE_ID The ID of the Elasticsearch cluster.
  2. Create an Elasticsearch external table.
    Execute the following SQL statement to create the product_info external table in the my_es_db schema:
    CREATE EXTERNAL TABLE 'product_info' (
     `productname` string NULL COMMENT '',
     `annual_rate` string NULL COMMENT '',
     `describe` string NULL COMMENT ''
     )
     TBLPROPERTIES (
     TABLE_MAPPING = 'product_info',
     COLUMN_MAPPING = 'productname,productName; '
     );
    Note
    • By default, the name of the external table is the same as the name of the index that is mapped to the external table in the Elasticsearch cluster. If they are not the same,use the TABLE_MAPPING property to specify the name of the index that is mapped to the external table.
    • productName contains an uppercase letter, and column names in DLA are not case-sensitive. Therefore, you must use COLUMN_MAPPING to add column mappings.
  3. Read data from the Elasticsearch external table.
    Execute the following SELECT statement to read data from the product_info table in DLA:
     select * from product_info;