Data Lake Analytics (DLA) can access Elasticsearch and use standard SQL statements to read data from Elasticsearch. This way, you can perform joint queries on data in Elasticsearch and data from other data sources. This topic describes how to use DLA to read data from and write data to Elasticsearch.
- Create an Alibaba Cloud Elasticsearch cluster.
- The DLA instance and Elasticsearch cluster are deployed in the same region. If they are not deployed in the same region, the operations in this topic cannot be performed.
- DLA supports only Elasticsearch V6.3 and later.
- Configure a whitelist to access an Elasticsearch cluster over the Internet or a VPC.
- Access an Elasticsearch cluster.
- Create an index.
- Create a document and insert data.
Elasticsearch is a Lucene-based, distributed, real-time search and analytics engine. It is an open source service released under the Apache License. Elasticsearch is the mainstream search engine for enterprises. It provides a distributed system for you to store, query, and analyze large amounts of datasets in near-real time. Elasticsearch is used as a basic engine or technology to support complex queries and high-performance applications.
- Create an Elasticsearch schema.
- Log on to the DLA console.
- In the left-side navigation pane, choose .
- On the SQL access point page, click Log on in DMS in the Public network section. You can also find your virtual private cloud (VPC)
and click Log on in DMS in the Actions column. Then, execute the following SQL statement
to create a schema. You can also use a MySQL client or program code to connect to DLA. Then, execute the following SQL statement to create a schema:
CREATE DATABASE 'my_es_db' WITH DBPROPERTIES ( catalog = 'elasticsearch', location = 'http://****.elasticsearch.aliyuncs.com:9200', user = '******', password = '******', VPC_ID = 'vpc-*****', INSTANCE_ID = 'es-*****' )
Parameter Description catalog The schema that you want to create. The value elasticsearch indicates the Elasticsearch schema. location The endpoint of the Elasticsearch cluster, which consists of three parts: IP address, protocol, and port number. You can log on to the Elasticsearch console and obtain the endpoint of the Elasticsearch cluster in the Basic Information section on the cluster details page.Notice When you specify this parameter, you cannot append a forward slash (/) to the parameter value. user The username that is used to access the Elasticsearch cluster. password The password that is used to access the Elasticsearch cluster. VPC_ID The ID of the VPC where the Elasticsearch cluster resides. INSTANCE_ID The ID of the Elasticsearch cluster.
- Create an Elasticsearch external table. Execute the following SQL statement to create the
product_infoexternal table in the
CREATE EXTERNAL TABLE 'product_info' ( `productname` string NULL COMMENT '', `annual_rate` string NULL COMMENT '', `describe` string NULL COMMENT '' ) TBLPROPERTIES ( TABLE_MAPPING = 'product_info', COLUMN_MAPPING = 'productname,productName; ' );Note
- By default, the name of the external table is the same as the name of the index that
is mapped to the external table in the Elasticsearch cluster. If they are not the
TABLE_MAPPINGproperty to specify the name of the index that is mapped to the external table.
productNamecontains an uppercase letter, and column names in DLA are not case-sensitive. Therefore, you must use
COLUMN_MAPPINGto add column mappings.
- By default, the name of the external table is the same as the name of the index that is mapped to the external table in the Elasticsearch cluster. If they are not the same,use the
- Read data from the Elasticsearch external table. Execute the following
SELECTstatement to read data from the
product_infotable in DLA:
select * from product_info;