Data Lake Analytics (DLA) can access Elasticsearch and use standard SQL statements
to read data from Elasticsearch. This way, you can perform joint queries on data in
Elasticsearch and data from other data sources. This topic describes how to use DLA
to read data from and write data to Elasticsearch.
Background information
Elasticsearch is a Lucene-based, distributed, real-time search and analytics engine.
It is an open source service released under the Apache License. Elasticsearch is the
mainstream search engine for enterprises. It provides a distributed system for you
to store, query, and analyze large amounts of datasets in near-real time. Elasticsearch
is used as a basic engine or technology to support complex queries and high-performance
applications.
Procedure
- Create an Elasticsearch schema.
- Log on to the DLA console.
- In the left-side navigation pane, choose .
- On the SQL access point page, click Log on in DMS in the Public network section. You can also find your virtual private cloud (VPC)
and click Log on in DMS in the Actions column. Then, execute the following SQL statement
to create a schema.
You can also use a MySQL client or program code to connect to DLA. Then, execute the
following SQL statement to create a schema:
​CREATE DATABASE 'my_es_db'
WITH DBPROPERTIES (
catalog = 'elasticsearch',
location = 'http://****.elasticsearch.aliyuncs.com:9200',
user = '******',
password = '******',
VPC_ID = 'vpc-*****',
INSTANCE_ID = 'es-*****'
)
Parameter |
Description |
catalog |
The schema that you want to create. The value elasticsearch indicates the Elasticsearch
schema.
|
location |
The endpoint of the Elasticsearch cluster, which consists of three parts: IP address,
protocol, and port number. You can log on to the Elasticsearch console and obtain the endpoint of the Elasticsearch cluster in the Basic Information section on the cluster details page.
Notice When you specify this parameter, you cannot append a forward slash (/) to the parameter value.
|
user |
The username that is used to access the Elasticsearch cluster. |
password |
The password that is used to access the Elasticsearch cluster. |
VPC_ID |
The ID of the VPC where the Elasticsearch cluster resides. |
INSTANCE_ID |
The ID of the Elasticsearch cluster. |
- Create an Elasticsearch external table.
Execute the following SQL statement to create the
product_info
external table in the
my_es_db
schema:
CREATE EXTERNAL TABLE 'product_info' (
`productname` string NULL COMMENT '',
`annual_rate` string NULL COMMENT '',
`describe` string NULL COMMENT ''
)
TBLPROPERTIES (
TABLE_MAPPING = 'product_info',
COLUMN_MAPPING = 'productname,productName; '
);
Note
- By default, the name of the external table is the same as the name of the index that
is mapped to the external table in the Elasticsearch cluster. If they are not the
same,use the
TABLE_MAPPING
property to specify the name of the index that is mapped to the external table.
productName
contains an uppercase letter, and column names in DLA are not case-sensitive. Therefore,
you must use COLUMN_MAPPING
to add column mappings.
- Read data from the Elasticsearch external table.
Execute the following
SELECT
statement to read data from the
product_info
table in DLA:
select * from product_info;