×
Community Blog Enable Hive to Write and Read Data from Alibaba Cloud Elasticsearch using ES-Hadoop

Enable Hive to Write and Read Data from Alibaba Cloud Elasticsearch using ES-Hadoop

In this guide, we'll dive deep into leveraging ES-Hadoop to enable Hive to write data to and read from Alibaba Cloud Elasticsearch, transforming your data analytics operations.

Elasticsearch and Hadoop are powerhouse technologies that have revolutionized data storage, processing, and analytics. When combined, especially in the versatile environment of Alibaba Cloud, they unlock incredible potentials for handling big data tasks. In this guide, we'll dive deep into leveraging ES-Hadoop to enable Hive to write data to and read from Alibaba Cloud Elasticsearch, transforming your data analytics operations.

Integrating Hive with Alibaba Cloud Elasticsearch

Elasticsearch-Hadoop (ES-Hadoop) is an open-source tool developed to bridge the gap between Elasticsearch and the Hadoop ecosystem. This integration not only accelerates query responses but also provides a scalable architecture for real-time analytics.

Before you embark on this integration, ensure you have an Alibaba Cloud account and familiarize yourself with their Elasticsearch services (learn more here). Let’s explore how to set up this powerhouse duo to supercharge your data analytics workflow.

Prerequisites

  • A running Alibaba Cloud Elasticsearch cluster.
  • An E-MapReduce (EMR) cluster within the same VPC.

Procedure

Step 1: Prepare Your Environment

Disable Auto Indexing in your Elasticsearch cluster to ensure optimal mapping configurations. Create an index with specified mappings. Consider the following example:

PUT company
{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {"type": "long"},
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "birth": {"type": "text"},
        "addr": {"type": "text"}
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 1
    }
  }
}

Create an EMR cluster in the same VPC as your Elasticsearch setup to ensure seamless connectivity and data transfer.

Step 2: Upload the ES-Hadoop JAR

Obtain the compatible ES-Hadoop package and upload it to HDFS:

hadoop fs -mkdir /tmp/hadoop-es
hadoop fs -put elasticsearch-hadoop-hive-x.x.x.jar /tmp/hadoop-es

Replace x.x.x with the correct version number corresponding to your Elasticsearch version.

Step 3: Creating a Hive External Table

Set up a Hive external table and map its fields to the Elasticsearch index fields:

add jar hdfs:///tmp/hadoop-es/elasticsearch-hadoop-hive-x.x.x.jar;

CREATE EXTERNAL table IF NOT EXISTS company( 
   id BIGINT,
   name STRING,
   birth STRING,
   addr STRING 
)  
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
TBLPROPERTIES(  
    'es.nodes' = 'http://es-cn-xxxxxx.elasticsearch.aliyuncs.com',
    'es.port' = '9200',
    'es.net.ssl' = 'true', 
    'es.nodes.wan.only' = 'true', 
    ...
);

Step 4: Writing and Reading Data

Write data to the index using HiveSQL:

INSERT INTO TABLE company VALUES (1, "zhangsan", "1990-01-01","No.969, WenyiXi Rd, Yuhang, Hangzhou");

Read data from the index:

1SELECT * FROM company;

The integration of Hive with Alibaba Cloud Elasticsearch via ES-Hadoop creates a robust environment for processing and analyzing big data. This setup not only enhances data insights but also optimizes storage and query efficiency.


Conclusion

Integrating Hive with Alibaba Cloud Elasticsearch offers a streamlined pathway for real-time data analytics. Alibaba Cloud provides a comprehensive and scalable platform for your Elasticsearch needs. The synergy between Elasticsearch, Hadoop, and Hive presents a formidable framework for handling large datasets, enabling advanced analytics that drive informed business decisions.
Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece.

Embark on Your 30-Day Free Trial

0 1 0
Share on

Data Geek

109 posts | 4 followers

You may also like

Comments

Data Geek

109 posts | 4 followers

Related Products

  • Alibaba Cloud Elasticsearch

    Alibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.

    Learn More
  • Data Transmission Service

    Supports data migration and data synchronization between data engines, such as relational database, NoSQL and OLAP

    Learn More
  • Hologres

    A real-time data warehouse for serving and analytics which is compatible with PostgreSQL.

    Learn More
  • Security Center

    A unified security management system that identifies, analyzes, and notifies you of security threats in real time

    Learn More