All Products
Search
Document Center

E-MapReduce:Overview

Last Updated:Feb 28, 2024

Kudu is a distributed, scalable, and column-oriented data storage system that makes fast analytics on rapidly changing data.

Scenarios

Kudu is suitable for the following scenarios:

  • Near real-time computing

  • Time series data

  • Prediction modeling

  • Tremendous historical data

    In most cases, a large volume of historical data exists in the production environment. The historical data can be stored in Hadoop Distributed File System (HDFS), relational database management system (RDBMS), or Kudu. If you need to only access or query historical data, you can use Impala to perform the operation and do not need to migrate the data to Kudu.

Components

Kudu consists of the following components:

  • Master servers: manage metadata. The metadata includes the server and tablet information of tablet servers. Master servers work in high availability (HA) mode by using the Raft algorithm.

  • Tablet servers: store tablets. Each tablet has multiple replicas, which ensure high availability by using the Raft algorithm.

Terms

Term

Description

master server

Manages metadata of the entire cluster. The metadata includes tablet server information, table information, tablet information, and other metadata-related information.

tablet server

Stores and provides tablets for clients.

column-oriented storage

Kudu is a column-oriented data storage system. Data in the same column is stored in adjacent locations in the underlying storage.

table

Kudu stores data in tables. A table has a schema and a globally ordered primary key. A table can be split into multiple segments that are called tablets.

tablet

A contiguous segment of a table. A specific tablet is replicated on multiple tablet servers. One of these replicas is considered to be the leader tablet.

Raft

A consensus algorithm that is used to ensure high availability of master servers and data consistency among tablet replicas.

catalog table

The central location for metadata in Kudu. The catalog table stores information about tables and tablets.