All Products
Search
Document Center

ApsaraDB for SelectDB:Hive data source

Last Updated:Mar 28, 2026

ApsaraDB for SelectDB supports federated queries against external data sources through catalogs. This topic explains how to connect a Hive cluster to SelectDB using a Hive catalog so you can query Hive data directly without moving it.

Use cases

ScenarioDescription
Query accelerationRun SelectDB's distributed query engine directly against Hive data without ETL.
Data integrationRead Hive data and load it into SelectDB internal tables using INSERT INTO statements.

Supported configurations

DimensionSupported
Hive versionsHive 1, Hive 2, Hive 3
Table typesManaged tables, external tables, some Hive views
Metadata formatsHive, Iceberg, and Hudi metadata stored in a Hive metastore
Access modeRead-only — SelectDB cannot write to external catalogs
Storage backendsHDFS (non-HA and HA), Object Storage Service (OSS)

Prerequisites

Before you begin, make sure that:

Sample environment

The walkthrough in this topic uses the following setup. Substitute your own values wherever these parameters appear.

ParameterValue
High availability (HA) modeNon-HA
Hive cluster storageHDFS
Source databasetest_db
Source tabletest_t

Prepare the source data

  1. Log on to the Hive cluster you want to query.

  2. Create a database named test_db.

    CREATE DATABASE IF NOT EXISTS test_db;
  3. Create a table named test_t.

    CREATE TABLE IF NOT EXISTS test_t (
        id   INT,
        name STRING,
        age  INT
    );
  4. Insert sample data.

    INSERT INTO TABLE test_t VALUES
        (1, 'Alice',   25),
        (2, 'Bob',     30),
        (3, 'Charlie', 35),
        (4, 'David',   40),
        (5, 'Eve',     45);

Connect a Hive cluster to SelectDB

Step 1: Connect to an ApsaraDB for SelectDB instance

Connect to your SelectDB instance using a MySQL client. See Connect to an ApsaraDB for SelectDB instance by using a MySQL client.

Step 2: Create a Hive catalog

Create an external catalog that maps to your Hive cluster. SelectDB automatically discovers the databases and tables in the catalog and maps Hive column types to SelectDB types. For the full type mapping table, see Column data type mappings.

Select the configuration that matches your storage backend.

HDFS-based Hive (non-HA)

Use this configuration when your HDFS has a single active NameNode.

CREATE CATALOG hive_catalog PROPERTIES (
    'type'                = 'hms',
    'hive.metastore.uris' = 'thrift://<metastore-host>:9083'
);

Example:

CREATE CATALOG hive_catalog PROPERTIES (
    'type'                = 'hms',
    'hive.metastore.uris' = 'thrift://master-1-1.c-7fa25a1a****.cn-hangzhou.emr.aliyuncs.com:9083'
);
ParameterRequiredDescription
typeYesCatalog type. Set to hms.
hive.metastore.urisYesHive metastore URI in the format thrift://<host>:<port>. Default port: 9083. Run SET hive.metastore.uris on a Hive client to get the URI.

HDFS-based Hive (HA)

Use this configuration when your HDFS uses high availability with multiple NameNodes.

CREATE CATALOG <catalog_name> PROPERTIES (
    'type'                                                = 'hms',
    'hive.metastore.uris'                                 = 'thrift://<metastore-host>:9083',
    'hadoop.username'                                     = '<hadoop-username>',
    'dfs.nameservices'                                    = '<nameservice-id>',
    'dfs.ha.namenodes.<nameservice-id>'                   = '<nn1>,<nn2>',
    'dfs.namenode.rpc-address.<nameservice-id>.<nn1>'     = '<host1>:<port1>',
    'dfs.namenode.rpc-address.<nameservice-id>.<nn2>'     = '<host2>:<port2>',
    'dfs.client.failover.proxy.provider.<nameservice-id>' = '<dfs.client.failover.proxy.provider.your-nameservice>'
);
ParameterRequiredDescription
typeYesCatalog type. Set to hms.
hive.metastore.urisYesHive metastore URI. Format: thrift://<host>:<port>. Default port: 9083.
hadoop.usernameNoUsername for HDFS access.
dfs.nameservicesNoNameService ID. Must match the value in hdfs-site.xml.
dfs.ha.namenodes.[nameservice ID]NoNameNode IDs. Must match the value in hdfs-site.xml.
dfs.namenode.rpc-address.[nameservice ID].[name node ID]NoRemote procedure call (RPC) address of each NameNode. Must match the values in hdfs-site.xml.
dfs.client.failover.proxy.provider.[nameservice ID]NoJava class for active NameNode failover. Use org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider in most cases.

OSS-based Hive

Use this configuration when Hive data is stored in Object Storage Service (OSS). The syntax is identical to the HDFS variant; only the storage parameters differ.

CREATE CATALOG hive_catalog PROPERTIES (
    "type"                = "hms",
    "hive.metastore.uris" = "thrift://172.0.0.1:9083",
    "oss.endpoint"        = "oss-cn-beijing.aliyuncs.com",
    "oss.access_key"      = "<your-access-key-id>",
    "oss.secret_key"      = "<your-access-key-secret>"
);
ParameterRequiredDescription
typeYesCatalog type. Set to hms.
hive.metastore.urisYesHive metastore URI. Format: thrift://<host>:<port>.
oss.endpointYesOSS endpoint for the region where your data is stored. See Regions and endpoints.
oss.access_keyYesAccessKey ID used to access OSS.
oss.secret_keyYesAccessKey secret used to access OSS.

Step 3: Verify the catalog

Confirm the catalog was created successfully.

SHOW CATALOGS;

Expected output:

+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
| CatalogId    | CatalogName  | Type     | IsCurrent | CreateTime              | LastUpdateTime      | Comment                |
+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
| 436009309195 | hive_catalog | hms      |           | 2024-07-19 17:09:08.058 | 2024-07-19 18:04:37 |                        |
|            0 | internal     | internal | yes       | UNRECORDED              | NULL                | Doris internal catalog |
+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+

Step 4: Query Hive data

After connecting, SelectDB defaults to the internal catalog. Switch to the Hive catalog to access Hive data.

Both query styles below are equivalent — use whichever fits your workflow.

-- Style 1: Switch catalog, then use database and table
SWITCH hive_catalog;
USE test_db;
SELECT * FROM test_t;

-- Style 2: Use the fully qualified table name (no switch required)
SELECT * FROM hive_catalog.test_db.test_t;

To explore the catalog before running queries:

SWITCH hive_catalog;
SHOW DATABASES;   -- list all databases in the Hive catalog
USE test_db;
SHOW TABLES;      -- list all tables in test_db
After switching to the external catalog, you navigate databases and tables the same way as in the internal catalog.

Migrate data from Hive to SelectDB

After the Hive catalog is connected, use INSERT INTO statements to migrate historical Hive data into SelectDB internal tables. See Import data by using INSERT INTO statements.

Column data type mappings

SelectDB automatically maps Hive column types to SelectDB types when you create a catalog. The mappings below apply to Hive, Iceberg, and Hudi metadata stored in a Hive metastore.

Complex types can be nested. For example:

  • array<map<string, int>>

  • map<string, array<int>>

  • struct<col1: array<int>, col2: map<int, date>>

Hive metastore typeSelectDB typeNote
booleanboolean
tinyinttinyint
smallintsmallint
intint
bigintbigint
floatfloat
doubledouble
charchar
varcharvarchar
decimaldecimal
datedate
timestampdatetime
array<type>array<type>Supports nesting, for example array<map<string, int>>.
map<KeyType, ValueType>map<KeyType, ValueType>Supports nesting, for example map<string, array<int>>.
struct<col1: Type1, col2: Type2, ...>struct<col1: Type1, col2: Type2, ...>Supports nesting, for example struct<col1: array<int>, col2: map<int, date>>.
Otherunsupported

What's next