All Products
Search
Document Center

Hologres:Access data lake data in OSS using Hive Metastore (Beta)

Last Updated:Feb 04, 2026

Starting with V2.2, Hologres supports accessing data lake data in OSS using Hive Metastore (HMS). If you use an EMR cluster to build a data lake on OSS, you can configure Hologres to accelerate read and write operations for OSS and OSS-HDFS data.

Prerequisites

  • You have activated OSS. For more information, see Quick Start.

  • You have created an EMR data lake cluster and created test data. For more information, see Create a cluster. The EMR cluster must meet the following conditions:

    • The Hive version is 3.1.3 or later.

    • Kerberos identity authentication is disabled.

    • For Metadata, you have selected Self-managed RDS or Built-in MySQL.

  • You have purchased a Hologres instance, enabled data lake acceleration, and created a database. For more information, see Purchase a Hologres instance and Create a database.

    Note

    To enable data lake acceleration, go to the Hologres instance list. In the Actions column of the target instance, click Data Lake Acceleration and confirm your choice.

  • A network connection is established.

    Submit a network connection request. After your request is received, the Alibaba Cloud Hologres support team will contact you to help you complete the following steps to establish a network connection:

    Log on to the VPC console to create a reverse endpoint. For more information, see Access Alibaba Cloud services. For the Endpoint Service parameter, select Other Endpoint Services and enter the name of the endpoint service in the region where the EMR cluster is located. The endpoint service names for each region are as follows.

    Region

    Endpoint service name

    Beijing

    com.aliyuncs.privatelink.cn-beijing.epsrv-2zeokrydzjd6kx3cbwmb

    Shanghai

    com.aliyuncs.privatelink.cn-shanghai.epsrv-uf61fvlfwta7f7dv9n3x

    Zhangjiakou

    com.aliyuncs.privatelink.cn-zhangjiakou.epsrv-8vbno4k4wwvys0eg2swp

    Note
    • If an endpoint service name is not available for your region, the Hologres team will create one for you and provide the name after you submit the network connection request.

    • A virtual private cloud (VPC) is an isolated network environment built on Alibaba Cloud. VPCs are logically isolated from each other and from classic networks, and cannot communicate by default. Because the Hologres service was created before VPCs, it is deployed in a classic network. Therefore, you must configure a reverse endpoint to establish a network connection.

    • The current network configuration uses an IP address for the connection. If the IP address of the EMR cluster changes, you must reconfigure the connection.

Limitations

  • Read-only secondary Hologres instances do not support the data lake acceleration feature.

  • Operations such as UPDATE, DELETE, or TRUNCATE are not supported on foreign tables.

  • Mapping foreign tables from HMS using Auto Load is not supported.

  • Hive clusters with Kerberos identity authentication enabled are not supported.

Procedure

  1. Execute a SQL command to create the extension.

    Creating an extension requires superuser permissions. This operation affects the entire database and needs to be performed only once per database.

    CREATE EXTENSION IF NOT EXISTS hive_fdw;
  2. Create a foreign server based on hive_fdw and configure the endpoint information.

    CREATE SERVER IF NOT EXISTS <server_name> FOREIGN DATA WRAPPER hive_fdw 
    OPTIONS (
       hive_metastore_uris 'thrift://<IP address of Hive Metastore>:<port number>',
       oss_endpoint 'oss-<nation>-<region>-internal.aliyuncs.com | <bucket>.oss-<nation>-<region>.oss-dls.aliyuncs.com' 
    );

    Parameter

    Required

    Description

    Example

    server_name

    Yes

    A custom name for the foreign server.

    hive_server

    hive_metastore_uris

    Yes

    The URI of the Hive Metastore. The format is thrift://<IP address of Hive Metastore>:<port number>. The default port number is 9083.

    Note

    Log on to the E-MapReduce console. In the Actions column of the target cluster, click Node Management. On the Node Management tab, find the Internal IP of the master node. This IP address is the IP address of the Hive metastore.

    thrift://172.16.0.250:9083

    oss_endpoint

    Yes

    The endpoint of OSS. Choose one based on your business needs:

    • Native OSS storage: For better access performance, use the internal endpoint of OSS.

    • OSS-HDFS storage: Only internal network access is supported.

    Note

    Log on to the OSS console. Go to the overview page of the bucket. In the Access Ports area, find the OSS endpoint address.

    • OSS

      oss-cn-shanghai-internal.aliyuncs.com
    • OSS-HDFS

      <bucket_name>.cn-beijing.oss-dls.aliyuncs.com
  3. (Optional) Create a user mapping.

    Hologres lets you use the CREATE USER MAPPING command to map a user to a specific foreign server. For example, the owner of a foreign server can use CREATE USER MAPPING to allow a Resource Access Management (RAM) user (123xxx) to access external data in OSS. For more information about CREATE USER MAPPING, see the PostgreSQL documentation.

    CREATE USER mapping FOR <account> server <server_name> options
    (
        dlf_access_id '<yourAccessKeyId>', 
        dlf_access_key '<yourAccessKeySecret>',
        oss_access_id '<yourAccessKeyId>', 
        oss_access_key '<yourAccessKeySecret>'
    );

    The following is an example:

    -- Create a user mapping for the current user
    CREATE USER mapping FOR current_user server <server_name> options
    (
        dlf_access_id 'yourAccessKeyId', 
        dlf_access_key 'yourAccessKeySecret',
        oss_access_id 'yourAccessKeyId', 
        oss_access_key 'yourAccessKeySecret'
    );
    
    -- Create a user mapping for RAM user 123xxx
    CREATE USER mapping FOR "p4_123xxx" server <server_name> options
    (
        dlf_access_id 'yourAccessKeyId', 
        dlf_access_key 'yourAccessKeySecret',
        oss_access_id 'yourAccessKeyId', 
        oss_access_key 'yourAccessKeySecret'
    );
    
    -- Delete the user mapping
    Drop USER MAPPING FOR CURRENT_USER server <server_name>;
    Drop USER MAPPING FOR "p4_123xxx" server <server_name>;
  4. Create a foreign table.

    Hologres supports the following commands to create foreign tables:

    • CREATE FOREIGN TABLE: Creates a single foreign table. This command lets you customize the table by specifying a subset of columns. It is suitable for scenarios where you need to create only a few foreign tables or do not need to map all fields.

    • IMPORT FOREIGN SCHEMA: Creates foreign tables in batches. This command is suitable for scenarios where you need to create multiple foreign tables or perform batch mapping from an external data source.

    Note
    • Hologres supports reading partitioned tables in OSS. The supported data types for partition keys are TEXT, VARCHAR, and INT. When you use CREATE FOREIGN TABLE, you can define the partition fields as regular fields because this command only maps the schema without storing data. When you use IMPORT FOREIGN SCHEMA, you do not need to define the table fields because the system automatically handles the field mapping.

    • If a table in the external schema has the same name as an existing internal table in Hologres, IMPORT FOREIGN SCHEMA skips the creation of that foreign table. In this case, use CREATE FOREIGN TABLE to define a unique name for the foreign table.

    -- Using CREATE FOREIGN TABLE
    CREATE FOREIGN TABLE <holo_schema_name>.<table_name>
    (
      { column_name data_type }
      [, ... ]
      ] )
    )
    SERVER <hive_server_name>
    OPTIONS
    (
      schema_name '<ext_db_name>',
      table_name '<ext_table_name>'
    );
    
    
    -- Using IMPORT FOREIGN SCHEMA
    IMPORT FOREIGN SCHEMA <ext_db_name> 
    [
      { limit TO | EXCEPT } 
      ( table_name [, ...] ) 
    ]
    FROM server <hive_server_name>
    INTO <holo_schema_name> 
    options(
      if_table_exist 'update',
      if_unsupported_type 'error'
            );
  5. Query the foreign table.

    After the foreign table is created, you can directly query it to read data from OSS.

    • Non-partitioned table

      SELECT * FROM <holo_schema>.<hive_table>;
    • Partitioned table

      SELECT * FROM <holo_schema>.<hive_partition_table> WHERE <partition_key> = '<partition_value>';