All Products
Search
Document Center

Realtime Compute for Apache Flink:Manage Hive catalogs

Last Updated:Mar 26, 2026

A Hive catalog connects Realtime Compute for Apache Flink to your Hive metastore or Alibaba Cloud Data Lake Formation (DLF), so you can query and write Hive tables without declaring DDL statements for each table. Tables registered in a Hive catalog are available as source tables, result tables, or dimension tables in both streaming and batch deployments.

This topic covers:

Prerequisites

Before you begin, make sure that you have:

If you use a Hive metastore:

  • The Hive metastore service is running. To start it, run hive --service metastore. To verify it is listening on the default port (9083), run netstat -ln | grep 9083. If you configured a different port in hive-site.xml, replace 9083 with that port.

  • A whitelist configured for the Hive metastore service, with the CIDR blocks of Realtime Compute for Apache Flink added. For steps, see Configure a whitelist and Add a security group rule.

If you use Alibaba Cloud DLF:

  • DLF activated in your Alibaba Cloud account.

Limitations

LimitationDetails
Supported metastore typesSelf-managed Hive metastores
Supported Hive versions2.0.0–2.3.9 and 3.1.0–3.1.3
Legacy Hive versions (1.X, 2.1.X, 2.2.X)Not supported in Apache Flink 1.16 or later; requires Ververica Runtime (VVR) 6.X
Non-Hive tables in DLF-backed catalogsRequires VVR 8.0.6 or later
Writing to OSS-HDFS via Hive catalogRequires VVR 8.0.6 or later

Configure Hive metadata

Before creating a Hive catalog, connect your Hadoop cluster to the virtual private cloud (VPC) where Realtime Compute for Apache Flink runs, then upload your configuration files to Object Storage Service (OSS).

Step 1: Connect your Hadoop cluster to the VPC

Use Alibaba Cloud DNS PrivateZone to resolve Hadoop hostnames within the VPC. For setup instructions, see Resolver.

Step 2: Configure the metastore in hive-site.xml

Choose the metastore type that matches your setup.

Hive metastore

Verify that hive.metastore.uris in hive-site.xml points to the internal or public IP address of your Hive host:

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xx.yy.zz.mm:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
If you set hive.metastore.uris to a hostname instead of an IP address, configure Alibaba Cloud DNS PrivateZone to resolve it. Without DNS resolution, Ververica Platform (VVP) returns an UnknownHostException when connecting to the metastore. See Add a DNS record to a private zone.

Alibaba Cloud DLF

Important

If hive-site.xml contains a dlf.catalog.akMode entry, delete it before proceeding. Leaving it in place prevents the Hive catalog from accessing DLF.

Add the following properties to hive-site.xml:

<property>
  <name>hive.imetastoreclient.factory.class</name>
  <value>com.aliyun.datalake.metastore.hive2.DlfMetaStoreClientFactory</value>
</property>
<property>
  <name>dlf.catalog.uid</name>
  <value>${YOUR_DLF_CATALOG_UID}</value>
</property>
<property>
  <name>dlf.catalog.endpoint</name>
  <value>${YOUR_DLF_ENDPOINT}</value>
</property>
<property>
  <name>dlf.catalog.region</name>
  <value>${YOUR_DLF_CATALOG_REGION}</value>
</property>
<property>
  <name>dlf.catalog.accessKeyId</name>
  <value>${YOUR_ACCESS_KEY_ID}</value>
</property>
<property>
  <name>dlf.catalog.accessKeySecret</name>
  <value>${YOUR_ACCESS_KEY_SECRET}</value>
</property>
ParameterRequiredDescription
dlf.catalog.uidYesYour Alibaba Cloud account ID. Find it on the Security Settings page.
dlf.catalog.endpointYesDLF service endpoint. Use the VPC endpoint for your region, for example dlf-vpc.cn-hangzhou.aliyuncs.com. For all endpoints, see Supported regions and endpoints. For cross-VPC access, see How does Realtime Compute for Apache Flink access a service across VPCs?
dlf.catalog.regionYesRegion ID where DLF is activated. Must match the region of dlf.catalog.endpoint. See Supported regions and endpoints.
dlf.catalog.accessKeyIdYesAccessKey ID of your Alibaba Cloud account. See Obtain an AccessKey pair.
dlf.catalog.accessKeySecretYesAccessKey secret of your Alibaba Cloud account. See Obtain an AccessKey pair.

Step 3: Configure table storage in hive-site.xml

Hive catalogs support two storage backends for table data: OSS and OSS-HDFS.

OSS

Add the following properties to hive-site.xml:

<property>
  <name>fs.oss.impl.disable.cache</name>
  <value>true</value>
</property>
<property>
  <name>fs.oss.impl</name>
  <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
</property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>${YOUR_OSS_WAREHOUSE_DIR}</value>
</property>
<property>
  <name>fs.oss.endpoint</name>
  <value>${YOUR_OSS_ENDPOINT}</value>
</property>
<property>
  <name>fs.oss.accessKeyId</name>
  <value>${YOUR_ACCESS_KEY_ID}</value>
</property>
<property>
  <name>fs.oss.accessKeySecret</name>
  <value>${YOUR_ACCESS_KEY_SECRET}</value>
</property>
<property>
  <name>fs.defaultFS</name>
  <value>oss://${YOUR_OSS_BUCKET_DOMIN}</value>
</property>
ParameterRequiredDescription
hive.metastore.warehouse.dirYesDirectory where table data is stored.
fs.oss.endpointYesOSS endpoint. See Regions and endpoints.
fs.oss.accessKeyIdYesAccessKey ID. See Obtain an AccessKey pair.
fs.oss.accessKeySecretYesAccessKey secret. See Obtain an AccessKey pair.
fs.defaultFSYesDefault file system for table data. Set to the HDFS endpoint of the target bucket, for example oss://oss-hdfs-bucket.cn-hangzhou.oss-dls.aliyuncs.com/.

OSS-HDFS

  1. Add the following properties to hive-site.xml:

    ParameterRequiredDescription
    hive.metastore.warehouse.dirYesDirectory where table data is stored.
    fs.oss.endpointYesOSS endpoint. See Regions and endpoints.
    fs.oss.accessKeyIdYesAccessKey ID. See Obtain an AccessKey pair.
    fs.oss.accessKeySecretYesAccessKey secret. See Obtain an AccessKey pair.
    fs.defaultFSYesDefault file system for table data. Set to the HDFS endpoint of the target bucket, for example oss://oss-hdfs-bucket.cn-hangzhou.oss-dls.aliyuncs.com/.
    <property>
      <name>fs.jindo.impl</name>
      <value>com.aliyun.jindodata.jindo.JindoFileSystem</value>
    </property>
    <property>
      <name>hive.metastore.warehouse.dir</name>
      <value>${YOUR_OSS_WAREHOUSE_DIR}</value>
    </property>
    <property>
      <name>fs.oss.endpoint</name>
      <value>${YOUR_OSS_ENDPOINT}</value>
    </property>
    <property>
      <name>fs.oss.accessKeyId</name>
      <value>${YOUR_ACCESS_KEY_ID}</value>
    </property>
    <property>
      <name>fs.oss.accessKeySecret</name>
      <value>${YOUR_ACCESS_KEY_SECRET}</value>
    </property>
    <property>
      <name>fs.defaultFS</name>
      <value>oss://${YOUR_OSS_HDFS_BUCKET_DOMIN}</value>
    </property>
  2. (Optional) If you plan to read Parquet-format Hive tables stored in OSS-HDFS, add the following configuration to Realtime Compute for Apache Flink:

    fs.oss.jindo.accessKeyId: ${YOUR_ACCESS_KEY_ID}
    fs.oss.jindo.accessKeySecret: ${YOUR_ACCESS_KEY_SECRET}
    fs.oss.jindo.endpoint: ${YOUR_JINODO_ENDPOINT}
    fs.oss.jindo.buckets: ${YOUR_JINDO_BUCKETS}

    For parameter details, see Write data to OSS-HDFS.

If your data is already stored in Realtime Compute for Apache Flink, skip Steps 2–3 and go directly to Create a Hive catalog.

Step 4: Upload configuration files to OSS

  1. Log on to the OSS console and click Buckets in the left-side navigation pane.

  2. Click the name of the bucket used by your Realtime Compute for Apache Flink workspace.

  3. Create the folder ${hms} at the path oss://${bucket}/artifacts/namespaces/${ns}/. For instructions, see Create directories.

    Realtime Compute for Apache Flink automatically creates the /artifacts/namespaces/${ns}/ directory when a workspace is provisioned. If the directory does not appear in the OSS console, upload any file on the Artifacts page in the development console to trigger its creation.
    VariableDescription
    ${bucket}The bucket used by your Realtime Compute for Apache Flink workspace
    ${ns}The name of the workspace for which you are creating the Hive catalog
    ${hms}The folder name. Use the same name as the Hive catalog you will create
  4. Inside oss://${bucket}/artifacts/namespaces/${ns}/${hms}/, create two subdirectories: After creating both directories, go to Files > Projects in the OSS console to verify the structure and copy the OSS URLs.

    • hive-conf-dir/ — stores hive-site.xml

    • hadoop-conf-dir/ — stores Hadoop configuration files

  5. Upload hive-site.xml to hive-conf-dir/. See Upload objects.

  6. Upload the following files to hadoop-conf-dir/. See Upload objects.

    • hive-site.xml

    • core-site.xml

    • hdfs-site.xml

    • mapred-site.xml

    • Any other required files, such as compressed packages used by Hive deployments

Create a Hive catalog

After configuring Hive metadata, create a Hive catalog from the console or by running an SQL statement. The console method is recommended.

Important

After creating a Hive catalog, you cannot modify its configuration. To change any parameter, drop the catalog and create it again.

Create a Hive catalog from the console

  1. Log on to the Realtime Compute for Apache Flink console. Find the workspace and click Console in the Actions column.

  2. Click Catalogs.

  3. On the Catalog List page, click Create Catalog. In the dialog box, go to the Built-in Catalog tab, select Hive in the Choose Catalog Type step, and click Next.

  4. In the Configure Catalog step, set the following parameters:

    ParameterRequiredDescription
    catalog nameYesName of the Hive catalog
    hive-versionYesVersion of the Hive metastore. Supported: 2.0.0–2.3.9, 3.1.0–3.1.3. Map your installed version: 2.0.X or 2.1.X → 2.2.0; 2.2.X → 2.2.0; 2.3.X → 2.3.6; 3.1.X → 3.1.2
    default-databaseYesName of the default database
    hive-conf-dirYesOSS path to the directory containing hive-site.xml, created in Step 4. For fully managed storage, upload the file as prompted
    hadoop-conf-dirYesOSS path to the directory containing Hadoop configuration files, created in Step 4. For fully managed storage, upload the files as prompted
    hive-kerberosNoEnable Kerberos authentication and associate a registered Kerberized Hive cluster and a Kerberos principal. See Register a Kerberized Hive cluster
  5. Click Confirm.

  6. In the Catalogs pane on the left side of the Catalog List page, verify the new catalog appears.

Create a Hive catalog by running an SQL statement

  1. On the Scripts page, run the following statement:

    ParameterRequiredDescription
    ${HMS Name}YesName of the Hive catalog
    typeYesConnector type. Set to hive
    default-databaseYesName of the default database
    hive-versionYesHive metastore version. Supported: 2.0.0–2.3.9, 3.1.0–3.1.3. Map your installed version: 2.0.X or 2.1.X → 2.2.0; 2.2.X → 2.2.0; 2.3.X → 2.3.6; 3.1.X → 3.1.2
    hive-conf-dirYesOSS path to the directory containing hive-site.xml. See Configure Hive metadata
    hadoop-conf-dirYesOSS path to the directory containing Hadoop configuration files. See Configure Hive metadata
    CREATE CATALOG ${HMS Name} WITH (
        'type' = 'hive',
        'default-database' = 'default',
        'hive-version' = '<hive-version>',
        'hive-conf-dir' = '<hive-conf-dir>',
        'hadoop-conf-dir' = '<hadoop-conf-dir>'
    );
  2. Select the CREATE CATALOG statement and click Run.

After the catalog is created, tables in it are available in drafts as result tables or dimension tables without DDL declarations. Table names use the format ${hive-catalog-name}.${hive-db-name}.${hive-table-name}.

Use a Hive catalog

All examples below use a catalog named flinkexporthive and a database named flinkhive.

Create a Hive table

UI method

From the console:

  1. Log on to the Realtime Compute for Apache Flink console, find the workspace, and click Console in the Actions column.

  2. Click Catalogs.

  3. On the Catalog List page, find the catalog and click View in the Actions column.

  4. Find the target database and click View in the Actions column.

  5. Click Create Table.

  6. On the Built-in tab of the Create Table dialog box, select a table type from the Connection Type drop-down list, select a connector type, and click Next.

  7. Enter the table creation statement and configure the parameters:

    CREATE TABLE `${catalog_name}`.`${db_name}`.`${table_name}` (
      id INT,
      name STRING
    ) WITH (
      'connector' = 'hive'
    );
  8. Click OK.

SQL command method

By running an SQL statement:

On the Scripts page, run the CREATE TABLE statement, then select it and click Run.

image

Example — create flink_hive_test in the flinkhive database under flinkexporthive:

-- Create a table named flink_hive_test in the flinkhive database under the flinkexporthive catalog.
CREATE TABLE `flinkexporthive`.`flinkhive`.`flink_hive_test` (
  id INT,
  name STRING
) WITH (
  'connector' = 'hive'
);

Modify a Hive table

On the Scripts page, run the ALTER TABLE statement:

-- Add a column to the Hive table.
ALTER TABLE `${catalog_name}`.`${db_name}`.`${table_name}`
ADD column type-column;

-- Drop a column from the Hive table.
ALTER TABLE `${catalog_name}`.`${db_name}`.`${table_name}`
DROP column;

Example — add and then remove the color column from flink_hive_test:

-- Add the color field to the Hive table.
ALTER TABLE `flinkexporthive`.`flinkhive`.`flink_hive_test`
ADD color STRING;

-- Drop the color field from the Hive table.
ALTER TABLE `flinkexporthive`.`flinkhive`.`flink_hive_test`
DROP color;

Read data from a Hive table

INSERT INTO ${other_sink_table}
SELECT ...
FROM `${catalog_name}`.`${db_name}`.`${table_name}`;

Write data to a Hive table

INSERT INTO `${catalog_name}`.`${db_name}`.`${table_name}`
SELECT ...
FROM ${other_source_table};

Drop a Hive table

UI method

From the console:

  1. Log on to the Realtime Compute for Apache Flink console, find the workspace, and click Console in the Actions column.

  2. Click Catalogs.

  3. In the Catalogs pane, navigate to the target table under the target database and catalog.

  4. On the table details page, click Delete in the Actions column.

  5. In the confirmation dialog, click OK.

SQL command method

By running an SQL statement:

On the Scripts page, run:

-- Drop the Hive table.
DROP TABLE `${catalog_name}`.`${db_name}`.`${table_name}`;

Example:

-- Drop the Hive table.
DROP TABLE `flinkexporthive`.`flinkhive`.`flink_hive_test`;

View a Hive catalog

  1. Log on to the Realtime Compute for Apache Flink console, find the workspace, and click Console in the Actions column.

  2. Click Catalogs.

  3. On the Catalog List page, check the Name and Type columns for the catalog.

To view the databases and tables in the catalog, click View in the Actions column.

Drop a Hive catalog

Warning

Dropping a Hive catalog does not affect running deployments. However, it affects unpublished drafts and deployments that need to be suspended and resumed. Drop a catalog only when necessary.

Drop a Hive catalog from the console

  1. Log on to the Realtime Compute for Apache Flink console, find the workspace, and click Console in the Actions column.

  2. Click Catalogs.

  3. On the Catalog List page, find the catalog and click Delete in the Actions column.

  4. In the confirmation dialog, click Delete.

  5. Verify the catalog no longer appears in the Catalogs pane.

Drop a Hive catalog by running an SQL statement

  1. On the Scripts page, run:

    DROP CATALOG ${HMS Name};

    ${HMS Name} is the catalog name shown in the development console of Realtime Compute for Apache Flink.

  2. Right-click the statement and select Run from the shortcut menu.

  3. Verify the catalog no longer appears in the Catalogs pane.