All Products
Search
Document Center

Dataphin:Configure Hive Input Component

Last Updated:Mar 05, 2026

The Hive input component enables reading data from Hive data sources. To synchronize data from Hive to other data sources, configure the Hive input component to read the data source, then set up the target data source for synchronization. This topic describes the configuration process for the Hive input component.

Limits

The Hive input component supports data formats such as orc, parquet, text, rc, seq, and iceberg (the iceberg format is only supported for Hive compute sources or E-MapReduce 5.x data sources). It does not support ORC format transactional tables or Kudu table integration.

Note

To integrate data from a Kudu table, please utilize the Impala input component. For more information, see Configure Impala Input Component.

Prerequisites

  • A Hive data source has been established. For more information, see Create Hive Data Source.

  • To configure the Hive input component properties, the account must possess read-through permissions for the data source. If you lack these permissions, you must obtain them from the data source. For more information, see Request Data Source Permissions.

Procedure

  1. Select Development > Data Integration from the top menu bar on the Dataphin home page.

  2. In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting an environment).

  3. In the navigation pane on the left, click on Batch Pipeline. From the Batch Pipeline list, select the offline pipeline you want to develop to access its configuration page.

  4. To open the Component Library panel, click on the Component Library in the upper-right corner of the page.

  5. In the Component Library panel's left-side navigation pane, select Input. Then, from the right-hand list of input components, locate the Hive component and drag it onto the canvas.

  6. Click the image icon on the Hive input component card to open the Hive Input Configuration dialog box.

  7. Configure the parameters in the Hive Input Configuration dialog box.

    Parameter

    Description

    Step Name

    This is the name of the Hive input component. Dataphin automatically generates the step name, but you can modify it according to the business scenario. The naming convention is as follows:

    • It can only contain Chinese characters, letters, underscores (_), and numbers.

    • It cannot exceed 64 characters.

    Datasource

    The data source drop-down list displays all Hive-type data sources, including those for which you have read-through permissions and those for which you do not. Click the image icon to copy the current data source name.

    • For data sources without read-through permissions, you can click Request after the data source to request read-through permissions for the data source. For more information, see Request Data Source Permissions.

    • If you do not have a Hive-type data source, click Create Data Source to create a data source. For more information, see Create Hive Data Source.

    Table

    Select the source table for data synchronization. Click the image icon to copy the name of the currently selected table.

    Note

    When the selected table is a Hudi table or a Paimon table, only partition configuration is supported.

    Partition

    Supports reading static partitions or range partitions. Examples of static partitions are ds=20230101 and ds1=2023,ds2=01. An example of a range partition is /*query*/ds >=20230101 and ds <= 20230107.

    Note

    When the selected table is a Hudi table or a Paimon table, range partitions are not supported.

    When Partition Does Not Exist

    You can choose the following policies to handle scenarios where the specified partition does not exist:

    • Fail The Task: Terminate the task and mark it as failed.

    • Succeed The Task Without Writing Data: The task runs successfully without writing data to the target table.

    File Encoding

    Select the codec for reading files stored in Hive. File encoding includes UTF-8 and GBK.

    NULL Value Replacement

    This option applies only to source tables that use the textfile data storage format. Enter the string that you want to replace with NULL. For example, if you enter \N, the system replaces the \N string with NULL.

    Compression Format

    This is optional. If the file is compressed, please select the corresponding compression format for Dataphin to decompress. The default format for orc tables is zlib. If you need another decompression format, you must specify it. Other format tables have no default format. Supported compression formats include zlib, hadoop-snappy, lz4, and none.

    Field Separator

    The field separator is usually specified when the table is created, for example, with a ROW FORMAT DELIMITED FIELDS TERMINATED BY statement. Enter the field separator for the table. If you leave this blank, Dataphin uses \u0001 as the default separator.

    Output Fields

    The output fields area displays all fields hit by the selected table and filter criteria. If you do not need to output certain fields to downstream components, you can delete the corresponding fields:

    Note

    When the compute engine is Hadoop, the output fields of the Hadoop input component support viewing the classification of fields. Non-Hadoop compute engines do not support this.

    • Single Field Deletion Scenario: To delete a small number of fields, you can click the sgaga icon under the operation column to delete the extra fields.

    • Batch Field Deletion Scenario: To delete many fields, you can click Field Management. In the Field Management dialog box, select multiple fields, then click the image shift left icon to move the selected input fields to the unselected input fields and click Confirm to complete the batch deletion of fields.

      image..png

  8. To complete the property configuration for the Hive Input Component, click Confirm.