All Products
Search
Document Center

Dataphin:Configure an Impala input component

Last Updated:Mar 05, 2026

An Impala input component retrieves data from an Impala data source. To sync data from an Impala data source to another data source, first configure the Impala input component to read from the source. Then configure the target data source for the sync. This topic describes how to configure an Impala input component.

Prerequisites

Procedure

  1. On the Dataphin homepage, in the top menu bar, choose Develop > Data Integration.

  2. On the Data Integration page, select a Project. In Dev-Prod mode, also select an environment.

  3. In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The pipeline configuration page opens.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the left navigation pane of the Component Library panel, click Input. In the input component list on the right, locate the Impala component and drag it onto the canvas.

  6. Click the image icon in the Impala input component card to open the Impala Input Configuration dialog box.

  7. In the Impala Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step name

    The name of the Teradata input component. Dataphin generates a step name automatically. You can change it based on your business scenario. Use the following naming convention:

    • Use only Chinese characters, letters, underscores (_), and digits.

    • Use no more than 64 characters.

    Data source

    The drop-down list shows all Impala data sources in Dataphin. It includes data sources for which you have sync-read permission and those for which you do not. Click the image icon to copy the current data source name.

    Source table count

    Select the number of source tables. Options are Single table and Multiple tables:

    • Single table: Use this option when you sync business data from one source table to one target table.

    • Multiple tables: Use this option when you sync business data from multiple source tables to one target table. When writing data from multiple tables to one table, Dataphin uses the union algorithm.

    Table matching method

    You can select only Generic rule.

    Note

    This parameter is available only when you select Multiple tables for Source table count.

    Table

    Select the source table:

    • If you selected Single table for Source table count, enter a keyword to search for a table name. Click the image icon to copy the name of the selected table.

    • If you selected Multiple tables for Source table count, add tables as follows:

      1. In the input box, enter an expression to filter tables with the same structure.

        Supported formats include enumeration, regex-like patterns, and combinations of both. For example: table_[001-100];table_102.

      2. Click Exact match. In the Confirm match details dialog box, review the list of matched tables.

      3. Click Confirm.

    Shard key

    Select a column with an integer data type from the source table as the shard key. We recommend using a primary key or an indexed column as the shard key. During data reading, Dataphin partitions the data by the shard key field to enable concurrent reads. This improves sync efficiency.

    Batch read size

    The number of records to read at a time. To reduce interactions with the data source, improve I/O efficiency, and lower network latency, set a batch read size such as 1024 instead of reading records one by one.

    Input filter

    Set conditions to filter the data to extract. Configure as follows:

    • Use a static field to extract specific data. For example: ds=20210101.

    • Use a variable parameter to extract part of the data. For example: ds=${bizdate}.

    Output fields

    The output fields section lists all fields from the selected table and filtered by the input filter. To exclude fields from downstream components, delete them:

    • Delete one field at a time: Click the sgaga icon in the Actions column to delete extra fields.

    • Delete multiple fields at once: Click Field management. In the Field management dialog box, select multiple fields. Click the left-shift icon image to move the selected input fields to the unselected list. Click OK to complete the bulk deletion.

      image..png

  8. Click Confirm to complete the configuration of the Impala input component.