how to configure the Impala input component - Dataphin - Alibaba Cloud Documentation Center

The Impala input component is utilized for reading data from Impala data sources. When synchronizing data from Impala to other data sources, it is necessary to configure the Impala input component to access the data source before setting up the target data source for synchronization. This topic describes the configuration process for the Impala input component.

Prerequisites

An Impala data source has been created. For more information, see Create an Impala data source.
The account configuring the Impala input component properties must possess read-through permission for the data source. If you lack this permission, you must obtain it from the data source. For more information, see Request, renew, and return data source permissions.

Procedure

On the Dataphin home page, select Development > Data Integration from the top menu bar.
On the integration page, select Project from the top menu bar (Dev-Prod mode requires selecting the environment).
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline you want to develop to open its configuration page.
Click the Component Library in the upper-right corner of the page to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Input. Locate the Impala component within the right-side list of input components and drag it onto the canvas.
Click the icon in the Impala input component card to open the Impala Input Configuration dialog box.

In the Impala Input Configuration dialog box, you can configure the parameters.

Parameter	Description
Step Name	This is the name of the Teradata input component. Dataphin automatically generates the step name, but you can modify it according to the business scenario. The naming convention is as follows: Can only contain Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
Data Source	The data source drop-down list displays all Impala-type data sources in the current Dataphin, including data sources for which you have read-through permission and those for which you do not. Click the icon to copy the name of the current data source. For data sources for which you do not have read-through permission, you can request read permission for the corresponding data source. For specific operations to request data source read permission, see Request, renew, and return data source permissions. If you do not yet have an Impala-type data source, click Create to create a data source. For specific operations, see Create an Impala data source.
Source Table Quantity	Select the source table quantity. The source table quantity includes Single Table and Multiple Tables: Single Table: Suitable for scenarios where business data from one table is synchronized to one target table. Multiple Tables: Suitable for scenarios where business data from multiple tables is synchronized to the same target table. When data from multiple tables is written to the same data table, the union algorithm is used.
Table	Select the source table: If Source Table Quantity is set to Single Table, you can enter a table name keyword to search. Click the icon to copy the name of the currently selected table. If Source Table Quantity is set to Multiple Tables, perform the following operations to add tables. In the input box, enter the expression of the table to filter tables with the same structure. The system supports enumeration form, class regular form, and mixed form. For example, `table_[001-100];table_102`. Click Precise Search to view the list of matched tables in the Confirm Match Details dialog box. Click Confirm.
Shard Key	You can use a column with the integer data type in the source data table as the shard key. It is recommended to use the primary key or a column with an index as the shard key. When reading data, data sharding is performed based on the configured shard key field to achieve concurrent reading, which can improve data synchronization efficiency.
Batch Read Count	The number of data records read at one time. When reading data from the source database, you can configure a specific batch read count (such as 1024 records) instead of reading one by one to reduce the number of interactions with the data source, improve I/O efficiency, and reduce network latency.
Input Filter	Configure the filter conditions for extracting data. The configuration instructions are as follows: Configure a static field to extract the corresponding data, such as `ds=20210101`. Configure variable parameters to extract a portion of the data, such as `ds=${bizdate}`.
Output Fields	The output fields area displays all fields hit by the selected table and filter conditions. If you do not need to output certain fields to downstream components, you can delete the corresponding fields: Single Field Deletion Scenario: If you need to delete a small number of fields, you can click the icon under the operation column to delete the extra fields. Batch Field Deletion Scenario: If you need to delete many fields, you can click Field Management, select multiple fields in the Field Management dialog box, then click the shift left icon to move the selected input fields to unselected input fields and click Confirm to complete the batch deletion of fields.

Click Confirm to finalize the property configuration for the Impala Input Component.