Write Data to Impala Using the Output Component - Dataphin

The Impala output component enables data writing to an Impala data source. When synchronizing data from other sources to an Impala data source, it's necessary to configure the Impala output component after setting up the source data source information. This topic guides you through the configuration process.

Prerequisites

An Impala data source has been created. For more information, see Create an Impala data source.
The account configuring the Impala output component properties must possess write-through permission for the data source. If you lack this permission, you need to request it for the data source. For more information, see Request, renew, and return data source permissions.

Procedure

On the Dataphin home page, select Development > Data Integration from the top menu bar.
In the integration page's top menu bar, select Project (Dev-Prod mode requires selecting Environment).
In the navigation pane on the left, click Batch Pipeline, and in the Batch Pipeline list, click the offline pipeline you want to develop to open its configuration page.
Click Component Library in the upper right corner of the page to open the Component Library panel.
In the Component Library panel's left-side navigation pane, select Output, find the Impala component in the list on the right, and drag it to the canvas.
Click and drag the icon from the target input, transform, or flow component to connect it to the Impala output component.
Click the icon on the Impala output component card to open the Impala Output Configuration dialog box.

In the Impala Output Configuration dialog box, configure the parameters.

Parameter		Description
Basic settings	Step name	This is the name of the Impala output component. Dataphin automatically generates the step name, and you can also modify it according to the business scenario. The name must meet the following requirements: Can only contain Chinese characters, letters, underscores (_), and numbers. Cannot exceed 64 characters.
	Data source	In the data source drop-down list, all Impala-type data sources are displayed, including data sources for which you have write-through permission and those for which you do not. For data sources without write-through permission, you can click Request after the data source to request write-through permission. For more information, see Request data source permissions. If you do not have an Impala-type data source, click Create Data Source to create a data source. For more information, see Create an Impala data source.
	Table	Select the target table for the output data. Click the icon to copy the name of the currently selected table.
	Loading policy	Impala only supports the append policy and does not support the overwrite policy. Under the append data policy, a dirty data fault will be prompted when there is a primary key or constraint violation.
	Batch write data volume	The size of the data volume written at one time. You can also set Batch Write Count. The system will write according to the limit reached first among the two configurations. The default is 32M.
	Batch write count	The default is 2048 entries. When data synchronization is written, a batch write strategy is adopted. The parameters set include Batch Write Count and Batch Write Data Volume. When the accumulated data volume reaches any of the set limits (that is, the batch write data volume or count limit is reached), the system will consider a batch of data to be full and will immediately write this batch of data to the target end at one time. It is recommended to set the batch write data volume to 32 MB. For the upper limit of batch insert count, you can flexibly adjust according to the actual size of a single record. It is usually set to a larger value to fully utilize the advantages of batch writing. For example, if the size of a single record is about 1 KB, you can set the batch insert byte size to 16 MB. Considering this condition, set the batch insert count to be greater than the result of 16 MB divided by the size of a single record, 1 KB (that is, greater than 16384 entries). Here, it is assumed to be set to 20000 entries. After such configuration, the system will trigger the batch write operation based on the batch insert byte size. Each time the accumulated data volume reaches 16 MB, a write action will be executed.
Field mapping	Input field	Displays the input fields based on the output of the upstream component.
	Output field	Displays the output fields. Click Field Management to select output fields. Click the icon to move the Selected Input Fields to Unselected Input Fields. Click the icon to move the Unselected Input Fields to Selected Input Fields.
	Mapping relationship	Based on the input of the upstream and the fields of the target table, you can manually select field mapping. Quick Mapping includes Row Mapping and Name Mapping. Name Mapping: Maps fields with the same field name. Row Mapping: The field names of the source table and target table are inconsistent, but the data in the corresponding rows of the fields need to be mapped. Only fields in the same row are mapped.

Click OK to finalize the property configuration for the Impala Output Component.