Hive Writer writes data to Hadoop Distributed File System (HDFS) and loads the data to Hive. This topic describes how Hive Writer works, the parameters that are supported by Hive Writer, and how to configure Hive Writer by using the codeless user interface (UI) and code editor.
Background information
- Hive stores processed data in HDFS.
- Hive uses MapReduce programs to analyze data at the underlying layer.
- Hive runs MapReduce programs on Yarn.
How it works
Hive Writer connects to a Hive metastore and obtains the storage path, format, and column delimiter of the HDFS file to which you want to write data. Then, Hive Writer writes data to the HDFS file and loads the data in the HDFS file to the destination Hive table by using Java Database Connectivity (JDBC).
The underlying logic of Hive Writer is the same as that of HDFS Writer. You can set parameters for HDFS Writer in the parameters of Hive Writer. Data Integration transparently transmits the configured parameters to HDFS Writer.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The name of the data source. It must be the same as the name of the added data source. | Yes | N/A |
column | The names of the columns to which you want to write data, such as "column": ["id", "name"] .
|
Yes | N/A |
table | The name of the Hive table to which you want to write data.
Note The name is case-sensitive.
|
Yes | N/A |
partition | The partition information in the Hive table.
|
No | N/A |
writeMode | The mode in which Hive Writer loads data to the Hive table. After data is written
to an HDFS file, Hive Writer executes the LOAD DATA INPATH (overwrite) INTO TABLE statement to load data to the Hive table.
This parameter specifies the mode in which Hive Writer loads data from the HDFS file
to the Hive table. Valid values:
Note Set this parameter with caution. Pay attention to the destination directory and the
value of this parameter to prevent data from being unexpectedly deleted.
This parameter and the hiveConfig parameter must be used in pairs. |
Yes | N/A |
hiveConfig | The extended parameters for Hive. Valid values: hiveCommand, jdbcUrl, username, and password.
|
Yes | N/A |
Configure Hive Writer by using the codeless UI
On the DataStudio page, double-click the sync node that you created. On the node configuration tab that appears, set the parameters for the node. For more information, see Configure a sync node by using the codeless UI.
- Configure the source and destination.
Configure the source and destination data sources for the sync node.
Parameter Description Connection The name of the data source to which you want to write data. This parameter is equivalent to the datasource parameter that is described in the preceding section. Table The name of the table to which you want to write data. This parameter is equivalent to the table parameter that is described in the preceding section. Partition Key Column The partition to which Hive Writer writes data. You must specify the lowest-level partition. Hive Writer can write data only to a single partition. Write Mode The write mode. This parameter is equivalent to the writeMode parameter that is described in the preceding section. parquet shchema The schema of the Hive table of the parquet type. The following example shows the configurations:
This parameter is required only if the Hive table at the underlying layer is of the parquet type.message tmp{ REQUIRED BINARY id; REQUIRED BINARY name; REQUIRED BINARY cyctime1; }
- Configure field mappings. This operation is equivalent to setting the column parameter that is described in the preceding section. Fields in the source on the
left have a one-to-one mapping with fields in the destination on the right.
Operation Description Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match. Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match. Delete All Mappings Click Delete All Mappings to remove the mappings that are established. Auto Layout Click Auto Layout. Then, the system automatically sorts the fields based on specific rules. - Configure channel control policies.
Parameter Description Expected Maximum Concurrency The maximum number of parallel threads that the sync node uses to read data from the source or write data to the destination. You can configure the parallelism for the sync node on the codeless UI. Bandwidth Throttling Specifies whether to enable bandwidth throttling. You can enable bandwidth throttling and specify a maximum transmission rate to prevent heavy read workloads on the source. We recommend that you enable bandwidth throttling and set the maximum transmission rate to an appropriate value based on the configurations of the source. Dirty Data Records Allowed The maximum number of dirty data records that are allowed.
Configure Hive Writer by using the code editor
{
"type": "job",
"steps": [
{
"stepType": "hive",
"parameter": {
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "hive",
"parameter": {
"partition": "year=a,month=b,day=c", // The partition information of the Hive table.
"datasource": "hive_ha_shanghai", // The name of the data source.
"table": "partitiontable2", // The Hive table to which you want to write data.
"column": [ // The names of the columns to which you want to write data.
"id",
"name",
"age"
],
"writeMode": "append" // The mode in which Hive Writer loads data to the Hive table.
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"throttle":true,// Specifies whether to enable bandwidth throttling. A value of false indicates that bandwidth throttling is disabled, and a value of true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only if the throttle parameter is set to true.
"concurrent": 2, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
}
}
Hive versions supported by Hive Writer
0.8.0
0.8.1
0.9.0
0.10.0
0.11.0
0.12.0
0.13.0
0.13.1
0.14.0
1.0.0
1.0.1
1.1.0
1.1.1
1.2.0
1.2.1
1.2.2
2.0.0
2.0.1
2.1.0
2.1.1
2.2.0
2.3.0
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
3.0.0
3.1.0
3.1.1
3.1.2
0.8.1-cdh4.0.0
0.8.1-cdh4.0.1
0.9.0-cdh4.1.0
0.9.0-cdh4.1.1
0.9.0-cdh4.1.2
0.9.0-cdh4.1.3
0.9.0-cdh4.1.4
0.9.0-cdh4.1.5
0.10.0-cdh4.2.0
0.10.0-cdh4.2.1
0.10.0-cdh4.2.2
0.10.0-cdh4.3.0
0.10.0-cdh4.3.1
0.10.0-cdh4.3.2
0.10.0-cdh4.4.0
0.10.0-cdh4.5.0
0.10.0-cdh4.5.0.1
0.10.0-cdh4.5.0.2
0.10.0-cdh4.6.0
0.10.0-cdh4.7.0
0.10.0-cdh4.7.1
0.12.0-cdh5.0.0
0.12.0-cdh5.0.1
0.12.0-cdh5.0.2
0.12.0-cdh5.0.3
0.12.0-cdh5.0.4
0.12.0-cdh5.0.5
0.12.0-cdh5.0.6
0.12.0-cdh5.1.0
0.12.0-cdh5.1.2
0.12.0-cdh5.1.3
0.12.0-cdh5.1.4
0.12.0-cdh5.1.5
0.13.1-cdh5.2.0
0.13.1-cdh5.2.1
0.13.1-cdh5.2.2
0.13.1-cdh5.2.3
0.13.1-cdh5.2.4
0.13.1-cdh5.2.5
0.13.1-cdh5.2.6
0.13.1-cdh5.3.0
0.13.1-cdh5.3.1
0.13.1-cdh5.3.2
0.13.1-cdh5.3.3
0.13.1-cdh5.3.4
0.13.1-cdh5.3.5
0.13.1-cdh5.3.6
0.13.1-cdh5.3.8
0.13.1-cdh5.3.9
0.13.1-cdh5.3.10
1.1.0-cdh5.3.6
1.1.0-cdh5.4.0
1.1.0-cdh5.4.1
1.1.0-cdh5.4.2
1.1.0-cdh5.4.3
1.1.0-cdh5.4.4
1.1.0-cdh5.4.5
1.1.0-cdh5.4.7
1.1.0-cdh5.4.8
1.1.0-cdh5.4.9
1.1.0-cdh5.4.10
1.1.0-cdh5.4.11
1.1.0-cdh5.5.0
1.1.0-cdh5.5.1
1.1.0-cdh5.5.2
1.1.0-cdh5.5.4
1.1.0-cdh5.5.5
1.1.0-cdh5.5.6
1.1.0-cdh5.6.0
1.1.0-cdh5.6.1
1.1.0-cdh5.7.0
1.1.0-cdh5.7.1
1.1.0-cdh5.7.2
1.1.0-cdh5.7.3
1.1.0-cdh5.7.4
1.1.0-cdh5.7.5
1.1.0-cdh5.7.6
1.1.0-cdh5.8.0
1.1.0-cdh5.8.2
1.1.0-cdh5.8.3
1.1.0-cdh5.8.4
1.1.0-cdh5.8.5
1.1.0-cdh5.9.0
1.1.0-cdh5.9.1
1.1.0-cdh5.9.2
1.1.0-cdh5.9.3
1.1.0-cdh5.10.0
1.1.0-cdh5.10.1
1.1.0-cdh5.10.2
1.1.0-cdh5.11.0
1.1.0-cdh5.11.1
1.1.0-cdh5.11.2
1.1.0-cdh5.12.0
1.1.0-cdh5.12.1
1.1.0-cdh5.12.2
1.1.0-cdh5.13.0
1.1.0-cdh5.13.1
1.1.0-cdh5.13.2
1.1.0-cdh5.13.3
1.1.0-cdh5.14.0
1.1.0-cdh5.14.2
1.1.0-cdh5.14.4
1.1.0-cdh5.15.0
1.1.0-cdh5.16.0
1.1.0-cdh5.16.2
1.1.0-cdh5.16.99
2.1.1-cdh6.1.1
2.1.1-cdh6.2.0
2.1.1-cdh6.2.1
2.1.1-cdh6.3.0
2.1.1-cdh6.3.1
2.1.1-cdh6.3.2
2.1.1-cdh6.3.3