Hive Reader allows you to read data from Hive. This topic describes how Hive Reader works, the parameters that are supported by Hive Reader, and how to configure Hive Reader.
Background information
- Hive stores processed data in Hadoop Distributed File System (HDFS).
- Hive uses MapReduce programs to analyze data at the underlying layer.
- Hive runs MapReduce programs on Yarn.
How it works
Hive Reader connects to a Hive metadatabase and parses the configuration to obtain the storage path, format, and column delimiter of the HDFS file corresponding to your Hive table. Then, Hive Reader reads data from the HDFS file.
- Read data based on HDFS files
Hive Reader connects to Hive Metastore and parses the configuration to obtain the storage path, format, and column delimiter of the HDFS file corresponding to your Hive table. Then, Hive Reader reads data from the HDFS file.
The underlying logic of Hive Reader is the same as that of HDFS Reader. After Hive Reader reads data, it synchronizes data from the HDFS file to the destination table by using Hive JDBC. You can configure the parameters of HDFS Reader in the parameters of Hive Reader. Data Integration transparently transmits the configured parameters to HDFS Reader.
- Read data based on Hive JDBC
Hive Reader connects to HiveServier2 by using Hive JDBC to read data. Hive Reader allows you to specify WHERE clauses to filter data and execute SQL statements to read data.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The name of the connection. It must be the same as the name of a created connection. | Yes | N/A |
table | The name of the Hive table to be synchronized.
Note The name is case-sensitive.
|
Yes | N/A |
readMode | The mode in which data is read.
|
No | N/A |
partition | The partitions in the Hive table.
|
No | N/A |
column | The columns to be read. Example: "column": ["id", "name"] .
|
Yes | N/A |
querySql | If Hive Reader reads data based on Hive JDBC, you can set the querySql parameter to read data. | No | N/A |
where | If Hive Reader reads data based on Hive JDBC, you can specify the WHERE clause to filter data. | No | N/A |
Codeless UI mode
On the DataStudio page, double-click a sync node and configure the node on the node configuration tab on the right. For more information, see Create a sync node by using the codeless UI.
- Configure the connections.
Configure the connections to the source and destination data stores for the sync node.
Parameter Description Connection The datasource parameter in the preceding parameter description. Select the name of a connection that you have configured in DataWorks. Table The table parameter in the preceding parameter description. Read Hive method The readMode parameter in the preceding parameter description. Valid values: Read data based on HDFS file and Read data based on Hive Jdbc (supports data condition filtering). Partition Key Column The partition parameter in the preceding parameter description. Note This parameter is displayed only when you set the Read Hive method parameter to Read data based on HDFS file.Data filter The filter condition for the data to be synchronized. Filtering based on the limit keyword is not supported. The SQL syntax is determined by the selected connection. Note This parameter is displayed only when you set the Read Hive method parameter to Read data based on Hive Jdbc (supports data condition filtering).Data query querySql The querySql parameter in the preceding parameter description. Note This parameter is displayed only when you set the Read Hive method parameter to Read data based on Hive Jdbc (supports data condition filtering). - Configure field mapping. It is equivalent to setting the column parameter in the preceding parameter description.
Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right. You can click Add to add a field. To remove a field, move the pointer over the field and click the Remove icon.
Parameter Description Map Fields with the Same Name Click Map Fields with the Same Name to establish a mapping between fields with the same name. The data types of the fields must match. Map Fields in the Same Line Click Map Fields in the Same Line to establish a mapping between fields in the same row. The data types of the fields must match. Delete All Mappings Click Delete All Mappings to remove mappings that have been established. Auto Layout You can sort the fields based on specified rules. Change Fields You can manually edit fields in the source table. Each field occupies a row. The first and the last blank rows are included, whereas other blank rows are ignored. Add Click Add to add a field. Take note of the following rules when you add a field: - You can enter constants. Each constant must be enclosed in single quotation marks (' '), such as 'abc' and '123'.
- You can use scheduling parameters, such as ${bizdate}.
- You can enter functions that are supported by relational databases, such as now() and count(1).
- Fields that cannot be parsed are indicated by Unidentified.
- You can enter the name of a partition key column to synchronize the partition key column.
- Configure channel control policies.
Parameter Description Expected Maximum Concurrency The maximum number of concurrent threads that the sync node uses to read data from the source data store and write data to the destination data store. You can configure the concurrency for the sync node on the codeless user interface (UI). Bandwidth Throttling You can enable bandwidth throttling and set a maximum transmission rate to avoid heavy read workload of the source data store. We recommend that you enable bandwidth throttling and set the maximum transmission rate to a proper value based on the configurations of the source data store. Dirty Data Records Allowed The maximum number of dirty data records that are allowed.
Code editor mode
- Read data based on HDFS files
{ "type": "job", "steps": [ { "stepType": "hive", "parameter": { "partition": "pt1=a,pt2=b,pt3=c", // The partitions in the Hive table. "datasource": "hive_not_ha_****", // The name of the connection. "column": [ // The columns to be read. "id", "pt2", "pt1" ], "readMode": "hdfs", // The mode in which data is read. "table": "part_table_1" }, "name": "Reader", "category": "reader" }, { "stepType": "hive", "parameter": { }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" // The maximum number of dirty data records that are allowed. }, "speed": { "concurrent": 2, // The maximum number of concurrent threads. "throttle": false } } }
- Read data based on Hive JDBC
{ "type": "job", "steps": [ { "stepType": "hive", "parameter": { "querySql": "select id,name,age from part_table_1 where pt2='B'", "datasource": "hive_not_ha_****", // The name of the connection. "column": [ // The columns to be read. "id", "name", "age" ], "where": "", "table": "part_table_1", "readMode": "jdbc" // The mode in which data is read. }, "name": "Reader", "category": "reader" }, { "stepType": "hive", "parameter": { }, "name": "Writer", "category": "writer" } ], "version": "2.0", "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] }, "setting": { "errorLimit": { "record": "" }, "speed": { "throttle": false, "concurrent": 2 // The maximum number of concurrent threads. } } }
Configure the resource group for Data Integration

Supported versions
0.8.0
0.8.1
0.9.0
0.10.0
0.11.0
0.12.0
0.13.0
0.13.1
0.14.0
1.0.0
1.0.1
1.1.0
1.1.1
1.2.0
1.2.1
1.2.2
2.0.0
2.0.1
2.1.0
2.1.1
2.2.0
2.3.0
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
3.0.0
3.1.0
3.1.1
3.1.2
0.8.1-cdh4.0.0
0.8.1-cdh4.0.1
0.9.0-cdh4.1.0
0.9.0-cdh4.1.1
0.9.0-cdh4.1.2
0.9.0-cdh4.1.3
0.9.0-cdh4.1.4
0.9.0-cdh4.1.5
0.10.0-cdh4.2.0
0.10.0-cdh4.2.1
0.10.0-cdh4.2.2
0.10.0-cdh4.3.0
0.10.0-cdh4.3.1
0.10.0-cdh4.3.2
0.10.0-cdh4.4.0
0.10.0-cdh4.5.0
0.10.0-cdh4.5.0.1
0.10.0-cdh4.5.0.2
0.10.0-cdh4.6.0
0.10.0-cdh4.7.0
0.10.0-cdh4.7.1
0.12.0-cdh5.0.0
0.12.0-cdh5.0.1
0.12.0-cdh5.0.2
0.12.0-cdh5.0.3
0.12.0-cdh5.0.4
0.12.0-cdh5.0.5
0.12.0-cdh5.0.6
0.12.0-cdh5.1.0
0.12.0-cdh5.1.2
0.12.0-cdh5.1.3
0.12.0-cdh5.1.4
0.12.0-cdh5.1.5
0.13.1-cdh5.2.0
0.13.1-cdh5.2.1
0.13.1-cdh5.2.2
0.13.1-cdh5.2.3
0.13.1-cdh5.2.4
0.13.1-cdh5.2.5
0.13.1-cdh5.2.6
0.13.1-cdh5.3.0
0.13.1-cdh5.3.1
0.13.1-cdh5.3.2
0.13.1-cdh5.3.3
0.13.1-cdh5.3.4
0.13.1-cdh5.3.5
0.13.1-cdh5.3.6
0.13.1-cdh5.3.8
0.13.1-cdh5.3.9
0.13.1-cdh5.3.10
1.1.0-cdh5.3.6
1.1.0-cdh5.4.0
1.1.0-cdh5.4.1
1.1.0-cdh5.4.2
1.1.0-cdh5.4.3
1.1.0-cdh5.4.4
1.1.0-cdh5.4.5
1.1.0-cdh5.4.7
1.1.0-cdh5.4.8
1.1.0-cdh5.4.9
1.1.0-cdh5.4.10
1.1.0-cdh5.4.11
1.1.0-cdh5.5.0
1.1.0-cdh5.5.1
1.1.0-cdh5.5.2
1.1.0-cdh5.5.4
1.1.0-cdh5.5.5
1.1.0-cdh5.5.6
1.1.0-cdh5.6.0
1.1.0-cdh5.6.1
1.1.0-cdh5.7.0
1.1.0-cdh5.7.1
1.1.0-cdh5.7.2
1.1.0-cdh5.7.3
1.1.0-cdh5.7.4
1.1.0-cdh5.7.5
1.1.0-cdh5.7.6
1.1.0-cdh5.8.0
1.1.0-cdh5.8.2
1.1.0-cdh5.8.3
1.1.0-cdh5.8.4
1.1.0-cdh5.8.5
1.1.0-cdh5.9.0
1.1.0-cdh5.9.1
1.1.0-cdh5.9.2
1.1.0-cdh5.9.3
1.1.0-cdh5.10.0
1.1.0-cdh5.10.1
1.1.0-cdh5.10.2
1.1.0-cdh5.11.0
1.1.0-cdh5.11.1
1.1.0-cdh5.11.2
1.1.0-cdh5.12.0
1.1.0-cdh5.12.1
1.1.0-cdh5.12.2
1.1.0-cdh5.13.0
1.1.0-cdh5.13.1
1.1.0-cdh5.13.2
1.1.0-cdh5.13.3
1.1.0-cdh5.14.0
1.1.0-cdh5.14.2
1.1.0-cdh5.14.4
1.1.0-cdh5.15.0
1.1.0-cdh5.16.0
1.1.0-cdh5.16.2
1.1.0-cdh5.16.99
2.1.1-cdh6.1.1
2.1.1-cdh6.2.0
2.1.1-cdh6.2.1
2.1.1-cdh6.3.0
2.1.1-cdh6.3.1
2.1.1-cdh6.3.2
2.1.1-cdh6.3.3