This topic describes how Hive Reader works, the supported parameters, and how to configure it by using the code editor.

Hive is a Hadoop-based data warehouse tool used to process large amounts of structured logs. Hive maps structured data files to a table and allows you to run SQL statements to query data in the table.

Essentially, Hive converts Hive Query Language (HQL) or SQL statements to MapReduce programs.
  • Hive stores processed data in a Hadoop Distributed File System (HDFS).
  • Hive uses MapReduce programs to analyze data at the underlying layer.
  • Hive runs MapReduce programs on Yarn.

How it works

Hive Writer accesses a Hive metadatabase, parses the configuration to obtain the file storage path, file format, and column delimiter of the HDFS file corresponding to the Hive table from which data is read, and reads data from the HDFS file.

The underlying logic of Hive Reader is the same as that of HDFS Reader. You can configure parameters of HDFS Reader in the parameters of Hive Reader. Data Integration transparently transmits the configured parameters to HDFS Reader.

Parameters

Parameter Description Required Default value
jdbcUrl The Java Database Connectivity (JDBC) URL for connecting to the Hive metadatabase. Currently, Hive Reader can only access Hive metadatabases of the MySQL type.

Make sure that the sync node is connected to the Hive metadatabase over a network and has access permissions on the metadatabase.

Yes None
username The username for connecting to the Hive metadatabase. Yes None
password The password for connecting to the Hive metadatabase. Yes None
column The columns to read. Example: "column": ["id", "name"].
  • Column pruning is supported. You can select and export specific columns.
  • Change of the column order is supported. You can export the columns in an order different from that specified in the schema of the table.
  • Constants are supported.
  • The column parameter must explicitly specify a set of columns to be synchronized. The parameter cannot be left empty.
Yes None
table The name of the source Hive table.
Note The name is case-sensitive.
Yes None
partition
  • The partition in the source Hive table. This parameter is required for a partitioned Hive table. The sync node reads data from the partition specified by the partition parameter.
  • This parameter is not required for a non-partitioned table.
No None

Configure Hive Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for Hive Reader.

Configure Hive Reader by using the code editor

In the following code, a node is configured to read data from a Hive metadatabase.
{
    "order": {
    "hops": [
    {
    "from": "Reader",
    "to": "Writer"
    }
    ]
    },
    "setting": {
    "errorLimit": {
    "record": "0"
    },
    "speed": {
    "concurrent": 1,// The maximum number of concurrent threads.
    "throttle": false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
    }
    },
    "steps": [
    {
    "category": "reader",
    "name": "Reader",
    "parameter": {
    "username": "",  // The username for connecting to the Hive metadatabase.
    "password": "", // The password for connecting to the Hive metadatabase.
    "jdbcUrl": "jdbc:mysql://host:port/database", // The JDBC URL for connecting to the Hive metadatabase.          
    "table": "",  // The name of the source Hive table.
    "partition": "",  // The partition in the source Hive table. This parameter is required for a partitioned Hive table.
    "column": [
    "id",
    "name"
    ]
    },
    "stepType": "hive"// The reader type.
    },
    {
    "category": "writer",
    "name": "Writer",
    "parameter": {},
    "stepType": "stream"
    }
    ],
    "type": "job",
    "version": "2.0",// The version number.
    }