This topic describes how to create and run a Hive job in an E-MapReduce (EMR) cluster.
Use Hive to process OSS data
CREATE EXTERNAL TABLE eusers (
userid INT)
LOCATION 'oss://emr/users';
Examples
- Example 1
- Write the following script, save it as hiveSample1.sql, and then upload it to OSS.
For more information, see Upload objects.
USE DEFAULT; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.stats.autogather=false; DROP TABLE emrusers; CREATE EXTERNAL TABLE emrusers ( userid INT, movieid INT, rating INT, unixtime STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION 'oss://${bucket}/yourpath'; SELECT COUNT(*) FROM emrusers; SELECT * from emrusers limit 100; SELECT movieid,count(userid) as usercount from emrusers group by movieid order by usercount desc limit 50;
- Prepare test data.
You can download the following test data and upload it to the destination OSS directory.
Test data: Public test data
- Create a job.
Create a Hive job in the EMR console. For more information, see Configure a Hive job.
Content of the job:-f ossref://${bucket}/yourpath/hiveSample1.sql
In this example,
${bucket}
indicates your OSS bucket, andyourpath
indicates a path in the bucket. Replace yourpath with the path where the Hive script is stored. - Run the job.
Click Run to run the job. You can associate the job with an existing cluster. You can also enable the system to automatically create a cluster and associate the job with the cluster.
- Write the following script, save it as hiveSample1.sql, and then upload it to OSS.
- Example 2
Use scan in HiBench as an example.
- Write the following script, save it as scan.hive, and then upload it to OSS:
USE DEFAULT; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set mapreduce.job.maps=12; set mapreduce.job.reduces=6; set hive.stats.autogather=false; DROP TABLE uservisits; CREATE EXTERNAL TABLE uservisits (sourceIP STRING,destURL STRING,visitDate STRING,adRevenue DOUBLE,userAgent STRING,countryCode STRING,languageCode STRING,searchWord STRING,duration INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE LOCATION 'oss://${bucket}/sample-data/hive/Scan/Input/uservisits';
For example, you can upload the file to oss://emr/jars/.
- Prepare test data.
You can download the following test data and upload it to the destination OSS directory.
Test data: uservisits.
- Create a Hive job in the EMR console. For more information, see Configure a Hive job.
- Run the job.
Click Run to run the job. You can associate the job with an existing cluster. You can also enable the system to automatically create a cluster and associate the job with the cluster.
- Write the following script, save it as scan.hive, and then upload it to OSS: