E-MapReduce (EMR) provides a Hive environment. You can use Hive to create tables and perform operations on the tables and the data in them.
Prerequisites
- A project is created. For more information, see Manage projects.
- A Hive SQL script, for example, uservisits_aggre_hdfs.hive, is uploaded to a path
in OSS, such as oss://path/to/.
Content of uservisits_aggre_hdfs.hive:
USE DEFAULT; DROP TABLE uservisits; CREATE EXTERNAL TABLE IF NOT EXISTS uservisits (sourceIP STRING,destURL STRING,visitDate STRING,adRevenue DOUBLE,userAgent STRING,countryCode STRING,languageCode STRING,searchWord STRING,duration INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE LOCATION '/HiBench/Aggregation/Input/uservisits'; DROP TABLE uservisits_aggre; CREATE EXTERNAL TABLE IF NOT EXISTS uservisits_aggre (sourceIP STRING, sumAdRevenue DOUBLE) STORED AS SEQUENCEFILE LOCATION '/HiBench/Aggregation/Output/uservisits_aggre'; INSERT OVERWRITE TABLE uservisits_aggre SELECT sourceIP, SUM(adRevenue) FROM uservisits GROUP BY sourceIP;
Procedure
- Go to the Data Platform tab.
- Log on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
- In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
- Click the Data Platform tab.
- In the Projects section, find your project and click Edit Job in the Actions column.
- Create a Hive job.
- Edit job content.