When users are applying for a cluster in E-MapReduce, they are provided with a Hive environment by default. Users can directly create and perform their table and data by using Hive. Operation steps are as follows:
Prepare the Hive script in advance, for example:
DROP TABLE uservisits;
CREATE EXTERNAL TABLE IF NOT EXISTS uservisits (sourceIP STRING,destURL STRING,visitDate STRING,adRevenue DOUBLE,user
Agent STRING,countryCode STRING,languageCode STRING,searchWord STRING,duration INT ) ROW FORMAT DELIMITED FIELDS TERMI
NATED BY ',' STORED AS SEQUENCEFILE LOCATION '/HiBench/Aggregation/Input/uservisits';
DROP TABLE uservisits_aggre;
CREATE EXTERNAL TABLE IF NOT EXISTS uservisits_aggre ( sourceIP STRING, sumAdRevenue DOUBLE) STORED AS SEQUENCEFILE LO
INSERT OVERWRITE TABLE uservisits_aggre SELECT sourceIP, SUM(adRevenue) FROM uservisits GROUP BY sourceIP;
Save this script into a script file, such as “uservisits_aggre_hdfs.hive”, and then upload it to an OSS directory (for example: oss://path/to/uservisits_aggre_hdfs.hive).
Log on to
Alibaba Cloud E-MapReduce Console Job List.
Click Create a job in the upper right corner to enter the job creation page.
Input the job name.
Select the Hive job type to create a Hive job. This type of job is submitted in the background by using the following process:
hive [user provided parameters]
Fill in the Parameters option box with parameters subsequent to Hive commands. For example, if it is necessary to use a Hive script uploaded to OSS, the following must be filled in:
You can also click Select OSS path to view and select from OSS, the system will automatically complete the absolute path of Hive script on OSS. Switch the Hive script prefix to “ossref” (click Switch resource type) to guarantee E-MapReduce downloading this file correctly.
Select the policy for failed operations.
Click OK to complete the Hive job definition.