This topic provides answers to some frequently asked questions about Hive.

What do I do if jobs are in the waiting state for a long period of time?

Perform the following steps to identify the issue:
  1. Go to the Public Connect Strings page of the E-MapReduce (EMR) console and click the link in the Connect String column that corresponds to YARN UI.
  2. Click the ID of an application.
  3. Click the link of Tracking URL.
    Multiple jobs are in the waiting state. Tracking URL
  4. In the left-side navigation pane, click Scheduler.
    You can check whether the resources in the queue are fully occupied or whether the current job consumes a large amount of time. If the queue has no sufficient resources, you can switch the jobs that are in the waiting state from the current queue to an idle queue. If the current job consumes a large amount of time, optimize the code. Scheduler

What do I do if small files are read in the map stage?

Perform the following steps to check whether small files are read in the map stage:
  1. Go to the Public Connect Strings page of the EMR console and click the link in the Connect String column that corresponds to YARN UI.
  2. Click the ID of an application.
    You can view the size of the data that is read in each map task on the Map tasks page. The size of the data that is read is two bytes, as shown in the following figure. If the size of data in files that are read in most map tasks is small, merge the small files. Map

    You can also view more information in the log of each map task.

What do I do if reduce tasks consume a large amount of time?

Perform the following steps to check whether the issue is caused by a data skew:
  1. Go to the Public Connect Strings page of the EMR console and click the link in the Connect String column that corresponds to YARN UI.
  2. Click the ID of an application.
  3. On the Reduce tasks page, sort reduce tasks by completion time in descending order and then find the top reduce tasks that consume the longest time in execution. Reduce tasks
  4. Click the name of a top reduce task.
  5. In the left-side navigation pane of the task details page, click Counters. Counters
    View the values of the Reduce input records and Reduce shuffle bytes metrics in the current reduce task. If the values of the two metrics are greater than the values of the two metrics in other tasks, a data skew occurs. Reduce input records