E-MapReduce (EMR) supports ad hoc queries, which are intended for data scientists and data analysts. You can execute SQL statements to perform ad hoc queries. When you run an ad hoc query job, relevant logs and query results appear in the lower part of the job page. This topic describes how to create, configure, run, and lock a job on the Ad Hoc Queries page in the EMR console.

Background information

You can perform the following operations on the Ad Hoc Queries page:

Prerequisites

A project is created. For more information, see Manage projects.

Create a job

  1. Go to the Data Platform tab.
    1. Log on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Data Platform tab.
  2. In the Projects section, find the project that you created and click Edit Job in the Actions column.
  3. Create a job for an ad hoc query.
    1. On the left side of the page, click the search_temp icon.
    2. In the Ad Hoc Queries pane on the left, right-click the folder in which you want to create a job and select Create Job.
      Note You can also right-click the folder and select Create Subfolder, Rename Folder, or Delete Folder to perform the required operation.
    3. In the Create Interactive Job dialog box, specify Name and Description, and then select a job type from the Job Type drop-down list.
      EMR supports ad hoc queries based on Shell, Spark SQL, Spark Shell, and Hive SQL.
      Notice After the job is created, you cannot change the type of the job.
    4. Click OK.

Configure a job

For more information about how to develop and configure each type of job, see Jobs. This section describes how to configure the parameters of a job on the Basic Settings, Advanced Settings, Shared Libraries, and Alert Settings tabs in the Job Settings panel.

  1. In the upper-right corner of the job page, click Job Settings.
  2. In the Job Settings panel, configure the parameters on the Basic Settings tab.
    Section Parameter and description
    Job Overview
    • Name: the name of the job.
    • Job Type: the type of the job.
    • Description: the description of the job. You can click Edit on the right side of this parameter to modify the description.
    Resources The resources that are required to run the job, such as JAR packages and user-defined functions (UDFs). Click the Plus sign icon on the right to add resources.

    Upload the resources to Object Storage Service (OSS) first. Then, you can add them to the job.

    Configuration Parameters The variables you want to reference in the job script. You can reference a variable in your job script in the format of ${Variable name}.

    Click the Plus sign icon on the right side to add a variable in the key-value pair format. You can select Password to hide the value based on your business requirements. The key indicates the name of the variable. The value indicates the value of the variable. In addition, you can configure a time variable based on the start time of scheduling. For more information, see Configure job time and date.

  3. Click the Advanced Settings tab and configure the parameters.
    Section Parameter and description
    Mode
    • Job Submission Node: the mode to submit the job. For more information, see Job submission modes. Valid values:
      • Worker Node: The job is submitted to YARN by using a launcher, and YARN allocates resources to run the job.
      • Header/Gateway Node: The job runs as a process on the allocated node.
    • Estimated Maximum Duration: the estimated maximum running duration of the job. Valid values: 0 to 10800. Unit: seconds.
    Environment Variables The environment variables that are used to run the job. You can also export environment variables from the job script.
    • Example 1: Configure a Shell job with the code echo ${ENV_ABC}. If you set the ENV_ABC variable to 12345, a value of 12345 is returned after you run the echo command.
    • Example 2: Configure a Shell job with the code java -jar abc.jar. Content of the abc.jar package:
      public static void main(String[] args) {System.out.println(System.getEnv("ENV_ABC"));}
      If you set the ENV_ABC variable to 12345, a value of 12345 is returned after you run the job. The effect of setting the ENV_ABC variable in the Environment Variables section is equivalent to running the following script:
      export ENV_ABC=12345
      java -jar abc.jar
    Scheduling Parameters The parameters used to schedule the job, including Queue, Memory (MB), vCores, Priority, and Run By. If you do not configure these parameters, the default settings of the Hadoop cluster are used.
    Note The Memory (MB) parameter specifies the memory quota for the launcher.
  4. Click the Shared Libraries tab.
    In the Dependent Libraries section, specify Libraries.

    Job execution depends on some library files related to data sources. EMR publishes the libraries to the repository of the scheduling center as dependency libraries. You must specify dependency libraries when you create a job. To specify a dependency library, enter its reference string, such as sharedlibs:streamingsql:datasources-bundle:2.0.0.

  5. Click the Alert Settings tab and configure the alert parameters.
    Parameter Description
    Execution Failed Specifies whether to send a notification to an alert contact group or a DingTalk alert group if the job fails.
    Action on Startup Timeout Specifies whether to send a notification to an alert contact group or a DingTalk alert group if the job startup times out.
    Job execution timed out. Specifies whether to send a notification to an alert contact group or a DingTalk alert group if the job execution times out.

Run a job

  1. Run the job that you created.
    1. On the job page, click Run in the upper-right corner to run the job.
    2. In the Run Job dialog box, select a resource group and the cluster that you created.
    3. Click OK.
  2. View operational logs.
    1. After you run the job, you can view the operational logs on the Log tab in the lower part of the job page.
    2. Click the Records tab to view the execution records of the job instance.
    3. Click Details in the Action column of a job instance to go to the Scheduling Center tab. On this tab, you can view the details about the job instance.

Lock a job

When you edit a job, you can click Lock in the upper-right corner of the job page to lock the job. This way, only the account you use can edit the job. Other members in the project can edit this job only after the job is unlocked.

Note Only the RAM user that locks the job and the Alibaba Cloud account can unlock the job.