This topic describes how to create a Hive SQL offline computing task in Dataphin.
Background information
You can use Hive SQL computing tasks to process existing data and generate new data that meets your business requirements.
Procedure
In the top menu bar of the Dataphin homepage, choose Development > Data Development.
On the Develop page, select a project from the top menu bar. In Dev-Prod mode, you also need to select an environment.
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the
icon and select Hive SQL.In the Create Hive SQL Task dialog box, configure the following parameters.
Parameter
Description
Task Name
Enter a name for the offline computing task.
The name can be up to 256 characters long. It cannot contain vertical bars (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or double quotation marks (").
Schedule Type
Select a schedule type for the task. The Schedule Type can be one of the following:
Recurring Task: The task runs automatically on a recurring schedule.
Manual Task: This task is triggered manually.
Select Directory
Select the directory to store the task.
If the required directory does not exist, create a new folder as follows:
Above the task list on the left, click the
icon to open the Create Folder dialog box.In the Create Folder dialog box, enter a Name for the folder and select a location in Select Directory as needed.
Click OK.
Use Template
Turn on the Use Template switch to use a code template. If you turn on this switch, you must also select a Template and Template Version.
Using a code template helps improve development efficiency. The code in a template task is read-only. You only need to configure the template parameters. For more information, see Create an offline computing template.
Description
Enter a brief description of the task. The description can be up to 1,000 characters long.
Click OK.
In the code editor for the Hive SQL task, write the code for the offline computing task. Then, click Precompile above the code editor to check the syntax of your Hive SQL code.
After the code is precompiled, click Run above the code editor.
In the sidebar, click Property to configure the task Properties. The properties include Basic Information, Runtime Parameter, Scheduling Property (for auto triggered tasks), Schedule Dependency (for auto triggered tasks), Runtime Configuration, and Resource Configuration.
Basic Information
Configure basic information for the task, such as its name, owner, and description. For more information, see Configure basic information for a task.
Runtime Parameter
If your task uses parameter variables, you can assign values to them in this section. When the node is scheduled, the parameter variables are automatically replaced with their assigned values. For more information, see Configure and use node parameters.
Scheduling Property (for auto triggered tasks)
If the schedule type of the offline computing task is Recurring Task, you must configure its scheduling properties in addition to the Basic Information. For more information, see Configure scheduling properties.
Schedule Dependency (for auto triggered tasks)
If the schedule type of the offline computing task is Recurring Task, you must configure its schedule dependencies in addition to the Basic Information. For more information, see Configure schedule dependencies.
Runtime Configuration
Configure a task-level runtime timeout and a retry policy for failed runs, as needed. If you do not configure these settings, the task inherits the default settings from the tenant. For more information, see Configure runtime settings for a computing task.
Resource Configuration
Configure a scheduling resource group for the current computing task. When the task is scheduled, it uses the resource quota of this resource group. For more information, see Configure resources for a computing task.
Save and submit the task.
Click the
icon above the code editor to save the code.Click the
icon above the code editor to submit the code.
On the Submitting Log page, confirm the Submission Content and the results of the Pre-check. Then, add comments. For more information, see Submit an offline computing task.
Click Confirm and Submit.
What to do next
If you are using the Dev-Prod mode, you must publish the task to the production environment from the release list after it is submitted. For more information, see Manage release tasks.
In Basic mode, submitted Hive SQL tasks can be scheduled in the production environment. You can view your published tasks in the Operation Center. For more information, see View and manage script tasks and View and manage one-time tasks.
Appendix: Switch the task type
If you have enabled Impala tasks in your Hadoop compute source, you can switch Hive SQL tasks to Impala SQL tasks. Impala SQL tasks provide a better experience for query and analysis because Impala is memory-based. Follow these steps:
In the top menu bar of the Dataphin homepage, choose Development > Data Development.
On the Develop page, select a project from the top menu bar. In Dev-Prod mode, you also need to select an environment.
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, select the target Hive SQL task.
Next to the Hive SQL task, click the
icon and select Change Type.In the Change Type dialog box, select Impala SQL and click OK to switch the task type.