This topic describes how to use the data warehouse development feature of Data Management (DMS) to schedule data analysis tasks on an AnalyticDB for PostgreSQL instance.

Prerequisites

  • DMS is activated.
  • An AnalyticDB for PostgreSQL instance is created. For more information, see Create an instance.
  • The privileged account of the AnalyticDB for PostgreSQL instance is created. For more information, see Create a privileged account.

Background information

AnalyticDB for PostgreSQL (previously known as HybridDB for PostgreSQL) is a popular real-time data warehousing service provided by Alibaba Cloud. It is a simple, fast, and cost-effective manner for building petabyte-scale data warehouses on the cloud.

Register the instance and create a database and schemas

  1. Register the AnalyticDB for PostgreSQL instance with DMS. For more information, see Register an ApsaraDB instance.
    Note Set the Control Mode parameter to Security Collaboration and the Security Rules parameter to adb-pg-default.
  2. In the Database instance section in the left-side navigation pane on the homepage of the DMS console, right-click the AnalyticDB for PostgreSQL instance you registered, and select Database management in the menu that appears.
  3. Create a database.

    In this example, the database is named dms_adbpg_demo_db.

    1. Click Create DB.
    2. Set the Database Name, Character set, and Validation rules parameters.
      Note If an instance is managed in Security Collaboration mode, you must configure a security rule that allows the CREATE DATABASE command to be executed. For more information, see SQL Correct.
    3. Click OK.
    4. Click the Refresh icon icon above the Database instance section in the left-side navigation pane on the homepage of the DMS console. You can view the created database in the AnalyticDB for PostgreSQL instance.
  4. Create schemas.
    1. Click the dms_adbpg_demo_db database and double-click the default schema named public.
    2. On the SQLConsole tab, enter the following commands and click Execute to create schemas named dms_adbpg_demo_schema and dms_test.
      CREATE SCHEMA dms_adbpg_demo_schema;
      CREATE SCHEMA dms_test;

Apply for permissions

In DMS, you need to apply for access permissions on the AnalyticDB for PostgreSQL instance for your account.

  1. Log on to the DMS console V5.0.
  2. Apply for the Database-Permission and Database-OWNER permissions on the dms_adbpg_demo_schema and dms_test schemas. For more information, see Apply for permissions by submitting a ticket.
    Note When you apply for the Database-Permission permissions, select Query, Export, and Change for the Permission parameter.

Develop a data warehouse

Create a workspace

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Data Warehouse Development.
  3. Click Create Workspace.
  4. In the Create Workspace dialog box, configure the parameters described in the following table.
    SectionParameterDescription
    Basic InformationWorkspaceThe name of the workspace. Specify a descriptive name for easy identification.
    Note You can specify a name based on the purpose of the database, such as online analytical processing (OLAP), data warehousing, or the actual business scenario. In this example, the workspace is named dms_adbpg_demo_dw.
    ModeThe mode of the workspace. Select the Standard Mode.
    Note The standard mode is the default mode. The test environment is isolated from the online environment. You can develop and debug tasks in the test database, and publish and schedule the tasks in the online database.
    DescriptionThe description of the workspace. Enter an informative description for easy identification.
    Data warehouse engineData warehouse engineThe engine of the test database and the online database. In this example, the engine is AnalyticDB for PostgreSQL.
    Test DatabaseThe database that you want to use as the test database. Search for and select dms_test.
    Online DatabaseThe database that you want to use as the online database. Search for and select dms_adbpg_demo_schema.
    Advanced SettingsSecurity Check Free During Trial RunYou can enable or disable this feature. By default, this feature is enabled. If you enable this feature, no security rule-based checks are performed when you test task flows or change table schemas in the test database.
    O&M Without ApprovalYou can enable or disable this feature. By default, this feature is enabled. If you enable this feature, you can publish a task flow or change table schemas without triggering the approval process.
  5. Click OK.
    The created workspace is displayed in the Workspace section.

Create a task flow

  1. In the Workspace section, click dms_adbpg_demo_dw.
  2. In the left-side navigation pane of the workspace page, right-click Default Business Scenario and select Create Task Flow.
    Note
    • You can right-click Default Business Scenario and click Rename to rename the default business scenario based on your business requirements.
    • You can also click the jai icon in the left-side navigation pane, select Business Scenario, create a business scenario, and then create a task flow in the new business scenario.
  3. In the Create Task Flow dialog box, enter a name and description for the task flow. In this example, the task flow is named dw_task.
  4. Click OK.

Create a table

  1. In the left-side navigation pane of the workspace page, click the Expand icon icon to the left of dw_task to view the subdirectories of the task flow.
  2. Right-click Table and select New table.
  3. On the New table page, follow the instructions to create the demo1 and demo2 tables.
    Note Make sure that you have logged on to the test and the online databases before you create a table.

    The following code shows the schemas of the demo1 and demo2 tables:

    --create tables demo1 and demo2
    create table demo1
    (
      id integer,
      v integer
    );
    create table demo2
    (  
      id integer,
      v integer
    );

Create single instance SQL tasks

  1. In the left-side navigation pane of the workspace page, double-click dw_task to go to the details page of the task flow.
  2. In the Task Type list on the left side of the canvas, drag two Single Instance SQL task nodes to the blank area on the canvas.
  3. Configure a data preparation task.
    1. Click one of the Single Instance SQL task nodes, select the Rename icon, and then set the task name to Preparation.
    2. Double-click the Preparation task node. In the SQL editor, enter the following SQL statements:
      --insert data into demo1
      insert into demo1 values(1, 1);
      insert into demo1 values(1, 19);
      insert into demo1 values(2, 12);
      insert into demo1 values(3, 2);
      insert into demo1 values(3, 4);
      insert into demo1 values(4, 67);
    3. Click Try Run to test the correctness of the code of the task node.
      • If status SUCCEEDED appears in the last line of the logs, the test run is successful.
      • If status FAILED appears in the last line of the logs, the test run fails. In this case, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.
  4. Configure a data aggregation task.
    1. Click the other Single Instance SQL task node, select the Rename icon, and then set the task name to Aggregation.
    2. Double-click the Aggregation task node. In the SQL editor, enter the following SQL statements:
      ---run aggregation operation against demo1 and insert results into demo2
      insert into demo2
      select id, sum(v) from demo1 group by id;
    3. Click Try Run to test the correctness of the code of the task node.
      • If status SUCCEEDED appears in the last line of the logs, the test run is successful.
      • If status FAILED appears in the last line of the logs, the test run fails. In this case, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.
  5. Return to the details page of the dw_task task flow and connect the two task nodes.
    Note You can use the auto layout tool Auto layout in the canvas toolbar to optimize the task flow layout.
    Task flow
  6. Click Try Run to test the correctness of the code of the task flow.
    • If status SUCCEEDED appears in the last line of the logs, the test run is successful.
    • If status FAILED appears in the last line of the logs, the test run fails. In this case, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configuration of the node and try again.
  7. In the lower part of the canvas, click Task Flow Information.
  8. In the Scheduling Settings section of the Task Flow Information tab, turn on Enable Scheduling and configure the scheduling cycle.
    Table 1. Scheduling properties
    ParameterDescription
    Scheduling TypeThe scheduling type of the task flow. Valid values:
    • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.
    • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.
    Effective TimeThe period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.
    Scheduling CycleThe scheduling cycle of the task flow. Valid values:
    • Hour: The task flow is run within the hours that you select.
    • Day: The task flow is run at the specified point in time every day.
    • Week: The task flow is run at the specified point in time on the days that you select every week.
    • Month: The task flow is run at the specified point in time on the days that you select every month.
    Timed SchedulingThe scheduling method of the task flow. DMS provides the following scheduling methods:
    • Scheduling at a specific interval:
      • Starting Time: the time when the task flow starts to run.
      • Intervals: the interval at which the task flow is run. Unit: hours.
      • End Time: the time when the task flow finishes running.
      For example, you can set the Starting Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.
    • Scheduling at the specified point in time: You must set the Specified Time parameter.

      For example, if you select 0Hour and 5Hour, DMS runs the task flow at 00:00 and 05:00.

    Specified Time
    • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.
    • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.
    Specific Point in TimeThe point in time of the specified days at which the task flow is run.

    For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

    Cron ExpressionThe CRON expression that is automatically generated based on the specified scheduling cycle and time settings.
  9. Publish the task flow. For more information, see Publish a task flow.

View the execution status of the task flow

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, click DTS. In the left-side navigation pane, choose Data Development > Data Warehouse Development.
  3. In the Workspace section, click dms_adbpg_demo_dw.
  4. In the left-side navigation pane of the workspace page, double-click dw_task to go to the details page of the task flow.
  5. In the upper-right corner of the canvas, click Go to O&M.
    • The upper part of the task flow O&M page displays the basic information about the task flow. You can view the time when the task flow was created, the last time when the task flow was modified, the scheduling properties for the task flow, and whether the task flow is published.
    • Click the Running History tab, and select Scheduling Trigger or Triggered Manually. You can view the records generated each time the task flow was triggered.
      Note
      • Scheduling Trigger: The task flow is automatically run at a specified point in time based on the scheduling properties that you configure.
      • Triggered Manually: The task flow is run after you click Try Run.
      • Find a record and click the Plus icon icon to the left of the Status column. You can view the detailed run logs of the task flow.
      • Find a record and click Executive History in the Operating column. In the pane that appears, you can view the time when the task flow was run, the user who ran the task flow, and the scheduling type of the task flow.
      • Find a record and select Exits, Rerun, Pause, Restore, or Set Successfully in the Operating column to manage the task flow based on its status.
        Note
        • You can rerun an executed task flow.
        • You can set the status of the run logs of the task flow that fails to be run to Success.
        • You can stop or pause an ongoing task flow.
    • Click the Published Tasks tab. You can view the version ID of the published task flow, the user who published the task flow, and the time when the task flow was published. You can also view the details and directed acyclic graph (DAG) of the task flow.

      Select two versions and click Version Comparison. You can compare the differences between the two versions of the task flow.

Summary

The data warehouse development feature of DMS offers the following benefits:
  • The task flow is scheduled so that the tasks in the task flow can be run in an automatic and periodic manner. This reduces the cost of labor.
  • The scheduled task flow generates analysis results in real time and provides quick responses for queries from business applications.
  • The analysis tasks can be scheduled to run during off-peak hours so that the computing resources of the AnalyticDB for PostgreSQL instance are allocated in a proper manner.
  • The analysis results of the scheduled task flow can be reused by other data analysis tasks to maximize the resource utilization of the AnalyticDB for PostgreSQL instance.