All Products
Search
Document Center

Develop a data warehouse based on AnalyticDB for PostgreSQL

Last Updated: Jun 30, 2020

AnalyticDB for PostgreSQL is a popular real-time data warehousing service provided by Alibaba Cloud. It helps you build petabyte-level data warehouses on the cloud in a simple, fast, and cost-effective manner.

This topic describes how to use Data Warehouse Developer of Data Management Service (DMS) to develop a data warehouse based on AnalyticDB for PostgreSQL. This tutorial also applies to other databases except NoSQL databases.

Before you begin

Prepare AnalyticDB for PostgreSQL

Prepare an AnalyticDB for PostgreSQL instance. If no AnalyticDB for PostgreSQL instance is available, create one first. Create a database in this instance and create a schema in this database. To create a database and a schema, you can run the following sample code on the AnalyticDB for PostgreSQL client:

  1. -- Create a database.
  2. create DATABASE dms_adbpg_demo_db;
  3. -- Switch to the created database.
  4. \c dms_adbpg_demo_db
  5. -- Create a schema.
  6. create SCHEMA dms_adbpg_demo_schema AUTHORIZATION <user_name>;

Activate DMS

If DMS is activated, go to the DMS console.

If not, activate DMS by following these steps:

Log on to the DMS console. On the page that appears, click Buy Now.
purchase DMS

Specify the billing method, the control mode, and the number of instances that you want to create, and click Buy Now. For more information, see Control modes.
After you activate DMS, you can proceed with subsequent steps.

Procedure

You can connect DMS to AnalyticDB for PostgreSQL in three steps: register a database instance, apply for permissions, and develop a data warehouse. The following sections describe the three steps.

Register a database instance

To register an AnalyticDB for PostgreSQL instance in DMS, follow these steps:

  1. Log on to the DMS console.On the homepage that appears, move the pointer over the plus sign (+) in the upper-left corner and select Add instance.
    Add instance0

  2. In the Add instance dialog box that appears, click ADB-PostgreSQL on the Cloud tab.
    Add instance1

  3. In the Basic Information/Advanced information step, set the following parameters:

    • Instance Area: the region where the AnalyticDB for PostgreSQL instance resides.
    • Instance ID: the ID of your AnalyticDB for PostgreSQL instance.
    • DATABASE: the name of the database that you created. In this example, enter dms_adbpg_demo_db.
    • Database account and Database password: the username and password that you use to log on to the AnalyticDB for PostgreSQL instance.
    • Control Mode: the mode in which the database instance is managed in DMS. In this example, select the Secure Collaboration mode.
    • Security Rules: the security rule that specifies the approval process. In this example, select adb-pg-default.
      Add instance cloud ADB PostgreSQL
  4. Click Test connection to verify that correct information is entered for the AnalyticDB for PostgreSQL instance.

  5. Click Submit. The registered AnalyticDB for PostgreSQL instance appears in the left-side navigation pane. You can search for this instance by instance ID. Right-click the instance and select Edit instance.
    Edit Instance1

  6. In the Edit instance dialog box that appears, you can change the instance name to dms_adbpg_demo in the Advanced information section.
    Edit Instance2

Apply for permissions

After the AnalyticDB for PostgreSQL instance is registered in DMS, you need to apply for access permissions on the AnalyticDB for PostgreSQL instance for your account.

Submit permission applications

To apply for the Database-Permission and Database-OWNER permissions, follow these steps:

  1. In the top navigation bar, choose Permission > Apply Permission > Database-Permission.

  2. On the Permission Application Ticket page that appears, select Database-Permission and enter the name of the schema that you created under Select the databases, tables, or columns on which you want to apply for permissions. In this example, enter dms_adbpg_demo_schema. Click Add to add this schema to the Selected Databases/Tables/Columns section. Then, select Query, Export, and Change for Permission in the Select Permission section. Click Submit to submit the application.
    1

  3. A permission application ticket is created, as shown in the following figure. Record the ticket number and data owner.
    2

  4. In the top navigation bar, choose Permission > Apply Permission > Database-OWNER.

  5. On the Permission Application Ticket page that appears, select Database-OWNER and enter the name of the schema that you created under Select the databases, tables, or columns on which you want to apply for permissions. In this example, enter dms_adbpg_demo_schema. Click Add to add this schema to the Selected Databases/Tables/Columns section. Then, click Submit.

  6. A permission application ticket is created, as shown in the following figure. Record the ticket number and data owner.

Approve permission applications

Data owners have the permission to approve permission applications. To approve a permission application, follow these steps:

  1. Log on to the DMS console. On the homepage, click Pending Tickets in the My Tickets section.
    my tickets

  2. Enter the name of the user who submits the application in the search bar and click Search. Then, click the ticket number.
    apply PermissionTickets

  3. On the Permission Application Ticket Details page that appears, click Approve.
    approve

Develop a data warehouse

After your permission application tickets are approved, you can continue to develop a data warehouse.

Go to Data Warehouse Developer

  1. In the top navigation bar, choose Data Factory > Data Warehouse Developer.
  2. On the page that appears, click Develop Space on the left-side navigation submenu.

Create a warehouse project

  1. On the Develop Space tab, click the plus sign (+) next to Data warehouse.
  2. In the New Warehouse Project dialog box that appears, set Project Name to dms_adbpg_demo_dw. In the Data warehouse engine selection section, select AnalyticDB for PostgreSQL and select dms_adbpg_demo_schema from the Select an existing database drop-down list. Then, click OK.
    new project

Create tables in the data warehouse

  1. In the left-side navigation pane, click the created dms_adbpg_demo_dw warehouse project. Then, choose Warehouse table > Internal table > More > Add internal table.
    add internal table

  2. On the editing page that appears, enter a CREATE TABLE statement and click Execute to create a table.
    create table&Execute

  3. Click Internal table in the left-side navigation pane. You can find the created demo1 and demo2 tables, which will be used in subsequent steps.
    view

You need to create two tables separately in this example: demo1 and demo2. You can use the following sample SQL statements:

  1. --create tables demo1 and demo2
  2. create table demo1
  3. (
  4. id integer,
  5. v integer
  6. );
  7. create table demo2
  8. (
  9. id integer,
  10. v integer
  11. );

Create and run a task flow

Create a task flow

In the left-side navigation pane, click the dms_adbpg_demo_dw warehouse project. Then, click the plus sign (+) next to Task Flow. The New Task Flow dialog box appears.
new flow

In the New Task Flow dialog box, set Task Flow Name to dw_task and click OK. A task flow is created.
new flow1

Create warehouse SQL tasks

On the editing page of the created task flow, drag and drop Warehouse SQL task to the canvas to create an SQL task. Click the SQL task. In the right-side pane, click the Content tab to write SQL statements.
create sql

In this example, you need to create two SQL tasks. One is used to prepare data and the other is used to aggregate data. Enter the following SQL statements for the SQL task used to prepare data on the Content tab:

  1. --insert data into demo1
  2. insert into demo1 values(1, 1);
  3. insert into demo1 values(1, 19);
  4. insert into demo1 values(2, 12);
  5. insert into demo1 values(3, 2);
  6. insert into demo1 values(3, 4);
  7. insert into demo1 values(4, 67);

Enter the following SQL statements for the SQL task used to aggregate data on the Content tab:

  1. --run aggregation operation against demo1 and insert results into demo2
  2. insert into demo2
  3. select id, sum(v) from demo1 group by id;

Then, click the Node Name tab for each SQL task. On the Node Name tab, set the name of the SQL task used to prepare data to Preparation and that of the SQL task used to aggregate data to Aggregation.
rename flow

On the canvas, click the mark on the bottom border of the Preparation task. Then, draw a line with an arrow from the Preparation task to the Aggregation task.
flow1

Run the task flow

After tasks are orchestrated, you can directly run the task flow or configure the scheduling policy to run the task flow at the specified time.**

Directly run the task flow

On the editing page of the task flow, click Try Run in the upper-left corner to run the task flow.
try run

View the running result

On the left-side navigation submenu, click Operation Center. On the page that appears, you can view the running result of the task flow.
centre

Run the task flow at the specified time

  1. On the left-side navigation submenu, click Develop Space to go back to the editing page of the task flow. Click a blank area on the canvas. The right-side pane appears.
  2. Click the Scheduling tab. Turn on Turn on/off, set Trigger type to Cyclic scheduling, and then set Scheduling Cycle and Specific Time.
    flow Scheduling

In this way, the scheduling policy is configured for the task flow. At the specified time, the task flow will be automatically run. You can view the running result on the Operation Center tab.**

Summary

This topic describes how to use DMS Data Warehouse Developer to develop a data warehouse based on AnalyticDB for PostgreSQL. The procedure includes activating DMS, registering an AnalyticDB for PostgreSQL instance, creating a warehouse project, creating and orchestrating tasks in a task flow, and running, managing, and scheduling the task flow. You can gain the following benefits by using task flows in DMS Data Warehouse Developer to perform periodic data analysis in AnalyticDB for PostgreSQL:

  • You need to define a task flow only once. The tasks in the task flow can be run automatically and periodically. This greatly reduces the cost of manual operations.
  • Tasks are run periodically and analysis results are generated in advance. Business applications can query and obtain the results in a timely manner.
  • You can select a time to run a task flow as required. This avoids heavy loads of AnalyticDB for PostgreSQL and takes full use of computing resources in AnalyticDB for PostgreSQL.
  • The results generated when task flows are being run can be reused by other data analysis tasks. This maximizes the resource utilization of AnalyticDB for PostgreSQL.

DMS Data Warehouse Developer provides more data management and data warehouse development features. For more information, see

DMS documentation

.