Build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres - Realtime Compute for Apache Flink

You can build a real-time data warehouse to implement efficient and scalable real-time data processing and analysis based on the powerful real-time data processing capabilities of Realtime Compute for Apache Flink and the capabilities of Hologres, such as binary logging, hybrid row-column storage, and strong resource isolation. This helps you better cope with the increasing data volume and real-time business requirements. This topic describes how to build a real-time data warehouse by using Realtime Compute for Apache Flink and Hologres.

Background information

As digitalization advances, enterprises have an increasingly strong demand for timeliness of data. Users need to process, store, and analyze data in real time in many business scenarios other than the traditional offline data processing scenarios. For traditional offline data warehousing, periodic scheduling is performed to process data at the following data layers: operational data store (ODS), data warehouse detail (DWD), data warehouse service (DWS), and application data service (ADS). However, the methodology system for real-time data warehousing is unclear. The concept of Streaming Warehouse is used to implement an efficient flow of real-time data between data layers. This can resolve issues that are related to data layering in a real-time data warehouse.

Architecture

Realtime Compute for Apache Flink is a powerful stream compute engine that supports efficient processing of large amounts of real-time data. Hologres is an end-to-end real-time data warehouse that supports real-time data write and update. Real-time data can be queried immediately after it is written to Hologres. Hologres is deeply integrated with Realtime Compute for Apache Flink to provide an integrated real-time data warehousing solution. The following figure shows the architecture of the real-time data warehouse that is built by using Realtime Compute for Apache Flink and Hologres.

Realtime Compute for Apache Flink writes data from a data source to Hologres to form the ODS layer.
Realtime Compute for Apache Flink subscribes to binary log data at the ODS layer for processing and then rewrites the data to Hologres to form the DWD layer.
Realtime Compute for Apache Flink subscribes to binary log data at the DWD layer for computing and then rewrites the data to Hologres to form the DWS layer.
Hologres provides entry points for data queries.

This solution provides the following benefits:

Each layer of data in Hologres can be efficiently updated, corrected, and queried upon data writing. This resolves the issue that data at the intermediate layer is difficult to query, update, and correct in traditional real-time data warehousing solutions.
Each layer of data in Hologres can be independently used to provide external services and reused in an efficient manner to achieve hierarchical reuse of data warehouses.
This solution provides a unified model and uses a simplified architecture. The logic of real-time extract, transform, and load (ETL) operations is implemented based on Flink SQL. Data at the ODS layer, DWD layer, and DWS layer is stored in Hologres. This reduces the architecture complexity and improves data processing efficiency.

This solution relies on three core capabilities of Hologres. The following table describes the core capabilities.

Core capability of Hologres	Description
Binary logging	Hologres supports binary logging to drive Realtime Compute for Apache Flink to perform real-time computing. This allows Realtime Compute for Apache Flink to serve as the upstream of stream computing. For more information about binary logging of Hologres, see Subscribe to Hologres binary logs.
Hybrid row-column storage	Hologres supports hybrid row-column storage. A table stores both row-oriented data and column-oriented data, and the row-oriented data and column-oriented data are strongly consistent. This feature ensures that tables at the intermediate layer can be used as source tables of Realtime Compute for Apache Flink and as dimension tables of Realtime Compute for Apache Flink for point queries by using primary keys and JOIN operations. The tables at the intermediate layer can also be queried by other applications, such as online analytical processing (OLAP) and online services. For more information about hybrid row-column storage of Hologres, see Storage models of tables: row-oriented storage, column-oriented storage, and row-column hybrid storage.
Strong resource isolation	If the load of a Hologres instance is high, the performance of point queries on the tables at the intermediate layer of the instance may be affected. Hologres supports strong resource isolation by configuring read/write splitting for primary and secondary instances (shared storage). This ensures that online services are not affected when Realtime Compute for Apache Flink pulls binary log data from Hologres. For more information, see Configure read/write splitting for primary and secondary instances (shared storage).

Best practices

This example shows how to build a real-time data warehouse for an e-commerce platform. The real-time data warehouse is built to process and cleanse data in real time and query data from upper-layer applications. This way, real-time data is layered and reused to support multiple business scenarios, such as report queries for transaction dashboard data analysis, behavioral data analysis, and user profile tagging, and personalized recommendations.

Build the ODS layer: Real-time ingestion of data in a business database into the data warehouse
MySQL has three business tables: orders, orders_pay, and product_catalog. Realtime Compute for Apache Flink synchronizes data from the tables to Hologres in real time to form the ODS layer.
Build the DWD layer: Real-time wide table
Realtime Compute for Apache Flink combines data in the orders, orders_pay, and product_catalog tables in real time to generate a wide table at the DWD layer.
Build the DWS layer: Real-time metric computing
Binary log data of the wide table is consumed in real time and metric tables are generated at the DWS layer based on event-driven aggregation.

Precautions

Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 6.0.7 or later supports the real-time data warehousing solution.
Only Hologres V1.3 or later supports the real-time data warehousing solution.
The ApsaraDB RDS for MySQL instance and the Hologres instance must reside in the same virtual private cloud (VPC) as Realtime Compute for Apache Flink. If the ApsaraDB RDS for MySQL instance or the Hologres instance does not reside in the same VPC as Realtime Compute for Apache Flink, you must establish a connection between the VPCs or enable Realtime Compute for Apache Flink to access the ApsaraDB RDS for MySQL instance or the Hologres instance over the Internet. For more information, see How does Realtime Compute for Apache Flink access a service across VPCs? and How does Realtime Compute for Apache Flink access the Internet?
If you want to access Realtime Compute for Apache Flink, Hologres, and ApsaraDB RDS for MySQL resources as a RAM user or by using a RAM role, you must make sure that the RAM user or RAM role has the required permissions.

Preparations

Create an ApsaraDB RDS for MySQL instance and prepare a MySQL CDC data source

Create an ApsaraDB RDS for MySQL instance. For more information, see Create an ApsaraDB RDS for MySQL instance.
Create a database and an account for the ApsaraDB RDS for MySQL instance.
Create a database named order_dw and a standard account that has the read and write permissions on the database. For more information, see Create accounts and databases and Manage databases.

Prepare a MySQL CDC data source.

In the upper-right corner of the details page of the desired instance, click Log On to Database.
In the Log on to Database Instance dialog box, configure the Database Account and Database Password parameters and click Login.
After the logon is successful, double-click the order_dw database in the left-side navigation pane to switch the database.

On the SQL Console tab, write the DDL statements that are used to create three business tables in the order_dw database and insert data into the business tables.

CREATE TABLE `orders` (
  order_id bigint not null primary key,
  user_id varchar(50) not null,
  shop_id bigint not null,
  product_id bigint not null,
  buy_fee numeric(20,2) not null,   
  create_time timestamp not null,
  update_time timestamp not null default now(),
  state int not null 
);


CREATE TABLE `orders_pay` (
  pay_id bigint not null primary key,
  order_id bigint not null,
  pay_platform int not null,
  create_time timestamp not null
);


CREATE TABLE `product_catalog` (
  product_id bigint not null primary key,
  catalog_name varchar(50) not null
);

-- Prepare data.
INSERT INTO product_catalog VALUES(1, 'phone_aaa'),(2, 'phone_bbb'),(3, 'phone_ccc'),(4, 'phone_ddd'),(5, 'phone_eee');

INSERT INTO orders VALUES
(100001, 'user_001', 12345, 1, 5000.05, '2023-02-15 16:40:56', '2023-02-15 18:42:56', 1),
(100002, 'user_002', 12346, 2, 4000.04, '2023-02-15 15:40:56', '2023-02-15 18:42:56', 1),
(100003, 'user_003', 12347, 3, 3000.03, '2023-02-15 14:40:56', '2023-02-15 18:42:56', 1),
(100004, 'user_001', 12347, 4, 2000.02, '2023-02-15 13:40:56', '2023-02-15 18:42:56', 1),
(100005, 'user_002', 12348, 5, 1000.01, '2023-02-15 12:40:56', '2023-02-15 18:42:56', 1),
(100006, 'user_001', 12348, 1, 1000.01, '2023-02-15 11:40:56', '2023-02-15 18:42:56', 1),
(100007, 'user_003', 12347, 4, 2000.02, '2023-02-15 10:40:56', '2023-02-15 18:42:56', 1);

INSERT INTO orders_pay VALUES
(2001, 100001, 1, '2023-02-15 17:40:56'),
(2002, 100002, 1, '2023-02-15 17:40:56'),
(2003, 100003, 0, '2023-02-15 17:40:56'),
(2004, 100004, 0, '2023-02-15 17:40:56'),
(2005, 100005, 0, '2023-02-15 18:40:56'),
(2006, 100006, 0, '2023-02-15 18:40:56'),
(2007, 100007, 0, '2023-02-15 18:40:56');

Click Execute. On the page that appears, click Execute.

Create a Hologres instance

Create a Hologres instance. For more information, see Purchase a Hologres instance.
In the HoloWeb console, connect to the Hologres instance, create a database, and grant permissions on the database to a user.
Create a database named order_dw by using the simple permission model (SPM) and assign the admin role to the user. For more information about how to create a database and grant permissions on the database to a user, see Manage databases.
Note
- If you cannot find the required RAM user in the User drop-down list, the RAM user is not added to the current instance as a user. In this case, add the RAM user as a superuser of an instance on the User Management page.
- By default, binary log expansion is enabled in Hologres V2.0 and later. You do not need to manually perform this operation. After you create a database in Hologres V1.3, you must run the create extension hg_binlog command to enable binary log expansion.

Create a Realtime Compute for Apache Flink workspace and catalogs

Create a workspace. For more information, see Activate Realtime Compute for Apache Flink.
Log on to the Realtime Compute for Apache Flink console. Find the workspace that you want to manage and click Console in the Actions column.
Create a session cluster, which provide an execution environment for creating a catalog and a script. For more information, see Step 1: Create a session cluster.

Create a Hologres catalog.

On the Scripts tab of the SQL Editor page in the Realtime Compute for Apache Flink console, copy the following code to the script editor, configure the parameters, select the code that you want to run, and then click Run that appears on the left side of the code.

CREATE CATALOG dw WITH (
  'type' = 'hologres',
  'endpoint' = '<ENDPOINT>', 
  'username' = '<USERNAME>',
  'password' = '${secret_values.ak_holo}',
  'dbname' = 'order_dw',
  'binlog' = 'true', -- You can configure parameters in the WITH clause that are supported by source tables, dimension tables, and result tables when you create a catalog. After you configure the parameters, the parameters are automatically added to the tables in the catalog when you use the tables. 
  'sdkMode' = 'jdbc', -- We recommend that you use the JDBC mode. 
  'cdcmode' = 'true',
  'connectionpoolname' = 'the_conn_pool',
  'ignoredelete' = 'true', -- When you merge data into a wide table, you must set the ignoredelete parameter to true to prevent data retraction. 
  'partial-insert.enabled' = 'true', -- When you merge data into a wide table, you must set the partial-insert.enabled parameter to true to perform data updates in specific columns. 
  'mutateType' = 'insertOrUpdate', -- When you merge data into a wide table, you must set the mutateType parameter to insertOrUpdate to perform data updates in specific columns. 
  'table_property.binlog.level' = 'replica', -- The persistent properties of a Hologres table can also be passed in when a catalog is created. Then, binary logging is enabled by default for all Hologres tables that are created by using the catalog. 
  'table_property.binlog.ttl' = '259200'
);

Configure the following parameters based on your Hologres information.

Parameter	Description	Remarks
endpoint	The endpoint of the Hologres instance.	For more information, see Instance configurations.
username	The AccessKey ID of the Alibaba Cloud account.	The Alibaba Cloud account to which the AccessKey ID belongs must be granted permissions to access all Hologres databases. For more information about Hologres database permissions, see Overview. To prevent password leaks, a key named ak_holo is used as the AccessKey secret to protect your AccessKey pair. For more information, see Manage keys.
password	The AccessKey secret of your Alibaba Cloud account.

Note

When you create a catalog, you can configure the parameters in the WITH clause that are used by default in source tables, dimension tables, and result tables. You can also configure the default parameters for the creation of Hologres physical tables, such as the parameters that start with table_property in the preceding code. For more information, see Manage Hologres catalogs and "Parameters in the WITH clause" in Hologres connector.

Create a MySQL catalog.

On the Scripts tab of the SQL Editor page in the Realtime Compute for Apache Flink console, copy the following code in the script editor, configure the parameters, select the code that you want to run, and then click Run that appears on the left side of the code.

CREATE CATALOG mysqlcatalog WITH(
  'type' = 'mysql',
  'hostname' = '<hostname>',
  'port' = '<port>',
  'username' = '<username>',
  'password' = '${secret_values.mysql_pw}',
  'default-database' = 'order_dw'
);

Configure the parameters based on the information about your MySQL database. The following table describes the parameters.

Parameter	Description
hostname	The IP address or hostname that is used to access the MySQL database.
port	The port number of the MySQL database. Default value: 3306.
username	The username that is used to access the ApsaraDB RDS for MySQL database.
password	The password that is used to access the ApsaraDB RDS for MySQL database. To prevent password leaks, a key named mysql_pw is used as the AccessKey secret to protect your AccessKey pair. For more information, see Manage keys.

Build a real-time data warehouse

Build the ODS layer: Real-time ingestion of data in a business database into the data warehouse

You can execute the CREATE DATABASE AS statement that is related to catalogs to create the ODS layer. In most cases, the ODS layer is used as an event driver for streaming deployments instead of directly performing OLAP operations or providing services such as key-value point queries. Therefore, you can enable binary logging to meet your business requirements. Binary logging is one of the core capabilities of Hologres. The Hologres connector can be used to read full binary log data and then consumes incremental binary log data.

Create a synchronization deployment named ODS to execute the CREATE DATABASE AS statement.
1. In the Realtime Compute for Apache Flink console, create an SQL streaming draft named ODS and copy the following code to the SQL editor:
```
CREATE DATABASE IF NOT EXISTS dw.order_dw   -- The table_property.binlog.level parameter is configured when a catalog is created. Therefore, binary logging is enabled for all tables that are created by using the CREATE DATABASE AS statement. 
AS DATABASE mysqlcatalog.order_dw INCLUDING all tables -- You can select the tables in the upstream database from which data needs to be ingested into the data warehouse. 
/*+ OPTIONS('server-id'='8001-8004') */; -- Configure parameters for the MySQL CDC source table.
```
  Note
  - In this example, data is synchronized to the public schema of the order_dw database by default. You can also synchronize data to a specified schema in the destination Hologres database. For more information, see Use the Hologres catalog that you created as the catalog of the destination store that is used in the CREATE TABLE AS statement. After you specify a schema, the format of the table name changes when you use a catalog. For more information, see Use a Hologres catalog.
  - If the schema of the source table changes, the schema of the result table changes only when data in the source table is changed. Data in the source table is changed if the data is updated in, deleted from, or inserted into the source table.
2. In the upper-right corner of the SQL Editor page, click Deploy to deploy the draft.
3. In the left-side navigation pane, click Deployments. On the Deployments page, find the ODS deployment for the draft that you deployed and click Start in the Actions column. In the Start Job dialog box, click Initial Mode to start the deployment without initial states.

In the HoloWeb console, execute the following statements in the SQL editor to view the data of the three Hologres tables to which data is synchronized from MySQL:

--- Query data in the orders table. 
SELECT * FROM orders;

--- Query data in the orders_pay table. 
SELECT * FROM orders_pay;

--- Query data in the product_catalog table. 
SELECT * FROM product_catalog;

Build the DWD layer: Real-time wide table

The capability of updating specific columns supported by the Hologres connector is used to build the DWD layer. You can use the INSERT statements to express the semantics of updating specific columns in an efficient manner. High-performance point queries based on column-oriented data storage and hybrid row-column data storage of Hologres help you query data of different dimension tables in deployments. Hologres uses a strong resource isolation architecture, which prevents interference among write, read, and analytics deployments.

Create a wide table named dwd_orders at the DWD layer in Hologres by using catalogs of Realtime Compute for Apache Flink.

On the Scripts tab of the SQL Editor page in the Realtime Compute for Apache Flink console, copy the following code to the script editor of the draft, select the code, and then click Run that appears on the left side of the code.

-- When data in different streams is written to the same result table, null values may appear in any column of the table. Therefore, make sure that the fields in the wide table are nullable. 
CREATE TABLE dw.order_dw.dwd_orders (
  order_id bigint not null,
  order_user_id string,
  order_shop_id bigint,
  order_product_id bigint,
  order_product_catalog_name string,
  order_fee numeric(20,2),
  order_create_time timestamp,
  order_update_time timestamp,
  order_state int,
  pay_id bigint,
  pay_platform int comment 'platform 0: phone, 1: pc', 
  pay_create_time timestamp,
  PRIMARY KEY(order_id) NOT ENFORCED
);

-- You can use a catalog to modify the properties of a Hologres physical table. 
ALTER TABLE dw.order_dw.dwd_orders SET (
  'table_property.binlog.ttl' = '604800' -- Change the timeout period of binary log data to one week. 
);

Implement binary logging for the orders and orders_pay tables at the ODS layer.

In the Realtime Compute for Apache Flink console, create an SQL draft named DWD, copy the following code to the SQL editor of the draft, deploy the draft, and then start the deployment for the draft. Execute the following SQL statements to join the orders table with the dimension table product_catalog and write the final result to the wide table dwd_orders. This way, data is written to the wide table in real time.

BEGIN STATEMENT SET;

INSERT INTO dw.order_dw.dwd_orders 
 (
   order_id,
   order_user_id,
   order_shop_id,
   order_product_id,
   order_fee,
   order_create_time,
   order_update_time,
   order_state,
   order_product_catalog_name
 ) SELECT o.*, dim.catalog_name 
   FROM dw.order_dw.orders as o
   LEFT JOIN dw.order_dw.product_catalog FOR SYSTEM_TIME AS OF proctime() AS dim
   ON o.product_id = dim.product_id;

INSERT INTO dw.order_dw.dwd_orders 
  (pay_id, order_id, pay_platform, pay_create_time)
   SELECT * FROM dw.order_dw.orders_pay;

END;

View data of the wide table dwd_orders.
In the HoloWeb console, connect to the Hologres instance and log on to the destination database. Then, execute the following statement in the SQL editor:
```
SELECT * FROM dwd_orders;
```

Build the DWS layer: Real-time metric computing

Create aggregate metric tables named dws_users and dws_shops in Hologres by using catalogs of Realtime Compute for Apache Flink.

-- Create a user-dimension aggregate metric table. 
CREATE TABLE dw.order_dw.dws_users (
  user_id string not null,
  ds string not null,
  paied_buy_fee_sum numeric(20,2) not null comment 'Total amount of payment that is complete on that day',
  primary key(user_id,ds) NOT ENFORCED
);

-- Create a merchant-dimension aggregate metric table. 
CREATE TABLE dw.order_dw.dws_shops (
  shop_id bigint not null,
  ds string not null,
  paied_buy_fee_sum numeric(20,2) not null comment 'Total amount of payment that is complete on that day',
  primary key(shop_id,ds) NOT ENFORCED
);

Consume data in the wide table dw.order_dw.dwd_orders at the DWD layer in real time, perform aggregate computing in Realtime Compute for Apache Flink, and then write the data to the tables at the DWS layer in Hologres.

In the Realtime Compute for Apache Flink console, create an SQL streaming draft named DWS, copy the following code to the SQL editor of the draft, deploy the draft, and then start the deployment for the draft.

BEGIN STATEMENT SET;

INSERT INTO dw.order_dw.dws_users
  SELECT 
    order_user_id,
    DATE_FORMAT (pay_create_time, 'yyyyMMdd') as ds,
    SUM (order_fee)
    FROM dw.order_dw.dwd_orders c
    WHERE pay_id IS NOT NULL AND order_fee IS NOT NULL -- Both order flow data and payment flow data are written to the wide table. 
    GROUP BY order_user_id, DATE_FORMAT (pay_create_time, 'yyyyMMdd');

INSERT INTO dw.order_dw.dws_shops
  SELECT 
    order_shop_id,
    DATE_FORMAT (pay_create_time, 'yyyyMMdd') as ds,
    SUM (order_fee)
   FROM dw.order_dw.dwd_orders c
   WHERE pay_id IS NOT NULL AND order_fee IS NOT NULL -- Both order flow data and payment flow data are written to the wide table. 
   GROUP BY order_shop_id, DATE_FORMAT (pay_create_time, 'yyyyMMdd');
END;

View the aggregation result at the DWS layer. The result is updated in real time based on the changes to input data.
1. In the Hologres console, view the data before update.
  - Query data in the dws_users table.
```
SELECT * FROM dws_users;
```
  - Query data in the dws_shops table.
```
SELECT * FROM dws_shops;
```
2. In the ApsaraDB RDS console, insert a data record into the orders and orders_pay tables in the order_dw database.
```
INSERT INTO orders VALUES
(100008, 'user_003', 12345, 5, 6000.02, '2023-02-15 09:40:56', '2023-02-15 18:42:56', 1);

INSERT INTO orders_pay VALUES
(2008, 100008, 1, '2023-02-15 19:40:56');
```
3. In the Hologres console, view the data after update.
  - dwd_orders table
  - dws_users table
  - dws_shops table

Perform data profiling

The binary logging feature is enabled. Therefore, data profiling can be performed to help you directly view data changes. If you want to perform an ad-hoc query on intermediate results for business data profiling or check the correctness of the final calculation results, you can use each layer of data in this solution to facilitate query of the intermediate data because each layer of data in this solution is persisted.

Data profiling in streaming mode

Create a data profiling streaming draft and start the deployment for the draft.

In the Realtime Compute for Apache Flink console, create an SQL streaming draft named Data-exploration, copy the following code to the SQL editor of the draft, deploy the draft, and then start the deployment for the draft.

-- Perform data profiling in streaming mode. You can create a print table to view the data changes. 
CREATE TEMPORARY TABLE print_sink(
  order_id bigint not null,
  order_user_id string,
  order_shop_id bigint,
  order_product_id bigint,
  order_product_catalog_name string,
  order_fee numeric(20,2),
  order_create_time timestamp,
  order_update_time timestamp,
  order_state int,
  pay_id bigint,
  pay_platform int,
  pay_create_time timestamp,
  PRIMARY KEY(order_id) NOT ENFORCED
) WITH (
  'connector' = 'print'
);

INSERT INTO print_sink SELECT *
FROM dw.order_dw.dwd_orders /*+ OPTIONS('startTime'='2023-02-15 12:00:00') */ -- The startTime parameter specifies the time when binary log data is generated.
WHERE order_user_id = 'user_001';

View the data profiling result.
On the Deployments page, click the name of the deployment whose data you want to query. On the Logs tab of the Deployments page, click the Running Logs tab. On the Running Logs tab, click the Running Task Managers tab and click the value in the Path, ID column. On the Stdout tab, search for logs that are related to user_001.

Data profiling in batch mode
In the Realtime Compute for Apache Flink console, create an SQL streaming draft, copy the following code to the SQL editor, and then click Debug. For more information, see Debug a deployment.
Data profiling in batch mode helps you obtain the final-state data at the current time. The following figure shows the debugging result on the SQL Editor page.
```
SELECT *
FROM dw.order_dw.dwd_orders /*+ OPTIONS('binlog'='false') */ 
WHERE order_user_id = 'user_001' and order_create_time > '2023-02-15 12:00:00'; -- Data profiling in batch mode supports filter pushdown. This improves the execution efficiency of batch deployments.
```

Use a real-time data warehouse

The previous section describes how to build a real-time hierarchical real-time data warehouse in Realtime Compute for Apache Flink based on Realtime Compute for Apache Flink and Streaming Warehouse of Hologres by using catalogs of Realtime Compute for Apache Flink. This section describes simple use scenarios after the data warehouse is built.

Key-value service

Query aggregate metric tables at the DWS layer based on a primary key. Millions of records per second (RPS) are supported.

The following sample code provides an example on how to query the consumption amount of the specified user on the specified date in the HoloWeb console.

-- holo sql
SELECT * FROM dws_users WHERE user_id ='user_001' AND ds = '20230215';

Details query

Perform OLAP analysis on the wide table at the DWD layer.

The following sample code provides an example on how to query the order details of a customer on a specific payment platform in February 2023 in the HoloWeb console.

-- holo sql
SELECT * FROM dwd_orders
WHERE order_create_time >= '2023-02-01 00:00:00'  and order_create_time < '2023-03-01 00:00:00'
AND order_user_id = 'user_001'
AND pay_platform = 0
ORDER BY order_create_time LIMIT 100;

Real-time reports

Display real-time reports based on the data of the wide table at the DWD layer. Hybrid row-column data storage and column-oriented data storage of Hologres provide high OLAP analytics capabilities. Data queries can be responded within seconds.

The following sample code provides an example on how to query the total order volume and total order amount of each category in February 2023 in the HoloWeb console.

-- holo sql
SELECT
  TO_CHAR(order_create_time, 'YYYYMMDD') AS order_create_date,
  order_product_catalog_name,
  COUNT(*),
  SUM(order_fee)
FROM
  dwd_orders
WHERE
  order_create_time >= '2023-02-01 00:00:00'  and order_create_time < '2023-03-01 00:00:00'
GROUP BY
  order_create_date, order_product_catalog_name
ORDER BY
  order_create_date, order_product_catalog_name;

References

For more information about how to get started with real-time data ingestion into data warehouses, see Ingest data into data warehouses in real time.
For more information about how to get started with real-time log data ingestion into data warehouses, see Ingest log data into data warehouses in real time.
For more information about how to build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon, see Build a streaming data lakehouse by using Realtime Compute for Apache Flink and Apache Paimon.
For more information about binary logging of Hologres, see Subscribe to Hologres binary logs.
Multiple INSERT INTO statements can be written in one deployment of Realtime Compute for Apache Flink. For more information about the syntax of INSERT INTO, see INSERT INTO statement.
Realtime Compute for Apache Flink supports various connectors. For more information, see Supported connectors.