Query Profile performance diagnostics and optimization use cases - E-MapReduce

This topic describes how to view and interpret a Query Profile. You can use the Query Profile to identify and resolve performance bottlenecks in your StarRocks instance and improve query efficiency.

Query Profile overview

Visualize a Query Profile

StarRocks Manager supports visual analytics for Query Profiles. For more information, see Query Profile introduction.

Identify query bottlenecks

The longer an operator runs, the darker its color. The top three longest-running nodes are highlighted. This helps you easily identify query bottlenecks.

Query Profile optimization use cases

Bitmap index
Bloom filter index
Optimize data skew
Single-table materialized view
Verify that JoinRuntimeFilter is effective
Colocate Join
Verify that bucketing or partition pruning is effective

Bitmap index

A bitmap index is a special type of database index that uses bitmaps. A bitmap is an array of bits, where each bit can be 0 or 1. Each bit corresponds to a row in the data table. The value of the bit depends on the value in the corresponding row.

You can use a Bitmap filter to improve query efficiency for fields with low cardinality and many duplicate values, such as gender.
To check if a query hits a Bitmap filter index, you can view the BitmapIndexFilterRows field in the query's Profile.

Create an index

Create a Bitmap index when you create a table

CREATE TABLE `student_info` (
  `s_stukey` bigint(20) NULL COMMENT "",
  `s_name` varchar(65533) NULL COMMENT "",
  `s_gender` varchar(65533) NULL COMMENT "",
  INDEX index1 (s_gender) USING BITMAP COMMENT 'index1'
) ENGINE=OLAP
DUPLICATE KEY(`s_stukey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`s_stukey`);

INSERT INTO student_info
VALUES
    (001,'student#000000019','male'),
    (002,'student#000000020','male'),
    (003,'student#000000021','male'),
    (004,'student#000000022','female');

Create a Bitmap index using CREATE INDEX after you create a table

CREATE INDEX index_name ON table_name (column_name) [USING BITMAP] [COMMENT ''];

Check index creation progress

SHOW ALTER TABLE COLUMN [FROM db_name];

View an index

SHOW {INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];

view bitmap

Delete an index

DROP INDEX index_name ON [db_name.]table_name;

Single-column query test

Execute a query to filter the s_gender column.

select * from student_info where s_gender='male';

View the Profile.
Click OLAP_SCAN, select the Node Details tab on the right, and then filter by Bitmap to verify that the Bitmap index is effective.

Bloom filter index

A Bloom filter index quickly checks if a data file might contain the queried data. If the data is not present, the index skips the file to reduce the amount of data scanned. Bloom filters are space-efficient and work well for high-cardinality columns, such as ID columns.

You can create a Bloom filter index on all columns in a Primary Key model or Duplicate Key model. In an Aggregate model or Update model, you can create a Bloom filter index only on dimension columns (Key columns).
You cannot create a Bloom filter index for columns of the TINYINT, FLOAT, DOUBLE, or DECIMAL type.
A Bloom filter index can only improve the query efficiency of queries that contain in or = filter conditions. For example, Select xxx from table where xxx in () and Select xxx from table where column = xxx.
To check if a query hits a Bloom filter index, you can view the BloomFilterFilterRows field in the query's Profile.

Create an index

To create an index, specify bloom_filter_columns in PROPERTIES when you create a table. The following is an example.

CREATE TABLE table1
(
  k1 BIGINT,
  k2 LARGEINT,
  v1 VARCHAR(2048) REPLACE,
  v2 SMALLINT DEFAULT "10"
)
ENGINE = olap
PRIMARY KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 10
PROPERTIES("bloom_filter_columns" = "k1,k2"); -- Separate multiple index columns with commas (,).

View an index

SHOW CREATE TABLE table1;

Modify an index

The following are examples:

Add a Bloom filter index for column v1.

ALTER TABLE table1 SET ("bloom_filter_columns" = "k1,k2,v1");

Remove the Bloom filter index for column k2.

ALTER TABLE table1 SET ("bloom_filter_columns" = "k1");

Delete all Bloom filter indexes for table1.

ALTER TABLE table1 SET ("bloom_filter_columns" = "");

Example test

For this example, use the customer table from TPC-H. Add a Bloom filter index to a column with high cardinality that is not a sort key, such as the c_phone column.
```
ALTER TABLE tpc_h_sf100.customer SET ("bloom_filter_columns" = "c_custkey, c_phone");
```
View the index to confirm that the Bloom filter index has been added.
```
SHOW CREATE TABLE tpc_h_sf100.customer;
```

Execute a query to filter the c_phone column.

select * from tpc_h_sf100.customer where c_phone = "10-334-921-5346";

View the Profile.
Click OLAP_SCAN and then select the Node Details tab on the right. Find the BloomFilterFilterRows metric to confirm that the Bloom filter index is effective.

Optimize data skew

This example uses the `lineitem` table from TPC-H. A key with a non-uniform distribution is selected as the bucketing key to demonstrate a data skew problem. In this example, the values in the l_tag field are not uniformly distributed.

Create the test data. Add a new column to the `lineitem` table in TPC-H to use as the DISTRIBUTED field.

CREATE TABLE `lineitem_tag` (
  `l_orderkey` bigint(20) NULL COMMENT "",
  `l_partkey` bigint(20) NULL COMMENT "",
  `l_suppkey` bigint(20) NULL COMMENT "",
  `l_linenumber` int(11) NULL COMMENT "",
  `l_quantity` double NULL COMMENT "",
  `l_extendedprice` double NULL COMMENT "",
  `l_discount` double NULL COMMENT "",
  `l_tax` double NULL COMMENT "",
  `l_returnflag` varchar(65533) NULL COMMENT "",
  `l_linestatus` varchar(65533) NULL COMMENT "",
  `l_shipdate` date NULL COMMENT "",
  `l_commitdate` date NULL COMMENT "",
  `l_receiptdate` date NULL COMMENT "",
  `l_shipinstruct` varchar(65533) NULL COMMENT "",
  `l_shipmode` varchar(65533) NULL COMMENT "",
  `l_comment` varchar(65533) NULL COMMENT "",
  `l_tag` varchar(65533)  default 'false' COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`l_orderkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`l_tag`) BUCKETS 96
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT",
"enable_persistent_index" = "false"
);

insert into lineitem_tag  select *, 'false' as l_tag  from tpc_h_sf100.lineitem;

Execute a query to perform a full table scan.
```
select count(1) from lineitem_tag;
```
View the Profile.
Click OLAP_SCAN. On the Node tab on the right, compare the SCAN time for MaxTime and MinTime. A difference of several orders of magnitude suggests that data skew has occurred.
Optimize the table schema by redefining the DISTRIBUTED field.
Review the Profile. Compare the SCAN time for MaxTime and MinTime. You can see that the data skew problem is resolved.

Single-table materialized view

A single-table materialized view, also known as a rollup, in StarRocks is a special type of index that you cannot query directly. If your data warehouse runs many complex or repetitive queries, you can create single-table materialized views to accelerate them.

Query test

This example uses the lineitem table from TPC-H. Execute the query.

select l_returnflag,l_linestatus,l_shipmode,sum(l_extendedprice) from lineitem group by l_returnflag,l_linestatus,l_shipmode;

The initial query takes 1115 ms without a materialized view.
View the Profile.
Click OLAP_SCAN. On the Node tab on the right, you can see that the rollup scans the lineitem table.

Create a materialized view

CREATE MATERIALIZED VIEW material_test AS select l_returnflag,l_linestatus ,l_shipmode,sum(l_extendedprice) from lineitem group by l_returnflag,l_linestatus,l_shipmode;

Verify that a query hits the single-table materialized view

You can use the EXPLAIN command to check if the query hits the single-table materialized view.

explain select l_returnflag,l_linestatus ,l_shipmode,sum(l_extendedprice) from lineitem group by l_returnflag,l_linestatus,l_shipmode;

EXPLAIN

View the Profile.
Click OLAP_SCAN. The Node tab on the right shows that the query hits the materialized view, reducing the query time to 0.05 ms.

Verify that JoinRuntimeFilter is effective

When the right table in a join builds a hash table, it also creates a runtime filter. This filter is delivered to the left subtree and pushed down to the scan operator whenever possible. You can view metrics related to JoinRuntimeFilter on the Node Details tab of the scan operator.

This example uses query72.sql from TPC-DS.

select i_item_desc,
       w_warehouse_name,
       d1.d_week_seq,
       sum(case when p_promo_sk is null then 1 else 0 end) no_promo,
       sum(case when p_promo_sk is not null then 1 else 0 end) promo,
       count(*) total_cnt
 from inventory
 join catalog_sales on (cs_item_sk = inv_item_sk)
 join warehouse on (w_warehouse_sk=inv_warehouse_sk)
 join item on (i_item_sk = cs_item_sk)
 join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk)
 join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk)
 join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk)
 join date_dim d2 on (inv_date_sk = d2.d_date_sk)
 join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk)
 left outer join promotion on (cs_promo_sk=p_promo_sk)
 left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number)
 where d1.d_week_seq = d2.d_week_seq
   and inv_quantity_on_hand < cs_quantity
   and d3.d_date > (cast(d1.d_date AS DATE) + interval '5' day)
   and hd_buy_potential = '>10000'
   and d1.d_year = 1999
   and cd_marital_status = 'D'
 group by i_item_desc,w_warehouse_name,d1.d_week_seq
 order by total_cnt desc, i_item_desc, w_warehouse_name, d1.d_week_seq
 limit 100;

View the Profile.
Click OLAP_SCAN. On the Node Details tab on the right, you can see that the scan operator triggered the JoinRuntimeFilter when scanning the inventory table.

Colocate Join

To use the Colocate Join feature in StarRocks, assign tables to a Colocation Group (CG) when you create them. All tables in a CG must follow the same Colocation Group Schema (CGS). This ensures that the data for all tables in the same CG is distributed across the same set of BE nodes. When the join column is the bucketing key, compute nodes perform a local join. This reduces data transfer time between nodes and improves query performance. Because Colocate Join avoids the network data transfer overhead of other joins, such as Shuffle Join and Broadcast Join, it provides better query performance.

Create a colocation table

Note

StarRocks supports Colocate Join operations only for tables within the same database.

CREATE TABLE tbl (k1 int, v1 int sum)
DISTRIBUTED BY HASH(k1)
BUCKETS 8
PROPERTIES(
    "colocate_with" = "group1"
);

Delete a Colocation Group

When the last table in a group is permanently deleted, the group is also automatically deleted. A table is permanently deleted after it is removed from the recycle bin. By default, a table stays in the recycle bin for one day after you run the DROP TABLE command.

View group information

For example, you can run the following command to view group information.

SHOW PROC '/colocation_group';

The following information is returned.

+-------------+--------------+----------+------------+----------------+----------+----------+
| GroupId     | GroupName    | TableIds | BucketsNum | ReplicationNum | DistCols | IsStable |
+-------------+--------------+----------+------------+----------------+----------+----------+
| 11912.11916 | 11912_group1 | 11914    | 8          | 3              | int(11)  | true     |
+-------------+--------------+----------+------------+----------------+----------+----------+

The following table describes the parameters.

Column	Description
GroupId	A unique identifier for the group across the cluster. The first part is the Database ID, and the second part is the Group ID.
GroupName	The full name of the group.
TabletIds	A list of table IDs included in the group.
BucketsNum	The number of buckets.
ReplicationNum	The number of replicas.
DistCols	Distribution columns, which are the bucketing column types.
IsStable	Indicates whether the group is stable.

You can run the following command to view the data distribution of a specific group.

SHOW PROC '/colocation_group/GroupId';

SHOW PROC '/colocation_group/11912.11916';

The following information is returned.

+-------------+---------------------+
| BucketIndex | BackendIds          |
+-------------+---------------------+
| 0           | 10002, 10004, 10003 |
| 1           | 10002, 10004, 10003 |
| 2           | 10002, 10004, 10003 |
| 3           | 10002, 10004, 10003 |
| 4           | 10002, 10004, 10003 |
| 5           | 10002, 10004, 10003 |
| 6           | 10002, 10004, 10003 |
| 7           | 10002, 10004, 10003 |
+-------------+---------------------+
8 rows in set (0.00 sec)

Modify group properties

ALTER TABLE tbl SET ("colocate_with" = "group_name");

This example uses data from TPC-H. Perform the following operations.

use tpc_h_sf100;
ALTER TABLE orders SET ("colocate_with" = "cg_tpc_orders");
ALTER TABLE lineitem SET ("colocate_with" = "cg_tpc_orders");

Run the following query.

select count(1) from orders as o join lineitem as l on o.o_orderkey = l.l_orderkey;

On the Query Plan tab for a query in EMR StarRocks Manager, you can check whether Colocate Join is in effect.
If colocate is displayed as true, Colocate Join is effective.

Verify that bucketing or partition pruning is effective

On the Query Plan tab for a query in EMR StarRocks Manager, you can check the partition or tabletRatio parameter to verify that partition or bucketing pruning is effective.