Creates a materialized view that supports clustering or partitioning based on the data for materialized view scenarios.

Background information

A view is a stored query accessible as a virtual table. Each time you query a view, the query statement is converted into the SQL statement that is used to define the view. For more information about a view, see View-related operations. A materialized view is a special physical table that occupies storage resources to store real data. For more information about the billing rules for materialized views, see Billing rules.

Materialized views are suitable for the following queries:
  • Queries that are in a fixed mode and are frequently executed
  • Queries that involve time-consuming operations, such as JOIN or AGGREGATE
  • Queries that involve only a small portion of data in a table
The following table compares traditional queries and queries based on materialized views.
Item Traditional query Query based on a materialized view
Query statements SQL statements are used to query data.
select empid, deptname  
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2018-01-01';
Data is queried based on the materialized view that you created.
Sample statement that is used to create a materialized view:
create materialized view mv
as select empid, deptname, hire_date 
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2016-01-01';
Sampe statement that is used to query data based on the materialized view that you created:
select empid, deptname from mv 
where hire_date >= '2018-01-01';
If the query rewrite feature is enabled for the materialized view, data is directly obtained from the query result that is contained in the materialized view when you execute the first SQL statement in the following code:
select empid, deptname 
from emps join depts 
on emps.deptno=depts.deptno 
where hire_date >= '2018-01-01';
-- The preceding statement is equivalent to the following statement: 
select empid, deptname from mv 
where hire_date >= '2018-01-01';
Query characteristics Queries involve table reading, JOIN operations, and filter operations that are performed by using WHERE clauses. If the source table contains a large amount of data, the query speed is slow. The operation complexity is high and the operation efficiency is low. Queries involve table reading and filter operations. JOIN operations are not involved. MaxCompute matches the optimal materialized view and reads data from the optimal materialized view. This greatly improves query efficiency.

Billing rules

When you use materialized views, you are charged for the following items:
  • Storage

    Materialized views occupy physical storage space. You are charged for the physical storage space that is occupied by materialized views. For more information about the billing rules, see Storage pricing (pay-as-you-go).

  • Computing
    Data is queried when you create, update, and query materialized views, and rewrite queries (if materialized views are valid). These operations consume computing resources and generate computing costs.
    • If the billing method of your MaxCompute project is subscription, you are not charged for data computations.
    • If the billing method of your MaxCompute project is pay-as-you-go, you are charged for data computations based on the complexity of the SQL statement and the amount of input data. For more information about the billing rules, see Billing for standard SQL jobs. Take note of the following points:
      • If you execute an SQL statement to create or update a materialized view, the computing fee depends on the amount of input data and complexity of the SQL statement.
      • If the materialized view is valid, data is read from the materialized view when the query rewrite operation is performed. The amount of input data that is read from the materialized view of the query statement is related to the materialized view and is irrelevant to the source table of the materialized view. If the materialized view is invalid, the query rewrite operation cannot be performed, and data is queried from the source table. The amount of input data for the query statement is related to the source table. For more information about how to query the status of a materialized view, see DESC TABLE/VIEW.
      • Data bloat may occur if a materialized view is generated based on the association of multiple tables. Therefore, the amount of data read from a materialized view may not be absolutely less than the amount of data in the source table. MaxCompute cannot guarantee that data reading from a materialized view costs less than data reading from the source table.

Limits

  • Before you use materialized views, take note of the following limits:
    • Window functions are not supported.
    • User-defined table-valued functions (UDTFs) are not supported.
    • By default, non-deterministic functions, such as user-defined scalar functions (UDFs) and user-defined aggregate functions (UDAFs) are not supported. If you must use non-deterministic functions, execute the set odps.sql.materialized.view.support.nondeterministic.function=true; statement at the session level.
  • Before you create a materialized view, take note of the following limits.
    • The name of the materialized view that you want to create cannot be the same as the name of an existing table, view, or materialized view. You can execute the show tables; statement to view the names of all tables and materialized views in a MaxCompute project.
    • You cannot create a materialized view based on an existing materialized view.
    • You cannot create a materialized view based on an external table.

Usage notes

  • If the query statement based on which you create a materialized view fails to be executed, you cannot create the materialized view.
  • Partition key columns in a materialized view must be derived from a source table. The sequence and number of the columns in the materialized view must be the same as the sequence and number of the columns in the source table. Column names can be different.
  • You must specify comments for all columns, including partition key columns. If you specify comments only for some columns, an error is returned.
  • You can specify both the partitioning and clustering properties for a materialized view. In this case, the data in each partition has the specified clustering property.
  • If the query statement based on which you create a materialized view contains operators that are not supported by the materialized view, an error is returned. For more information about the operators that are supported by materialized views, see Perform a query rewrite operation based on a materialized view.
  • By default, MaxCompute does not allow you to create materialized views by using non-deterministic functions, such as UDFs or UDAFs. If you must use non-deterministic functions, run the set odps.sql.materialized.view.support.nondeterministic.function=true; command at the session level.

Syntax

set odps.sql.materialized.view.initial.partition={"Partition key column": "Column value"};
-- Optional. This statement is used to specify the initial partition to prevent excessive unnecessary partitions from being created. 
create materialized view [if not exists] [<project_name>.]<mv_name> 
[lifecycle <days>]
[(<col_name> [comment <col_comment>], ...)]
[{disable|enable rewrite] 
[partitioned on (<col_name> [, <col_name>, ...]) 
[clustered by|range clustered by (<col_name> [, <col_name>, ...])
     [sorted by (<col_name> [asc | desc] [, <col_name> [asc | desc] ...])]  
           into <number_of_buckets> buckets]   
[tblproperties("compressionstrategy"="normal/high/extreme","enable_auto_substitute"="true")]
as <select_statement>;

Parameters

  • if not exists: optional. If you do not specify if not exists and the materialized view that you want to create already exists, an error is returned.
  • project_name: optional. The name of the MaxCompute project to which the materialized view belongs. If you do not configure this parameter, the current MaxCompute project is used. Log on to the MaxCompute console. In the top navigation bar, select a region and view the name of the MaxCompute project on the Project management tab.
  • mv_name: required. The name of the materialized view that you want to create.
  • days: optional. The lifecycle of the materialized view that you want to create. Unit: days.
  • col_name: optional. The name of a column in the materialized view that you want to create.
  • col_comment: optional. The comment on a column in the materialized view that you want to create.
  • disable|enable rewrite: optional. Specifies whether to disable or enable the query rewrite feature for the materialized view. If you do not configure this parameter, the query rewrite feature is enabled for the materialized view. You can execute the alter materialized view [project_name.]<mv_name> disable rewrite; statement to disable this feature for the materialized view. You can also execute the alter materialized view [project_name.]<mv_name> enable rewrite; statement to enable this feature for the materialized view.
  • partitioned on: optional. The partition key columns in the materialized view that you want to create. If you want to create a partitioned materialized view, you must configure this parameter.
  • clustered by|range clustered by: optional. The shuffle attribute of the materialized view that you want to create. If you want to create a clustered materialized view, you must configure the clustered by or range clustered by parameter.
  • sorted by: optional. The sort attribute of the materialized view that you want to create. If you want to create a clustered materialized view, you must configure this parameter.
  • number_of_buckets: optional. The number of buckets in the materialized view that you want to create. If you want to create a clustered materialized view, you must configure this parameter.
  • tblproperties: optional. compressionstrategy specifies the data storage and compression policy of the materialized view that you want to create. Valid values: normal, high, and extreme. enable_auto_substitute specifies whether to automatically query data from the source table if the materialized view does not contain the partition from which you want to query data. For more information, see Perform a penetration query based on a materialized view.
  • select_statement: required. The SELECT statement. For more information about the syntax of the SELECT statement, see SELECT syntax.

Examples

  • Example 1: Create a materialized view that contains a partition key column named ds. Sample statement:
    create materialized view mv lifecycle 7
    (
      key comment 'unique id',
      ds comment 'date'
    ) 
    partitioned on (ds) 
    as select t1.id, t1.value, t1.ds as ds from t1 join t2 on t1.id = t2.id;
  • Example 2: Create a non-partitioned materialized view that is clustered. Sample statement:
    create materialized view mv lifecycle 7 
    clustered by id sorted by value into 1024 buckets  
    as select t1.id, t1.value, t1.ds as ds from t1 join t2 on t1.id = t2.id;
  • Example 3: Create a partitioned materialized view that is clustered. Sample statement:
    create materialized view mv lifecycle 7 
    partitioned on (ds) 
    clustered by id sorted by value into 1024 buckets 
    as select t1.id, t1.value, t1.ds as ds from t1 join t2 on t1.id = t2.id;
  • Example 4: A page visit table named visit_records contains the page ID, user ID, and visit time of each user. The numbers of visits to different pages need to be queried and analyzed. The following code shows the structure of the visit_records table.
    +------------------------------------------------------------------------------------+
    | Field           | Type       | Label | Comment                                     |
    +------------------------------------------------------------------------------------+
    | page_id         | string     |       |                                             |
    | user_id         | string     |       |                                             |
    | visit_time      | string     |       |                                             |
    +------------------------------------------------------------------------------------+
    You can create a materialized view for the visit_records table to collect the number of visits to each page based on the page ID. Then, you can perform subsequent query operations based on the materialized view.
    1. Execute the following statement to create a materialized view:
      create materialized view count_mv as select page_id, count(*) from visit_records group by page_id;
    2. Execute the following query statement:
      set odps.sql.materialized.view.enable.auto.rewriting=true; 
      select page_id, count(*) from visit_records group by page_id;

      When you execute the query statement, MaxCompute can automatically match the materialized view count_mv and read the aggregated data from count_mv.

    3. Execute the following statement to check whether the query statement matches the materialized view:
      explain select page_id, count(*) from visit_records group by page_id;
      The following result is returned:
      job0 is root job
      
      In Job job0:
      root Tasks: M1
      
      In Task M1:
          Data source: doc_test_dev.count_mv
          TS: doc_test_dev.count_mv
              FS: output: Screen
                  schema:
                    page_id (string)
                    _c1 (bigint)
      
      
      OK
      In the returned result, the value of Data source contains count_mv. This indicates that the materialized view is valid and the query is rewritten.

Perform a query rewrite operation based on a materialized view

The most important feature of materialized views is to perform query rewrite operations on query statements. To perform query rewrite operations on a query statement based on a materialized view, you must add set odps.sql.materialized.view.enable.auto.rewriting=true; before the query statement. If a materialized view is invalid, the materialized view cannot be used for query rewrite operations. In this case, data is queried from the source table, and the query speed is not accelerated.

Note By default, a MaxCompute project can use only its materialized views for query rewrite operations. If you want to perform query rewrite operations on query statements based on the materialized views of other MaxCompute projects, you must add set odps.sql.materialized.view.source.project.white.list=<project_name1>,<project_name2>,<project_name3>; before the query statements to specify the MaxCompute projects.

The following table describes the types of operators that are supported by query rewrite operations of materialized views in MaxCompute and other services.

Operator Category MaxCompute BigQuery Amazon RedShift Hive
FILTER Exact match of expressions Supported Supported Supported Supported
Partial match of expressions Supported Supported Supported Supported
AGGREGATE Single AGGREGATE operation Supported Supported Supported Supported
Multiple AGGREGATE operations Not supported Unsupported Unsupported Not supported
JOIN JOIN operation INNER JOIN Not supported INNER JOIN INNER JOIN
Single JOIN operation Supported Not supported Supported Supported
Multiple JOIN operations Supported Not supported Supported Supported
AGGREGATE+JOIN - Supported Not supported Supported Supported

The query rewrite operations based on a materialized view require that the data in a query statement be obtained from the materialized view. The data includes output columns, the columns required by filter operations, the columns required by aggregate functions, and the columns required by JOIN operations. If the columns that are required in the query statement are not included in the materialized view or are not supported by the aggregate functions, you cannot perform query rewrite operations based on the materialized view. Examples:

  • Rewrite a query statement that contains filter conditions. The following statement creates a materialized view.
    create materialized view mv as select a,b,c from src where a>5;
    Execute a query statement based on the created materialized views. The following table lists the query statements before and after the query rewrite operation.
    Query statement before the query rewrite operation Query statement after the query rewrite operation
    select a,b from src where a>5; select a,b from mv;
    select a, b from src where a=10; select a,b from mv where a=10;
    select a, b from src where a=10 and b=3; select a,b from mv where a=10 and b=3;
    select a, b from src where a>3; (select a,b from src where a>3 and a<=5) union (select a,b from mv);
    select a, b from src where a=10 and d=4; The query rewrite operation fails because Column d does not exist in mv.
    select d, e from src where a=10; The query rewrite operation fails because Column d and Column e do not exist in mv.
    select a, b from src where a=1; The query rewrite operation fails because data that meets the expression of a=1 does not exist in mv.
  • Rewrite a query statement with aggregate functions
    • If the SQL statement of a materialized view has the same aggregate key as the query statement, all aggregate functions can be rewritten. If different aggregate keys are used, only aggregate functions SUM, MIN, and MAX are supported.
      The following statement creates a materialized view.
      create materialized view mv as 
      select a, b, sum(c) as sum, count(d) as cnt from src group by a, b;
      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.
      Query statement before the query rewrite operation Query statement after the query rewrite operation
      select a, sum(c) from src group by a; select a, sum(sum) from mv group by a;
      select a, count(d) from src group by a, b; select a, cnt from mv;
      select a, count(b) from (select a, b from src group by a, b) group by a; select a,count(b) from mv group by a;
      select a, count(b) from src group by a; The query rewrite operation fails because Column a and Column b have been aggregated in the materialized view. As a result, Column b cannot be re-aggregated.
      select a, count(c) from src group by a; The query rewrite operation fails because the COUNT function does not support re-aggregation.
    • If an aggregate function includes DISTINCT, the query rewrite operation is supported when the SQL statement of the materialized view has the same aggregate key as the query statement. Otherwise, the query rewrite operation is not supported.
      The following statement creates a materialized view.
      create materialized view mv as 
      select a, b, sum(distinct c) as sum, count(distinct d) as cnt from src group by a, b;
      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.
      Query statement before the query rewrite operation Query statement after the query rewrite operation
      select a, count(distinct d) from src group by a, b; select a, cnt from mv;
      select a, count(c) from src group by a, b; The query rewrite operation fails because the COUNT function does not support re-aggregation.
      select a, count(distinct c) from src group by a; The query rewrite operation fails because data in Column a needs to be re-aggregated.
  • Rewrite a query statement that contains a JOIN clause
    • Data input by using JOIN clauses
      The following statements create materialized views.
      create materialized view mv1 as select a, b from j1 where b > 10;
      create materialized view mv2 as select a, b from j2 where b > 10;
      Execute a query statement based on the created materialized views. The following table lists the query statements before and after the query rewrite operation.
      Query statement before the query rewrite operation Query statement after the query rewrite operation
      select j1.a,j1.b,j2.a from (select a,b from j1 where b > 10) j1 join j2 on j1.a=j2.a; select mv1.a, mv1.b, j2.a from mv1 join j2 on mv1.a=j2.a;
      select j1.a,j1.b,j2.a from(select a,b from j1 where b > 10) j1join(select a,b from j2 where b > 10) j2on j1.a=j2.a; select mv1.a,mv1.b,mv2.a from mv1 join mv2 on mv1.a=mv2.a;
    • JOIN clause with filter conditions
      The following statements create materialized views.
      -- Create non-partitioned materialized views. 
      create materialized view mv1 as select j1.a, j1.b from j1 join j2 on j1.a=j2.a;
      create materialized view mv2 as select j1.a, j1.b from j1 join j2 on j1.a=j2.a where j1.a > 10;
      -- Create a partitioned materialized view. 
      create materialized view mv lifecycle 7 partitioned on (ds) as select t1.id, t1.ds as ds from t1 join t2 on t1.id = t2.id;

      Execute a query statement based on the created materialized views. The following table lists the query statements before and after the query rewrite operation.

      Query statement before the query rewrite operation Query statement after the query rewrite operation
      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a=4; select a, b from mv1 where a=4;
      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 20; select a,b from mv2 where a>20;
      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 5; (select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j1.a > 5 and j1.a <= 10) union select * from mv2;
      select key from t1 join t2 on t1.id= t2.id where t1.ds='20210306'; select key from mv where ds='20210306';
      select key from t1 join t2 on t1.id= t2.id where t1.ds>='20210306'; select key from mv where ds>='20210306';
      select j1.a,j1.b from j1 join j2 on j1.a=j2.a where j2.a=4; The query rewrite operation fails because the materialized view does not have Column j2.a.
    • JOIN clause for joining tables
      The following statement creates a materialized view.
      create materialized view mv as select j1.a, j1.b from j1 join j2 on j1.a=j2.a;
      Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.
      Query statement before the query rewrite operation Query statement after the query rewrite operation
      select j1.a, j1.b from j1 join j2 join j3 on j1.a=j2.a and j1.a=j3.a; select mv.a, mv.b from mv join j3 on mv.a=j3.a;
      select j1.a, j1.b from j1 join j2 join j3 on j1.a=j2.a and j2.a=j3.a; select mv.a,mv.b from mv join j3 on mv.a=j3.a;
    Note
    • The preceding types of statements can be used together. You can rewrite a query statement that meets rewrite conditions.
    • MaxCompute selects the optimal rewrite rule to execute query statements. If operations are added after a query statement is rewritten, MaxCompute does not select the query statement after the rewrite because this query statement is not optimal.
  • Rewrite a query statement that contains a LEFT JOIN clause
    The following statement creates a materialized view.
    create materialized view mv lifecycle 7(
            user_id,
            job,
            total_amount
    ) as select t1.user_id, t1.job, sum(t2.order_amout) as total amount 
          from user_info as t1 left join sale_order as t2 on t1.user_id=t2.user_id group by t1.user_id;
    Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.
    Query statement before the query rewrite operation Query statement after the query rewrite operation
    select t1.user_id, sum(t2.order_amout) as total amount from user_info as t1 left join sale_order as t2 on t1.user_id=t2.user_id group by t1.user_id; select user_id, total_amount from mv;
  • Rewrite a query statement that contains a UNION ALL clause
    The following statement creates a materialized view.
    create materialized view mv lifecycle 7(
            user_id,
            tran_amount,
            tran_date
    ) as (select user_id, tran_amount, tran_date from alipay_tran union all
     select user_id, tran_amount, tran_date from unionpay_tran);
    Execute a query statement based on the created materialized view. The following table lists the query statements before and after the query rewrite operation.
    Query statement before the query rewrite operation Query statement after the query rewrite operation
    select user_id, tran_amount from alipay_tran union all select user_id, tran_amount from unionpay_tran; select user_id, total_amount from mv;

Perform a penetration query based on a materialized view

A partitioned materialized view does not contain the data of all partitions in a source partitioned table if only the latest partition data is updated to the partitioned materialized view. If the partition data that you want to query does not exist in a partitioned materialized view, the system performs a penetration query to query data from the source partitioned table. The following figure shows how to perform a penetration query.

Penetration query
To allow a materialized view to support penetration query, you must perform the following configurations:
  • Set odps.optimizer.cbo.rule.filter.black to re at the session level to enable constant folding for partition key columns.
  • When you create a materialized view, add "enable_auto_substitute"="true" to tblproperties.

The following example shows how to perform a penetration query based on a materialized view.

  1. Create a partitioned materialized view that supports penetration query. Sample statements:
    -- Enable constant folding for partition key columns in partitions to ensure that partition data is dynamically changed. 
    set odps.optimizer.cbo.rule.filter.black=re;
    -- Initialize the data of the 20210101 partition. 
    set odps.sql.materialized.view.initial.partition={"dt": "20210101"};
    -- Create a partitioned materialized view that supports penetration query. 
    create materialized view if not exists mv lifecycle 7 
    tblproperties("enable_auto_substitute"="true") 
    partitioned on (dt)
    as select id, name, dt from src;
  2. Query the data of the 20210101 partition in the src table from the mv materialized view. Sample statement:
    select * from mv where dt='20210101';
  3. Query the data of the 20210102 partition in the src table from the mv materialized view. The mv materialized view does not contain the partition data. Therefore, a penetration query is automatically performed to query the partition data from the src table. Sample statement:
    select * from mv where dt = '20210102';
    -- The preceding statement is equivalent to the following statement because the mv materialized view does not contain the data of the 20210102 partition and the partition data needs to be queried from the src table.
    select * from (select id, name, dt from src where dt='20210102') t;
  4. Query the data of partitions 20201230 to 20210102 from the mv materialized view. The mv materialized view does not contain all partition data. Therefore, a penetration query is automatically performed to obtain the data that is not contained in the mv materialized view, and a UNION operation is performed on the obtained data and the data that is queried from the mv materialized view to return the final result.
    select * from mv where dt >= '20201230' and dt<='20210102' and id='5';
    -- The mv materialized view does not contain the data of the 20210102 partition and a penetration query is performed to query the data of the 20210102 partition from the source table. The preceding statement is equivalent to the following statement:
    select * from
    (select id, name, dt from src where dt='20211231' or dt='20210102';
     union all 
     select * from mv where dt='20210101'
    ) t where id = '5';

Related statements

  • ALTER MATERIALIZED VIEW: Updates a materialized view, changes the lifecycle of a materialized view, enables or disables the lifecycle feature for a materialized view, or drops partitions from a materialized view.
  • DESC TABLE/VIEW: Views the information about a materialized view in a MaxCompute project.
  • SELECT MATERIALIZED VIEW: Queries the status of a materialized view.
  • DROP MATERIALIZED VIEW: Drops an existing materialized view.