All Products
Search
Document Center

MaxCompute:Time travel queries and incremental queries

Last Updated:Apr 17, 2024

In MaxCompute, you can perform time travel queries and incremental queries on Transaction Table 2.0 tables. If you perform a time travel query, you can query historical data based on the snapshot of the source table at a historical point in time or of a historical version. If you perform an incremental query, you can query historical incremental data within a historical period of time or between two versions of a source table. This topic describes the usage notes and limits of queries on Transaction Table 2.0 tables.

Syntax

[with <cte>[, ...] ]
select [all | distinct] <select_expr>[, <except_expr>)][, <replace_expr>] ...
       from <table_reference>
       [timestamp | version as of expr]
       [timestamp | version between start_expr and end_expr] 
       [where <where_condition>]
       [group by {<col_list>|rollup(<col_list>)}]
           [having <having_condition>]
       [order by <order_condition>]
       [distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
       [limit <number>]
       [window <window_clause>]
Note

In most cases, you can execute data query language (DQL) statements to query Transaction Table 2.0 tables in all scenarios. The syntax and limits of the DQL statements basically comply with the syntax and limits of MaxCompute DQL. The only difference is that the syntax of the FROM TABLE clause is optimized in DQL statements for Transaction Table 2.0 tables. The optimized FROM TABLE clause provides a fixed-format expression that allows you to specify a historical point in time or historical version for time travel queries or specify a time range or version range for incremental queries.

Parameters and limits of time travel queries

If you perform a time travel query, you can query historical data based on snapshots of Transaction Table 2.0 source tables at a historical point in time or of a historical version. You can use the [timestamp | version as of expr] syntax to configure time travel queries.

timestamp as of expr

  • Parameters

    • timestamp as of: a fixed syntax format, which is followed by a historical point in time. Historical data that was stored before the historical point in time is queried based on snapshots.

    • expr: a value of the standard TIMESTAMP, DATETIME, or DATE type that is supported by MaxCompute. The value can be in one of the following formats:

      • Date string constant

        The following table provides examples of string constants of the TIMESTAMP, DATETIME, and DATE types.

        Data type

        Example

        timestmap

        '2023-01-01 00:00:00.123'

        datetime

        '2023-01-01 00:00:00'

        date

        '2023-01-01'

      • Built-in time functions of MaxCompute

        current_timestamp() | getDate() + N: current_timestamp() | getDate()

        In the preceding functions, N is expressed in seconds. If N is set to a negative number, the time that is N seconds before the current time is specified. If N is set to a positive number, the time that is N seconds after the current time is specified.

      • Special syntax for Transaction Table 2.0 tables

        get_latest_timestamp(string tablename [, bigint <number>])

        • For cross-project access, the value of the tablename parameter must be in the projectName.tableName format.

        • For a three-layer model, the value of the tablename parameter must be in the projectName.schemaName.tableName format.

        • The number parameter is optional. The default value is 1. This parameter specifies the sequence number of a data operation in reverse chronological order. For example, to obtain the time when the last data operation is committed, set this parameter to 1. Data operations include data modifications that are initiated by users and data sorting and distribution that are initiated by the system. The returned timestamp may be the same even if the number parameter is set to different values.

  • Limits

    • The historical time range for time travel queries is [CreateTableTimestamp, expr]. The expr parameter specifies the time when a DML operation is committed, and the CreateTableTimestamp parameter specifies the time when the table creation operation is committed.

    • If the time specified by the expr parameter is more than N hours ago or is earlier than the time when the Transaction Table 2.0 table is created, an error is returned because specific historical data may not exist. N is specified by the acid.data.retain.hours parameter that is configured during table creation. For example, if the acid.data.retain.hours parameter is set to 72 in hours, and the expr parameter is set to a time more than 72 hours ago, an error is returned.

    • If the time specified by the expr parameter is N hours ago, an error may be returned. N is specified by the acid.data.retain.hours parameter that is configured during table creation. This is because latency occurs when internal systems interact with each other, and a second-level error may occur. Therefore, we recommend that you do not use syntax that is similar to timestamp as of current_timestamp() - acid.data.retain.hours.

version as of expr

  • Parameters

    • version as of: a fixed syntax format, which is followed by a historical data operation version. Historical data that was stored in versions before the historical data operation version is queried based on snapshots.

    • expr: a value of the BIGINT type that is supported by MaxCompute. The value supports the following formats:

      • Constant of the BIGINT type

        Example: 3.

      • Special syntax for Transaction Table 2.0 tables

        get_latest_version(string tablename [, bigint <number>])

        • For cross-project access, the value of the tablename parameter must be in the projectName.tableName format.

        • For a three-layer model, the value of the tablename parameter must be in the projectName.schemaName.tableName format.

        • The number parameter is optional. The default value is 1. This parameter specifies the sequence number of a data operation in reverse chronological order. For example, to obtain the version of the last data operation, set this parameter to 1. Data operations include data modifications that are initiated by users and data sorting and distribution that are initiated by the system. The returned version varies with the value of the number parameter.

  • Limits

    • Each DML operation generates a version that is strictly incremented. You can view information about all DML operations by using the show history for table / partition syntax. You can obtain the operation version from the output.

    • The historical time range for time travel queries is [CreateTableVersion, expr]. The expr parameter specifies the version of a DML operation. The CreateTableVersion parameter specifies the version of the table creation operation. The default value is 1.

    • The system obtains the time when a DML operation is committed based on the version specified by the expr parameter. If the time is more than N hours ago or the version is less than 1, an error is returned. N is specified by the acid.data.retain.hours parameter that is configured during table creation.

    • If the version specified by the expr parameter is later than the version of the last DML operation, an error is returned. We recommend that you obtain the version by using the get_latest_version function.

Parameters and limits of incremental queries

Only Transaction Table 2.0 tables support incremental queries. If you perform an incremental query, you can query historical incremental data within a historical period of time or between two versions of a source table. You can use the [timestamp | version between start_expr and end_expr] syntax to configure incremental queries.

timestamp between start_expr and end_expr

  • Parameters

    • timestamp between and: a fixed syntax format, which specifies a historical time range. Historical incremental data that was generated within the time range is queried.

    • start_expr and end_expr: The usage and limits of the two parameters are consistent with those of the expr parameter in the timestamp as of syntax.

  • Limits

    • The time range specified by (start_expr, end_expr] is a left-open, right-closed interval. The start_expr and end_expr parameters specify the time when DML operations are committed.

    • If the time specified by the start_expr parameter is more than N hours ago or is earlier than the table creation time, an error is returned. N is specified by the acid.data.retain.hours parameter.

    • If the time specified by the end_expr parameter is later than the time when the last DML operation is committed, the query result varies based on the value of the acid.incremental.query.out.of.time.range.enabled parameter:

      • If this parameter is set to the default value false, an error is returned.

      • If this parameter is set to true, all historical incremental data within the time range specified by (start_expr,end_expr] is returned.

        You can execute the ALTER TABLE statement to modify the value of this parameter. Sample statement:

        alter table mf_tt2 set tblproperties("acid.incremental.query.out.of.time.range.enabled"="true");

version between start_expr and end_expr

  • Parameters

    • version between and: a fixed syntax format, which specifies a version range of historical DML operations. Historical incremental data that was generated within the version range is returned.

    • start_expr and end_expr: The usage and limits of the two parameters are consistent with those of the expr parameter in the version as of syntax.

  • Limits

    • The version range specified by (start_expr, end_expr] is a left-open, right-closed interval. The start_expr and end_expr parameters specify the versions of DML operations.

    • The system obtains the time when a DML operation is committed based on the version specified by the start_expr parameter. If the time is more than N hours ago or the version is less than 1, an error is returned. N is specified by the acid.data.retain.hours parameter that is configured for time travel queries.

    • If the version specified by the end_expr parameter is later than the version of the last DML operation, the query result varies based on the value of the acid.incremental.query.out.of.time.range.enabled parameter. If this parameter is set to the default value false, an error is returned. If this parameter is set to true, all historical incremental data that was generated within the version range specified by (start_expr, end_expr] is returned.

Others

  • If multiple rows of records that have the same key are queried, only the latest row is returned. If the latest row is in the DELETE state, this row is filtered out. The feature of querying the update status of data in formats similar to Change Data Capture (CDC) is planned to be provided in the future.

  • You cannot query the historical tables that do not exist. For example, after you perform the drop or rename operation on a table, you cannot query the historical data of the original table.

    If you want to query the historical data in this type of table, you can restore a table and then perform queries.

  • Only upsertable transactional tables support time travel queries and incremental queries.

  • If you want to perform a time travel query and an incremental query on the same table in an SQL, you must set the timestamps or versions for the queries to the same value.

  • If you want to perform queries on a partitioned table, we recommend that you specify a partition for the query. This way, only the specified partition is queried, and less time is consumed.

  • The Multiversion Concurrency Control (MVCC) model is used for concurrent read and write transactions. This ensures that read and write operations are isolated and do not affect each other. The ReadCommitted level is supported.

  • Data that is generated by the compaction operation is not considered new data. Therefore, this type of data is not returned when you perform incremental queries.

Examples

  • Sample data

    -- Create a table. The version of this operation is 1. You can execute the show history for table mf_tt2 statement to query the version.
    create table mf_tt2 (pk bigint not null primary key, 
                      val bigint not null) 
                      partitioned by (dd string, hh string) 
                      tblproperties ("transactional"="true"); 
    -- The version of the following DML operation is 2.
    insert overwrite table mf_tt2 partition(dd='01', hh='01') 
                     values (1, 1), (2, 2), (3, 3);
    -- The version of the following DML operation is 3.
    insert into table mf_tt2 partition(dd='01', hh='01') 
                values (3, 30), (4, 4), (5, 5);
  • Query table data.

    • Query the table creation time, which is used as a reference when you specify a historical point in time.

      desc extended mf_tt2;

      The following result is returned:

      +------------------------------------------------------------------------------------+
      | Owner: ALIYUN$****_doctest@test.aliyunid.com | Project: doc_test_prod                               |
      | TableComment:                                                                      |
      +------------------------------------------------------------------------------------+
      | CreateTime:               2023-06-26 09:31:38                                      |
      | LastDDLTime:              2023-06-26 09:31:38                                      |
      | LastModifiedTime:         2023-06-26 09:32:31                                      |
      +------------------------------------------------------------------------------------+
      | InternalTable: YES      | Size: 8541                                               |
      +------------------------------------------------------------------------------------+
      | Native Columns:                                                                    |
      +------------------------------------------------------------------------------------+
      | Field    | Type   | Label | ExtendedLabel | Nullable | DefaultValue | Comment      |
      +------------------------------------------------------------------------------------+
      | pk       | bigint |       |               | false    | NULL         |              |
      | val      | bigint |       |               | false    | NULL         |              |
      +------------------------------------------------------------------------------------+
      | Partition Columns:                                                                 |
      +------------------------------------------------------------------------------------+
      | dd              | string     |                                                     |
      | hh              | string     |                                                     |
      +------------------------------------------------------------------------------------+
      | Extended Info:                                                                     |
      +------------------------------------------------------------------------------------+
      | TableID:                  bec515a56cc9492c8f906a224c62****                         |
      | IsArchived:               false                                                    |
      | PhysicalSize:             25623                                                    |
      | FileNum:                  9                                                        |
      | StoredAs:                 AliOrc                                                   |
      | CompressionStrategy:      normal                                                   |
      | ClusterType:              hash                                                     |
      | BucketNum:                16                                                       |
      | ClusterColumns:           [pk]                                                     |
      | SortColumns:              [pk ASC]                                                 |
      +------------------------------------------------------------------------------------+
    • Query the version of a historical data operation.

      show history for table mf_tt2 partition(dd='01',hh='01');

      The following result is returned:

      ID = 20230626021756157ghict5k****
      ObjectType      ObjectId                                ObjectName              VERSION(LSN)            Time                    Operation
      PARTITION       4764c8e1cb634a4fb9c21f3fc850****        dd=01/hh=01             0000000000000002        2023-06-26 09:31:56     CREATE
      PARTITION       4764c8e1cb634a4fb9c21f3fc850****        dd=01/hh=01             0000000000000003        2023-06-26 09:32:32     APPEND
  • Examples of time travel queries

    • Query all historical data that was stored before a specific point in time, such as the time specified by a string constant of the DATETIME type.

      select * from mf_tt2 timestamp as of '2023-06-26 09:33:00' 
               where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 1          | 1          | 01 | 01 |
      | 3          | 30         | 01 | 01 |
      | 4          | 4          | 01 | 01 |
      | 5          | 5          | 01 | 01 |
      | 2          | 2          | 01 | 01 |
      +------------+------------+----+----+
    • Query all historical data that was stored before a version specified by a constant.

      select * from mf_tt2 version as of 2 where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 1          | 1          | 01 | 01 |
      | 3          | 3          | 01 | 01 |
      | 2          | 2          | 01 | 01 |
      +------------+------------+----+----+
    • Query all historical data that was stored up to the current time.

      select * from mf_tt2 timestamp as of current_timestamp() where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 1          | 1          | 01 | 01 |
      | 3          | 30         | 01 | 01 |
      | 4          | 4          | 01 | 01 |
      | 5          | 5          | 01 | 01 |
      | 2          | 2          | 01 | 01 |
      +------------+------------+----+----+
    • Query all historical data that was stored up to 10 seconds ago.

      select * from mf_tt2 timestamp as of current_timestamp() - 10 where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 1          | 1          | 01 | 01 |
      | 3          | 30         | 01 | 01 |
      | 4          | 4          | 01 | 01 |
      | 5          | 5          | 01 | 01 |
      | 2          | 2          | 01 | 01 |
      +------------+------------+----+----+
    • Query all historical data that was stored up to the time when the last second operation was committed.

      • Example 1

        select * from mf_tt2 timestamp as of get_latest_timestamp('mf_tt2', 2) where dd = '01' and hh = '01';
      • Example 2

        select * from mf_tt2 version as of get_latest_version('mf_tt2', 2) where dd = '01' and hh = '01';

        The following result is returned:

        +------------+------------+----+----+
        | pk         | val        | dd | hh |
        +------------+------------+----+----+
        | 1          | 1          | 01 | 01 |
        | 3          | 3          | 01 | 01 |
        | 2          | 2          | 01 | 01 |
        +------------+------------+----+----+
  • Examples of incremental queries

    • Query the historical incremental data within a specified period of time. For example, set the start time and end time to string constants of the DATETIME type based on the time when operations are committed.

      select * from mf_tt2 timestamp between '2023-06-26 09:31:40' and '2023-06-26 09:32:00' where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 1          | 1          | 01 | 01 |
      | 3          | 3          | 01 | 01 |
      | 2          | 2          | 01 | 01 |
      +------------+------------+----+----+
    • Query historical incremental data between two specified versions.

      select * from mf_tt2 version between 2 and 3 where dd = '01' and hh = '01';

      The following result is returned:

      +------------+------------+----+----+
      | pk         | val        | dd | hh |
      +------------+------------+----+----+
      | 3          | 30         | 01 | 01 |
      | 4          | 4          | 01 | 01 |
      | 5          | 5          | 01 | 01 |
      +------------+------------+----+----+
    • Query historical incremental data within the most recent 300 seconds.

      • Use the default value false for the acid.incremental.query.out.of.time.range.enabled property.

        select * from mf_tt2 timestamp between current_timestamp() - 301 and current_timestamp() where dd = '01' and hh='01';

        An error is returned:

        FAILED: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: 
        com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.metadata.common.MetastoreServerException: 
        Incremental query can't exceed current version. Current version timestamp: 2023-06-26 09:32:32, input timestamp is: 2023-06-26 10:47:55
      • Set the acid.incremental.query.out.of.time.range.enabled property to true.

        alter table mf_tt2 set tblproperties("acid.incremental.query.out.of.time.range.enabled"="true");
      • Perform the query again.

        select * from mf_tt2 timestamp between current_timestamp() - 301 and current_timestamp() where dd = '01' and hh='01';

        The following result is returned:

        +------------+------------+----+----+
        | pk         | val        | dd | hh |
        +------------+------------+----+----+
        +------------+------------+----+----+
    • Query the historical incremental data between the points in time when the last two operations are committed.

      • Sample statement

        select * from mf_tt2 timestamp between get_latest_timestamp('mf_tt2', 3) and get_latest_timestamp('mf_tt2') where dd = '01' and hh = '01';
      • Sample statement

        select * from mf_tt2 version between get_latest_version('mf_tt2', 3) and get_latest_version('mf_tt2') where dd = '01' and hh = '01';
      • Returned result

        +------------+------------+----+----+
        | pk         | val        | dd | hh |
        +------------+------------+----+----+
        | 1          | 1          | 01 | 01 |
        | 3          | 30         | 01 | 01 |
        | 4          | 4          | 01 | 01 |
        | 5          | 5          | 01 | 01 |
        | 2          | 2          | 01 | 01 |
        +------------+------------+----+----+