All Products
Search
Document Center

Tair:Cpc

Last Updated:Jul 15, 2024

TairCpc is a data structure developed based on the compressed probability counting (CPC) sketch. It supports high-performance computing on sampled data with a small memory footprint.

Background information

In real-time decision-making scenarios that involve big data, the real-time computing system processes incoming business logs, the online storage system stores the processing results, and then the real-time rule-based or decision-making system makes decisions. Sample scenarios:

  • Prevention and control of credit card fraud: In this scenario, your systems must determine whether a credit card is used in a safe environment and stop suspicious transactions at the earliest opportunity.

  • Prevention and control of ticket scalping: In this scenario, your systems must identify and stop activities in real time that use virtual devices and fake IP addresses to undermine platform interests.

In this case, you can use TairCpc to deduplicate real-time data by dimension and structurally store the data in Tair databases. These operations allow fast access to data and the integration of storage and computing. TairCpc also supports multiple aggregation operations to allow data to be aggregated within nanoseconds and provide real-time risk control.

Overview

CPC is a high-performance data deduplication algorithm that counts different values as data streams. It allows you to combine data blocks and deduplicate the blocks to obtain a total number. For more information about CPC, see Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. CPC achieves the same level of accuracy as HLL with about 40% less memory.

Developed based on open source CPC, TairCpc reduces the error rate to 0.008%, as opposed to 0.67% of open source CPC and 1.95% of HLL.

Main features

  • Low memory usage, incremental reads and writes, and minimal I/O

  • High-performance and ultra-high-accuracy deduplication

  • Reduced stable error rate

Typical scenarios

  • Security systems for banks

  • Flash sales

  • Prevention and control of ticket scalping

Prerequisites

The instance is of one of the following Tair series types:

Note

The latest minor version provides more features and higher stability. We recommend that you update the instance to the latest minor version. For more information, see Update the minor version of an instance. If your instance is a cluster instance or read/write splitting instance, we recommend that you update the proxy nodes in the instance to the latest minor version to ensure that all commands can be run as expected.

Precautions

The TairCpc data that you want to manage is stored on a Tair instance.

Supported commands

Table 1. TairCpc commands

Command

Syntax

Description

CPC.UPDATE

CPC.UPDATE key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added.

CPC.ESTIMATE

CPC.ESTIMATE key

Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer.

CPC.UPDATE2EST

CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update. If the key does not exist, the key is created.

CPC.UPDATE2JUD

CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created.

CPC.ARRAY.UPDATE

CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN. Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created.

Note

For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted.

CPC.ARRAY.ESTIMATE

CPC.ARRAY.ESTIMATE key timestamp

Retrieves the cardinality estimate of the specified TairCpc key within the time window to which the specified timestamp belongs.

CPC.ARRAY.ESTIMATE.RANGE

CPC.ARRAY.ESTIMATE.RANGE key start_time end_time

Retrieves the cardinality estimates of the specified TairCpc key across the time windows within the specified time range. The time range is a closed interval.

CPC.ARRAY.ESTIMATE.RANGE.MERGE

CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range

Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backward. N is the value of the range parameter.

CPC.ARRAY.UPDATE2EST

CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command.

CPC.ARRAY.UPDATE2JUD

CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command.

DEL

DEL key [key ...]

Deletes one or more TairCpc keys.

Note

The following list describes the conventions for the command syntax used in this topic:

  • Uppercase keyword: indicates the command keyword.

  • Italic text: indicates variables.

  • [options]: indicates that the enclosed parameters are optional. Parameters that are not enclosed by brackets must be specified.

  • A|B: indicates that the parameters separated by the vertical bars (|) are mutually exclusive. Only one of the parameters can be specified.

  • ...: indicates that the parameter preceding this symbol can be repeatedly specified.

CPC.UPDATE

Parameter

Description

Syntax

CPC.UPDATE key item [EX|EXAT|PX|PXAT time]

Time complexity

O(1)

Command description

Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

Output

  • If the operation is successful, OK is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.UPDATE foo f1 EX 3600

Sample output:

OK

CPC.ESTIMATE

Parameter

Description

Syntax

CPC.ESTIMATE key

Time complexity

O(1)

Command description

Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer.

Parameter

  • key: the name of the TairCpc key.

Output

  • If the operation is successful, the DOUBLE-type estimate is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ESTIMATE foo

Sample output:

"19.000027716212127"

CPC.UPDATE2EST

Parameter

Description

Syntax

CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time]

Time complexity

O(1)

Command description

Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update. If the key does not exist, the key is created.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

Output

  • If the operation is successful, the DOUBLE-type estimate after the update is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.UPDATE2EST foo f3

Sample output:

"3.0000004768373003"

CPC.UPDATE2JUD

Parameter

Description

Syntax

CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time]

Time complexity

O(1)

Command description

Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

Output

  • If the operation is successful, the new estimate after the update and the difference between the original and new estimates are returned. The new estimate and the difference are both of the DOUBLE type.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.UPDATE2JUD foo f20

Sample output:

1) "20.000027716212127"    // The new cardinality estimate of the key after the update is 20. 
2) "1.0000014901183398"    // 20 - 19 = 1

CPC.ARRAY.UPDATE

Parameter

Description

Syntax

CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Time complexity

O(1)

Command description

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN. Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created.

Note

For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • timestamp: the UNIX timestamp. Unit: milliseconds.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.

  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.

Output

  • If the operation is successful, OK is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.UPDATE foo 1645584510000 f1 SIZE 120 WIN 10000

Sample output:

OK

CPC.ARRAY.ESTIMATE

Parameter

Description

Syntax

CPC.ARRAY.ESTIMATE key timestamp

Time complexity

O(1)

Command description

Retrieves the cardinality estimate of the specified TairCpc key within the time window to which the specified timestamp belongs.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • timestamp: the UNIX timestamp. Unit: milliseconds.

Output

  • If the operation is successful, the cardinality estimate of the key within the time window is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.ESTIMATE foo 1645584532000

Sample output:

"2"

CPC.ARRAY.ESTIMATE.RANGE

Parameter

Description

Syntax

CPC.ARRAY.ESTIMATE.RANGE key start_time end_time

Time complexity

O(1)

Command description

Retrieves the cardinality estimates of the specified TairCpc key across the time windows within the specified time range. The time range is a closed interval.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • start_time: the beginning of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.

  • end_time: the end of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.

Output

  • If the operation is successful, the cardinality estimates of the key across the time windows are returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.ESTIMATE.RANGE foo 1645584510000 1645584550000

Sample output:

1) "2"
2) "0"
3) "1"
4) "0"
5) "0"

CPC.ARRAY.ESTIMATE.RANGE.MERGE

Parameter

Description

Syntax

CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range

Time complexity

O(1)

Command description

Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backward. N is the value of the range parameter.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • timestamp: the beginning of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.

  • range: the number of time windows to query.

Output

  • If the operation is successful, the cardinality estimate of the key after deduplication in the specified time range is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.ESTIMATE.RANGE.MERGE foo 1645584510000 3

Sample output:

"6"

CPC.ARRAY.UPDATE2EST

Parameter

Description

Syntax

CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Time complexity

O(1)

Command description

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • timestamp: the UNIX timestamp. Unit: milliseconds.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.

  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.

Output

  • If the operation is successful, the new cardinality estimate after the update is returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.UPDATE2EST foo 1645584530000 f3

Sample output:

"3"

CPC.ARRAY.UPDATE2JUD

Parameter

Description

Syntax

CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]

Time complexity

O(1)

Command description

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command.

Parameter

  • key: the name of the TairCpc key that you want to manage by running this command.

  • timestamp: the UNIX timestamp. Unit: milliseconds.

  • item: the item that you want to add.

  • EX: the relative expiration time of the key in seconds. If this parameter is not specified, the key does not expire.

  • EXAT: the absolute expiration time of the key in seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • PX: the relative expiration time of the key in milliseconds. If this parameter is not specified, the key does not expire.

  • PXAT: the absolute expiration time of the key in milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.

  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.

  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.

Output

  • If the operation is successful, the new estimate within the time window after the update and the difference between the original and new estimates are returned.

  • Otherwise, an error message is returned.

Example

Sample command:

CPC.ARRAY.UPDATE2JUD foo 1645584530000 f7

Sample output:

1) "8"            // The new cardinality estimate of the key after the update is 8. 
2) "1"            // 8 - 7 = 1