TairCpc is a data structure developed based on the compressed probability counting (CPC) sketch. It allows you to perform high-performance computing on sampled data while using a small amount of memory.

Background information

In real-time decision-making scenarios that involve big data, the real-time computing system processes incoming business logs, the online storage system stores the processing results, and then the real-time rule-based or decision-making system makes decisions. Sample scenarios:
  • Prevention and control of credit card fraud: In this scenario, your systems must determine whether a credit card is used in a safe environment and stop suspicious transactions at the earliest opportunity.
  • Prevention and control of ticket scalping: In this scenario, your systems must identify and stop activities in real time that use virtual devices and fake IP addresses to undermine platform interests.

In this case, you can use TairCpc to deduplicate real-time data by dimension and structurally store the data in Tair databases. These operations allow fast access to data and the integration of storage and computing. TairCpc also supports multiple aggregation operations to allow data to be aggregated within nanoseconds and provide real-time risk control.

Overview

CPC is a high-performance data deduplication algorithm that counts different values as data streams. It allows you to combine data blocks and deduplicate the blocks to obtain a total number. For more information about CPC, see Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. CPC achieves the same accuracy as HyperLogLog (HLL) with about 40% less memory.

Developed based on open source CPC, TairCpc reduces the error rate to 0.008%, as opposed to 0.67% of open source CPC and 1.95% of HLL.

Main features
  • Low memory usage, incremental reads and writes, and minimal I/O
  • High-performance and ultra-high-accuracy deduplication
  • Reduced stable error rate
Typical scenarios
  • Security systems for banks
  • Flash sales
  • Prevention and control of ticket scalping

Prerequisites

  • The instance is a performance-enhanced instance of the ApsaraDB for Redis Enhanced Edition (Tair) that runs the minor version of 1.7.20 or later. For more information about performance-enhanced instances, see Performance-enhanced instances.
  • The instance is a persistent memory-optimized instance of the ApsaraDB for Redis Enhanced Edition (Tair) that runs the minor version of 1.2.3.3 or later. For more information about persistent memory-optimized instances, see Persistent memory-optimized instances.
Note The latest minor version provides more features and higher stability. We recommend that you update your instance to the latest minor version. For more information, see Update the minor version. If your instance is a cluster instance or read/write splitting instance, we recommend that you update the proxy nodes in the instance to the latest minor version. This ensures that all commands can be run as expected. For more information about cluster instances and read/write splitting instances, see Cluster master-replica instances and Read/write splitting instances.

Precautions

The TairCpc data to be managed is stored on the instance.

Supported commands

Table 1. TairCpc commands
Command Syntax Description
CPC.UPDATE CPC.UPDATE key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added.

CPC.ESTIMATE CPC.ESTIMATE key

Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer.

CPC.UPDATE2EST CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key. If the key does not exist, the key is created.

CPC.UPDATE2JUD CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time]

Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created.

CPC.ARRAY.UPDATE CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]
Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN. Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created.
Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted.
CPC.ARRAY.ESTIMATE CPC.ARRAY.ESTIMATE key timestamp

Retrieves the cardinality estimate of a specified TairCpc key within the time window to which the specified timestamp belongs.

CPC.ARRAY.ESTIMATE.RANGE CPC.ARRAY.ESTIMATE.RANGE key start_time end_time

Retrieves the cardinality estimates of the time windows that reside in the specified time range in the specified TairCpc key. The time range is a closed interval.

CPC.ARRAY.ESTIMATE.RANGE.MERGE CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range

Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication at a specific point in time and during a subsequent time range.

CPC.ARRAY.UPDATE2EST CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE.
CPC.ARRAY.UPDATE2JUD CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE.
DEL DEL key [key ...] Deletes one or more TairCpc keys.
Note The following section describes command syntax used in this topic:
  • Uppercase keyword: the command keyword.
  • Italic: Words in italic indicate variable information that you supply.
  • [options]: optional parameters. Parameters that are not included in brackets are required.
  • AB: specifies that these parameters are mutually exclusive. Select one of two or more parameters.
  • ...: specifies to repeat the preceding content.

CPC.UPDATE

Item Description
Syntax CPC.UPDATE key item [EX|EXAT|PX|PXAT time]
Time complexity O(1)
Command description

Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
Output
  • If the operation is successful, OK is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.UPDATE foo f1 EX 3600

Sample output:

OK

CPC.ESTIMATE

Item Description
Syntax CPC.ESTIMATE key
Time complexity O(1)
Command description

Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer.

Parameter
  • key: the name of the TairCpc key.
Output
  • If the operation is successful, the DOUBLE-typed estimate is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ESTIMATE foo

Sample output:

"19.000027716212127"

CPC.UPDATE2EST

Item Description
Syntax CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time]
Time complexity O(1)
Command description

Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key. If the key does not exist, the key is created.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
Output
  • If the operation is successful, the DOUBLE-typed updated estimate is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.UPDATE2EST foo f3

Sample output:

"3.0000004768373003"

CPC.UPDATE2JUD

Item Description
Syntax CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time]
Time complexity O(1)
Command description

Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
Output
  • If the operation is successful, the DOUBLE-typed updated estimate and difference are returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.UPDATE2JUD foo f20

Sample output:

1) "20.000027716212127"    // The updated cardinality estimate of the key is 20. 
2) "1.0000014901183398"    // 20 - 19 = 1

CPC.ARRAY.UPDATE

Item Description
Syntax CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]
Time complexity O(1)
Command description
Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN. Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created.
Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted.
Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • timestamp: the UNIX timestamp. Unit: milliseconds.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.
  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.
Output
  • If the operation is successful, OK is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.UPDATE foo 1645584510000 f1 SIZE 120 WIN 10000

Sample output:

OK

CPC.ARRAY.ESTIMATE

Item Description
Syntax CPC.ARRAY.ESTIMATE key timestamp
Time complexity O(1)
Command description

Retrieves the cardinality estimate of a specified TairCpc key within the time window to which the specified timestamp belongs.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • timestamp: the UNIX timestamp. Unit: milliseconds.
Output
  • If the operation is successful, the cardinality estimate of the key within the time window is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.ESTIMATE foo 1645584532000

Sample output:

"2"

CPC.ARRAY.ESTIMATE.RANGE

Item Description
Syntax CPC.ARRAY.ESTIMATE.RANGE key start_time end_time
Time complexity O(1)
Command description

Retrieves the cardinality estimates of the time windows that reside in the specified time range in the specified TairCpc key. The time range is a closed interval.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • start_time: the beginning of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.
  • end_time: the end of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.
Output
  • If the operation is successful, the cardinality estimates of the key within the time windows are returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.ESTIMATE.RANGE foo 1645584510000 1645584550000

Sample output:

1) "2"
2) "0"
3) "1"
4) "0"
5) "0"

CPC.ARRAY.ESTIMATE.RANGE.MERGE

Item Description
Syntax CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range
Time complexity O(1)
Command description

Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication at a specific point in time and during a subsequent time range.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • timestamp: the beginning of the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.
  • range: the time range to query. Unit: milliseconds. The value must be a UNIX timestamp.
Output
  • If the operation is successful, the cardinality estimate of the key after deduplication in the specified time range is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.ESTIMATE.RANGE.MERGE foo 1645584510000 100000

Sample output:

"6"

CPC.ARRAY.UPDATE2EST

Item Description
Syntax CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]
Time complexity O(1)
Command description

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • timestamp: the UNIX timestamp. Unit: milliseconds.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.
  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.
Output
  • If the operation is successful, the updated cardinality estimate is returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.UPDATE2EST foo 1645584530000 f3

Sample output:

"3"

CPC.ARRAY.UPDATE2JUD

Item Description
Syntax CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length]
Time complexity O(1)
Command description

Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE.

Parameter
  • key: the name of the TairCpc key that you want to manage by running this command.
  • item: the item that you want to add.
  • EX: the relative expiration time of the key. Unit: seconds. If this parameter is not specified, the key does not expire.
  • EXAT: the absolute expiration time of the key. Unit: seconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • PX: the relative expiration time of the key. Unit: milliseconds. If this parameter is not specified, the key does not expire.
  • PXAT: the absolute expiration time of the key. Unit: milliseconds. The value must be a UNIX timestamp. If this parameter is not specified, the key does not expire.
  • SIZE: the number of time windows. Default value: 10. Valid values: 1 to 1000. We recommend that you set this parameter to a value that is less than 120.
  • WIN: the length of each time window. Unit: milliseconds. Default value: 60000. 60000 milliseconds are equal to 1 minute.
Output
  • If the operation is successful, the updated estimate and the difference within the time window are returned.
  • Otherwise, an error message is returned.
Example

Sample command:

CPC.ARRAY.UPDATE2JUD foo 1645584530000 f7

Sample output:

1) "8"            // The updated cardinality estimate of the key is 8. 
2) "1"            // 8 - 7 = 1