TairCpc is a data structure developed based on the compressed probability counting (CPC) sketch. It allows you to perform high-performance computing on sampled data with a small amount of memory.
Background information
- Prevention and control of credit card fraud: In this scenario, your systems must determine whether a credit card is used in a safe environment and stop suspicious transactions at the earliest opportunity.
- Prevention and control of ticket scalping: In this scenario, your systems must identify and stop activities in real time that use virtual devices and fake IP addresses to undermine platform interests.
In this case, you can use TairCpc to deduplicate real-time data by dimension and structurally store the data in Tair databases. These operations allow fast access to data and the integration of storage and computing. TairCpc also supports multiple aggregation operations to allow data to be aggregated within nanoseconds and provide real-time risk control.
Overview
CPC is a high-performance data deduplication algorithm that counts different values as data streams. It allows you to combine data blocks and deduplicate the blocks to obtain a total number. For more information about CPC, see Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. CPC achieves the same accuracy as HyperLogLog (HLL) with about 40% less memory.
Developed based on open source CPC, TairCpc reduces the error rate to 0.008%, as opposed to 0.67% of open source CPC and 1.95% of HLL.
Main features- Low memory usage, incremental reads and writes, and minimal I/O
- High-performance and ultra-high-accuracy deduplication
- Reduced stable error rate
- Security systems for banks
- Flash sales
- Prevention and control of ticket scalping
Prerequisites
- For more information about performance-enhanced instances, see DRAM-based instances.
- The instance on which TairCpc data is stored is a persistent memory-optimized instance of the ApsaraDB for Redis Enhanced Edition (Tair) that runs the minor version of 1.2.3.3 or later. For more information about persistent memory-optimized instances, see Persistent memory-optimized instances.
Precautions
The TairCpc data that you want to manage is stored on instances of the ApsaraDB for Redis Enhanced Edition (Tair).
Supported commands
Command | Syntax | Description |
CPC.UPDATE | CPC.UPDATE key item [EX|EXAT|PX|PXAT time] | Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added. |
CPC.ESTIMATE | CPC.ESTIMATE key | Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer. |
CPC.UPDATE2EST | CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time] | Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key. If the key does not exist, the key is created. |
CPC.UPDATE2JUD | CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time] | Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. |
CPC.ARRAY.UPDATE | CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN . Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created. Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted. |
CPC.ARRAY.ESTIMATE | CPC.ARRAY.ESTIMATE key timestamp | Retrieves the cardinality estimate of a specified TairCpc key within the time window to which the specified timestamp belongs. |
CPC.ARRAY.ESTIMATE.RANGE | CPC.ARRAY.ESTIMATE.RANGE key start_time end_time | Retrieves the cardinality estimates of the time windows that reside in the specified time range in the specified TairCpc key. The time range is a closed interval. |
CPC.ARRAY.ESTIMATE.RANGE.MERGE | CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range | Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backwards. N is the value of the range parameter. |
CPC.ARRAY.UPDATE2EST | CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE. |
CPC.ARRAY.UPDATE2JUD | CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE. |
DEL | DEL key [key ...] | Deletes one or more TairCpc keys. |
Uppercase keyword
: the command keyword.Italic
: Words in italic indicate variable information that you supply.[options]
: optional parameters. Parameters that are not included in brackets are required.AB
: specifies that these parameters are mutually exclusive. Select one of two or more parameters....
: specifies to repeat the preceding content.
CPC.UPDATE
Item | Description |
Syntax | CPC.UPDATE key item [EX|EXAT|PX|PXAT time] |
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ESTIMATE
Item | Description |
Syntax | CPC.ESTIMATE key |
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.UPDATE2EST
Item | Description |
Syntax | CPC.UPDATE2EST key item [EX|EXAT|PX|PXAT time] |
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key. If the key does not exist, the key is created. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.UPDATE2JUD
Item | Description |
Syntax | CPC.UPDATE2JUD key item [EX|EXAT|PX|PXAT time] |
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key and returns the updated cardinality estimate of the key and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE
Item | Description |
Syntax | CPC.ARRAY.UPDATE key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] |
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. SIZE indicates the number of time windows, and WIN indicates the length of each time window. The length is measured in milliseconds. The key is updated as data streams are added to the key. During this process, data that is generated during a time-window range is saved. The time-window range is calculated by using the following formula: Time-window range = SIZE × WIN . Data that is generated outside of this time-window range is overwritten and deleted. SIZE and WIN are valid only at the point in time when the key is created. Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set SIZE to 10 (10 time windows) and WIN to 60000 (1 minute for each time window). In this case, if you write the data that was generated during the 11th minute to the key, the data that was generated during the first minute is overwritten and deleted. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE
Item | Description |
Syntax | CPC.ARRAY.ESTIMATE key timestamp |
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of a specified TairCpc key within the time window to which the specified timestamp belongs. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE.RANGE
Item | Description |
Syntax | CPC.ARRAY.ESTIMATE.RANGE key start_time end_time |
Time complexity | O(1) |
Command description | Retrieves the cardinality estimates of the time windows that reside in the specified time range in the specified TairCpc key. The time range is a closed interval. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE.RANGE.MERGE
Item | Description |
Syntax | CPC.ARRAY.ESTIMATE.RANGE.MERGE key timestamp range |
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backwards. N is the value of the range parameter. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE2EST
Item | Description |
Syntax | CPC.ARRAY.UPDATE2EST key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] |
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE2JUD
Item | Description |
Syntax | CPC.ARRAY.UPDATE2JUD key timestamp item [EX|EXAT|PX|PXAT time] [SIZE size] [WIN window_length] |
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the updated cardinality estimate of the key within the time window and the difference between the original and updated estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command acts like CPC.ARRAY.UPDATE. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|