[MySQL]Principles of pt-table-checksum
Created#More Posted time:Sep 1, 2016 15:47 PM
MySQL: MySQL 5.6.27
OS: centos 6.6
tool: pt-table-checksum 2.2.15
What can it do
The most popular MySQL master-slave data comparison tool and the best data consistency checking tool, with no rivals
How to use it
./pt-table-checksum -hxx -P 3306 -u backup -p backup --no-check-binlog-format --databases=xx_db,yy_db,zz_db --no-check-replication-filters
How to find inconsistencies
On * slave, run:
SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;
The general principles are available on the internet. Here I will illustrate some key points.
Suppose we have one library, one table and 100 records
When pt-table-checksum performs a comparison, it compares chunk by chunk instead of record by record.
So we can divide the 100 records into 10 chunks, with each chunk containing 10 records.
The comparison progresses one chunk after another until the comparison for all chunks are completed. So we can describe the principle with the following chunk as an example.
1. master> /*!50108 SET @@binlog_format := 'STATEMENT'*/ Set binlog-format as statement
2. master> SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ This is the isolation level of RR. RR features static data so no lock is needed.
3. master> checksums table: Use REPLACE INTO select to set this_cnt, this_crc (pass to the slave. This is actually setting the cnt,crc for each chunk of the slave). The algorithm is from: COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(primary key ) AS UNSIGNED)), 10, 16)), 0)
4. slave> When synchronization completes, start to run REPLACE INTO select and set cnt,crc for each chunk of the slave.
5. master> checksums table: update master_cnt, master_crc. This is to set the cnt, crc for each chunk of the master
6. slave> When synchronization completes, start to run update master_cnt,master_crc and set cnt,crc for each chunk of the master.
The comparison is complete. Next, we only need to locate the master_cnt <> this_cnt or OR master_crc <> this_crc record in the checksums table.
• What are the application scenarios for pt-table-checksum? Will it be used on the master database of the production environment?
1. If you are not so demanding on the consistency between the master and slave servers, data consistency can be basically achieved after you set ROW mode on the master server.
2. Although pt-table-checksum is intelligent, it introduces some issues to the server. Usually it is not used on the master server unless it is the only option.
3. For data consistency checks, we usually make comparisons on two slave servers. If the two slave servers are okay, the consistency is regarded as okay.
• It is said that the tool only works on statement format. If the server adopts the row format, can the consistency check work? Why?
1. If it is in row mode, the pt-table-checksum will report errors, but it can be solved by adding --no-check-binlog-format.
2. This tool automatically sets row-format=statement, so you don’t need to worry about that. Even if the row mode is set, the consistency check still works.
• How does it compare records on the master and slave databases. Aren’t master records updating all the time? Will it implement a lock?
1. For detailed principles, see the analysis above.
2. Indeed, the master records keep updating but there are no locks. The RR isolation level feature can ensure that data in the current transactions will not be changed.
• The inconsistency of the table structure of master and slave databases cannot be detected for the moment.
master: table xx( A int,B char)
slave: table xx( A int)
pt-table-checksum cannot check B field.
• Risk 1
Here the SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ is set and the session won’t be finished until the comparison is complete.
What does it mean? It means that if the session is not finished, the transaction undo records will not be purged, and the undo records will keep increasing, that is, the ibdata keeps increasing.
Solution: run the command in non-peak periods
• Risk 2
The RR isolation level ensures the consistency between select CRC values. What if the RR is damaged?
Right. Such a comparison is vulnerable to DDL damage.
Solution: DDL time window and cycle can be customized manually.
[Gordon edited the post at Sep 2, 2016 17:10 PM]