Blanche
Engineer
Engineer
  • UID619
  • Fans3
  • Follows2
  • Posts59
Reads:2776Replies:0

[Others]InnoDB personalized backup

Created#
More Posted time:Sep 1, 2016 11:25 AM
Exploring Xtrabackup principles
Why Xtrabackup
• Commonly used backup tools


• Performance

• Logical backup vs physical backup
Advantages of logical backup:  A small footprint, safe
Disadvantages of logical backup:  Slow recovery
Advantages of physical backup:  Fast backup and recovery
Disadvantages of physical backup:  A large footprint
• Make decisions based on business scenarios
For data sizes similar to that of Facebook (PB, EB), logical backup is recommended out of cost considerations
For data sizes of T level, physical backup is recommended
I am using physical backup also, because:  Logical backup is vulnerable to DDL damage.
Just think about it: if there are frequent DDL operations at night, and the backup tasks cannot be adjusted at will, it will introduce a huge maintenance cost.
• Physical backup: InnoDB vs Myisam
Myisam engine can copy files at will as long as flush table commands are executed.
InnoDB copies files through Xtrabackup.
Before MySQL5.6 was released, I prefer Myisam engine for backup, because of its fast backup and recovery and the convenience of copying files at will.
After MySQL5.6, the emergence of transportable tablespace makes individual tablespace recovery easy and I prefer InnoDB for backup.
One Achilles’ heel of Myisam is: if you want to recover the table to an InnoDB table, you need to change the table engine - the beginning of pain.
You don’t need to switch between InnoDB and Myisam, because Myisam will be discarded in later MySQL versions.
What is a consistent backup
Simply put, it means the backed up data is from the same time point.



Basic concepts of Percona Xtrabackup
Innobackupex is a script written in Perl language. For the sake of convenience, it encapsulates Xtrabackup, so innobackupex is used for general backups.
Xtrabackup only backs up InnoDB tables, and backups for all the others are handled by innobackupex.
In the new versions such as 2.4 and 2.3.xx, innodbex has been re-written in C language and linked to Xtrabackup.
Backup principles of Xtrabackup for 5.6
1. innobackupex calls Xtrabackup --suspend-at-end
2. Xtrabackup starts to copy innodb data files and share tablespace
3. After Xtrabackup finishes copying innodb files, it creates Xtrabackup_suspended_2
4. After innobackupex detects the created Xtrabackup_suspended_2 file, it implements locks a. FLUSH NO_WRITE_TO_BINLOG TABLES b. FLUSH TABLES WITH READ LOCK
5. Innobackupex starts to check features supported by the db server, such as: gtid, backup locks and changed page bitmap.
6. Then it starts to copy non-innodb files and table structure .frm files
7. After all the files are backed up, the redo transaction copying is ended.
8. Next, these locks are released a. FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS b. UNLOCK TABLES
9. Delete Xtrabackup_suspended_2 file and exit.



From the above flowchart, we can see that the time point of FLUSH TABLES WITH READ LOCK [FTWRL] is the backup recovery time point, that is, the consistency point.
Because data is inconsistent when it is re-copied, MySQL ensures the final consistency using its crash recover mechanism and the redo logs.
Here we can suppose a scenario in which InnoDB implements 16k refreshing. When 4k is refreshed, the copy thread copies the page. Is it a manual partial-write?
For partial-write, since the redo log is in physical logic, it cannot be recovered through redo operations. What should we do?
Don’t worry. The underlying pread or pwrite operations are serial and won’t affect each other. The partial-write only occurs at server outages.
How to back up production environments
Objective of backup: Back up the master or slave data and transmit the data to a server with a high-capacity disk to keep the data for several days.
Question 1: How to transmit the data to other servers?
1. Xtrabackup has taken this issue into consideration. It uses xbstream + ssh, but it has a fatal weakness in that it does not support resumable data transfer. If the network goes down, it will be a tragedy.
2. Another solution is: use Xtrabackup to back up data to a local directory, then transmit the data to a remote server using rsync [supporting resumable data transfer].  But this solution also has a weakness: the local disks have to allocate half of their space and the data is copied twice.
So we have to look for a third solution: can we copy data to the remote server directly?
• If it is the backup file on the master server, direct copying won’t work, because redo is always changing and polling.


• What if the data is on the slave server, or etl&backup servers?  It is easy to find that Xtrabackup logs the redo changes. If we manually stop redo log increasing, execute stop slave command, will the backup principle match that of the Xtrabackup?



One benefit of this is you don’t need to waste half of the disk space and don’t have to transmit the data twice.
Secrets you don’t know



From the above flowchart, we can find an important clue: our physical backup won’t include binlog [important].
So it is agonizing.


For MySQL5.1 & 5.5, Xtrabackup for 5.1 or 5.5 can be used. If innodb_flush_log_at_trx_commit=1, it should work.


For MySQL5.6, Xtrabackup for 5.1 or 5.5 can be used, even if innodb_flush_log_at_trx_commit=1 and transactions may be lost.


For MySQL5.6, Xtrabackup for 5.6 can be used and no transactions will be lost.
Backup by directly copying InnoDB
Now let’s get to the point. Can we achieve free copying like in Myisam?
The answer is yes.
• Method 1:  Follow the example of Myisam.
Myisam enables free copying because all the files will be synchronized to the disks using the fsync function after table flushing. Copying is surely not a problem at this time.
But InnoDB is not like that. It just cannot control redo log refreshing, data page refreshing and ibdata refreshing through table flushing.
So we only need to solve the three problems.
redo:  We can solve it by innodb_flush_log_at_trx_commit=1.
data page:  We can solve it by InnoDB checkpoint.
Ibdata contains: the data dictionary information, change buffer, doublewrite and undo logs of innodb tables. All these cannot be forcefully refreshed for the moment.
So, we can execute the stop slave command to at least eliminate data loss. It is okay if ibdata files are not refreshed. They can be automatically repaired through the crash-recover mechanism.
After executing the stop slave command and when copying files for recovery, you may see the following messages:
2016-05-26 13:47:24 15818 [Note] InnoDB: The log sequence numbers 2446418 and 2446418 in ibdata files do not match the log sequence number 2446945 in the ib_logfiles!
2016-05-26 13:47:24 15818 [Note] InnoDB: Database was not shutdown normally!
2016-05-26 13:47:24 15818 [Note] InnoDB: Starting crash recovery.
2016-05-26 13:47:24 15818 [Note] InnoDB: Reading tablespace information from the .ibd files...
2016-05-26 13:47:24 15818 [Note] InnoDB: Restoring possible half-written data pages
2016-05-26 13:47:24 15818 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 506, file name tjtx-101-141.000018
2016-05-26 13:47:25 15818 [Note] InnoDB: 128 rollback segment(s) are active.
2016-05-26 13:47:25 15818 [Note] InnoDB: Waiting for purge to start
2016-05-26 13:47:26 15818 [Note] InnoDB: 5.6.27 started; log sequence number 2446945
2016-05-26 13:47:26 15818 [Note] Recovering after a crash using /data/mysql.bin/tjtx-101-141
2016-05-26 13:47:26 15818 [Note] Starting crash recovery...
2016-05-26 13:47:26 15818 [Note] Crash recovery finished.


When Crash recovery finished is displayed, the backup is okay.
But there is another question: how many LSNs are there?
Log sequence number 2446945  -- lsn in redo log buffer, recorded in memory
Log flushed up to   2446945  -- lsn in redo log file, recorded in every lock block in the redo log
Pages flushed up to 2446945  -- the page’s newest modification (lsn) of the last pointer in the flush list [last checkpoint newest modification], recorded in every page header
Last checkpoint at  2446945  -- the page’s oldest modification(lsn) of the last pointer in the flush list [last checkpoint oldest modification], recorded at a fixed position of the first redo header and stored in two locations in the header


So what does the “The log sequence numbers xx in ibdata files” mean in the error log?
Run a test: kill -9 mysqld
And we can know that:  All improper shutdowns of MySQL will trigger this error, even MySQL with no data.
So I guess the LSN recorded in the ibdata files are smaller than the LSN in the redo logs from the MySQL start. Is it used to check for improper shutdowns of MySQL?
Anyway, it won’t impede our online file copying for InnoDB.
Through the test above, we know that after we execute the stop slave command for a while, and when the dirty pages are refreshed, we can assume that all data has been refreshed to the disk and it is no different from Myisam.
But another question emerges:  How can we determine that the dirty pages have been fully refreshed?
Check the four LSNs. When all the four are identical, the refresh must have been complete.
Log sequence number 2446945
Log flushed up to   2446945
Pages flushed up to 2446945
Last checkpoint at  2446945


That’s right. When all the 4 LSNs are identical, it does indicate that all the dirty pages have been fully refreshed.
However, when the 4 LSNs are not fully identical, the dirty pages still may have been fully refreshed. For example:
Log sequence number 2446945
Log flushed up to   2446945
Pages flushed up to 2446945
Last checkpoint at  2446747 --different


Let me explain:
Last checkpoint refers to the oldest modification of the last dirty page
Pages flushed up to refers to the newest modification of the last dirty page

When we refresh the last dirty page again, the page is refreshed multiple times, indicating that the newest modification of the dirty page is changed to the oldest modification.
It is not consistent, but the dirty page is indeed refreshed.

So you can check the Modified db pages value which indicates the number of pages to be refreshed in the flush list. If it is 0, it means all the dirty pages in the flush list have been refreshed.

Suppose an instance has 2 BUFFER POOL instances, we need to check whether the two Modified db pages values are both 0.

---BUFFER POOL 0
Buffer pool size   655359
Free buffers       1000
Database pages     636541
Old database pages 234953
Modified db pages  0  --Indicating the dirty pages in this buffer pool have been fully refreshed
Pending reads 0


---BUFFER POOL 1
Buffer pool size   655359
Free buffers       1000
Database pages     636426
Old database pages 234910
Modified db pages  0 -- Indicating the dirty pages in this buffer pool have been fully refreshed
Pending reads 0

So there are no dirty pages in the flush list.

• Method 2
What if the dirty pages are not fully refreshed after the stop slave command? If the LSNs are not consistent, can we copy files?
The answer is yes of course.
The inconsistent LSNs mean we need to restore the data through redo logs during crash-recover. It just take a little longer, with no impact to the big picture.
Important
No matter which method you use, if you want to copy files directly through stop slave method, it’d be best to run:
In this way, it is no different from Xtrabackup.
Guest