Blanche
Engineer
Engineer
  • UID619
  • Fans3
  • Follows2
  • Posts59
Reads:42983Replies:0

[MySQL]MySQL practice: big fields optimization

Created#
More Posted time:Oct 9, 2016 16:56 PM
Background
We have an online table with 100 million rows of data. The physical size is astonishing: 1.2T. But after analysis, we found many fields, such as 10 varchar and 1 text. Such a big table will become a headache for the O&M: how to do DDL? What about recovery? What about backup?
Basic knowledge: InnoDB Storage Architecture for InnoDB On Disk Format
Blueprint: database > tablespaces > pages > rows > columns
The physical storage structure of InnoDB


The logic storage structure of InnoDB


The storage structure of InnoDB page
Page type
Data page (B-tree node)
Undo page (undo log page)
System page
Transaction data page (transaction system page)
Insert buffer bitmap page (insert buffer page)
Uncompressd BLOB page
Compressed BLOB page
Page size
16K by default (The below examples all adopt 16k as the default size unless otherwise specified).
A page must have two rows of records, otherwise it is not B+tree, but a chain table.
Structure diagram


The storage structure of InnoDB row
The overall planning graph of rows file format


The structure when row-format is compact


Difference between compress & dynamic and compact formats


Character string type of fields
Char (N) vs varchar (N)
No matter whether it is char or varchar, in the compact row-format format, NULL does not occupy any storage space.
In case of multiple-byte character strings, the actual row storage of CHAR vs VARCHAR has little difference.
No matter whether CHAR is multiple-character sets or not, it will fill in 0x20 for the unfilled length.
In the specification: no requirements are imposed on char and varchar.
Varchar (N): 255 vs 256
When the actual length is greater than 255, the extended field length list should be stored with two bytes, that is, each row will add one byte. The total storage space increase is not big after measurement and the performance impact is low. So try to keep it within 256.
Maximum limits of varchar (N) & char (N)
The maximum limit of char is: N<=255.
The maximum limit of varchar is: N<=65535. Note that the official documents said the N represents bytes and the sum of all the fields in a row is less than 65535, but the N in varchar (N) represents a character.
After testing, we found 65535 is not the maximum limit, but 65532 is.
[MySQL 5.6.27]

* The maximum limit of char is: N<=255.
root:test> create table test( a char(65535))charset=latin1 engine=innodb;
ERROR 1074 (42000): Column length too big for column 'a' (max = 255); use BLOB or TEXT instead.

* After testing, we found 65535 is not the maximum limit, but 65532 is.
root:test> create table test( a varchar(65535))charset=latin1 engine=innodb;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead. Please check the manual. You have to change some columns to TEXT or BLOBs

root:test> create table test( a varchar(65532))charset=latin1 engine=innodb;
Query OK, 0 rows affected (0.00 sec)

* The maximum limit of varchar is: N<=65535. Note that the official documents said the N represents bytes and the sum of all the fields in a row is less than 65535, but the N in varchar (N) represents a character.

root:test> create table test_1( a varchar(30000),b varchar(30000),c varchar(5535))charset=latin1 engine=innodb;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead. Please check the manual. You have to change some columns to TEXT or BLOBs

* The N in varchar (N) represents a character.

root:test> create table test_1( a varchar(50000))charset=utf8 engine=innodb;
Query OK, 0 rows affected, 1 warning (0.00 sec)

root:test> show warnings;
+-------+------+--------------------------------------------+
| Level | Code | Message                                    |
+-------+------+--------------------------------------------+
| Note  | 1246 | Converting column 'a' from VARCHAR to TEXT |
+-------+------+--------------------------------------------+
1 row in set (0.00 sec)

root:test> show create table test_1;
+--------+-------------------------------------------------------------------------------+
| Table  | Create Table                                                                  |
+--------+-------------------------------------------------------------------------------+
| test_1 | CREATE TABLE `test_1` (
  `a` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+--------+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)


Off-page: Row overflow
• Why is there an off-page concept?
Suppose we created a table, and a field in it is a varchar (30,000), but the InnoDB page only contains 16,384 bytes. How can we save the rest? So it is the off-page.
• How to identify an off-page?
You can view off-page using the tool written by Jiang Chengyao.
The off-pages include uncompressed BLOB page: 243453
[root()@xx script]# python py_innodb_page_info.py t.ibd

Total number of page: 537344:
Insert Buffer Bitmap: 33
Freshly Allocated Page: 74040
File Segment inode: 1
B-tree Node: 219784
File Space Header: 1
Expand the description page: 32
Uncompressed BLOB page: 243453


• What is the impact of off-page?
The overflow data is not stored in B+tree any longer.
The overflow data uses uncompressed BLOB page and enjoys an exclusive storage, which is why the storage size gets bigger and bigger.
From the test below, you will find that the data inserted by t_long is only several bytes more than the t_short, but the final storage size is 2 to 3 times of it.
* Table structure

root:test> show create table t_long;
+--------+---------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                            |
+--------+---------------------------------------------------------------------------------------------------------+
| t_long | CREATE TABLE `t_long` (
  `id` int(11) DEFAULT NULL,
  `col1` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+--------+---------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

root:test> show create table t_short;
+---------+----------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                             |
+---------+----------------------------------------------------------------------------------------------------------+
| t_short | CREATE TABLE `t_short` (
  `id` int(11) DEFAULT NULL,
  `col1` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


* Test case

foreach $num (1 .. 48849){

        $sql_1 = "insert into $table_short select $num,repeat('a',8090)";
        $sql_2 = "insert into $table_long select $num,repeat('a',8098)";
        `$cmd -e " $sql_1 "`;
        `$cmd -e " $sql_2 "`;
}


* Final number of records

root:test> select count(*) from t_short;
+----------+
| count(*) |
+----------+
|    48849 |
+----------+
1 row in set (0.03 sec)

root:test> select count(*) from t_long;
+----------+
| count(*) |
+----------+
|    48849 |
+----------+
1 row in set (0.02 sec)


* Comparison of page types

[root()@xx script]# python py_innodb_page_info.py /data/mysql_data/test/t_short.ibd
Total number of page: 25344:
Insert Buffer Bitmap: 2
Freshly Allocated Page: 887
File Segment inode: 1
B-tree Node: 24452
File Space Header: 1
Expand the description page: 1



[root()@xx script]# python py_innodb_page_info.py /data/mysql_data/test/t_long.ibd
Total number of page: 60160:
Insert Buffer Bitmap: 4
Freshly Allocated Page: 8582
File Segment inode: 1
B-tree Node: 2720
File Space Header: 1
Expand the description page: 3
Uncompressed BLOB page: 48849


* Comparison of final size

[root()@xx test]# du -sh * | grep 'long\|short' | grep ibd
941M    t_long.ibd
397M    t_short.ibd

* Conclusion

The t_short tables at around 400M is understandable, because 8k * 48849 = 400M.

The t_long tables are a waste of the space because its exclusive use of 48,849 uncompressed BLOB pages.


• In what circumstances will off-page occur?
Principle: as long as the sum of a row of data exceeds 8k, off-page will occur.
So: varchar(9000) or varchar(3000) + varchar(3000) + varchar(3000): When the actual length is longer than 8k, off-page will occur.
So: Blob, text: If the actual length of a row of data exceeds 8k, off-page will occur. If the actual length is less than 8k, the off-page will not occur, and not all the blob and text will suffer off-page.
• Total sum of multiple columns of big fields vs one column of big fields
Multiple big fields will cause multiple off-pages.
root:test> show create table t_3_col;
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------+
| Table   | Create Table
                                       |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------+
| t_3_col | CREATE TABLE `t_3_col` (
  `id` int(11) DEFAULT NULL,
  `col1` varchar(7000) DEFAULT NULL,
  `col2` varchar(7000) DEFAULT NULL,
  `col3` varchar(7000) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------+
1 row in set (0.00 sec)

root:test> show create table t_1_col;
+---------+---------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                    |
+---------+---------------------------------------------------------------------------------------------------------------------------------+
| t_1_col | CREATE TABLE `t_1_col` (
  `id` int(11) DEFAULT NULL,
  `col1` varchar(21000) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+---------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

root:test>
root:test>
root:test> insert into t_1_col(col1) select repeat('a',21000);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

root:test>
root:test>
root:test> insert into t_3_col(col1,col2,col3) select repeat('a',7000),repeat('a',7000),repeat('a',7000);
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0


[root()@xx script]# python py_innodb_page_info.py /data/mysql_data/test/t_1_col.ibd
Total number of page: 6:
Insert Buffer Bitmap: 1
Uncompressed BLOB Page: 2
File Space Header: 1
B-tree Node: 1
File Segment inode: 1

[root()@xx script]# python py_innodb_page_info.py /data/mysql_data/test/t_3_col.ibd
Total number of page: 7:
Insert Buffer Bitmap: 1
Uncompressed BLOB Page: 3
File Space Header: 1
B-tree Node: 1
File Segment inode: 1


How to optimize big fields
If there are multiple big fields, try to serialize them and save them in the same column to avoid multiple off-pages.
Split text and other big fields from the master table and a) store them in key-value; b) store them in a separate sub-table and compress it.
A row of record must be less than 8k.
Guest