This topic describes how to select or create an optimal index to improve the read performance of an ApsaraDB RDS for MySQL instance.

Background information

When you use an RDS instance, the instance may be faulty due to SQL performance issues. Index issues such as the lack of indexes, implicit conversion, and inappropriate indexing are a common type of SQL performance issue.

  • Lack of indexes: If you want to use SQL statements to access a table that is not indexed, the RDS instance performs a full table scan. If the amount of data in the table is large, the SQL statements are executed at low speeds and the connections to the RDS instance are consumed. When the maximum number of connections that can be established to the RDS instance is reached, new requests from the application are rejected and errors occur.
  • Implicit conversion: If the data type that is specified in the SQL statements is different from the data type of the fields in the table, indexing fails. In this case, the SQL statements are executed at low speeds and pile up. As a result, the number of connections that are established to the RDS instance may exceed the maximum number of connections that are allowed.

Precautions

  • If you use the like keyword and add a percent sign (%) as a prefix, indexing fails.
  • In most cases, an index is not created on a column that contains null values. Columns that contain null values are excluded from the columns that are eligible for indexing.
  • If you use the or keyword and the field to the left or right of the or keyword is not indexed, indexing fails.
  • If you use the != operator, indexing fails. In this case, the indexed range is uncertain, and the RDS instance automatically performs full table scans. As a result, indexes cannot help significantly improve query performance.
  • Do not perform operations on indexed fields.
  • When you use a composite index to improve query performance, you must follow the leftmost prefix matching principle and use the first field of the composite index. Otherwise, indexing fails. Make sure that the order of fields is consistent with the order of indexes.
  • To prevent implicit conversions, make sure that the data type that is specified in the SQL statements is consistent with the data type of fields in the table.

Usage notes

  • Create indexes on fields that are frequently queried but are not frequently updated by add, delete, and modify operations.
  • Use indexed fields directly after ORDER BY and GROUP BY statements.
  • Make sure that the number of indexes that are created on a table does not exceed 6.
  • Make sure that the length of an indexed field is fixed and short.
  • Make sure that the number of duplicate fields among indexed fields is small.
  • Create indexes on fields of high filterability.

Example 1: Optimization for a table that has no indexes

  1. Execute the show create table customers; statement to view the schema of the table.
     CREATE TABLE `customers` (
       `cust_id` int(11) NOT NULL AUTO_INCREMENT,
       `cust_name` char(50) NOT NULL,
       `cust_address` char(50) DEFAULT NULL,
       `cust_city` char(50) DEFAULT NULL,
       `cust_state` char(5) DEFAULT NULL,
       `cust_zip` char(10) DEFAULT NULL,
       `cust_country` char(50) DEFAULT NULL,
       `cust_contact` char(50) DEFAULT NULL,
       `cust_email` char(255) DEFAULT NULL,
       PRIMARY KEY (`cust_id`),
     ) ENGINE=InnoDB AUTO_INCREMENT=10006 DEFAULT CHARSET=utf8
  2. Execute the explain select * from customers where cust_zip = '44444' limit 0,1 \G; statement to view the execution plan of the SQL statement that is used for the data query.
                id: 1
       select_type: SIMPLE
             table: customers
              type: ALL
     possible_keys: NULL
               key: NULL
           key_len: NULL
               ref: NULL
              rows: 505560
             Extra: Using where
    Note In the execution plan, the value of the type parameter is ALL. This means that the RDS instance performs a full table scan, in which total of 505,560 rows of data are scanned. This significantly deteriorates database performance.
  3. Execute the alter table customers add index idx_cus(cust_zip); statement to create an index.
  4. Re-execute the explain select * from customers where cust_zip = '44444' limit 0,1 \G; statement to view the new execution plan of the SQL statement that is used for the data query.
                id: 1
       select_type: SIMPLE
             table: customers
              type: ref
     possible_keys: idx_cus
               key: idx_cus
           key_len: 31
               ref: const
              rows: 4555
             Extra: Using index condition
    Note In the new execution plan, the value of the type parameter is ref. This means that the RDS instance performs index-based equivalent queries or equi-joins between tables. A total of 4,555 rows of data are scanned. This significantly improves database performance.

Example 2: Optimization for a table that has no indexes

  1. Execute the show create table customers; statement to view the schema of the table.
     CREATE TABLE `customers` (
       `cust_id` int(11) NOT NULL AUTO_INCREMENT,
       `cust_name` char(50) NOT NULL,
       `cust_address` char(50) DEFAULT NULL,
       `cust_city` char(50) DEFAULT NULL,
       `cust_state` char(5) DEFAULT NULL,
       `cust_zip` char(10) DEFAULT NULL,
       `cust_country` char(50) DEFAULT NULL,
       `cust_contact` char(50) DEFAULT NULL,
       `cust_email` char(255) DEFAULT NULL,
       PRIMARY KEY (`cust_id`),
     ) ENGINE=InnoDB AUTO_INCREMENT=10006 DEFAULT CHARSET=utf8
  2. Execute the explain select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name\G; statement to view the execution plan of the SQL statement that is used for the data query.
                id: 1
       select_type: SIMPLE
             table: customers
              type: ALL
     possible_keys: NULL
               key: NULL
           key_len: NULL
               ref: NULL
              rows: 505560
             Extra: Using filesort
  3. Execute the alter table customers add index idx_cu_zip_name(cust_zip,cust_name); statement to create an index.
  4. Re-execute the explain select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name\G; statement to view the new execution plan of the SQL statement that is used for the data query.
                id: 1
       select_type: SIMPLE
             table: customers
              type: ref
     possible_keys: idx_cu_zip_name
               key: idx_cu_zip_name
           key_len: 31
               ref: const
              rows: 4555
             Extra: Using where; Using index

Example 1: Optimization for a table on which implicit conversions occur

  1. Execute the show create table customers; statement to view the schema of the table.
     CREATE TABLE `customers` (
       `cust_id` int(11) NOT NULL AUTO_INCREMENT,
       `cust_name` char(50) NOT NULL,
       `cust_address` char(50) DEFAULT NULL,
       `cust_city` char(50) DEFAULT NULL,
       `cust_state` char(5) DEFAULT NULL,
       `cust_zip` char(10) DEFAULT NULL,
       `cust_country` char(50) DEFAULT NULL,
       `cust_contact` char(50) DEFAULT NULL,
       `cust_email` char(255) DEFAULT NULL,
       PRIMARY KEY (`cust_id`),
     ) ENGINE=InnoDB AUTO_INCREMENT=10006 DEFAULT CHARSET=utf8
  2. Execute the explain select * from customers where cust_zip = 44444 limit 0,1 \G; statement to view the execution plan of the SQL statement that is used for the data query.
              id: 1
      select_type: SIMPLE
            table: customers
             type: ALL
    possible_keys: idx_cus
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 505560
            Extra: Using where
  3. Execute the show warnings; statement to view the warning message that is displayed after the previous statement is executed.
    Warning: Cannot use range access on index 'idx_cus' due to type or collation conversion on field 'cust_zip'
    Note The preceding warning message indicates that the data type of the cust_zip field is STRING but the data type that is specified in the SQL statement is NUMBER. As a result, an implicit conversion is performed, and indexes cannot be used.
  4. Use one of the following methods to prevent implicit conversions:
    • Change the data type of the cust_zip field to NUMBER.
    • Change the data type that is specified in the SQL statement to STRING.

Example 2: Optimization for two tables on which implicit conversions occur

  1. Execute the show create table customers1; statement and the show create table customers2; statement to view the schemata of the table named customers 1 and the table named customers 2.
     CREATE TABLE `customers1` (
       `cust_id` varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
       `cust_name` char(50) NOT NULL,
       KEY `idx_cu_id` (`cust_id`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    CREATE TABLE `customers2` (
       `cust_id` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
       `cust_name` char(50) NOT NULL,
       KEY `idx_cu_id` (`cust_id`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8
  2. Execute the explain select customers1.* from customers2 left join customers1 on customers1.cust_id=customers2.cust_id where customers2.cust_id='x'\G; statement to view the execution plan of the SQL statement that is used for the data query.
     *************************** 1. row ***************************
                id: 1
       select_type: SIMPLE
             table: customers2
              type: ref
     possible_keys: idx_cu_id
               key: idx_cu_id
           key_len: 33
               ref: const
              rows: 1
             Extra: Using where; Using index
     *************************** 2. row ***************************
                id: 1
       select_type: SIMPLE
             table: customers1
              type: ALL
     possible_keys: NULL
               key: NULL
           key_len: NULL
               ref: NULL
              rows: 1
             Extra: Using where; Using join buffer (Block Nested Loop)
    Note The preceding results show that the data types that are specified in the character set of the cust_id field in two tables are different. As a result, an implicit conversion occurred, and indexes cannot be used.
  3. Execute the alter table customers1 modify column cust_id varchar(10) COLLATE utf8_bin; statement to modify the data type of the character set of the cust_id field in the table named customers 1 to UTF-8. After the modification, the data types of the character set of the cust_id field in two tables are consistent.
    Note If you execute this statement, the data type of the character set of the cust_id field in the table named customers 2 is modified to UTF-8.
  4. Re-execute the explain select customers1.* from customers2 left join customers1 on customers1.cust_id=customers2.cust_id where customers2.cust_id='x'\G; statement to view the new execution plan of the SQL statement that is used for the data query.
     *************************** 1. row ***************************
                id: 1
       select_type: SIMPLE
             table: customers2
              type: ref
     possible_keys: idx_cu_id
               key: idx_cu_id
           key_len: 33
               ref: const
              rows: 1
             Extra: Using where; Using index
     *************************** 2. row ***************************
                id: 1
       select_type: SIMPLE
             table: customers1
              type: ref
     possible_keys: idx_cu_id
               key: idx_cu_id
           key_len: 33
               ref: const
              rows: 1
             Extra: Using where
    Note After the data types are consistent, the index is used and improves database performance.