edit-icon download-icon

Select and create an optimal index for faster data access

Last Updated: Mar 31, 2018

Faults caused by SQL problems emerge frequently in a database. Indexing problems are the most frequently occurring SQL problems. Common index problems include no index, implicit conversion, and improper indexing.

When no index is created for a database table referenced by a SQL statement, a full table scan is performed. If the table has a large data volume, the scan task is slow. In this case, database connections are so occupied that the specified maximum number of connections are soon reached. As a result, new requests are denied, and faults emerge.

Implicit conversions may cause an index to fail when the value passed into the conditions in a SQL query is inconsistent with the field data definitions. Here are a few examples of common implicit conversions. The CHAR data type is defined in the table schema for the field referenced in the SQL query, but a numeric field value is passed into the SQL query. Or the case-sensitive collation is defined for fields, but the case-sensitivity definitions for joined table fields may differ in case of multi-table joins. Implicit conversions may cause an index to fail, thereby resulting in the foregoing situation in which the SQL query runs slowly and database connections become drained.

Indexing policies and optimization

Create indexes

  • Create an index on a field that is frequently referenced in queries, but not frequently updated through operations like adding, deleting, or modifying the field data.
  • Use a field directly after “order by” and “group by”, which must be an indexed field.
  • Create up to six indexes in a table.
  • Make sure that the values in an indexed field have a fixed length and are short.
  • Make sure that duplicate values are not too many in an indexed field.
  • Create an index on a highly filterable field.

Index usage notes

  • When the keyword “like” is used, the index may become invalid if the prefix “%” is added.
  • Columns with null values are automatically excluded from candidate columns for indexing, because an index is usually not created on a column containing null values.
  • When the keyword “or” is used, if a field to either side of “or” is not indexed, the index becomes invalid for the other field, even if it is indexed.
  • When the operator “!=” is used, indexes are not used. This is because the indexing efficiency is low and the data range is uncertain in this case. Instead, a full table scan is performed.
  • Do not perform computation in indexed fields.
  • In case of a composite index, the “Leftmost Prefixing” principle must be followed. The first indexed field must be used when you perform a query. Otherwise, the index becomes invalid. Also try to keep the field sequence consistent with the index order.
  • Avoid implicit conversions by keeping the defined data type consistent with the data type passed into the SQL query.

No-index cases

No-index case 1

  1. View the table schema.

    1. mysql> show create table customers;
    1. CREATE TABLE `customers` (
    2. `cust_id` int(11) NOT NULL AUTO_INCREMENT,
    3. `cust_name` char(50) NOT NULL,
    4. `cust_address` char(50) DEFAULT NULL,
    5. `cust_city` char(50) DEFAULT NULL,
    6. `cust_state` char(5) DEFAULT NULL,
    7. `cust_zip` char(10) DEFAULT NULL,
    8. `cust_country` char(50) DEFAULT NULL,
    9. `cust_contact` char(50) DEFAULT NULL,
    10. `cust_email` char(255) DEFAULT NULL,
    11. PRIMARY KEY (`cust_id`),
    12. ) ENGINE=InnoDB AUTO_INCREMENT=10006 DEFAULT CHARSET=utf8
  2. Run the following statement.

    1. mysql> select * from customers where cust_zip = '44444' limit 0,1 \G;
  3. The execution plan is as follows.

    1. mysql> explain select * from customers where cust_zip = '44444' limit 0,1 \G;
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers
    4. type: ALL
    5. possible_keys: NULL
    6. key: NULL
    7. key_len: NULL
    8. ref: NULL
    9. rows: 505560
    10. Extra: Using where

    In the execution plan, the value for “type” is “ALL”, which means that a full table scan is to be performed. 505,560 rows of data have to be covered for a single scan, which is a drain on performance. The following describes how to optimize queries.

  4. Add an index.

    1. mysql> alter table customers add index idx_cus(cust_zip);
  5. The execution plan is as follows.

    1. mysql> explain select * from customers where cust_zip = '44444' limit 0,1 \G;
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers
    4. type: ref
    5. possible_keys: idx_cus
    6. key: idx_cus
    7. key_len: 31
    8. ref: const
    9. rows: 4555
    10. Extra: Using index condition

    In the execution plan, the value for “type” is “ref”, which means index-based equi-join queries or equi-joins of tables.

No-index case 2

  1. The table schema is the same as in the foregoing case. Run the following statement.

    1. mysql> select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name;
  2. The execution plan is as follows.

    1. mysql> explain select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name\G;
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers
    4. type: ALL
    5. possible_keys: NULL
    6. key: NULL
    7. key_len: NULL
    8. ref: NULL
    9. rows: 505560
    10. Extra: Using filesort
  3. Add an index.

    1. mysql> alter table customers add index idx_cu_zip_name(cust_zip,cust_name);
  4. The execution plan is as follows.

    1. mysql> explain select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name\G;
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers
    4. type: ref
    5. possible_keys: idx_cu_zip_name
    6. key: idx_cu_zip_name
    7. key_len: 31
    8. ref: const
    9. rows: 4555
    10. Extra: Using where; Using index
    • Use a field after “order by”, which must be an indexed field.

Implicit conversions case

Implicit conversions case 1

  1. mysql> explain select * from customers where cust_zip = 44444 limit 0,1 \G;
  1. id: 1
  2. select_type: SIMPLE
  3. table: customers
  4. type: ALL
  5. possible_keys: idx_cus
  6. key: NULL
  7. key_len: NULL
  8. ref: NULL
  9. rows: 505560
  10. Extra: Using where
  1. mysql> show warnings;
  2. Warning: Cannot use range access on index 'idx_cus' due to type or collation conversion on field 'cust_zip'

In the foregoing case, the CHAR data type is defined in the table schema for the “cust_zip” field. However, a numeric value is passed into the SQL query for the field. This results in implicit conversions and causes the index to fail.

Solutions:

  • Change the data type for the “cust_zip” field to numeric.

  • Change the data type passed into the SQL query from CHAR to numeric.

Implicit conversions case 2

  1. View the table schema.

    1. mysql> show create table customers1;
    1. CREATE TABLE `customers1` (
    2. `cust_id` varchar(10) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
    3. `cust_name` char(50) NOT NULL,
    4. KEY `idx_cu_id` (`cust_id`)
    5. ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    6. mysql> show create table customers2;
    7. CREATE TABLE `customers2` (
    8. `cust_id` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
    9. `cust_name` char(50) NOT NULL,
    10. KEY `idx_cu_id` (`cust_id`)
    11. ) ENGINE=InnoDB DEFAULT CHARSET=utf8
  2. Run the following statement.

    1. mysql> select customers1.* from customers2 left join customers1 on customers1.cust_id=customers2.cust_id where customers2.cust_id='x';
  3. The execution plan is as follows.

    1. mysql> explain select customers1.* from customers2 left join customers1 on customers1.cust_id=customers2.cust_id where customers2.cust_id='x'\G;
    1. *************************** 1. row ***************************
    2. id: 1
    3. select_type: SIMPLE
    4. table: customers2
    5. type: ref
    6. possible_keys: idx_cu_id
    7. key: idx_cu_id
    8. key_len: 33
    9. ref: const
    10. rows: 1
    11. Extra: Using where; Using index
    1. *************************** 2. row ***************************
    2. id: 1
    3. select_type: SIMPLE
    4. table: customers1
    5. type: ALL
    6. possible_keys: NULL
    7. key: NULL
    8. key_len: NULL
    9. ref: NULL
    10. rows: 1
    11. Extra: Using where; Using join buffer (Block Nested Loop)
  4. Modify “COLLATE”.

    1. mysql> alter table customers1 modify column cust_id varchar(10) COLLATE utf8_bin ;
  5. The execution plan is as follows.

    1. mysql> explain select cust_id,cust_name,cust_zip from customers where cust_zip = '42222'order by cust_zip,cust_name\G;
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers2
    4. type: ref
    5. possible_keys: idx_cu_id
    6. key: idx_cu_id
    7. key_len: 33
    8. ref: const
    9. rows: 1
    10. Extra: Using where; Using index
    1. id: 1
    2. select_type: SIMPLE
    3. table: customers1
    4. type: ref
    5. possible_keys: idx_cu_id
    6. key: idx_cu_id
    7. key_len: 33
    8. ref: const
    9. rows: 1
    10. Extra: Using where

    After the COLLATE attributes for fields become consistent, indexes are used in the execution plan. Therefore, it is important to make sure that the collate attributes are consistently defined for table fields.

Conclusion

When an index is used, the SQL execution plan can be viewed using “explain”. It helps to determine whether an index is used and whether implicit conversions occur, so that an appropriate index is created. Exercise caution when creating an index as it is a complex process.

Thank you! We've received your feedback.