## 概述

• IO代价的评估方式请参见
• CPU代价的评估方式请参见

## 统计信息

• 高频值

表示常见值，例如在表t1中，a字段大小是1~100，其中1~10的值占据了95%，1-10的值就称为高频值。高频值用于等值查询，进行评估选择性。

• 直方图

表示数据值的分布情况，例如在表t1中，a字段大小是1~100，可以分为4个桶，1~25的值有30个，26~50的值有20个，51~75的值有25个，76~100的值有25个。

• 相关系数

表示某一列的物理顺序和逻辑顺序的相关性，相关性越高，走索引扫描离散块扫描代价越低。

• 其他统计信息
• 唯一值个数
• Null值比率
• 表的行数
• 表的页面数

## 选择率

• 无条件查询
``````EXPLAIN SELECT * FROM tenk1;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)

SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';

relpages | reltuples
----------+-----------
358 |     10000``````
• 范围查询
``````EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;

QUERY PLAN
--------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1  (cost=24.06..394.64 rows=1007 width=244)
Recheck Cond: (unique1 < 1000)
->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..23.80 rows=1007 width=0)
Index Cond: (unique1 < 1000)``````
• 范围查询计算公式
``````SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='unique1';

histogram_bounds
------------------------------------------------------
{0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}

selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets
= (1 + (1000 - 993)/(1997 - 993))/10
= 0.100697

rows = rel_cardinality * selectivity
= 10000 * 0.100697
= 1007  (rounding off)``````
• 等值查询
``````EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA';

QUERY PLAN
----------------------------------------------------------
Seq Scan on tenk1  (cost=0.00..483.00 rows=30 width=244)
Filter: (stringu1 = 'CRAAAA'::name)``````
• 等值查询计算公式
``````SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';

null_frac         | 0
n_distinct        | 676
most_common_vals|{EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,JOAAAA,MCAAAA,NAAAAA,WGAAAA}
most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003}

selectivity = mcf[3]  = 0.003

rows = 10000 * 0.003 = 30

## 备注：如果值不在most_common_vals里面，计算公式为selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)``````