如何发现问题

  • 自上而下,梳理痛点

    从上向下梳理计划,查看时间到底花在了什么算子上面,然后针对具体算子深入分析。

  • 查看代价,对比行数

    查看比较代价估算的异常(特别小或特别大),对比估算行数和实际执行行数,找到代价估算和行数估算的问题。

  • 耗时算子,尽量避免

    AP场景很少需要NestLoop、Sort+GroupByAgg,遇到它们需要谨慎。

  • 具体算子,是否合理
    • Motion:是否有不必要的Motion?是否可以优化分布键?是否可以使用复制表?
    • Join:内外表顺序是否合理?
    • Scan:是否可以使用索引?是否可以使用分区表?
  • 内存信息,调整参数

    查看下盘情况,分析后适当调整statement_mem参数。

举例分析

Update (cost=0.00..1274.11 rows=1 width=1) (actual time=995096.707..1517097.191 rows=245136 loops=1)
  Executor Memory: 1kB  Segments: 96  Max: 1kB (segment 0)
  ->  Partition Selector for t2 (cost=0.00..1274.04 rows=1 width=842) (actual time=995096.480..1514408.806 rows=245136 loops=1)
    ->  Redistribute Motion 96:96 (slice2; segments: 96) (cost=0.00..1274.04 rows=1 width=838) (actual time=995096.440..1513830.155 rows=245136 loops=1)
          Hash Key: t2.c1, t2.c2
      ->  Split (cost=0.00..1274.04 rows=1 width=838) (actual time=995080.103..1496878.037 rows=245136 loops=1)
            Executor Memory: 1kB  Segments: 96  Max: 1kB (segment 0)
        ->  Hash Join (cost=0.00..1274.04 rows=1 width=1484) (actual time=995080.071..1496625.817 rows=122568 loops=1)
              Hash Cond: ((t1.c1)::text = (t2.c1)::text)
              Executor Memory: 33535270kB  Segments: 96  Max: 349326kB (segment 33)
              work_mem: 33535270kB  Segments: 96  Max: 349326kB (segment 33)  Workfile: (96 spilling)
              Work_mem wanted: 26684983K bytes avg, 26684983K bytes max (seg0) to lessen workfile I/O affecting 96 workers.
          ->  Seq Scan on t1 (cost=0.00..672.28 rows=121412 width=736) (actual time=672.771..1039.167 rows=122568 loops=1)
                Filter: ((t1.c2 = '2019-05-17'::date) AND ((t1.c3)::text = '0'::text))
          ->  Hash (cost=431.00..431.00 rows=1 width=762) (actual time=994417.443..994417.443 rows=34583155 loops=1)
            ->  Broadcast Motion 96:96 (slice1; segments: 96) (cost=0.00..431.00 rows=1 width=762) (actual time=25.562..912862.203 rows=34583155 loops=1)
              ->  Sequence (cost=0.00..431.00 rows=1 width=762) (actual time=34.475..4822.173 rows=361460 loops=1)
                ->  Partition Selector for t2 (dynamic scan id: 1) (cost=10.00..100.00 rows=2 width=4) (never executed)
                      Partitions selected: 27 (out of 27)
                ->  Dynamic Seq Scan on t2 (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=762) (actual time=34.441..4680.938 rows=361460 loops=1)
                      Partitions scanned:  Avg 27.0 (out of 27) x 96 workers.  Max 27 parts (seg0).

上面是explain analyze一个SQL语句输出的执行计划,您可以按照之前的指导来定位问题:

  1. 先自上而下梳理耗时算子,通过查看每个算子的actual time可以看到hash join执行时间很长。
  2. 从hash join算子往下梳理,发现内表有下盘操作,如下“work_mem: 33535270kB Segments: 96 Max: 349326kB (segment 33) Workfile: (96 spilling) Work_mem wanted: 26684983K bytes avg, 26684983K bytes max (seg0) to lessen workfile I/O affecting 96 workers.”
  3. hash join的内表在build phase时有广播操作,如下“Broadcast Motion 96:96 (slice1; segments: 96) (cost=0.00..431.00 rows=1 width=762) (actual time=25.562..912862.203 rows=34583155 loops=1)”,从这里可以看到优化器预估的表t1的行数为1,与实际验证不符。
  4. 基于以上诊断可以得出,由于没有及时收集表t1的统计信息,导致优化器认为t1为一张小表,从而在hash join时将t1广播到各个节点,并且以t1作为内表构建hash table导致t1下盘,最终导致整个SQL执行耗时长。

以上问题的解决方案是重新收集一遍t1统计数据:analyze t1;