首页 » ORACLE 9i-23ai » Oracle Column Group Extended Statistics列组扩展统计信息

Oracle Column Group Extended Statistics列组扩展统计信息

扩展统计信息(也称为列组扩展)是 Oracle 11g 中引入的重要统计信息改进之一。虽然 Oracle Cost Based Optimizer 能够获得正确的单列选择性估计,但它无法计算出查询谓词中存在的两个或多个相关列的联合的基数。为这种列的联合计算的列组扩展旨在帮助 CBO 弄清楚这种列的相关性,以便获得准确的估计。但在某些情况下,CBO 拒绝使用列组扩展。在Oracle中(10g以后)当您看到计划步骤的实际基数与估计基数相差很大(50% 或更多)并且与该计划步骤相关的谓词包含来自同一表的多个列(e.g., “WHERE t1.col1 = 5 AND t1.col2 = 10”).可以考虑使用 Column Group列组来改进优化器的基数估计。今天刚好同事遇到一个10053是看到的CG未使用的问题,简单整理一下。

什么是Column Group Extended Statistics

可以通过与普通列相同的数据字典视图查看列组上的扩展统计信息。列组被分配一个虚拟列名,以字符串“SYS_STU”开头。您可以在定义扩展统计时查看分配给列组的虚拟名称,也可以稍后通过USER_STAT_EXTENSIONS查看。假设列OBJECT_TYPE和列CREATED之间存在相关性,并且要求您在它们上面创建一个列组。

SQL> SELECT dbms_stats.create_extended_stats(USER,'OBJ_PART','(OBJECT_TYPE,CREATED)') NEW_VIRT_COL
  2  FROM dual;

NEW_VIRT_COL                                                                    
------------------------------                                                  
SYS_STUX091_SOBFDAM7#HSOG24#6G                                                  
 
SQL> EXEC dbms_stats.gather_table_stats(USER,'OBJ_PART',METHOD_OPT=>'FOR ALL COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.
 
SQL> SELECT extension_name, extension
  2  FROM user_stat_extensions
  3  WHERE table_name='OBJ_PART';

EXTENSION_NAME                 EXTENSION                                        
------------------------------ -------------------------                        
SYS_STUX091_SOBFDAM7#HSOG24#6G ("OBJECT_TYPE","CREATED")  

Now that stats have been gathered on the column group, you may view them just as you would any other column.
SQL> SELECT column_name, num_distinct, num_nulls
  2  FROM user_tab_col_statistics
  3  WHERE table_name='OBJ_PART'
  4  ORDER BY 1;

COLUMN_NAME                      NUM_DISTINCT  NUM_NULLS                        
-------------------------------- ------------ ----------                        
CREATED                                  1678          0                        
LAST_DDL_TIME                            1747          1                        
OBJECT_NAME                             46516          0                        
OBJECT_TYPE                                47          0                        
OWNER                                      39          0                        
STATUS                                      2          0                        
SYS_STUX091_SOBFDAM7#HSOG24#6G           4048          0 

如何删除列组和统计信息

可参考修改SQL,删除一些不需要的CG。

BEGIN
FOR c IN (
   SELECT owner, table_name, extension_name
   FROM dba_stat_extensions
   WHERE table_name=''
   AND owner=''
   AND creator='SYSTEM'
   AND droppable='YES'
)
LOOP
   dbms_output.put_line('Deleting col stats for ' || c.extension_name);
   dbms_stats.delete_column_stats(c.owner, c.table_name, c.extension_name);
END LOOP;
END;
/

BEGIN
FOR rec IN (
   SELECT owner, table_name, DBMS_LOB.substr(extension, 3000) extension
   FROM dba_stat_extensions
   WHERE table_name=''
   AND owner=''
   AND creator='SYSTEM'
   AND droppable='YES'
)
LOOP
   dbms_output.put_line('Dropping extended stats for ' || rec.extension);
   dbms_stats.drop_extended_stats( rec.owner, rec.table_name, rec.extension );
END LOOP;
END;
/

EXEC dbms_stats.set_table_prefs('','','AUTO_STAT_EXTENSIONS','OFF');

测试样例

DROP TABLE cust_tab;

CREATE TABLE cust_tab (country_id number, state_id number, city_id number, customer_id number, vc_padding varchar2(200));

DECLARE
max_country number := 5;
max_state number := 10;
max_city number := 20;
reps number := 100;
cust_number number := 1;

BEGIN

FOR r IN 1..reps LOOP
FOR j IN 1..max_country LOOP    -- Country
FOR k IN 1..max_state LOOP -- State
FOR l IN 1..max_city LOOP -- City

INSERT INTO cust_tab (country_id, state_id, city_id, customer_id, vc_padding) values (j,k+((j-1)*max_state),l+((k-1)*max_city),cust_number, rpad('x',100));
cust_number := cust_number + 1;

END LOOP;
END LOOP;
END LOOP;
END LOOP;

END;
/

COMMIT;

这个表有100 000 行,用于下同的说明,所有列distinct values:

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
COUNTRY_ID                                5
STATE_ID                                 50
CITY_ID                                 200
CUSTOMER_ID                          100000
VC_PADDING                                1

哪几种情况不会使用CG?

a. 谓词多列的NDV与CG NDV哪个列小,用哪个
如 country_id = 1 和 state_id =1;
1. 每列的各个 NDV 的乘积(因为我们对谓词进行 AND 运算)为:NDV(country_id) * NDV(state_id) = 5 * 50 = 250
2. CG列组的 NDV 为 1000。CBO 使用两个可用 NDV 值中最小的一个,因此它使用 NDV = 250。
3. CBO计算cardinality:
comp. card = original cardinality * 1/NDV

b. CG 列没有histogram,而列上有
CG列上某些列存在直方图会导致忽略CG,除非 CG 本身有可用的直方图。

c. 带有倾斜数据的列,直方图和 CG 哪个更好?
当相关列的数据倾斜时,哪个可以提供更准确的估计基数,CG 还是直方图?答案是直方图通常比没有直方图的 CG 更准确,但有直方图的 CG 比这两种方法都准确得多。

d. 当有多个CG可用时,选择1个相关度好的CG
如 country_id = 1 和 state_id =1;

EXTENSION                                NUM_DISTINCT HISTOGRAM       NUM_BUCKETS                                       

---------------------------------------- ------------ --------------- -----------                                       

("COUNTRY_ID","STATE_ID","CITY_ID")              1000 NONE                      1                                       

("COUNTRY_ID","STATE_ID")                          50 FREQUENCY                50

看10053

SINGLE TABLE ACCESS PATH                                                                  
  Single Table Cardinality Estimation for CUST_TAB[CUST_TAB]                              
  Column (#7):                                                                            
    NewDensity:0.010000, OldDensity:0.000005 BktCnt:100000, PopBktCnt:100000, PopValCnt:50
  ColGroup (#2, VC) SYS_STUDZ9G6#P63YYW9XLPD$LHLU3                                        
    Col#: 1 2 3    CorStregth: 50.00                                                      
  ColGroup (#1, VC) SYS_STU_27#NWM_8TL3FTLIK77G19A                                        
    Col#: 1 2    CorStregth: 5.00                                                         
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0200                    
  Table: CUST_TAB  Alias: CUST_TAB                                                                                                        

相关强correlation strength (“CorStrength”)排序.相关强度越高,CG 中列之间的相关性越强。在这种情况下,存在精确匹配,这决定了将使用哪个 CG。

The correlation strength = Product of column NDVs in the CG / NDV of the CG

相关性强度越高(值越小),CG中各列之间的相关性越强。在这种情况下,有一个精确的匹配,这决定了将使用哪个CG。可以看到这里写着“Matches Full: #1”。“#1”指的是CG“ColGroup (#1, VC)”。CG的选择性显示在“Sel: 0.0200”处。这用于计算基数为:100000 * 0.02 = 2 000行.

e.使用多列索引作为 CG
如果不存在列组,并且没有收集与索引列匹配的相等谓词的直方图,CBO 将使用多列索引。如下

select count(*)
from cust_tab
where country_id = 1 and state_id =1 and city_id = 1 ;
COUNT(*)
--------
   20100
----------------------------------------------------------------------------------
| Id | Operation        | Name         | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |              | 1    | 10    | 1 (0)      | 00:00:01 |
| 1 | SORT AGGREGATE    |              | 1    | 10    |            |          |
|* 2 | INDEX RANGE SCAN | CUST_TAB_IDX | 120  | 1200  | 1 (0)      | 00:00:01 |
----------------------------------------------------------------------------------

10053 trace

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: CUST_TAB  Alias: CUST_TAB
#Rows: 120000  #Blks:  2170  AvgRowLen:  117.00
Index Stats::
Index: CUST_TAB_IDX  Col#: 1 2 4
LVLS: 1  #LB: 342  #DK: 1000  LB/K: 1.00  DB/K: 97.00  CLUF: 97926.00
Access path analysis for CUST_TABSINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for CUST_TAB[CUST_TAB]
ColGroup (#1, Index) CUST_TAB_IDX
Col#: 1 2 4    CorStregth: 50.00
ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0010
Table: CUST_TAB  Alias: CUST_TAB
Card: Original: 120000.000000  Rounded: 120  Computed: 120.00  Non Adjusted: 120.00

从 10.2.0.4 开始,CBO 将尽可能以这种方式使用复合索引。ColGroup (#1, Index)并不是CG,而是复合索引.
注:

当使用当CG是谓词的子集时使用它,并使用附加的非相等操作符时
如“where country_id = 1 and state_id =1 and city_id < 5;”

sel(city_id < 5) = (value – minValue) / (maxValue – minValue) = (200 – 5) / (200 -1) = 0.02
sel = sel(CG #3) * sel(city_id < 5) = 0.02 * 0.02 = 0.0004
cardinality = sel * 100000 = 0.0004 * 100 000 = 40

f. 参数_optimizer_extended_stats_usage_control 设置为 224
当sql中使用大量的In (xxx,xxx)可有会导致SQL PARSE时间变长甚至到几十分钟,函数kkestGetMCSelInlist ,是个已知bug 16825679,会建议配置该参数,设置参数后,只有在where子句中的in-list的情况扩展统计信息不会被使用.

_optimizer_extended_stats_usage_control=default
ColGroup Usage:: PredCnt: 4  Matches Full: #1  Partial:

_optimizer_extended_stats_usage_control=224
ColGroup Usage:: PredCnt: 4  Matches Full:  Partial:  ==or  no lines with colgroup

案例 因为柱状图没有使用CG

 
create table t_ext_stat
  ( dvpk_id    number(10) not null
  , vpk_id     number(10) not null
  , layer_code varchar2(1 char) not null
  , dvpk_day   date not null
  , cre_date   date not null
  , cre_usr    varchar2(40 char) not null
  , mod_date   date not null
  , mod_usr    varchar2(40 char) not null
 );
 
create unique index t_ext_uk_i on t_ext_stat(vpk_id, layer_code, dvpk_day);

select
  count(1)
from
  t_ext_stat
where
  vpk_id = 63148
and
  layer_code = 'R';
 
 COUNT(1)
----------
 338

layer_code数据分布较为倾斜。layer_code e有4个不同值,其中R (400,087) and S (380,069) ,
vpk_id

vpk_id有近5000个不同值

SQL> select
       vpk_id
      ,count(1)
    from
      t_ext_stat
    group by
      vpk_id
    order by 2 desc;

    VPK_ID   COUNT(1)
---------- ----------
     62866       1456
     62953       1456
     63528       1456
     63526       1456
     63518       1456
     62947       1456
     62850       1456
     62849       1456
     62851       1456
     62954       1456
     64362       1452
     64538       1424
     64483       1358
….
     63207          1
     63021          1
     62972          1

4947 rows selected.

创建CG,使用直方图收集所有表列t_ext_stat统计信息,包括新创建的虚拟列

SQL> SELECT
         dbms_stats.create_extended_stats
         (ownname   => user
         ,tabname   => 't_ext_stat'
         ,extension =>'(vpk_id,layer_code)'
         )
    FROM dual;

BEGIN
 
dbms_stats.gather_table_stats
 (user
 ,'t_ext_stat'
 ,method_opt => 'for all columns size auto'
 ,cascade => true
 ,no_invalidate => false
 );
END;
/

检查收集的列统计信息

SQL> SELECT
       column_name
      ,num_distinct
      ,density
      ,histogram
    FROM
       user_tab_col_statistics
    WHERE
       table_name = 'T_EXT_STAT'
    AND
      column_name in ('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');
 
COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------------
SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         4967 .000201329  NONE
LAYER_CODE                                4  6.2471E-07 FREQUENCY
VPK_ID                                 2862 .000349406  NONE

SQL> select
        to_char(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1) extension
       ,count(1)
     from
       t_ext_stat
     group by
       SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
      order by 2 desc;
 
EXTENSION               COUNT(1)
--------------------- ----------
10113707817839868275         728
6437420856234749785          728
6264201076174478674          728
7804673458963442057          728
2433504440213765306          728
6976215179539283979          728
493591537539092624           728
 
6710977030485345437            1
18158393637293365880           1
5275318825200713603            1
13895660777899711317           1

正如预期的那样,已在 layer_code 列上识别出偏斜,因此在其上收集了频率直方图以指示此偏度。尽管如此,有两点似乎值得一提:
1.既然其中一个列组扩展有一个直方图,为什么扩展本身也没有被标识为倾斜列呢?
2.在扩展上没有直方图,而形成扩展的列之一上没有直方图的特定情况下,会发生什么情况?
扩展在其数据分散中不会表现出偏斜。事实上,该扩展有 10,078 个不同的值,其中最受欢迎的值出现了 728 次,而不太受欢迎的值只出现了一次.

如果构成扩展的其中一列具有 histogram,而扩展本身没有 histogram,则 Optimizer 将不会使用该扩展。正是上面的#B不使用CG的情况。

select
   count(1)
from
   t_ext_stat
where vpk_id = 63148
and layer_code = 'R';
 
COUNT(1)
----------
338
 
SQL_ID  d26ra17afbfyh, child number 0
-------------------------------------
-------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    142 |    338 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')

10053 trace

Access path analysis for T_EXT_STAT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_EXT_STAT[T_EXT_STAT]
SPD: Directive valid: dirid = 17542990197075222359, state = 5, flags = 1, loc = 1 {EC(98564)[2, 3]}
SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE
Column (#2): VPK_ID(NUMBER)
  AvgLen: 5 NDV: 2862 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
Column (#3):
   NewDensity:0.002043, OldDensity:0.000001 BktCnt:5873.000000, PopBktCnt:5873.000000, PopValCnt:4, NDV:4
 
Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
    Histogram: Freq  #Bkts: 4  UncompBkts: 5873  EndPtVals: 4  ActualVal: no
 
Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 4967 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 1980066.000000
ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.30
 
ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:
Table: T_EXT_STAT  Alias: T_EXT_STAT
Card: Original: 803809.000000  Rounded: 142  Computed: 141.74  Non Adjusted: 141.74

使用以下公式

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 803809 * 1/(4967) = 161.829877

另一个显示 Optimizer 未使用扩展名的线索可以在上面的 10053 跟踪文件中看到,也可以通过以下几行看到:

1
ColGroup Usage:: PredCnt: 2  Matches FullPartial:

其中 Matches Full 和 Partial 信息为空。如Christian Antognini 在这篇文章在早期版本10g时存在bug.有一个修复程序,我们可以通过设置来使用 Oracle 扩展来构建 Oracle:

SQL> @fix 6972291

SESSION_ID      BUGNO      VALUE SQL_FEATURE                                                      DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT     CON_ID
---------- ---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------- ---------- ---------- ----------
      1231    6972291          0 QKSFM_CBO_6972291                                                Choose column group ndv over histograms of individual cols                                          0          1          1

SQL> @fix "column group"

SESSION_ID      BUGNO      VALUE SQL_FEATURE                                                      DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT     CON_ID
---------- ---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------- ---------- ---------- ----------
      1231    6972291          0 QKSFM_CBO_6972291                                                Choose column group ndv over histograms of individual cols                                          0          1          1
      1231   11689541          1 QKSFM_CARDINALITY_11689541                                       column group not null selectivity greater than individual column 11.2.0.4                           0          1          1
      1231   13329748          1 QKSFM_CARDINALITY_13329748                                       adjust correlated subquery cardinality based on column groups    11.2.0.4                           0          1          1
      1231   12909121          1 QKSFM_STATS_12909121                                             Record column groups for unanalyzed tables                       8.0.0                              0          1          1
      1231   13263455          1 QKSFM_CARDINALITY_13263455                                       allow MIN/MAX(column) in column group join selectivity           11.2.0.4                           0          1          1
      1231   14723910          1 QKSFM_CARDINALITY_14723910                                       limit multi column group selectivity due to NDV of inlist column 11.2.0.4                           0          1          1
      1231   20265690          1 QKSFM_CARDINALITY_20265690                                       recognize column group with virtual column column                12.2.0.1                           0          1          1
      1231   31821701          1 QKSFM_CARDINALITY_31821701                                       fix outer join selectivity of column group stats                 21.1.0                             0          1          1
      1231   32061341          1 QKSFM_CBO_32061341                                               do not use empty indexes for column groups                       23.1.0                             0          1          1
      1231   27982637          1 QKSFM_CARDINALITY_27982637                                       fix column group not null sel when one of the columns is all nul 23.1.0                             0          1          1

10 rows selected.


SQL> alter session set "_fix_control"="6972291:ON";
 
SQL> alter session set events '10053 trace name context forever, level 1';
 
SQL> select
      count(1)
    from
     t_ext_stat
    where
      vpk_id = 63148
    and
      layer_code = 'R';
 
  COUNT(1)
----------
       338

下面是对应的执行计划(新估计 162)以及 10053 trace 文件中与扩展相关的部分

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation          | Name      | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |           |       |       |     3 |           |
| 1   |  SORT AGGREGATE    |           |     1 |     7 |       |           |
| 2   |   INDEX RANGE SCAN | T_EXT_UK_I|   162 |  1134 |     3 |  00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')
 
=====================================
Access path analysis for T_EXT_STAT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_EXT_STAT[T_EXT_STAT]
SPD: Directive valid: dirid = 17542990197075222359, state = 5, flags = 1, loc = 1 {EC(98564)[2, 3]}
SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE
  Column (#2): VPK_ID(NUMBER)
    AvgLen: 5 NDV: 2899 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
  Column (#3):
    NewDensity:0.001753, OldDensity:0.000001 BktCnt:6275.000000, PopBktCnt:6275.000000, PopValCnt:4, NDV:4
  Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
    Histogram: Freq  #Bkts: 4  UncompBkts: 6275  EndPtVals: 4  ActualVal: no
  Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 4985 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 1980066.000000
  ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
  ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.33
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0002
  Table: T_EXT_STAT  Alias: T_EXT_STAT
    Card: Original: 806857.000000  Rounded: 162  Computed: 161.86  Non Adjusted: 161.86

我们可以注意到,这一次,CBO 使用扩展来计算其行估计,因为 162 来自以下公式:

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 806857* 1/(4985) = 161.856971 --> rounded to 162

注意, layer_code 列中删除 histogram也可以使用上CG.

exec dbms_stats.gather_table_stats(user ,'t_ext_stat', method_opt => 'for all columns size 1');
-------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    154 |    338 |
-------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')
 
 Column (#2): VPK_ID(NUMBER)
    AvgLen: 5 NDV: 2982 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
  Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
  Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 5238 Nulls: 0 Density: 0.000000
  ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
  ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.28
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0002
  Table: T_EXT_STAT  Alias: T_EXT_STAT
    Card: Original: 807515.000000  Rounded: 154  Computed: 154.16  Non Adjusted: 154.16

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 807515 * 1/(5238) = 154.164758 --> rounded to 154

Note:
尽管使用了扩展,但估计值并不如预期(154 而不是 338)。对这种差异的解释可能来自layer_code和vpk_id之间存在的非常弱的相关性强度 (CorStregth: 2.30)

Correlation strength => measure of how correlated the columns in a column group are.  Its calculated as:

Product of individual NDVs / NDV for the column group.  A value of 1 means columns aren’t correlated at all.
e.g., for 3 columns with NDVs of 5, 50, and 200 and an NDV of 1000 for their CG, you get:
5*50*200/1000 = 50000/1000 = 50

小结
列组的新扩展统计特性可以极大地提高优化器估计基数的能力,进而在表的列具有依赖值时生成更准确的执行计划。此外,为具有倾斜数据的列组创建直方图将确保尽可能获得最准确的基数估计(除了使用动态抽样之外,动态抽样甚至更准确,但由于抽样导致的高解析时间通常不切实际)。当您打算使用扩展统计数据时,请小心收集直方图。如果组合中的一列具有直方图,则扩展无需显示倾斜。在这种情况下,Oracle 将绕过扩展。

References
How to Drop System-Created Column Groups and their Statistics (Doc ID 2255331.1),
MultiColumn/Column Group Statistics – Additional Examples (Doc ID 872406.1)

Extended Statistics Part I : histogram effect

打赏

, ,

目前这篇文章还没有评论(Rss)

我要评论