Skip to content

Commit 88b0dac

Browse files
authored
Column Groups (oracle-samples#165)
* security update * security update * security update * SPA helper example * Explain version script * Column groups * parameter typo * gather ext added
1 parent 3ed03e9 commit 88b0dac

37 files changed

+1754
-0
lines changed

optimizer/column_groups/README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Introduction
2+
3+
How to detect the potential need for column groups (and create them automatically).
4+
5+
These scripts are aimed and detecting the need for column groups to support a specific query rather than an entire workload. Nevertheless, if a table has columns with correlated values, and a column group is created, then this has the potential to help many queries.
6+
7+
There are multiple approaches proposed here:
8+
9+
1. Use EXPLAIN PLAN to parse a test query while column usage seeding is enabled
10+
2. Use columnm usage seeding via a SQL tuning set
11+
3. Query all rows (or a sample of rows) for a given table and look for column value correlations
12+
13+
# Overheads
14+
15+
Checking for column value correlation for a table by scanning rows can be time-consuming (the "corr" scripts). You should expect this approach to take a long time on large tables. The runtime can be reduced by sampling a proportion of table rows. For this reason, a sample size parameter is provided (it is a decimal value >0 and <100).
16+
17+
Column usage seeding has a small overhead and in some of the "cg" examples, a system-wide setting is used for a few seconds. Therefore, you should not use it on a production system running under high load. Also, bear in mind that scanning a large percentage of rows in a large table will have some overhead too (the "corr" scripts). For this reason, you may want to start with a small sample and build up.
18+
19+
# Demos
20+
21+
There are four demos:
22+
<pre>
23+
SQL> -- Use column usage tracking to identify useful column groups
24+
SQL> @run_test1
25+
SQL> -- Scan tables and look for column value correlation
26+
SQL> @run_test2
27+
SQL> -- Use column usage tracking to identify useful column groups - for queries in a SQL tuning set
28+
SQL> @run_test3
29+
SQL> -- Introspect a SQL tuning set and scan the tables accessed ny the STS queries to look for column value correlations
30+
SQL> @run_test4
31+
</pre>
32+
33+
Take a look at "t4.sql" because it shows how you can "hide" statistics until you are ready to expose them to the workload. In this way you can make changes and test your queries before implementation.
34+
35+
A script called "gather_ext.sql" is provided to demonstrate how to gather stats on extended statistics if stats have never been gathered.
36+
37+
The demos make use of easy-to-use utility scripts. Here are some examples
38+
<pre>
39+
SQL> -- Immediately create column groups that will benefit SQL ID "7kjpawwbyh1bz" (query must be in cursor cache) - uses column usage tracking
40+
SQL> @cg_from_sqlid 7kjpawwbyh1bz y
41+
SQL> -- Immediately create column groups for SQL statements in a SQL tuning set - uses column usage tracking
42+
SQL> @@cg_from_sts name_of_sql_tuning_set y
43+
SQL> -- Immediately create column groups that will benefit SQL ID "7kjpawwbyh1bz" by sampling 100% of rows in tables accessed by query. The query must be in cursor cache.
44+
SQL> @@corr_from_sqlid 7kjpawwbyh1bz 100 y
45+
SQL> -- Immediately create column groups for correlated columns on table current_user.TAB1 by sampling 10% of rows
46+
SQL> @@corr_from_table user tab1 10 y
47+
SQL> -- Immediately create column groups for correlated columns on tables accessed in the SQL tuning set "my_sql_tuning_set" - sample 10% of rows
48+
SQL> @@corr_from_sts my_sql_tuning_set 10 y
49+
SQL> -- Output a runnable SQL script that can be used to create column groups for correlated columns on tables accessed in the SQL tuning set "my_sql_tuning_set" - sample 50% of rows
50+
SQL> @@corr_from_sts my_sql_tuning_set 50 n
51+
</pre>
52+
53+
# Limitations
54+
55+
The "sqlid" scripts rely on "execute immediate explain plan..." and this will not work for queries that exceed the maximum VARCHAR2 length. For cases like this, capture the relevant query in a SQL tuning set and then use the "sts" scripts provided. Check out "load_sqlset.sql" for an example.
56+
57+
58+
# Disclaimer
59+
60+
<br/>-- These scripts are provided for educational purposes only.
61+
<br/>-- They are NOT supported by Oracle World Wide Technical Support.
62+
<br/>-- The scripts have been tested and they appear to work as intended.
63+
<br/>-- You should always run scripts on a test instance.
64+
65+
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
--
2+
-- Creates column groups for a parsed SQL statement
3+
-- where the plan is available in the PLAN_TABLE
4+
-- Parameter:
5+
-- Y/N where Y - will create the column groups immediately
6+
-- N - will print the column group creation script only
7+
--
8+
set long 100000
9+
var create_now varchar2(1)
10+
11+
exec dbms_stats.flush_database_monitoring_info;
12+
exec dbms_lock.sleep(2)
13+
14+
exec select decode(nvl(upper('&1'),'N'),'N','N','Y') into :create_now from dual;
15+
16+
set serveroutput on
17+
declare
18+
r clob;
19+
cursor c1 is
20+
select distinct statement_id,object_name,object_owner
21+
from plan_table
22+
where object_type = 'TABLE'
23+
and timestamp = (select max(timestamp) from plan_table)
24+
order by object_name;
25+
begin
26+
for rec in c1
27+
loop
28+
r := dbms_stats.report_col_usage(rec.object_owner,rec.object_name) ;
29+
dbms_output.put_line('-- ===========================================================');
30+
dbms_output.put_line('-- Table Name : '||rec.object_name);
31+
dbms_output.put_line('/*');
32+
dbms_output.put_line(r);
33+
dbms_output.put_line('*/');
34+
if :create_now = 'Y'
35+
then
36+
select dbms_stats.create_extended_stats(rec.object_owner,rec.object_name) into r from dual;
37+
dbms_output.put_line(r);
38+
else
39+
dbms_output.put_line('select dbms_stats.create_extended_stats('''||rec.object_owner||''','''||rec.object_name||''') es from dual;');
40+
end if;
41+
end loop;
42+
dbms_output.put_line('-- === Stats need to be regathered on the following tables');
43+
for rec in c1
44+
loop
45+
-- In theory, we could go ahead and gather stats here rather than just reporting the need.
46+
-- In addition, we could choose to create the new stats unpublished if we wanted
47+
-- to temporariy hide the change from the workload.
48+
dbms_output.put_line('exec dbms_stats.gather_table_stats('''||rec.object_owner||''','''||rec.object_name||''')');
49+
end loop;
50+
end;
51+
/
52+
set serveroutput off
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
--
2+
-- For a given SQL ID, parse and create column groups
3+
-- Parameters:
4+
-- SQL ID
5+
-- Y/N - where Y - create column groups immediately
6+
-- N - spool a reation script to verify and run later
7+
--
8+
set long 100000
9+
10+
--
11+
-- This assumes that the parse will complete within 10 seconds
12+
--
13+
exec dbms_stats.seed_col_usage(null,null,10)
14+
exec dbms_lock.sleep(1)
15+
--
16+
-- Parse the relevant SQL statement
17+
--
18+
@@explain &1
19+
--
20+
-- Create the column groups
21+
--
22+
@@cg_from_plan &2
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
--
2+
-- Creates column groups for a parsed SQL statement
3+
-- where the plan is available in the PLAN_TABLE
4+
-- Parameter:
5+
-- Y/N where Y - will create the column groups immediately
6+
-- N - will print the column group creation script only
7+
--
8+
set long 100000
9+
set feedback off
10+
var create_now varchar2(1)
11+
var sqlset varchar2(100)
12+
13+
exec select '&1' into :sqlset from dual;
14+
exec select decode(nvl(upper('&2'),'N'),'N','N','Y') into :create_now from dual;
15+
16+
set serveroutput on
17+
declare
18+
time_limit_sec number := 30;
19+
cursor c1 is
20+
select distinct object_owner owner, object_name table_name
21+
from user_sqlset_plans
22+
where object_type = 'TABLE'
23+
and sqlset_name = :sqlset
24+
order by object_owner,object_name;
25+
r clob;
26+
begin
27+
dbms_stats.seed_col_usage(:sqlset,user,time_limit_sec);
28+
dbms_stats.flush_database_monitoring_info;
29+
30+
for rec in c1
31+
loop
32+
r := dbms_stats.report_col_usage(rec.owner,rec.table_name) ;
33+
dbms_output.put_line('-- ===========================================================');
34+
dbms_output.put_line('-- Table Name : '||rec.table_name);
35+
dbms_output.put_line('/*');
36+
dbms_output.put_line(r);
37+
dbms_output.put_line('*/');
38+
if :create_now = 'Y'
39+
then
40+
select dbms_stats.create_extended_stats(rec.owner,rec.table_name) into r from dual;
41+
dbms_output.put_line('Extension created: '||r);
42+
else
43+
dbms_output.put_line('select dbms_stats.create_extended_stats('''||rec.owner||''','''||rec.table_name||''') es from dual;');
44+
end if;
45+
end loop;
46+
dbms_output.put_line('-- === Stats need to be regathered on the following tables');
47+
for rec in c1
48+
loop
49+
dbms_output.put_line('exec dbms_stats.gather_table_stats('''||rec.owner||''','''||rec.table_name||''')');
50+
end loop;
51+
end;
52+
/
53+
set serveroutput off

optimizer/column_groups/cols.sql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
select table_name,column_name,num_distinct from user_tab_col_statistics columns where table_name like '%CORR%'
2+
/
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
--
2+
--
3+
-- Scan tables to look for column values that correlate
4+
-- Only 2-column combinations are checked
5+
-- Correlation is set to 80% - an arbitary figure
6+
-- Data types limited
7+
-- Only columns with shorter strings compared
8+
-- Columns checked must have a 'similar' number of distinct values (NDVs must not differ by 2X)
9+
-- A sample of rows can be used to speed up execution time - which can be substantial
10+
--
11+
-- Parameters:
12+
-- Sample percentage
13+
-- Y/N where Y - will create the column groups immediately
14+
-- N - will print the column group creation script only
15+
--
16+
var create_now varchar2(1)
17+
set echo off
18+
column tab_owner format a20
19+
column tab_name format a20
20+
set linesize 250
21+
set trims on
22+
set pagesize 10000
23+
set feedback off
24+
25+
var tabname varchar2(100)
26+
var ownname varchar2(100)
27+
var samp number
28+
29+
exec select '&1' into :samp from dual;
30+
exec select decode(nvl(upper('&2'),'N'),'N','N','Y') into :create_now from dual;
31+
32+
set serveroutput on
33+
34+
--
35+
-- Look for column value correlation
36+
--
37+
declare
38+
--
39+
-- Columns must correlate >0.8 to get a column group (this value is chosen arbitrarily and can be adjusted)
40+
--
41+
minimum_correlation number(6,5) := 0.8;
42+
43+
cname1 varchar2(200);
44+
cname2 varchar2(200);
45+
46+
cursor tabsc is
47+
select distinct object_name,object_owner
48+
from plan_table
49+
where object_type = 'TABLE'
50+
and timestamp = (select max(timestamp) from plan_table)
51+
order by object_name;
52+
53+
cursor extc is
54+
select count(*)
55+
from dba_stat_extensions
56+
where table_name = :tabname
57+
and owner = :ownname
58+
and extension like '%'||cname1||'%'
59+
and extension like '%'||cname2||'%';
60+
61+
cursor ext is
62+
select extension_name,extension,rownum r
63+
from dba_stat_extensions
64+
where table_name = :tabname
65+
and owner = :ownname;
66+
--
67+
-- To reduce the number of column combinations checked, we will only check
68+
-- column pairs that have similar NDV - so some NULL cases will be missed.
69+
-- There is also an assumption that longer strings are rarely used in comparison
70+
--
71+
cursor c1 is
72+
with w as (
73+
select column_name, num_distinct
74+
from dba_tab_columns
75+
where table_name = :tabname
76+
and owner = :ownname
77+
and num_distinct is not null
78+
and num_distinct > 0
79+
and ( data_type in ('DATE','NUMBER')
80+
or (data_type = 'CHAR' and data_length <= 20)
81+
or (data_type = 'VARCHAR2' and data_length <= 20)
82+
or (data_type like 'TIMESTAMP%')))
83+
select t1.column_name c1, t2.column_name c2
84+
from w t1, w t2 /* , (select num_rows from dba_tables where owner = :ownname and table_name = :tabname) t */
85+
where t1.column_name > t2.column_name
86+
and greatest(t1.num_distinct,t2.num_distinct)/least(t1.num_distinct,t2.num_distinct)<2 /* Similar number of distinct values */
87+
--and t1.num_distinct < t.num_rows/10 /* Perhaps eliminate sequenced columns? */
88+
order by t1.column_name;
89+
c number(6,5);
90+
n number;
91+
num_ext number;
92+
r clob;
93+
begin
94+
if :samp>=100
95+
then
96+
:samp := 99.9999;
97+
end if;
98+
dbms_output.put_line('column es format a100');
99+
100+
for tabs in tabsc
101+
loop
102+
:tabname := tabs.object_name;
103+
:ownname := tabs.object_owner;
104+
105+
dbms_output.put_line('-- ');
106+
dbms_output.put_line('-- ******* '||:tabname||' *******');
107+
execute immediate 'select /*+ FULL */ count(*) from "'||:ownname||'"."'||:tabname||'" sample('||:samp||') ' into n;
108+
dbms_output.put_line('-- ');
109+
dbms_output.put_line('-- Row sample size (approx): '||n);
110+
dbms_output.put_line('-- ');
111+
dbms_output.put_line(' ');
112+
for x in ext
113+
loop
114+
dbms_output.put_line('-- Existing extension '||x.extension||' '||x.extension_name);
115+
end loop;
116+
dbms_output.put_line(' ');
117+
118+
for x in c1
119+
loop
120+
execute immediate 'select corr(ora_hash("'||x.c1||'"),ora_hash("'||x.c2||'")) from "'||:ownname||'"."'||:tabname||'" sample('||:samp||')' into c;
121+
if (c is not null and c > minimum_correlation)
122+
then
123+
dbms_output.put('-- '||x.c1 || ',' || x.c2 ||': good correlation = '||c);
124+
cname1 := x.c1;
125+
cname2 := x.c2;
126+
open extc;
127+
fetch extc into num_ext;
128+
close extc;
129+
if (num_ext>0)
130+
then
131+
dbms_output.put_line(' SKIPPING (covered already)');
132+
else
133+
dbms_output.put_line(' ');
134+
if :create_now = 'Y'
135+
then
136+
select dbms_stats.create_extended_stats(:ownname,:tabname,'("'||x.c1||'","'||x.c2||'")') into r from dual;
137+
dbms_output.put_line('Extension created: ' || r);
138+
else
139+
dbms_output.put_line('select dbms_stats.create_extended_stats(''"'||:ownname||'"'',''"'||:tabname||'"'',''("'||x.c1||'","'||x.c2||'")'') es from dual;');
140+
end if;
141+
end if;
142+
else
143+
if c is not null
144+
then
145+
dbms_output.put_line('-- '||x.c1 || ',' || x.c2 ||': poor correlation = '||c);
146+
else
147+
dbms_output.put_line('-- '||x.c1 || ',' || x.c2 ||': NULL correlation');
148+
end if;
149+
end if;
150+
end loop;
151+
end loop;
152+
end;
153+
/
154+
155+
set serveroutput off
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
--
2+
-- For a given SQL ID, parse and create column groups
3+
-- Parameters:
4+
-- SQL ID
5+
-- Table sample percentage
6+
-- Y/N - Yes to create column groups immediately
7+
--
8+
set long 100000
9+
10+
--
11+
-- Parse the relevant SQL statement
12+
--
13+
@@explain &1
14+
--
15+
-- Create the column groups
16+
--
17+
@@corr_from_plan &2 &3

0 commit comments

Comments
 (0)