Using Basic Database Functionality For Data Warehousing
Using Basic Database Functionality For Data Warehousing
Using Basic Database Functionality For Data Warehousing
Warehousing
Purpose
In this tutorial, you use basic Oracle Database 10g functionality to query and improve
performance in a data warehouse.
Time to Complete
Approximately 1 hour
Topics
This tutorial covers the following topics:
Overview
Prerequisites
Examining the Differences Between B-tree and Bitmap Indexes for Data
Warehousing
Improving Query Performance with Oracle Advanced Partitioning
Cleanup
Summary
Viewing Screenshots
Place the cursor over this icon to load and view all the screenshots for this
tutorial. (Caution: This action loads all screenshots simultaneously, so response
time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following
steps to load and view only the screenshot associated with that step. You can hide an
individual screenshot by clicking it.
Overview
Oracle Database 10g is the leading relational database for data warehousing and the first
complete Business Intelligence platform. It not only addresses the basic core
requirements of performance, scalability, and manageability, but also other data-
relevant, back-end functionality around ETL processing, data analysis (OLAP), and data
mining.
Oracle Database 10g Release2 is the robust and enhanced successor and provides
significant enhancements to every facet of Oracle's relational capabilities, extending
Oracle's lead in providing the most complete Business Intelligence platform.
Back to Topic List
Prerequisites
Before starting this tutorial, you should:
1. Perform the Installing Oracle Database 10g on Windows tutorial.
2. Download and unzip bdf.zip into your working directory (i.e. c:\wkdir).
PROMPT let's disable the usage of TEMP TABLES to show simple star first
ALTER SESSION SET star_transformation_enabled=TEMP_DISABLE;
show parameters star_transformation
2. In this step, you are shown the basic mechanism of Oracle's Star Query Transformation with
TEMP table Transformation.
From a SQL*Plus session logged on to the SH schema, run show_star2.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\show_star2.sql
From a SQL*Plus session logged on to the SH schema, run show_star3.sql, or copy the
following SQL statements into your SQL*Plus session.
@c:\wkdir\show_star3.sql
4. Next, you issue a typical star query without any SQL processing optimization.
From a SQL*Plus session logged on to the SH schema, run show_star4.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\show_star4.sql
2. Look at the execution plan of your query again and compare it with the original star
transformation plan in the previous section. For the sake of simplicity, the usage of temp tables
for star transformation is disabled.
@c:\wkdir\plan_bj_idx.sql
2. If you have not completed the High Speed Data Loading and Rolling Window Operations
tutorial yet, you need to create an external table to be able to load some test data. From a
SQL*Plus session logged on to the SH schema, run create_ext_tab_for_bdf.sql, or
copy the following SQL statements into your SQL*Plus session:
@c:\wkdir\create_ext_tab_for_bdf.sql
set timing on
CREATE INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;
On your system, you can see that you needed 3 to 7 seconds to create the B-tree indexes.
5. Use SQL to store the SIZES in a table for a direct comparison. To do this, select the actual
segment sizes of the B-tree indexes from the data dictionary.
@c:\wkdir\cr_compare_tab.sql
Set timing on
CREATE BITMAP INDEX sales_prod_local_bix
ON sales_delta (prod_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_cust_local_bix
ON sales_delta (cust_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_time_local_bix
ON sales_delta (time_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_channel_local_bix
ON sales_delta (channel_id)
NOLOGGING COMPUTE STATISTICS ;
CREATE BITMAP INDEX sales_promo_local_bix
ON sales_delta (promo_id)
NOLOGGING COMPUTE STATISTICS ;
Note: The creation of the bitmap indexes is much faster than the creation of the equivalent B-
tree indexes.
8. Fill the former created comparison table, so that you can use SQL to investigate the differences
between B-tree and bitmap indexes.
@c:\wkdir\fill_comp_table.sql
COMMIT;
9. Compare the different sizes of those indexes. To show all the facts, run the script
comp_idx1.sql from your SQL*Plus session.
@c:\wkdir\comp_idx1.sql
Depending on the cardinality of the indexed column, a bitmap index is normally up to 30 times
smaller than the equivalent B-tree index; it can go even up to a factor of 50 to 60. Consider a
data warehousing system in Terabyte ranges, even an average factor of five saves a tremendous
amount of disk space and potential work load. Consider scanning a 250 GB index versus a 50
GB index.
Furthermore, the creation time is much less than that of B-tree indexes.
Providing real bitmap indexes is a crucial component for any data warehousing environment
where the data model has start or snowflake schemata.
10. Clean up the environment.
@c:\wkdir\cleanup_idx_test.sql
Static partition pruning takes place when the optimizer can eliminate specific
partitions at parse time, for example a query predicate on the partitioning key
column.
Dynamic partition pruning can be as simple as a bind variable replacement at
runtime or as complex as spawning additional recursive SQL to identify the
appropriate partitions. Dynamic partition pruning takes place at query run time.
Generically, the cases of advanced pruning are taking place when the following criteria
are satisfied:
2. To obtain more information about the partition that must be accessed, you can query the data
dictionary with the appropriate partition number.
@c:\wkdir\select_part_name.sql
SELECT partition_name
FROM user_tab_partitions
WHERE table_name='SALES'
AND partition_position=13;
Back to Topic
Dynamic Partition Pruning
The sales fact table of the sample schema is range partitioned on the time_id column.
Joining the sales fact table with the times dimension table over time_id satisfies
the first criteria, so that dynamic partition pruning can take place.
1. From a SQL*Plus session logged on to the SH schema, run plan_dyn_pruning.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\plan_dyn_pruning.sql
2. To show the recursive SQL statement that is issued for dynamic partition pruning, run the
following query against the plan_table:
@c:\wkdir\select_other.sql
SELECT other
FROM plan_table
WHERE other IS NOT NULL;
You can see how the Oracle database translates the original predicate condition into an inline
view for getting the appropriate partition information.
You can see that the results not only tells you what kind of dynamic pruning has happenend but
also HOW the dynamic pruning is taken place. In this example, the dynamic pruning is based on
a SUBQUERY, which can be identified either in the plan (Id 7) or in the PSTART/PSTOP
column; KEY(SQ) is the abbreviation for KEY(SUBQUERY).
Back to Topic
Back to Topic List
Using Partitionwise Joins
Partitionwise joins reduce query response time by minimizing the amount of data
exchanged among parallel execution servers when joins execute in parallel. This
significantly reduces response time and improves the use of both CPU and memory
resources. In Oracle Real Application Cluster environments, partitionwise joins also
avoid or at least limit the data traffic over the interconnect, which is the key to
achieving good scalability for massive join operations. Partitionwise joins can be full or
partial. The Oracle server decides which type of join to use.
Note: This tutorial does not cover all possibilities of partitionwise operations in detail.
It should give you a basic understanding of partitionwise operations, and an
understanding of how to read an execution plan appropriately. For detailed information
about partitionwise joins, see the Data Warehousing Guide.
Steps
1. Create two additional hash-partitioned tables and indexes.
2. Import statistics for those two tables.
3. Examine serial partitionwise joins.
4. Examine parallel partitionwise joins.
5. Clean up.
exec dbms_stats.import_table_stats('sh','customers_hash', -
stattab =>'stat_table', statid =>'HANDS_ON')
3. Examine the statistics for those tables as if they contain a lot of data.
@c:\wkdir\show_tab_stats.sql
Back to Topic
3. Examine Serial Partitionwise Joins
There are two ways to use serial partitionwise joins. These include:
Serial Non-Partitionwise Joins
Note: The NOREWRITE hint is used to avoid any interference with possibly existing
materialized views. Alternatively, you can disable query_rewrite for this particular
session, or for the complete instance. This is true for all subsequent statements.
The plan shows full table access for both tables, sales and customers, and a hash join. The
row source PARTITION RANGE ALL is inside the HASH JOIN, which means that the
customers table is joined with all partitions.
Back to Subtopic
Serial Full Partitionwise Joins
A full partitionwise join divides a large join into smaller joins between a pair of
partitions from the two joined tables. To use this feature, you must equipartition both
tables on their join keys by joining your composite (range-hash) partitioned table,
sales_hash, with the hash-partitioned table, customers_hash. Note that those
two tables are equi-partitioned on the cust_id join column.
1. From a SQL*Plus session logged on to the SH schema, run serial_pwj.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\serial_pwj.sql
The plan looks slightly different than the nonpartitioned plan. You see an additional operation,
�PARTITION HASH ALL�, in the plan. The row source is outside the HASH JOIN. You can
read this operation as specifying how to process the hash-join, or, in pseudocode:
FOR LOOP over all partitions of customers_hash
DO
hash-join equivalent partitions
DONE
This full partitionwise join is done in serial.
Back to Subtopic
Back to Topic
4. Examine Parallel Partitionwise Joins
There are three ways to use parallel partitionwise joins. These include:
Non-Partitionwise Joins
Nonpartitionwise Joins
Now join your range-partitioned table "sales" with the non-partitioned table
"customers."
1. From a SQL*Plus session logged on to the SH schema, run par_nopwj.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\par_nopwj.sql
You can see that the statement is executed in parallel; because there is no existing physical
partitioning of one of the tables which could be used for this query, no partitionwise join takes
place.
Both tables are scanned in parallel. The smaller table customers is broadcasted to all slaves
working on the next part of the parallel plan (Id 9). The next part of the parallel plan is
performing the HASH JOIN followed by a first HASH GROUP BY. Because all the slaves are
working on the complete result set, you have to redistribute it (HASH based, Id 5) for the final
HASH GROUP BY operation (Id 3).
If the query contains an additional ORDER BY, you see a RANGE based redistribution for
feeding the last SORT GROUP BY (PQ Distribution method of ID 5). Redistributing the data
RANGE based optimizes the plan by eliminating a final order operation - the order is guaranteed
by concatenating the results of the parallel slaves in a specific order (Id 2).
Back to Subtopic
Full Partitionwise Joins
Parallel execution of a full partitionwise join is a straightforward parallelization of the
serial execution. Instead of joining one partition pair at a time, the partition pairs are
joined in parallel by n query servers.
The picture above shows a parallel partitionwise join for a hash-hash partitionwise join.
In this tutorial, you are joining the hash-partitioned table, customers_hash, with the
composite partitioned table, sales_hash. The hash partitions for the sales_hash
table are composed of a set of 16 subpartitions, one from each range partition.
1. From a SQL*Plus session logged on to the SH schema, run par_fullpwj.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\par_fullpwj.sql
Back to Subtopic
Partial Partitionwise Joins
Unlike full partitionwise joins, partial partitionwise joins require you to partition only
one table on the join key, not both tables. The partitioned table is referred to as the
reference table. The other table may or may not be partitioned. Partial partitionwise
joins are more common than full partitionwise joins and do not require an
equipartitioning of the tables.
To execute a partial partitionwise join, the Oracle database dynamically repartitions the
other table based on the partitioning of the reference table. Once the other table is
repartitioned, the execution is similar to a full partitionwise join.
Partial partitionwise joins are executed only in parallel. A partitionwise join is more a
distribution method prior to the join to improve and speed up the efficiency of the join
operation. Because you only have one process executing the join in serial, there is no
need—and no benefit—to redistribute the nonpartitioned table
dynamically before the join.
1. From a SQL*Plus session logged on to the SH schema, run par_partpwj.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\par_partpwj.sql
You see that customers table is dynamically redistributed based on the join key column,
cust_id, in the same way that the sales_hash table is subpartitioned on this column.
This is shown in the fact that the result sets of the parallel table scan of the customers table are
fed to the HASH JOIN operation (Q 1,01 - ID 7), the same slave set that does the parallel
(partitionwise) scan of the composite partitioned table, sales_hash.
The hash partitions for the sales_hash table are composed of a set of 16 subpartitions, one
from each range partition.
Back to Subtopic
Back to Topic
5. Clean up
Before you move to the next topic, you need to clean up the environment. Perform the
following step:
From a SQL*Plus session logged on to the SH schema, run cleanup_mod3.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\cleanup_mod3.sql
You see that with some exceptions (i.e. External Tables), all tables are monitored by default. This
is a new behavior in Oracle Database 10g. The monitoring is completely done in memory and
does not impose any overhead on the tables; you cannot switch off table monitoring.
2. The user_tab_modifications data dictionary view lists all changes to your table with
monitoring enabled.
@c:\wkdir\show_tab_modifications.sql
Depending on what happened before on your system, you see either more entries or less entries.
Now you can use the single command for keeping statistics up-to-date and see how the timing
and number of objects, where statistics are gathered, is changing.
3. From a SQL*Plus session logged on to the SH schema, run gather_auto.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||' '||
list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
Depending on the number of objects without any statistics that you might have left over from
previous tutorials, this may take up to a couple of minutes.
Note: The schema name and the request for automatic gathering of statistics to Oracle is passed.
All other parameters are derived internally.
COMMIT;
You are modifying less than 10% of the data, so that the
threshold for gathering new statistics has not been reached.
6. Although the database periodically flushes the information about modified objects from the SGA
(and also internally every time when DBMS_STATS is invoked), there�s a customer
requirement to get the most actual information for customer-specific usage. You can flush this
information manually "on demand" by issuing the following procedure:
@c:\wkdir\flush_monitoring.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER AUTO',
objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||
' '||list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
9. Because the ratio of changed rows is below the threshold of 10%, it runs very quickly and does
not invoke any statistics gathering. The modifications are still there.
@c:\wkdir\show_tab_modifications.sql
If you want to set up your own statistics gathering mechanism based on application specific
rules, you can do so and use the information in user_tab_modifications. As soon as the
shown objects get new statistics, their entries will be deleted from the list of potentially stale
tables.
Rem note that the monitoring mechanism is agnostic about the content
- the following statement reverts the subsequent on
update customers
set cust_credit_limit=cust_credit_limit-1
where country_id =
(select country_id from countries where country_name='France');
commit;
11. Flush the information about monitored tables from the SGA
@c:\wkdir\flush_monitoring.sql
exec dbms_stats.flush_database_monitoring_info
12. The information about table modifications was now flushed manually. You see the changes
reflected.
@c:\wkdir\show_tab_modifications.sql
The threshold of 10% modified data is now exceeded. As soon as the automatic gathering of the
statistics is run again, Oracle automatically gathers the appropriate new set of statistics for the
table customers.
Before the statistics are gathered, you update the customers table again and use Oracle
functionality of parallel DML capabilities for nonpartitioned tables. With the 10g
Release 2, the limit of having a maximum DOP on a per-segment base in the presence
of bitmap indexes is lifted.
1. Create a temporary table. From a SQL*Plus session, execute the following script:.
@c:\wkdir\cr_cust_dml.sql
DROP TABLE cust_dml;
CREATE TABLE cust_dml PARALLEL AS SELECT /*+ PARALLEL(c) */ * FROM customers
c;
2. Now issue a parallel UPDATE command against the nonpartitioned table products.
@c:\wkdir\xpdml_on_cust.sql
Invoke the automated gathering of statistics again. See that it takes longer, and new statistics for
table customers is gathered.
@c:\wkdir\gather_auto.sql
set serveroutput on
declare
list_of_objects dbms_stats.objectTab := dbms_stats.objectTab();
begin
dbms_output.enable(200000);
dbms_stats.gather_schema_stats('SH',options=>'GATHER
AUTO',objlist=>list_of_objects);
for i in 1 ..list_of_objects.count loop
dbms_output.put_line('updated:'||list_of_objects(i).objtype||'
'||list_of_objects(i).objname||' '||list_of_objects(i).partname);
end loop;
end;
/
5. The customers table does not show up as a table with outstanding modifications anymore.
@c:\wkdir\show_tab_modifications.sql
From a SQL*Plus session logged on to the SH schema, execute the following commands:
SET SERVEROUTPUT ON
EXEC dw_handsOn.cleanup_modules
Become familiar with Oracle advanced partitioning mechanisms for improved query
performance
Use automatic gathering of statistics