CS2032 Unit I Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Syllabus

DATA WAREHOUSING AND MINING


UNIT-II
DATA WAREHOUSING
Data Warehouse Components, Building a Data warehouse, Mapping Data Warehouse to a
Multiprocessor Architecture, Data Extraction, Clean up and TransIormation Tools, Meta data
Objective
To know about data warehousing components
To know the considerations oI building a data warehouse
How to map data warehouse to a multiprocessor architecture
To know about Data Extraction, Clean up and TransIormation Tools
To know details about Meta data
Data Warehouse Components
1. Overall Architecture
The data warehousing architecture is based on a relational database management system server
that Iunctions as the central repository Ior inIormational data. Typically the source data Ior the
warehouse is coming Irom the operational applications. As the data enters the data warehouse, it is
transIormed into an integrated structure and Iormat.
2. Data Warehouse Database
The central data warehouse database is a cornerstone oI the data warehousing environment.
This data base is almost implemented on relational data base management system. Some oI the
approaches needed Ior the data base are
Parallel relational database designs that require a parallel computing platIorm.
An innovative approach to speed up a traditional RDBMS MDDBS that are based on
proprietary data base technology or implemented using RDBMS
3. Sourcing, Acquisition, Clean Up and transformation Tools
These tools perIorm all oI the conversion, summarization, key changes, structural changes and
condensation needed to transIorm disparate data into inIormation that can be used by decision support
tool. These Iunctionality includes
Removing unwanted data Irom operational databases
Converting to common data names and deIinitions
Calculating summaries and derived data
Establishing deIaults Ior missing data
Accommodating sources data deIinitions changes
These tools have to deal with some signiIicant issues as Iollows
Data heterogeneity: This is the diIIerence in the way how the data is deIined and used in diIIerent
models
Database heterogeneity: DBMS are very diIIerent in data models data access language data
languages, data navigation and so on
4. Meta data
Meta data is data about data that describes the data warehouse.Meta data can be classiIied into
1. Technical metadata:
This contains inIormation about data warehouse data Ior use by warehouse designers and
administratorswhen carrying out warehouse development and management tasks, Technical metadata
documents include
InIormation about data sources
TransIormation descriptions
Warehouse object and data structure deIinitions Ior data targets.
The rules used to perIorm data clean up and data enhancement.
2. Business Meta data:
Contains inIormation that gives users an easy-to-understand perspective oI the inIormation
stored in the data warehouse .Business metadata documents inIormation about.
Subject areas and inIormation object type, including queries, reports, images, video,
and/or audio clips.
Internet home pages.
5. Access Tools
Users interact with the data warehousing using Iront-end tools. Tools are divided into Iive
groups
Data query and reporting tools
Application development tools
Executive inIormation system tools
Online analytical processing tools
Data mining tools
1. Query and Reporting Tools
This category is Iurther divided into two groups: Reporting and Managed query tools
Reporting tools can be divided into production reporting tools and desktop report writers
Production reporting tools will let companies generate regular operational reports
or support high-volume batch jobs, such as calculating and paychecks
Report writers, on the other hand, are inexpensive desktop tools designed Ior end
user.
Managed query tools shield end users Irom the complexities oI SQL and database structures by
inserting a metalayer between users and the database.
Metalayer
It is the soItware that provides subject oriented views oI a database and supports point-and-
click creation oI SQL.
2. Applications The tools require such a complex set oI queries and sophisticated data models that
the business users may Iind themselves overwhelmed by the need to become SQL and/or data
modeling reports.
3. OLAP: These tools are based on the concepts oI multi-dimensional data bases and allow a
sophisticated user to analyze the data using elaborate, multi dimensional complex views.
4. Data Mining
A critical success Iactor Ior any business today is its ability to use inIormation
eIIectively. Data mining as the process oI discovering meaningIul new correlations, patterns,
and trends by digging into large amounts oI data stored in warehouses, using AI and statistical
and mathematical techniques.
In these areas, data mining can reach beyond the capabilities oI the OLAP, especially
since the major attraction oI the data mining is its ability to build predictive rather than
retrospective models. Most organizations engage in data mining to
Discover Knowledge
Visualize data
Correct data
5. Data Visualization
Data visualization is not a separate class oI tools; rather than it is a method oI
presenting the output oI the previously mentioned tools in such a way that the entire problem
or/and the solution.
Data visualization goes Iar beyond simple bar and pie charts. It is a collection oI
complex techniques that currently represents an area oI intense research.
6.Data Marts
Data marts are presented as an inexpensive alternative to a data warehouse that takes
signiIicantly less time and money to build. It is a subsidiary to a data warehouse oI integrated data. It
is created Ior a dedicated set oI users.
A data mart is a set oI de normalized, summarized or aggregated data. Data mart is separate
data base server, oIten on local area network serving a dedicated group oI users.
Two types oI data marts.
1. Dependent data mart: The data content is sourced Irom a data ware house, have a high value
because no matter how may are deployed and no matter how many diIIerent technology are
use.
2. Independent data mart: UnIortunately the misleading statements about the simplicity and low
cost oI data marts sometimes result in organizations or vendors incorrectly positioning them as
an alternative to the data warehouse .This view point deIines independent data marts.
The concept oI independent data mart is dangerous one. Each enterprise will start to design their
own data marts with out integration. The complex many-to-one problem will be diverted to many-to-
many sourcing and management nightmare. Scalability oI data mart is complex.
The business drivers underlying such developments include
Extremely urgent user requirement
The absence oI a budget Ior a Iull data warehouse strategy
The absence oI a sponsor Ior an enterprise wide decision support strategy
The decentralization oI business units
The attraction oI easy-to-use tools and a mind-sized project.
The recommended approach by Ralph Kimball is as Iollows. Eor any two data marts in an
enterprise, the common dimension must conIorm to the equality and roll-up rule. In summary data
mart presents two problems those are scalability and integration
6. Data warehouse Administration and Management
Managing data warehouse includes
Security and priority management
Monitoring updates Irom multiple sources
Data quality checks
Managing and updating meta data
Auditing and reporting data warehouse usage and status.
Purging data
Replicating sub setting and distributing data
Back up and recovery
Data warehouse storage management.
7. Information Delivery System
The inIormation delivery component is used to enable the process oI subscribing Ior data
warehouse inIormation and having it delivered to one or more destinations oI choice according to
some user-speciIied scheduling algorithm. InIormation delivery system distributes ware house stored
data and other inIormation objects to other data warehouses and end user products
Building a Data warehouse
1. Business Considerations: Return on Investment
1. Approach
The subject oriented nature oI the data warehouse determines the scope oI the
inIormation in the data warehouse. Organizations embarking on data warehousing
development can chose on oI the two approaches
Top-down approach: Meaning that the organization has developed an enterprise data
model, collected enterprise wide business requirement, and decided to build an
enterprise data warehouse with subset data marts
Bottom-up approach: Implying that the business priorities resulted in developing
individual data marts, which are then integrated into the enterprise data warehouse.
2. Organizational Issues
The requirements and environments associated with the inIormational applications oI a
data warehouse are diIIerent. ThereIore an organization will need to employ diIIerent
development practices than the ones it uses Ior operational applications
2. Design Consideration
In general, a data warehouse`s design point is to consolidate data Irom multiple, oIten
heterogeneous, sources into a query data base. The main Iactors include
Heterogeneity oI data sources, which aIIects data conversion, quality, time-liness
Use oI historical data, which implies that data may be old
Tendency oI database to grow very large
Data Content: Typically a data warehouse may contain detailed data, but the data is cleaned up
and transIormed to Iit the warehouse model, and certain transactional attributes oI the data are Iiltered
out. The content and the structure oI the data warehouses are reIlected in its data model. The data
model is a template Ior how inIormation will be organized with in the integrated data warehouse
Iramework.
Meta data: DeIines the contents and location oI the data in the warehouse, relationship between
the operational databases and the data warehouse, and the business view oI the warehouse data that are
accessible by end-user tools. the warehouse design should prevent any direct access to the warehouse
data iI it does not use meta data deIinitions to gain the access.
Data distribution: As the data volumes continue to grow, the data base size may rapidly
outgrow a single server. ThereIore, it becomes necessary to know how the data should be divided
across multiple servers. The data placement and distribution design should consider several options
including data distribution by subject area, location, or time.
Tools: Data warehouse designers have to be careIul not to sacriIice the overall design to Iit to a
speciIic tool. Selected tools must be compatible with the given data warehousing environment each
other.
PerIormance consideration: Rapid query processing is a highly desired Ieature that should be
designed into the data warehouse.
Nine decisions in the design oI a data warehouse:
1. Choosing the subject matter
2. Deciding what a Iact table represents
3. IdentiIying and conIorming the decisions
4. choosing the Iacts
5. Storing pre calculations in the Iact table
6. Rounding out the dimension table
7. Choosing the duration oI the data base
8. The need to track slowly changing dimensions
9. Deciding the query priorities and the query modes.
3. Technical Considerations
A number oI technical issues are to be considered when designing and implementing a
data warehouse environment .these issues includes.
The hardware platIorm that would house the data warehouse.
The data base management system that supports the warehouse data base.
The communication inIrastructure that connects the warehouse, data marts, operational
systems, and end users.
The hardware platIorm and soItware to support the meta data repository
The systems management Iramework that enables the centralized management and
administration oI the entire environment.
4. Implementation Considerations
A data warehouse can not be simply bought and installed-its implementation requires the
integration oI many products with in a data ware house.
Access tools
Data Extraction, clean up, TransIormation, and migration
Data placement strategies
Meta data
User sophistication levels: Casual users, Power users, Experts
Mapping Data Warehouse to a Multiprocessor Architecture
1. Relational Data base Technology Ior data warehouse
The size oI a data warehouse rapidly approaches the point where the search oI a data warehouse
rapidly approaches the point where the search Ior better perIormance and scalability becomes a
real necessity. The search is pursuing two goals
Speed Up: the ability to execute the same request on the same amount oI data in less time
Scale-Up: The ability to obtain the same perIormance on the same request as the data base size
increases.
1. Types oI Parallelism
Parallel execution oI tasks with in the SQL statements can be done in either oI two ways.
Horizontal parallelism: Which means that the data base is partitioned across multiple disks
and the parallel processing occurs in the speciIic tasks, that is perIormed concurrently on
diIIerent processors against diIIerent sets oI data
Vertical Parallelism: which occurs among diIIerent tasks all components query operations
are executed in parallel in a pipelined Iashion. In other words an output Irom one task
becomes an input into another task as soon as records become available.
2. Data Partitioning
Data partitioning is a key requirement Ior eIIective parallel execution oI data base operations.
It spreads data Irom data base tables across multiple disks so that I/O operations such as read
and write can be perIormed in parallel.
Random partitioning includes random data striping across multiple disks on single servers. In
round robin partitioning, each new record id placed on the new disk assigned to the data base.
Intelligent partitioning assumes that DBMS knows where a speciIic record id located and does
not waste time searching Ior it across all disks. This partitioning allows a DBMS to Iully
exploit parallel architectures and also enables higher availability.
Intelligent partitioning includes
Hash Partitioning : Hash algorithm is used to calculate the partition number
Key range partitioning : Partitions are based on the partition key
Schema partitioning :Each table is placed in each disk, UseIul Ior small reIerences
User-deIined partitioning: Tables are partitioned based on user deIined expressions.
2. Database Architecture Ior parallel Processing
1. Shared-Memory Architecture
Also called as shared-everything style .Traditional approach to implement an RDBMS on
SMP hardware. Simple to implement. The key point oI this approach is that a single RDBMS
server can potentially utilize all processors, access all memory, and access the entire
database, thus providing the user with a consistent single system image
2. Shared-disk Architecture
It implements the concept oI shared ownership oI the entire data base between RDBMS servers,
each oI which is running on a node oI distributed memory system. Each RDBMS server can
read, write, update and delete records Irom the same shared data base, which would require the
system to implement a Iorm oI distributed lock manager (DLM).
Pining:
In worst case scenario, iI all nodes are reading and updating same data, the RDBMS and its
DLM will have to spend a lot oI resources synchronizing multiple buIIer pool. This problem is
called as pining
Data skew: Un even distribution oI data
Shared-disk architectures can reduce perIormance bottle-necks resulting Irom data skew
3. Shared-Nothing Architecture
The data is partitioned across many disks, and DBMS is 'partitioned across multiple
conservers, each oI which resides on individual nodes oI the parallel system and has an
ownership oI its own disk and thus, its own data base partition.
It oIIers non-linear scalability These requirements includes
Support Ior Iunction shipping
Parallel join strategies
Support Ior data repartitioning
Query compilation
Support Ior data base transactions
Support Ior the single system image oI the data base environment.
4. Combined Architecture
Interserver parallelism oI the distributed memory architecture means that each query is
parallelized across multiple servers. While intraserver parallelism oI the shared memory
architecture means that a query is parallelized with in the server.
3. Parallel RDBMS Eeature
Some oI the demands Irom the DBMS vendors are
Scope and techniques oI parallel DBMS operations
Optimized implementation
Application transparency
The parallel environment
DBMS Management tools
Price/PerIormance
4. Alternative Technologies
In addition to parallel data base technology, a number oI vendors are working on other solutions
improving perIormance in data warehousing environments. These includes
Advanced database indexing products
Specialized RDBMS designed specially Ior the data warehousing
Multidimensional data bases
5. Parallel DBMS Vendors
1. Oracle
2. InIormix
3. IBM
4. Sybase
5. MicrosoIt
Data Extraction, Clean up and Transformation Tools
1. Tool Requirements
The tools that enable sourcing oI the proper data contents and Iormats Irom operational
and external data stores into the data warehouse have to perIorm a number oI important
tasks that include
Data transIormation Irom one Iormat to another on the basis oI possible diIIerences
between the source and target platIorms
Data consolidation and integration, which may include combining several source
records into a single record to be loaded into the warehouse.
Meta data synchronization and management and calculation based on the application oI
the business rules that Iorce certain transIormation.
2. Vendor Approaches
The tasks oI capturing data Irom a source data system, cleaning and transIorming it, and then
loading the results into a target data system can be carried out either by separate products, or
by single integrated solution.
Code generator
Data base data replication tools
Rule-driven dynamic transIormation engines
3. Access to Legacy Data
The middleware strategy is the Ioundation Ior the tools such as Enterprise/Access Irom Apertus
Corporation.
The data layer provides data access and transaction services Ior management oI
corporate data asserts
The process layer provides services to manage automation and support Ior current
business processes.
The user layer manages user interaction with process and/or data layer services. It
allows the user interIaces to change independently oI the underlying business
processes.
Meta data
1. Meta data-deIinition
Meta data is one oI the most important aspects oI data warehousing. It is data about data stored in the
warehouse and its users.
Meta data contains
The location oI and description oI the warehouse system and data components
Names, deIinition, structure and content oI the data warehouse and end user views
IdentiIication oI authoritative data sources
Integration and transIormation rules used to populate the data warehouse; these include the
mapping method Irom operational data bases into the warehouse, and algorithm used to
convert
Integration and transIormation rules used to deliver data to end-user analytical tools
2. Meta data interchange initiative
A MetaData Coalition Introduction
The MetaData Coalition was Iounded by a group oI industry-leading vendors aimed at deIining
a tactical set oI standard speciIications Ior the access and interchange oI meta data between diIIerent
soItware tools. What Iollows is an overview oI Version 1.0 oI the MetaData Interchange SpeciIication
(MDIS) initiative taken by the MetaData Coalition. Goals oI the MetaData Interchange SpeciIication
Initiative
Situation Analysis
The volatility oI our global economy and an increasingly competitive business climate are
driving companies to leverage their existing resources in new, creative, and more eIIective ways.
Enterprise data, once viewed as merely Iodder Ior the operational systems that ran the day-to-day
mechanics oI business, is now being recognized not only as one oI these valuable resources but as a
strategic business asset.
However, as the rate oI change continues to accelerate-in response to both business pressures
and technological advancement-managing this strategic asset and providing timely, accurate, and
manageable access to enterprise data becomes increasingly critical. This need to Iind Iaster, more
comprehensive and eIIicient ways to provide access to and manage enterprise data has given rise to a
variety oI new architectures and approaches, such as data warehouses, distributed client/server
computing, and integrated enterprise-wide applications.
In these new environments, meta data, or the inIormation about the enterprise data, is emerging
as a critical element in eIIective data management. Vendors as well as users have been quick to
appreciate the value oI meta data, but the rapid proliIeration oI data manipulation and management tools
has resulted in almost as many diIIerent "Ilavors" and treatments oI meta data as there are tools.
The Challenge
To enable Iull-scale enterprise data management, diIIerent tools must be able to Ireely and easily
access, and in some cases manipulate and update, the meta data created by other tools and stored in a
variety oI diIIerent storage Iacilities. The only viable mechanism to enable disparate tools Irom
independent vendors to exchange this meta data is to establish at least a minimum common denominator
oI interchange speciIications and guidelines to which the diIIerent vendors' tools can comply.
Establishing and adhering to a core set oI industry meta data interchange speciIications will
enable IS managers to select what they perceive as "best oI breed" to build the tool inIrastructure that
best Iits their unique environment needs. In choosing the interchange-compliant tools, they can be
assured oI the accurate and eIIicient exchange oI meta data essential to meeting their users' business
inIormation needs.
The MetaData Coalition was established to bring industry vendors and users together to
address a variety oI diIIicult problems and issues with regard to exchanging, sharing, and managing
meta data. This is intended as a coalition oI interested parties with a common Iocus and shared goals,
not a traditional standards body or regulatory group in any way.
Terminology and Basic Assumptions
The MetaData Interchange SpeciIication (MDIS) draws a distinction between: The Application
Metamodel - the tables, etc., used to "hold" the meta data Ior schemas, etc., Ior a particular
application; Ior example, the set oI tables used to store meta data in Composer may diIIer signiIicantly
Irom those used by the Bachman Data Analyst.
The MetaData Metamodel - the set oI objects that the MetaData Interchange SpeciIication can be
used to describe. These represent the inIormation that is common (i.e., represented) by one or more
classes oI tools, such as data discovery tools, data extraction tools, replication tools, user query tools,
database servers, etc. The meta data metamodel should be:
Independent oI any application metamodel
Character-based so as to be hardware/platIorm-independent
Eully qualiIied so that the deIinition oI each object is uniquely identiIied
Basic Assumptions
The MetaData Coalition has made the Iollowing assumptions:
Because users' inIormation needs are growing more complex, the IS organization would ideally
like the interchange speciIication to support (to the greatest extent possible) the bidirectional
interchange oI meta data so that updates can be made in the most natural place. Eor example,
the user might initially speciIy the source-to-target mapping between a legacy database and a
RDBMS target in a CASE tool but, aIter using a data extraction tool to generate and execute
programs to actually move the data, discover that the mapping was somehow incorrect. The
most natural place to test out the "Iix" to this problem is in the context oI the data extraction
tool. Once the correction is veriIied, one updates the metamodel in the CASE tool, rather than
having to go to the CASE tool, change the mapping, and trigger the meta data interchange
between the CASE tool and the data extraction tool beIore being able to test the new mapping.
Vendors would like to support the MetaData Interchange SpeciIication with a minimum amount
oI additional development. In light oI these assumptions, the meta data model must be
suIIiciently extensible to allow a vendor to store the entire metamodel Ior any application. In
other words, MDIS should provide mechanisms Ior extending the meta data model so that
additional (and possibly encrypted) inIormation can be passed. An example oI when a vendor
might want encryption is in the case oI a tool that generates parameters Ior invoking some
internal routine. Because these parameters might provide other vendors with inIormation
regarding what is considered a proprietary part oI their tool, the vendor may wish to encrypt
these parameters.
II one assumed that all updates to the model occurred in the context oI a single tool, e.g., the CASE
tool in the example above, the MDIS would not beneIit Irom "carrying along" any oI the tool-speciIic
meta data. However, as the above example indicates, this assumption is not the "natural" meta data
interchange Ilow. Consequently, some type oI mechanism Ior providing extensions to the type oI
inIormation exchanged by the interchange speciIication is necessary iI one hopes to achieve
bidirectional interchange between vendor applications.
The MetaData Interchange Eramework
Eor Version 1.0, the MetaData Council is recommending the ASCII-based batch approach so
that vendors can implement support Ior the speciIication with minimum overhead and the customer
beneIits Irom the availability oI meta data interchange as quickly as possible.
ASCII Batch Approach
An ASCII Batch approach relies on the ASCII Iile Iormat that contains the description oI the
common meta data components and standardized access requirements that make up the interchange
speciIication meta data model. In this approach, the entire ASCII Iile containing the MDIS schema and
access parameters is reloaded whenever a tool accesses the meta data through the speciIication API.
This approach requires only the addition oI a simple import/export Iunction to the tools and
would not require updating the tool in the event oI meta data model changes, because the most up-to-
date schema will always be available through the access Iramework. This eliminates the amount oI
retroIitting required to enable tools to remain compliant with the MDIS, because the burden Ior update
stays primarily within the Iramework itselI.
The MetaData Interchange SpeciIication
There are two basic aspects oI the proposed speciIication:
Those that pertain to the semantics and syntax used to represent the meta data to be exchanged.
These items are those that are typically Iound in a speciIications document.
Those that pertain to some Iramework in which the speciIication will be used. This second set
oI items is two Iile-based semaphores that are used by the speciIication's import and export
Iunctions to help the user oI the speciIication control consistency.
Components deIining the semantics and syntax that deIine the speciIication:
The Metamodel
The MetaData Interchange SpeciIication Metamodel describes the entities and relationships that
are used to directly represent meta data in the MDIS. The goal in designing this metamodel is twoIold:
To choose the set oI entities and relationships that represents the objects that the majority oI
tools require.
To provide some mechanism Ior extensibility in the case that some tool requires the
representation oI some other type oI object. Section 5 describes the metamodel Ior Version 1.0
oI the MetaData Interchange SpeciIication. In the rest oI this document the entities that are
directly represented by the speciIication are reIerred to as objects in the "public view," while
any other meta data stored in the interchange Iile is reIerred to as "private meta data" (i.e., tool-
speciIic meta data).
The Mechanism Ior Extending the Metamodel
The mechanism chosen to provide extensibility to the speciIication is analogous to the
"properties" object Iound in LISP environments: a character Iield oI arbitrary length that consists oI a
set oI identiIiers and a value, where the identiIiers are used by the import Iunction oI the speciIication to
locate and identiIy the private meta data in question and the value is the actual meta data itselI. Note
that because some tools may consider their private meta data proprietary, the actual value Ior this meta
data may be encrypted.
The MDIS Access Eramework
Version 1.0 oI the MDIS includes inIormation which will support a bidirectional Ilow oI meta
data while maintaining meta data consistency.
Three types oI inIormation are required:
Versioning inIormation in the header oI the Iile containing the meta data
A Tool ProIile which describes what type oI data elements a tool directly represents and/or
updates
A ConIiguration ProIile which describes the "legal Ilow oI meta data." Eor example, although
source-to-target mapping may be speciIied in the context oI some analysis tool, once that meta
data has been exported to ETI*EXTRACT and the mapping is changed because oI errors Iound
in expected data, one may want to require that all Iuture changes to mapping originate in
ETI*EXTRACT. II the conIiguration proIile is set properly, the import Iunction Ior
ETI*EXTRACT would err oII iI asked to import a conversion speciIication Irom the analysis
tool with a version number greater than the version number oI the one originally imported Irom
the mapping tool.
The components oI the meta data interchange standard Irameworks are
The standard meta data model
The standard access Iramework
Tool proIile
The user conIiguration
3. Meta data repository
The data warehouse architecture Iramework represents a higher level oI abstraction than the
meta data interchange standard Iramework and by design. The warehouse design should
prevent any direct access to the warehouse data iI it does not use Meta data deIinitions to gain
the access
The Iramework provides the Iollowing beneIits
It provides a comprehensive suite oI tools Ior enterprise wide meta data management
It reduces and eliminates inIormation redundancy, inconsistency, and under utilization.
It simpliIies management and improves organization, control, and accounting oI
inIormation assets.
It increases identiIication, understanding, coordinating and utilization oI enterprise
wide inIormation assets
It provides eIIective data administration tools to better manage corporate inIormation
assets with Iull-Iunction data dictionary
It increases Ilexibility, control, and reliability oI the application development process
and accelerates internal application development.
It leverages investment in legacy systems with the ability inventory and utilize existing
application
It provides a universal relational model Ior heterogeneous RDBMS to interact and
share inIormation.
It enIorces CASE development standards and eliminates redundancy with bthe ability
to share and reuse meta data.
4. Meta data Management
A Irequently occurring problem in data warehousing is the inability to communicate to the end
user what inIormation resides in the data warehouse how it can be accessed.
The key to providing users and applications with a roadmap to the inIormation stored in the
warehouse is the meta data.
It deIines all data elements and their attributes, data sources and timing, and the rules that govern
data use and data transIormation. Meta data needs to be collected as the warehouse is designed and
built. Must enIorce integrity and redundancy.
SUMMARY
This unit covers the basics components oI data warehousing. This includes the architecture oI
data warehousing, components, building data warehousing
Data marts are presented as an inexpensive alternative to a data warehouse that takes
signiIicantly less time and money to build. It is a subsidiary to a data warehouse oI integrated
data. It is created Ior a dedicated set oI users.
The inIormation delivery component is used to enable the process oI subscribing Ior data
warehouse inIormation and having it delivered to one or more destinations oI choice according
to some user-speciIied scheduling algorithm. InIormation delivery system distributes ware
house stored data and other inIormation objects to other data warehouses and end user products
KEY TERMS
Data base components-Parallelism-Metadata-Shared Memory Architecture-Shared disk
architecture-Shared nothing architecture-Data Clean up, TransIormation tools.

You might also like