Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
LECTURE 3
• There are different types of data that need to be integrated in the data
warehouse. It can be divided into two categories:
• Structured Data
• Unstructured Data
STRUCTURED VS UNSTRUCTURED DATA
• Structured data:
• Structured data is comprised of clearly defined data types whose pattern makes
them easily searchable.
• Structured data usually resides in relational databases (RDBMS).
• Examples of Structured Data are fields store length-delineated data phone
numbers, social Security numbers, or ZIP codes.
STRUCTURED VS UNSTRUCTURED DATA
• Unstructured Data:
• Unstructured data is essentially everything else.
• Unstructured data is not easily searchable.
• Unstructured data has internal structure but is not structured via pre-defined data
models or schema. It may be textual or non-textual, and human- or machine-
generated.
• It may also be stored within a non-relational database like NoSQL.
• Examples of Unstructured Data: Email messages, videos, photos, audio files.
STRUCTURED VS UNSTRUCTURED DATA
SEARCHING UNSTRUCTURED DATA
• After adding unstructured data, the next big challenge is the ability to search
unstructured data.
• Vendors are now providing new search engines to find the information the
user needs from unstructured data.
• Query by image content is an example of a search mechanism for images. The
product allows you to pre-index images based on shapes, colors, and textures.
DATA VISUALIZATION
• Our DWH will be considered outdated, if it will display results only in the
form of output lists or spreadsheets.
• We need to display results in the form of graphics and charts as well.
• Visualization of data in the result sets boosts the process of analysis for the
user, especially when the user is looking for trends over time.
• Data visualization helps the user to interpret query results quickly and easily.
MAJOR VISUALIZATION TRENDS
• Most data visualizations are in the form of some standard chart type. The numerical
results are converted into a pie chart, a scatter plot, or another chart type. Now the
list of chart types supported by data visualization software has grown much longer.
• Visualizations are no longer static. Dynamic chart types are themselves user
interfaces. Users can now review a result chart, manipulate it, and then see newer
views online.
• Newer visualization software can visualize thousands of result points and complex
data structures.
MAJOR VISUALIZATION TRENDS
ADVANCED VISUALIZATION TECHNIQUES
• Advanced Interaction.
• The user simply double clicks a part of the visualization and then drags
and drops representations of data entities.
• Visual query is the most advanced of user interaction features.
• For example, the user may see the outlying data points in a scatter plot,
then select a few of them with the mouse and ask for a brand new
visualization of just those selected points.
PARALLEL PROCESSING
• A task is divided into smaller units and these smaller units are executed
concurrently.
• We need parallel processing to speed up query processing, data loading, and
index creation.
• Both hardware configurations and software techniques go hand in hand to
accomplish parallel processing.
PARALLEL PROCESSING HARDWARE OPTIONS
• Hardware alone would be worthless if the operating system and the database
software cannot make use of the parallel features of the hardware.
• We will have to ensure that the software can allocate units of a larger task to
the hardware components appropriately.
PARALLEL PROCESSING SOFTWARE
IMPLEMENTATION
• Data fusion is a technology dealing with the merging of data from disparate
sources.
• The principles and techniques of data fusion technology have a direct application
in data warehousing.
• In present-day warehouses, we tend to collect data in astronomical proportions.
• The more information stored, the more difficult it is to find the right information
at the right time. Data fusion technology is expected to address this problem also.
DATA WAREHOUSING AND ERP
• In the last few years, many businesses are adopting ERP (enterprise
resource planning) application packages.
• A remarkable feature of an ERP package is that it supports practically
every phase of the day-to-day business of an enterprise, from inventory
control to customer billing, from human resources to production
management.
DATA WAREHOUSING AND ERP
• Our data warehouse must hold details of every transaction at every touchpoint
with each customer.
• This means every unit of every sale of every product to every customer must be
gathered in the data warehouse repository.
• Making the data warehouse CRM-ready will increase the data volumes
tremendously.
• For customer-related data, cleansing and transformation functions are more
involved and complex.
WEB-ENABLED DATA WARE HOUSE
• Web-enabling the data warehouse means using the Web for information
delivery and integrating the clickstream data from the corporate Web site for
analysis.
• Notice the presence of the essential functional features of a traditional data
warehouse.
• In addition to the data warehouse repository holding the usual types of
information, the Web house repository contains clickstream data.
WEB-ENABLED DATA WARE HOUSE
END OF SLIDES