Data Management: Data, Databases, and Warehousing: Maestría en Tecnologías de Información

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Maestría en Tecnologías de

Información

Chapter Data Management:


Data, Databases, and
3 Warehousing
Catedratico:
M.C. Anahi Ramírez Hernández

Alumnos:
Set Joatam Adonai Hdez. O
David Luna Cruz

Pachuca Hgo, 31 Mayo de 2008


1
3.1 Data Management: A Critical Succest Factor.
3.2 File Management.
3.3 Databases and Database Management
System.
3.4 Creating Databases.
3.5 Data Warehousing.
3.6 Marketing Databases in Action.
3.7 Web – based Data Management Systems.
3.8 Managerial Issues.

2
3.1. Data Management
Definition
The goal of data management is to provide the
infraestructure to transform raw data into corporate
information of the highest quality.

DATA LIFE CYCLE PROCESS


• New data collection from various sources.
• The data are stored in the database(s).
• The data are preprocessed to fit the format of a data
warehouse or data marts. Take a copy of the needed
data for analysis.
• In the analysis look for patterns and data
interpretations using data analysis tools and intelligent
systems.
Data Life Cycle Process
(continuation)
• Generating of information and knowledge.
• The data are presented to users using
different visualization tools.
• The created knowledge is stored and
used, together with decision support tools
to provide solutions to organizational
problems.
Data Sources
• Organizational Data.- Organization’s internal data
are: people, products, services and process.
• End User Data.- These are data created by
corporate employees. Can reside on the user’s PC
or be placed on departamental databases, etc.
• External Data.- Ranging from commercial
databases to sensors and satellites. Data are
available from CD-ROMs, memory chips, Internet
servers, films, sounds, pictures, etc. Most external
data are irrelevant and must be monitored and
captured to ensure that important data are not
overlooked.
Methods for Collecting
Raw Data
Raw data can be collected:
• Manually.- Surveys, observations y
contributions from experts.
• Electronically.- Using HW and SW for data
storage, comunication, transmission and
presentation. Ej. Clickstream data.
Data Quality and Integrity
• Determines the data’s uselfulness as well as
the quality of the decisions based on the
data.
• Data quality is the cornerstone of effective
business intelligence.
• The different categories of data quality are:
– Standardization.
– Matching
– Verification
– Enhancement.
Document Management
• Is the automated control of electronic
documents, page images, spreadsheets,
voice word processing documents, etc. from
initial creation to final archiving.
• The major tools of document management
are workflow SW, authoring tools, scanners
and databases.
• Document management systems (DMSs)
• Lotus Development Corporation.
3.2. File Management.
Database Components
• A computer system organizes data in a hierarchy
that begins with bits, bytes, words, fields, records,
files and databases.
• A record is a physical description of an entity.
• An entity is a person, place, thing or event.
• Attributes describe the characteristics of an
entity.
• Primary Key.- Is the field that uniquely identifies
the record for be retrieved, updated and sorted.
• Foreign keys.- Provide relationships between two
tables.
Accessing Records from
Computer Files
• Sequential File Organization.- Data records
must be retrieved in the same physical
sequence in which they are stored.
– ISAM (Indexed sequential access method)
• Direct or Random File Organization.- Users
can retrieve records in any sequence,
without regarding to actual physical order on
the storage medium.
– Direct file access method.
Problems from the file
environment.
• Data Redundancy.- Data duplicity in several
files.
• Data Inconsistency.- Actual values across
various copies of the data are not
synchronized.
• Data Insolation.- Difficulty in accessing data
from different applications.
• Data Integrity.- Data values must often meet
integrity constraints. Also ensure that data is
complete.
3.3. Databases and Database
Management Systems
Databases (1/2)
• Is an organized logical grouping of related
files.
• A centralized DB has all the related files in
one physical location.
• Files are not accessible except via the centralized
host computer, where they can be protected
more easily from unauthorized access or
modification. Also recovery from disasters can be
more easily.
• They are vulnerable to a single point of failure.
Databases (2/2)
• A Distributed DB has complete copies of a DB, or
portions of a DB in more than a location. There are
two types:
– Replicated.- Has complete copies of a entire DB in many
locations, primarily to alleviate the single-point-of-failure
problems. Also increase user access responsiveness.
– Partitionated.- Each location has a portion of the entire DB.
Provides the response speed of localizated files without
the need to replicate a changes in multiple locations.
• DBMS: The database management system is the
progam or group of programs that provides access to
a DB. Acts as an interface between application
programs and physical data files. Provides users with
tools to add, delete, mantain, display, print, search,
select, sort and update data.
Capabilites and Advantages
of a DBMS
• Persistence.
• Query hability.
• Concurrency
• Backup and Replication
• Rule enforcement
• Security
• Computation
• Change and access logging
• Automated optimization
• Ej. Microsoft Access, DB2 de IBM, Oracle 13
Views of the DB
• Physical View.- Deals with the actual
physical arrangement and location of data in
the direct access storage device(DASD’s).
• Logical View.- Represents data in a forma
that is meaningful to a user and to the
software programs that process that data.
• While there is one physical view of the data,
there can be an endless number of different
logical views.
DBMS Languages (1/2)
• A DBMS contains four major components:
– Data Model.-Defines the way data are
conceptually structured. (Hierarchical,
Network, relational, object-oriented,
hypermedia and multidimensional models).
DBMS Languages (2/2)
– Data Definition Language (DDL).- Specify the types of
information and structure of the DB. Is the link between the
logical and physical views.
• The schema is the logical description of the entire DB and the
listing of all the data items and the relationships among them.
The subschema, is the specific set of data from the DB that is
required by each application.
– Data Manipulation Language (DML).- Is used together with
a language to manipulate the data in the DB. Provides
users with the ability to retrieve, sort, display, and delete
the contents of a DB. The most popular is SQL (SELECT,
MODIFY, DELETE).
– Data Dictionary.- Stores definitions of data elements and
characteristics such as usage, physical representation,
ownership, authorization and security. A data element
represents a field. Dictionaries are a form of metadata, in
other words, is information about information.
DBMS Benefits
• Improved strategic use of corporate data.
• Reduced complexity of the organization’s information
systems environment.
• Reduced data redundancy and inconsistency.
• Enhanced data integrity.
• Application data independence.
• Improved security.
• Reduced application development and maintenance
costs.
• Improved flexibility of information systems.
• Increased access and availability of data and
information.
3.4. Creating Databases
• The data model represents the
conceptual view of the organization’s data.
(Level 1).
• Entity-relationship diagrams.-
Represent the logical view of how data are
organized. (Level 2).
• Data Tables: Represent the physical view
of the data. (Level 3).
Level 1: Conceptual (1/2)
BUILDING THE DATA MODEL.
• The first step is to develop a conceptual
design for the data.
• Using entity-relationship (E-R)
diagramming techniques.
• The conceptual data model identifies what
the bussiness needs to know about data
objects that are essential to its operation.
Level 1: Conceptual (2/2)
• Objects may represent: people, places,
things, events and intellectual property.
• Attributes detail the type of data that we
want to collect.
• Entities are associated with one another in
relationships. The number of entities
represents the degree of the relationship.
E.g. Binary relationships are relationships of
de degree two.
-Product SKU
-Sales quotes Entity -Product description
-Payrates
-Personnel -Product price
-Schedules -Product quantity
-Employee contact

Relationship

-Product preferences -Vendor name


-Customer contact -Vendor contact
-Shop locations -Vendor location
-Product sales Attributes -Payment terms
-shipping

26
Level 2: Logical
BUILDING THE ENTITY RELATIONSHIP
MODEL.
• A logical model defines all the rules for data
used in the bussiness operation.
• The diagrams force people to define and
document systematically the information that
they will use in the bussiness.
• The attributes at Level 1 become the data
entities at Level 2.
LEVEL 2: CUSTOMER
DIAGRAM
-Product SKU -Store number
-Product name -Store neighborhood
-Product type -Store sales
-Product characteristics -Store customers

-Customer ID -Product SKU


-Customer name -Product name
-Customer address -Product price
-Customer phone -Product cost

28
Level 3: Physical
BUILDING THE DATABASE TABLES.
• We convert the E-R diagrams to database
tables.
• The set of attributes for an entity becomes
the fields in the table.
• The relationships from the E-R diagrams
link the tables through primary and foreign
keys.
LEVEL 3: CUSTOMER DATA TABLES

Product Preferences Table Shop Locations Table


Field Name Type Length Field Name Type Length
Product SKU * Number 12 Store Number * Number 8
Product name Text 20 Store neighborhood Text 20
Product type Text 12 Store sales Currency 8
Product charact. Number 30 Store customers Number 5
Customer ID ** Number 8 Product SKU** Number 12

Customer Contact Table Product Sales Table

Field Name Type Length Field Name Type Length


Customer ID * Number 8 Product SKU * Number 12
Customer name Text 20 Product name Text 20
Customer address Text 30 Product price Currency 8
Customer phone Number 12 Product cost Currency 8
Customer ID ** Number 8

* Primary Key ** Foreign Key


30
3.5 Data Warehose

31
The need for data
warehousing
• Many large and medium – size companies
are using data warehousing.

• It make easier and faster to process,


analyze, and query data.

• Successful key is the effective an efficient


use of data and information.
32
The need for data
warehousing
• This is done not only via transaction processing,
but also through analytical processing.

• Analytical processing involves:


– Business intelligence.
– Data mining.
– Decision support systems (DSSs).
– Enterprise information systems(EISs).
– Web applications.
– Queryng.
– Other end – user activities.

33
The need for data
warehousing
• When data are in several sources is
dificult to conduct an analysis.

• A data warehouse overcomes these


limitations whith:
– Business representation.
– Web – based enviroment.
– Server – based repository (the data
warehouse).
34
The data warehouse
• Is a repository of data that are organized
to be readily acceptable for analytical
processing activities such as:

– Data mining.
– Decision support.
– Querying.
– And other applications.

35
The data warehouse
• Benefits of an enterprise data warehouse(EDW).
• Better business decisions can be made due to
better information.
– Marketing and sales.
– Customer and channel partnet.
– Pricing and contracts.
– Forecesting.
– Financial.
– Supply chain.
– Customer service improvements and order
enhancement.
– Information system and processes.
36
The data warehouse
• Process of buil and using a data warehouse:

Extraction, Data Mart


Transformation, Marketing
Load(ETL)

Supply Chain
Operational Data
Data Store Multidimensional
Systems/Data warehouse
database

Data Mart
Management

37
The data warehouse
• Characteristics of a data warehouse:
– Organization.
– Consistency.
– Time variant.
– Nonvolatile.
– Relational.
– Client / server.
– Web – based.
– Integration.
– Real time.
38
Buiding a data warehouse
• Building and implementing a data
warehouse can present problems.
1. Top Management support the data
warehouse.
2. Users want access to a broand range of
data?
3. Sigle repository or data marts?
4. Users want access and analysis tools?
5. Users persive the information technology.
39
Building a data warehouse
6. Users understand how the data warehouse
solves business problems?
7. Users understand data warehouse
technologies?

40
Building a data warehouse
• Architecture and Tools
– Two – tier
– Three – tier
• Data from data warehouse are processed twice
and deposited in an additional multidimensional
database.

41
Building a data warehouse
• Suitability: data warehousing is most
appropiate for organizations whith:
– Large amounts of data.
– Data is stored in different system.
– Information management is in use.
– Large or diverse customer base.
– Same data differently in different systems.
– Data stored, difficult to decipher.
– Extensive end – user computing is performed.

42
Data Marts
• A data mart is a small warehouse
designed for a strategic business unit
(SBU) or a department.
– Low cost.
– Shorter time for implementation.
– Central control.

43
Data Marts
• There are two major types of data marts:

– Replicated (dependent) data marts.


• In such a case the data mart is addition to the data
warehouse.

– Standalone data marts.


• A company can have one or more independent
data marts without having a data warehouse.

44
Multidimensional Databases
• Are specialized data stores that organize
facts by dimensions, such as geographical
region, product line, salesperson, or time.

• The data in multidimensional databases


are usually preprocessed and stored in
what is called a (multidimensional) data
cube.
45
3.6 Marketing Databases in
Action

46
The marketing transaction
database
• Today requires a new kind of database.
• Personalizing marketing messages in real
time.
• Create new and / or pesonalized products
and services.
• Such a database is called marketing
transaction database(MTD).

47
Implementation examples
• Data mining, warehousing and marketing:

– Bank of America
• Analyzing customers’s banking activities.

– Chicago Tribune.
• Analyze data from Web.

48
3.7 Web – Based Data
Management Systems

49
Web – Based Data
Management Systems

• Are often performed with Web tools.

• Web technologies.

• E - business.

50
Intelligent data warehouse
web based systems

• An intelligent agent can improve the


operation of a data warehouse.

• This application supplements the


monitoring and scanning of external
strategic data.

51
Clickstream data warehouse

• Clickstream data (web logs) occur inside


the Web enviroment, when customers visit
a Web site.

52
3.8 Managerial Issues

53
Managerial Issues
• Cost – benefit issues and justification
• Where to store data physically.
• Legal issues.
• Internal or external?
• Disaster recovery.
• Data security and ethics.
• Paying for use of data
• Privacy
• The legacy data problem.
• Data delivery.
54

You might also like