100% found this document useful (1 vote)
57 views

Data Modeling: Agnivesh Kumar

Agnivesh Kumar discusses data modeling which involves defining a data model to organize data and relationships. The document outlines different types of data modeling including conceptual, enterprise, logical, physical, relational, and dimensional modeling. It also discusses data warehousing, describing its key characteristics and common architecture layers such as the data source, extraction, staging, transformation, storage, presentation and metadata layers. Finally, key differences between databases and data warehouses are highlighted such as their usage for transactions versus analysis and optimization for writes versus reads.

Uploaded by

Agnivesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
57 views

Data Modeling: Agnivesh Kumar

Agnivesh Kumar discusses data modeling which involves defining a data model to organize data and relationships. The document outlines different types of data modeling including conceptual, enterprise, logical, physical, relational, and dimensional modeling. It also discusses data warehousing, describing its key characteristics and common architecture layers such as the data source, extraction, staging, transformation, storage, presentation and metadata layers. Finally, key differences between databases and data warehouses are highlighted such as their usage for transactions versus analysis and optimization for writes versus reads.

Uploaded by

Agnivesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Modeling

Agnivesh Kumar

Define the data model
A data model documents and organizes data, how it is stored and accessed, and the
relationships among different types of data. The model may be abstract or concrete.

Identify the different data components consider raw and processed data, as well as associated
metadata !these are called entities"

Identify the relationships between the different data components !these are called
associations"

Identify anticipated uses of the data !these are called re#uirements", with recognition that data
may be most valuable in the future for unanticipated uses

Identify the strengths and constraints of the technology !hardware and software" that you plan
to use during your pro$ect !this is called a technology assessment phase"

%uild a draft model of the entities and their relations, attempting to &eep the model
independent from any specific uses or technology constraints.

Data Modeling Types

'onceptual Data Modeling

(nterprise Data Modeling

)ogical Data Modeling

*hysical Data Modeling

+elational Data Modeling

Dimensional Data Modeling



'onceptual Data Modeling

A conceptual data model identifies the highestlevel relationships between the different
entities. ,eatures of conceptual data model include-

Includes the important entities and the relationships among them.

.o attribute is specified.

.o primary &ey is specified.



(nterprise Data Model

Development of a common consistent view and understanding of data elements and their
relationships across the enterprise is referred to as enterprise data modeling.

This type of data modeling provides access to information scattered throughout an enterprise
under the control of different divisions or departments with different databases and data
models.

(nterprise data modeling is sometimes called as global business model and the entire
information about the enterprise would be captured in the forms of entities.

/hen a enterprise logical data model is transformed to a physical data model, 01*(+T2*(0
and 01%T*(0 may not be as is. I.e the logical and physical structure of super types and
subtypes may be entirely different.!Means names of tables and columns changes and tables
can brea& for understand the model.


)ogical Data Modeling
A logical data model describes the data in as much detail as possible, without regard to how
they will be physical implemented in the database. ,eatures of a logical data model include-

Includes all entities and relationships among them.

All attributes for each entity are specified.

*rimary &ey for each entity is specified.

,oreign &eys are specified.

.ormalization occurs at this level.



*hysical Data Modeling
*hysical data model represents how the model will be built in the database. A physical
database model shows all table structures, including column name, column data type, column
constraints, primary &ey, foreign &ey, and relationships between tables. ,eatures of a physical
data model include-

0pecification all tables and columns.

,oreign &eys are used to identify relationships between tables.

Denormalization may occur based on user re#uirements.

*hysical considerations may cause the physical data model to be #uite different from the
logical data model.

*hysical data model will be different for different +D%M0. ,or e3ample, data type for a
column may be different between My04), 04) 0erver,5racle,*ostgres etc.


+elational Data Modeling

+elational Data Model is a data model that views the real world as entities and relationships.

(ntities are concepts, real or abstract about which information is collected.

The goal of relational data model is to normalize data and present it in a good normal form.

,ollowing are some of #uestions that arise during development of relational data model,
/hat will be the future scope of the data model6
7ow to normalize data 6
7ow to group attribute and entities6
7ow to connect one entity to other6
7ow to validate data6
7ow to present report6

Dimensional Data Modeling

DM is a logical design techni#ue that see&s to present the data in a standard, intuitive
framewor& that allows for highperformance access. It is inherently dimensional, and it
adheres to a discipline that uses the relational model with some important restrictions. (very
dimensional model is composed of one table with a multipart &ey, called the fact table, and a
set of smaller tables called dimension tables. (ach dimension table has a singlepart primary
&ey that corresponds e3actly to one of the components of the multipart &ey in the fact table.
This characteristic 8starli&e9 structure is often called a star $oin.

A fact table, because it has a multipart primary &ey made up of two or more foreign &eys,
always e3presses a manytomany relationship. The most useful fact tables also contain one
or more numerical measures, or 8facts,9 that occur for the combination of &eys that define
each record.

Dimension tables, by contrast, most often contain descriptive te3tual information. Dimension
attributes are used as the source of most of the interesting constraints in data warehouse
#ueries, and they are virtually always the source of the row headers in the 04) answer set.


*hysical vs )ogical

+elational :s Dimensional

Data /arehouse

A data warehouse is a sub$ectoriented, integrated, timevariant and nonvolatile collection of


data in support of management;s decision ma&ing process.

0ub$ect5riented- A data warehouse can be used to analyze a particular sub$ect area. ,or
e3ample, <sales< can be a particular sub$ect.

Integrated- A data warehouse integrates data from multiple data sources. ,or e3ample, source
A and source % may have different ways of identifying a product, but in a data warehouse,
there will be only a single way of identifying a product.

Time:ariant- 7istorical data is &ept in a data warehouse. ,or e3ample, one can retrieve data
from = months, > months, ?@ months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is &ept. ,or
e3ample, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.

.onvolatile- 5nce data is in the data warehouse, it will not change. 0o, historical data in a
data warehouse should never be altered.

A data warehouse is a copy of transaction data specifically structured for #uery and analysis.

Data /arehouse Architecture
All data warehouse systems have the following layers-

Data 0ource )ayer

Data (3traction )ayer

0taging Area

(T) )ayer

Data 0torage )ayer

Data )ogic )ayer

Data *resentation )ayer

Metadata )ayer

0ystem 5perations )ayer




Data 0ource )ayer

This represents the different data sources that feed data into the data warehouse. The data
source can be of any format plain te3t file, relational database, other types of database,
(3cel file, etc., can all act as a data source.

Many different types of data can be a data source-

5perations such as sales data, 7+ data, product data, inventory data, mar&eting data,
systems data.

/eb server logs with user browsing data.

Internal mar&et research data.

Thirdparty data, such as census data, demographics data, or survey data.



Data (3traction )ayer

Data gets pulled from the data source into the data warehouse system. There is li&ely some
minimal data cleansing, but there is unli&ely any ma$or data transformation.
0taging Area

This is where data sits prior to being scrubbed and transformed into a data warehouse A data
mart. 7aving one common area ma&es it easier for subse#uent data processing A integration.
(T) )ayer

This is where data gains its <intelligence<, as logic is applied to transform the data from a
transactional nature to an analytical nature. This layer is also where data cleansing happens.
The (T) design phase is often the most timeconsuming phase in a data warehousing pro$ect,
and an (T) tool is often used in this layer.
Data 0torage )ayer

This is where the transformed and cleansed data sit. %ased on scope and functionality, = types
of entities can be found here- data warehouse, data mart, and operational data store !5D0". In
any given system, you may have $ust one of the three, two of the three, or all three types.

Data )ogic )ayer

This is where business rules are stored. %usiness rules stored here do not affect the underlying
data transformation rules, but do affect what the report loo&s li&e.
Data *resentation )ayer

This refers to the information that reaches the users. This can be in a form of a tabular A
graphical report in a browser, an emailed report that gets automatically generated and sent
everyday, or an alert that warns users of e3ceptions, among others. 1sually an 5)A* tool
andAor a reporting tool is used in this layer.
Metadata )ayer

This is where information about the data stored in the data warehouse system is stored. A
logical data model would be an e3ample of something that;s in the metadata layer. A
metadata tool is often used to manage metadata.
0ystem 5perations )ayer

This layer includes information on how the data warehouse system operates, such as (T) $ob
status, system performance, and user access history.

Database :s Data /arehouse
Database-

1sed for 5nline Transactional *rocessing !5)T*". This records the data from the user for history.

The tables and $oins are comple3 since they are normalized. This is done to reduce redundant data and to
save storage space.

(ntity B +elational modeling techni#ues are used for database design.

5ptimized for write operation.

*erformance is low for analysis #ueries.


Data /arehouse-

1sed for 5nline Analytical *rocessing !5)A*". This reads the historical data for the 1sers for business
decisions.

The Tables and $oins are simple since they are denormalized. This is done to reduce the response time
for analytical #ueries.

Data B Modeling techni#ues are used for the Data /arehouse design.

5ptimized for read operations.

7igh performance for analytical #ueries.

Ceneral Data ,low B !(3- 5nline Insurance +egistration"

You might also like