Data Modeling: Agnivesh Kumar
Data Modeling: Agnivesh Kumar
Agnivesh Kumar
Define the data model
A data model documents and organizes data, how it is stored and accessed, and the
relationships among different types of data. The model may be abstract or concrete.
Identify the different data components consider raw and processed data, as well as associated
metadata !these are called entities"
Identify the relationships between the different data components !these are called
associations"
Identify anticipated uses of the data !these are called re#uirements", with recognition that data
may be most valuable in the future for unanticipated uses
Identify the strengths and constraints of the technology !hardware and software" that you plan
to use during your pro$ect !this is called a technology assessment phase"
%uild a draft model of the entities and their relations, attempting to &eep the model
independent from any specific uses or technology constraints.
Data Modeling Types
A conceptual data model identifies the highestlevel relationships between the different
entities. ,eatures of conceptual data model include-
.o attribute is specified.
Development of a common consistent view and understanding of data elements and their
relationships across the enterprise is referred to as enterprise data modeling.
This type of data modeling provides access to information scattered throughout an enterprise
under the control of different divisions or departments with different databases and data
models.
(nterprise data modeling is sometimes called as global business model and the entire
information about the enterprise would be captured in the forms of entities.
/hen a enterprise logical data model is transformed to a physical data model, 01*(+T2*(0
and 01%T*(0 may not be as is. I.e the logical and physical structure of super types and
subtypes may be entirely different.!Means names of tables and columns changes and tables
can brea& for understand the model.
)ogical Data Modeling
A logical data model describes the data in as much detail as possible, without regard to how
they will be physical implemented in the database. ,eatures of a logical data model include-
*hysical considerations may cause the physical data model to be #uite different from the
logical data model.
*hysical data model will be different for different +D%M0. ,or e3ample, data type for a
column may be different between My04), 04) 0erver,5racle,*ostgres etc.
+elational Data Modeling
+elational Data Model is a data model that views the real world as entities and relationships.
The goal of relational data model is to normalize data and present it in a good normal form.
,ollowing are some of #uestions that arise during development of relational data model,
/hat will be the future scope of the data model6
7ow to normalize data 6
7ow to group attribute and entities6
7ow to connect one entity to other6
7ow to validate data6
7ow to present report6
Dimensional Data Modeling
DM is a logical design techni#ue that see&s to present the data in a standard, intuitive
framewor& that allows for highperformance access. It is inherently dimensional, and it
adheres to a discipline that uses the relational model with some important restrictions. (very
dimensional model is composed of one table with a multipart &ey, called the fact table, and a
set of smaller tables called dimension tables. (ach dimension table has a singlepart primary
&ey that corresponds e3actly to one of the components of the multipart &ey in the fact table.
This characteristic 8starli&e9 structure is often called a star $oin.
A fact table, because it has a multipart primary &ey made up of two or more foreign &eys,
always e3presses a manytomany relationship. The most useful fact tables also contain one
or more numerical measures, or 8facts,9 that occur for the combination of &eys that define
each record.
Dimension tables, by contrast, most often contain descriptive te3tual information. Dimension
attributes are used as the source of most of the interesting constraints in data warehouse
#ueries, and they are virtually always the source of the row headers in the 04) answer set.
*hysical vs )ogical
+elational :s Dimensional
Data /arehouse
0ub$ect5riented- A data warehouse can be used to analyze a particular sub$ect area. ,or
e3ample, <sales< can be a particular sub$ect.
Integrated- A data warehouse integrates data from multiple data sources. ,or e3ample, source
A and source % may have different ways of identifying a product, but in a data warehouse,
there will be only a single way of identifying a product.
Time:ariant- 7istorical data is &ept in a data warehouse. ,or e3ample, one can retrieve data
from = months, > months, ?@ months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is &ept. ,or
e3ample, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.
.onvolatile- 5nce data is in the data warehouse, it will not change. 0o, historical data in a
data warehouse should never be altered.
A data warehouse is a copy of transaction data specifically structured for #uery and analysis.
Data /arehouse Architecture
All data warehouse systems have the following layers-
0taging Area
(T) )ayer
Metadata )ayer
This represents the different data sources that feed data into the data warehouse. The data
source can be of any format plain te3t file, relational database, other types of database,
(3cel file, etc., can all act as a data source.
5perations such as sales data, 7+ data, product data, inventory data, mar&eting data,
systems data.
Data gets pulled from the data source into the data warehouse system. There is li&ely some
minimal data cleansing, but there is unli&ely any ma$or data transformation.
0taging Area
This is where data sits prior to being scrubbed and transformed into a data warehouse A data
mart. 7aving one common area ma&es it easier for subse#uent data processing A integration.
(T) )ayer
This is where data gains its <intelligence<, as logic is applied to transform the data from a
transactional nature to an analytical nature. This layer is also where data cleansing happens.
The (T) design phase is often the most timeconsuming phase in a data warehousing pro$ect,
and an (T) tool is often used in this layer.
Data 0torage )ayer
This is where the transformed and cleansed data sit. %ased on scope and functionality, = types
of entities can be found here- data warehouse, data mart, and operational data store !5D0". In
any given system, you may have $ust one of the three, two of the three, or all three types.
Data )ogic )ayer
This is where business rules are stored. %usiness rules stored here do not affect the underlying
data transformation rules, but do affect what the report loo&s li&e.
Data *resentation )ayer
This refers to the information that reaches the users. This can be in a form of a tabular A
graphical report in a browser, an emailed report that gets automatically generated and sent
everyday, or an alert that warns users of e3ceptions, among others. 1sually an 5)A* tool
andAor a reporting tool is used in this layer.
Metadata )ayer
This is where information about the data stored in the data warehouse system is stored. A
logical data model would be an e3ample of something that;s in the metadata layer. A
metadata tool is often used to manage metadata.
0ystem 5perations )ayer
This layer includes information on how the data warehouse system operates, such as (T) $ob
status, system performance, and user access history.
Database :s Data /arehouse
Database-
1sed for 5nline Transactional *rocessing !5)T*". This records the data from the user for history.
The tables and $oins are comple3 since they are normalized. This is done to reduce redundant data and to
save storage space.
1sed for 5nline Analytical *rocessing !5)A*". This reads the historical data for the 1sers for business
decisions.
The Tables and $oins are simple since they are denormalized. This is done to reduce the response time
for analytical #ueries.
Data B Modeling techni#ues are used for the Data /arehouse design.