Module 2
Module 2
Module-2
Spatial Data Models
Database Structures – Relational, Object Oriented – Entities – ER diagram - data models -
conceptual, logical and physical models - spatial data models – Raster Data Structures – Raster
Data Compression - Vector Data Structures - Raster vs Vector Models- TIN and GRID data
models.
.IN
2. Hierarchical Database
3. Network Database
4. Relational Database (RDBMS)
C
5. Object-Oriented Database (OODBMS)
N
6. Object-Relational Database (ORDBMS)
SY
GIS applications require the ability to store, retrieve, query, and analyze spatial data, and
VT
RDBMS helps in managing this efficiently through spatial extensions such as PostGIS
(PostgreSQL), Oracle Spatial, and Microsoft SQL Server Spatial.
Features of RDBMS in GIS
1
BCV654 B Module-2
Example:
Cities Table (Attribute Data)
1 City A 500,000
2 City B 200,000
.IN
using primary keys, foreign keys, down data insertion and
Consistency
and constraints. updates.
Stores data in structured tables, Fixed schema requires careful
Data
making it easy to manage and planning before designing the
C
Organization
retrieve. database.
Complex queries may require
N
Querying & Data Supports SQL for efficient querying,
optimized indexing for
Retrieval filtering, and manipulation of data.
performance improvement.
SY
2
BCV654 B Module-2
.IN
✔ Object-Oriented Data Storage – Stores spatial data as objects, encapsulating attributes and
behaviours.
C
✔ Complex Data Representation – Supports hierarchical and multi-level relationships for
geographic features.
N
✔ Inheritance & Polymorphism – Allows objects to inherit properties and behaviors from
SY
other objects.
✔ Spatial Data Support – Integrates spatial data types such as points, lines, polygons, and
3D models.
U
✔ Direct Object Manipulation – Objects are stored and retrieved without complex joins,
VT
improving performance.
✔ Better Mapping with Programming Languages – Works efficiently with OOP languages
like Java, C++, and Python
Advantages
3
BCV654 B Module-2
Better Integration with Object- Easily integrates with languages like Java, C++,
Oriented Programming (OOP) and Python for GIS applications.
.IN
Disadvantages
OODBMS is not as widely adopted in GIS as relational
Less Mature than RDBMS
databases.
C
No standard query language like SQL; querying depends on
Complex Query Language
N
OOP-based languages.
SY
Limited Support in GIS Fewer commercial GIS tools (e.g., ArcGIS, QGIS) fully
VT
4
BCV654 B Module-2
.IN
example, a "Road" entity might have attributes like "Name," "Length," and
"Type."
o Relationships: Associations between entities. Represented by diamonds or lines.
C
For example, a "City" entity "contains" a "Park" entity.
N
o Cardinality: Specifies the number of instances of one entity that can be related
to another entity. Common types are:
SY
▪ One-to-one (1:1)
▪ One-to-many (1:N)
▪ Many-to-many (M:N)
U
.IN
The above figure is an Entity-Relationship (ER) Diagram designed to represent the
C
relationships between different entities in a database, likely for a GIS (Geographic Information
N
System) application related to land parcels and buildings. Let's break it down:
SY
Entities (Rectangles):
Attributes (Ovals):
• Land Parcel:
• Building:
6
BCV654 B Module-2
• Occupant:
Relationships (Diamonds):
• Contains: This represents the relationship between "Land Parcel" and "Building". It
indicates that a land parcel contains a building.
• Has: This represents the relationship between "Building" and "Occupant". It indicates
that a building has an occupant.
Interpretation:
relationship).
4. A Building can have one or more Occupants (as indicated by the "Has" relationship).
VT
5. Each Occupant has a unique ID, a name, and potentially other personal information.
• Cardinality: (Implicit) While not explicitly shown with notation like 1:1, 1:N, or M:N,
we can infer that:
7
BCV654 B Module-2
Purpose:
This ER diagram serves as a blueprint for designing a database to store and manage information
about land parcels, buildings, and occupants. It helps to ensure data consistency, integrity, and
efficient retrieval of information.
The grid-based GIS spatial data can be stored, manipulated, analysed, and referenced basically
in any one of the three methods/models. These three models are: GRID model, IMGRID
model and MAP model. All of these models use the grid cell values, their attributes, coverages
and corresponding legends. These models are developed depending upon the requirements
.IN
from time to time. Based on the applications of interest, availability of software and other
related information, any one of the above models can be selected for the execution of a
particular GIS project.
C
GRID MODEL:
N
SY
The first and foremost model for the representation of raster data is the GIRD model. The
method of storing, manipulating, and analysing the grid-based data was first conceptualised in
an attempt to develop GRID model.
U
Burrough (1983) used this approach because each of those early GIS systems used this model.
VT
The figure below illustrates the GRID model. In this method, each grid cell is referenced and
addressed individually and is associated with identically positioned grid cells in all other
coverages, rather like a vertical column of grid cells, each dealing with a separate theme.
Comparisons between coverages are therefore performed on a single column at a time. For
example, to compare soil attributes in one coverage with vegetation attributes in a second
coverage, and land use/land cover attributes in a third coverage, each X and Y location must be
examined individually. So a soil grid cell at the location must be examined individually. So a
soil grid cell at location X10-Y10 will be compared to its vegetation counterpart and third layer
land use/land cover at location X10-Y10. You might be able to envision this by imagining a
geological core in which each rock type is lying directly on top of the next, and to get a picture
of the entire study area, it will be necessary to put a large number of cores together.
8
BCV654 B Module-2
The advantage of this model is that computational comparison of multiple themes or coverages
for each grid cell location is relatively easy. This is a reasonable approach and has proven
successful. The main disadvantage is that it limits the efficient examination of relationships of
themes to one-to-one relationships within the spatial framework. In other words, it is more
inconvenient to compare groups in one coverage to groups in another coverage because each
grid cell location must be addressed individually. The second disadvantage is more storage
space for the cell data and the representation is vertical rather than horizontal, which would
more closely resemble our notion of maps.
.IN
C
N
SY
IMGRID MODEL
With a slight modification of the checkerboard analogue, the second basic raster data model,
U
that is the IMGRID data model, can be illustrated in the figure below. This model is also used
in the early GIS system (Burrough, 1983). Let us assume that the red squares on the
VT
checkerboard map serve to contain a single attribute, rather than just a theme. Instead, we can
use the number 1 (red squares) to repre·sent water and 0 (black squares) to indicate the absence
of water. How can we represent a thematic map of land use that contains, say four categories,
namely, recreation, agriculture, industry, and residences? Each of these four attributes would
have to be separated as an individual layer. One layer would stand for agriculture only, with 1
's and O's representing the presence or absence of this activity for each grid cell. Recreation,
industry, and residences would be represented in the same way, with each variable referenced
directly.
9
BCV654 B Module-2
IMGRID system has two major advantages. First, we have a contiguous object that more
closely resembles how we think about a map. That is, our primary storage object is a two-
dimensional array of numbers, rather than a column of numbers for different themes. Second,
.IN
we reduce the numbers that must be contained in each coverage to O's and 1 'so This will
certainly simplify our computations and will eliminate the need for map legends. Since each
C
variable is uniquely identified, assigning a single attribute value to a single grid cell is possible,
and this is the third advantage. Let us assume that a given grid cell partly occupies agriculture
N
and partly recreation and each of these attributes of the land use theme is separated. In such a
SY
case, we may encounter difficulties when creating our final thematic coverage if multiple
values occur in individual cells. To avoid such problems, we must be able to ensure that each
grid cell has only a single value for each variable.
U
MAP MODEL
VT
The third raster GIS model Map Analysis Package (MAP) model developed by C. Dana Tomlin
(Burrough, 1983) formally integrates the advantages of the above two raster data structure
methods. In this data model, each thematic coverage is recorded and accessed separately by
map name or title. This is accomplished by recording each variable, or mapping unit, of the
coverage's theme as a separate number code or label, which can be accessed individually when
the coverage is retrieved. The label corresponds to a portion of the legend and has its symbol
assigned to it. In this way, it is easy to operate on individual grid cells and groups of similar
grid cells, and the resolution changes in value require rewriting only a single number per
mapping unit, thus simplifying the computations. The overall major improvement is that the
MAP method allows ready manipulation of the data in a many-to-one relationship of the
attribute values and the sets of grid cells.
10
BCV654 B Module-2
The MAP data model is compatible with almost all computer systems from its original
mainframe version to Macintosh and PC versions and modern UNIX-based workstation
versions. It can be used as a teaching version of GIS as it is very flexible and has also become
a major module in commercial GIS packages like ARC/INFO.
Although raster GIS systems have traditionally been developed to allow single attributes to be
stored individually for each grid cell, some have evolved to include direct links to existing
database management systems. This approach extends the utility of the raster GIS by
minimising the number of coverages and substituting multiple variables for each grid cell in
each coverage. Such extensions to the raster data model have also allowed direct linkage to
existing GIS systems that use a vector back and forth from raster to vector. The user can operate
with all the advantages of both the data structures. The conversion process is often quite
transparent, allowing the user to perform the analyses needed without concern for the original
.IN
data structure. This feature is particularly important because it is strengthening the relationship
between traditional digital image processing software used to manipulate grid cell-based,
C
remotely sensed data and GIS software. Many software systems already have both sets of
capabilities, and still, more are likely in the future. Together with the linkage with existing
N
statistical packages, we are rapidly approaching the systems that operate with a superset of
SY
11
BCV654 B Module-2
Refers to the storage of Raster data so that it can processed by the computer. A raster data is
stored as a matrix. The cell values are written into the files by rows and columns. The different
types of raster data structures are:
.IN
One of the earliest digital formats used for satellite data is band intereaved by pixel (BIP)
format. This format treats pixels as the separate storage unit. Brightness values for each pixel
are stored one after another. It is practical to use if all bands in an image are to be used. Figure
C
shows the logic of how the data is recorded to the computer tape in sequential values for a four
N
band image in BIP format. All four bands are written to the tape before values for the next pixel
are represented. Any given pixel located on the tape contains values for all four bands written
SY
directly in sequence. In order to read all four bands of the image, all four panels must be pieced
together to form the entire scene.
U
VT
Just as the BIP format treats each pixel of data as the separate unit, the band interleaved by line
(BIL) format is stored by lines. Figure shows the logic of how the data is recorded to the
computer tape in sequential values for a four band image in BIL format. Each line is represented
in all four bands before the next line is recorded. Like the BIP format, it is a useful to use if all
12
BCV654 B Module-2
bands of the imagery are to be used in the analysis. If some bands are not of interest, the format
is inefficient if the data are on tape, since it is necessary to read serially past unwanted data.
.IN
The band sequential format requires that all data for a single band covering the entire scene be
written as one file. Thus, if an analyst wanted to extract the area in the center of a scene in four
C
bands, it would be necessary to read into this location in four separate files to extract the desired
N
information. Many researchers like this format because it is not necessary to read serially past
unwanted information if certain bands are of no value, especially when the data are on a number
SY
of different tapes.Random access optical disk technology, however, makes this serial argument
obsolete.
U
VT
13
BCV654 B Module-2
Satellite Imagery
Remotely sensed satellite data are recorded in raster format. The pixel value in a satellite image
represents light energy reflected or emitted from earth’s surface. By analyzing the pixel values,
an image processing system can extract a variety of themes from satellite images, such as land
use and land cover, hydrography, water quality and other areas Satellite images can be
displayed in black and white or in color. Satellite images can also simulate color photographs
if they have pixel values from the red green and blue spectral bands. The image looks like a
color photograph if bands 3 2 and 1 are assigned to red green and blue respectively, and a color
infrared photograph if bands 4 3 and 2 are assigned to red, green and blue respective.
Graphics Files
In this type of raster data, we can include maps, photographs and images which can be stored
.IN
as digital graphic files. Major popular graphic files in raster format are GIF (Graphic
Interchange Format), TIFF (Tagged Image File Format), JPEG Joint Photographic Experts
C
Group).
N
Digital Elevation Models
SY
A digital elevation model (consist of an array of uniformly spaced elevation data. A DEM is
point based, but it can easily be converted to raster data by placing each elevation point at the
center of a cell.
U
Digital Orthophotos
VT
A digital orthophoto quad (is a digitized image prepared from an aerial photographs or other
remotely sensed data, in which the displacement caused by camera tilt and terrain relief has
been removed. A digital orthophoto is geo referenced and can be registered with topographic
and other maps.
Raster data compression techniques aim to reduce storage space and improve efficiency by
encoding information using fewer bits than the original, employing either lossless or lossy
methods. Common techniques include (a) run-length codes, (b) raster chain codes, (c) block
codes, and (d) the unique structure called quadtrees.
14
BCV654 B Module-2
Run-Length Coding
The first method of compacting raster data is a process called run-length codes. In the raster
data, each grid cell has a numerical value corresponding to a category of data on the map that
must be put (generally typed) into the computer.
For example, for a map of 500 x 500 grid cells, 2,50,000 numbers have to the typed into the
computer. As you begin typing, you will quickly see patterns emerging from the data that
present opportunities for reducing your workload. Specifically, there are long strings of the
same number in each row. Think how much time you could save if for a given row, you could
just tell the computer that starting at column 8 all the numbers are 1 s, representing some map
variable, until you get to column 56, then at column 57 the numbers are 2s until the end of the
row. Indeed, you could also save a great deal of space by simply giving starting and ending
.IN
points for each string and the value that should be stored for that string. This method of storing
the data is called run-length coding.
C
N
SY
U
VT
0,10
0,10
0,4, 1,4, 0,2
0,4, 1,4, 0,2
0,2, 1,6, 0,2
0,2, 1,6, 0,2
0,2, 1,6, 0,2
0,2, 1,6, 0,2
0,10
0,10
15
BCV654 B Module-2
Chain coding defines the outer boundary using relative positions from a start point. The
sequence of the exterior is stored where the endpoint finishes at the start point. During the
encoding, the direction is stored as an integer. For example, the value 0 is north and 1 is east.
The disadvantage is that it is difficult to modify and edit the boundaries, such as merging and
inserting them. Local modification will change the overall structure, which is inefficient.
Moreover, because chain code stores the boundaries of each area as a unit, the boundaries of
adjacent areas will be stored repeatedly, resulting in redundancy.
For example: we start at the position (5,2). From here we define the border using cardinal
directions and the number of movements.
• E3, S4,W1,
.IN
• E1, E2, E3
• S1,S2,S3,S4
• S1,W1,N1,
• W1
C
• W1, N3, E1, • S1,W1,N1
N
• W1,N3.E1
• N1.
• N1
SY
Block Codes
U
The third method of storing the grid-based data ~or reducing the storage is block codes. The
VT
block codes method is a modification of run-length codes. Instead of giving starting and ending
points, plus a grid cell code, select a square group of cells and assign a starting point, the centre
or a corner, pick a grid cell value, and tell the computer how wide the square of grid cells is,
based on the number of cells. Block coding is also called a two-dimensional run-length code.
Each square, group of grid cells, including individual grid cells, can be stored in this way with
a minimum group of numbers. Block coding methods are a very effective method of reducing
the storage space for most thematically layered digital data in a GIS.
With respect to pervious example: Instead of storing 64 grid cells, all it takes is just 7 blocks.
Using block coding, it requires one 3×3 block, two 2×2 blocks, and four 1×1 cell blocks to
encode this raster image
16
BCV654 B Module-2
Quadtrees
.IN
The final method of compact storage is a rather difficult approach. Still at least one commercial
system called Spatial Analysis System (SPANS), from Tydac, and one experimental system
called Quilt are based on this scheme. Like block codes, quadtrees operate on square groups of
C
cells. In this the entire map is- successively divided into uniform square groups of grid cells
N
with the same attribute value. Starting with the entire map as entry points the map is then
divided into four quadrants (NW, NE, SW, and SE). If any of these quadrants is homogeneous
SY
containing grid cells with the same value, that quadrant is stored and no further subdivision is
necessary. Each remaining quadrant is further divided into four quadrants, again NW, N.E, SW,
and SE. Each quadrant is examined for homogeneity. All homogeneous quadrants are again
U
stored, and each of the remaining quadrants is further divided and tested in the same way until
VT
the entire map is stored, as square groups of cells, each with the same attribute value. In the
quadtree structure, the smallest unit of representation is a single grid cell.
One of the advantages of this raster model is that each cell can be subdivided into smaller cells
of the same shape and orientation. This unique feature of the raster data model has produced a
range of innovative data storage and data reductionmethods that are based on quadtree works
on the principle of recursively subdividing space. The most popular of these is the area or
region quadtree. The area quadtree works on the principles of recursively subdividing the cells
in a raster image into quads (or quarters). The subdivision process continues until each cell in
the image can be classed as having the spatial entity either present or absent within the bounds
of its geographical domain. The number of subdivisions required to represent an entity will be
a trade-off between the complexity of the feature and the dimensions of the smallest grid cell.
17
BCV654 B Module-2
The quadtrees principle is illustrated in Figure where the division of the region of the image is
mainly based on the resolution of the system as minimum mapable unit. Therefore the systems
based on quadtrees are called variable resolution systems because they can operate at any level
of quadtree subdivision. Thus users can decide how fine the resolution needs to be for various
manipulations and applications. In addition, because of the compactness of storage from this
method, a very large database, perhaps of a continental or even global scale, can be stored in a
single system.
.IN
C
N
SY
Vector data structures allow the representation of geographic space in an intuitive way
VT
reminiscent of the familiar analog map. The geographic space can be represented by the spatial
location of items or attributes which are stored in another file for later access. Like the raster
spatial data model, there are many potential vector data models that can be used to store the
geometric representation of entities in the computer.
A point is the simplest spatial entity that can be represented in the vector world with topology.
A point requires to be topologically correct with respect to a geographicalreference system
which locates it with respect to other spatial entities. To have topology a line entity must consist
of an ordered set of points a locus of number points, (known as an arc, segment, or chain) with
a defined start and end points (nodes). Knowledge of the start and end points gives a line
direction. For the creation of topologically correct area entities, the data about the points and
lines used in its construction, and a knowledge of how these are connected to define the
18
BCV654 B Module-2
boundary, are required. The combination of points gives the line entity and the combination of
points and line segments forms an area entity. The two basic types of vector data models are
(i) spaghetti model, and (ii) topological model.
Spaghetti Model
The simplest vector data structure that can be used to reproduce a geographical image in the
computer is a file containing (x, y) coordinate pairs that represent the location of individual
point features. The figure below is essentially a one-for-one translation of the graphical image
or a map which is also termed as the conceptual model.
Let us consider a
conceptual model in
which an analog
.IN
map covering each
graphic object is
C
shown in Figure.
Each graphic object
N
can be represented
SY
with a piece of
spaghetti. Each
piece of spaghetti
U
acts as a single
VT
19
BCV654 B Module-2
Topological Models:
In order to use the data manipulation and analysis subsystem more efficiently and obtain the
desired results, to allow advanced analytical techniques on GIS data and its systematic study
in any project area, much explicit spatial information is to be created. The topological data
model incorporates solutions to some of the frequently used operations in advanced GIS
analytical techniques. This is done by explicitly recording adjacency information into the basic
logical entity in topological data structures, beginning and ending when it contacts or intersects
another line, or when there is a change in the direction of the line.
Each line then has two sets of numbers: a pair of coordinates. and an associated node number.
The node is the intersection of two or more lines, and its number is used to refer to any line to
which it is connected. In addition, each line segment, called a link, has its own identification
.IN
number that is used as a pointer to indicate the set of nodes that represent its beginning and
ending polygon. These links also have identification codes that relate polyg~n numbers to see
which two polygons are adjacent to each other along its length. In fact, the left and right
C
polygon are also stored explicitly, so that even this tedious step is eliminated. This design
N
feature allows the computer to know the actual relationships among all its graphical parts to
identify the spatial relationships contained in an analog map document.
SY
Fundamentally, the topological models available in GIS ensure (a) that no node or line segment
is duplicated, (b) that line segments and nodes can be referenced to more than one polygon,
U
and (c) that all polygons can be adequately represented. Figure below shows one possible
VT
topological data structure for the vector representation. To understand the topological vector
data structure, let us consider a network with 8 nodes encoded as n1 to n8.
20
BCV654 B Module-2
The links joining all these nodes are encoded as 11 to 114 and the polygons created by all these
line segmentsllinks are coded as A 1 to A8. The creation of this structure for complex area
features is carried out in a series of stages. Burrough (1986) identifies these stages as
identifying a boundary network of arcs (the envelope polygon), checking polygons for closure,
and linking arcs into polygons. The area of polygons can then be calculated and unique
identification numbers attached. This identifier would allow nonspatial information to be
linked to a specific polygon.
.IN
C
N
SY
U
VT
21
BCV654 B Module-2
.IN
Depends on pixel size; finer Independent of resolution;
Resolution resolution means more detail but maintains high accuracy at different
larger file size. scales.
C
Lower accuracy, as data is Higher accuracy, as features are
Spatial Accuracy
generalized to fit within a pixel. defined with precise coordinates.
N
Faster for overlay and mathematical Slower for complex spatial analysis
Data Processing
operations (e.g., elevation analysis, but efficient for queries and
Speed
SY
NDVI). topology.
Common formats: GeoTIFF (.tif), Common formats: Shapefile (.shp),
Storage Format
JPEG2000 (.jp2), GRID. GeoJSON (.geojson), KML (.kml).
U
Good for spatial analysis like terrain Best for network analysis,
Analysis Type modeling, land use classification, topology-based operations, and
VT
22
BCV654 B Module-2
.IN
variation since the resolution is fixed.
sparser in flat areas).
Smaller file size since data is stored Larger file size since all cells are
Storage Size
only where needed. stored, even in flat areas.
C
Variable resolution; adapts to Fixed resolution; same grid size
Resolution
terrain complexity. everywhere.
N
Faster for terrain modeling and 3D More efficient for grid-based spatial
Computational
visualization due to adaptive analysis like hydrology and land use
Efficiency
SY
sampling. classification.
Surface Creates a network of triangles that Creates a grid-based 2.5D surface,
Representation can represent terrain in 3D. less smooth than TIN.
U
******
23