GIS Lecture Notes-1
GIS Lecture Notes-1
GIS Lecture Notes-1
GIS stands for 'Geographical(Geospartial) Information System'.
Geography is the study of the Earth's surface and climate, and is the founding
science to GIS
Geographic refers to Earth’s surface and near surface while Spatial refers to
any space (more general).
Geographical / Geospartial is:
o information about places on the earth’s surface
o knowledge about “what is where when” (NB: time is important)
Geography furnishes information about the Earth and distinguishes how
features upon the Earth correlate with one another
o Example: basic geographic study involves how climate and landforms
interrelate with inhabitants, soil, and vegetation.
Data are intimately concerned with the properties of such objects and hold
attributes that can be associated to other types of geodata
Geography as with for other subjects, stipulates the use of ICT to gain access
to additional information sources and to assist in handling, presenting and
analyzing spatial information.
Data representing the real world can be stored and processed so that they can
be presented later in simplified forms to suit specific needs.
Definition of GIS
A (computer) system for capturing, storing, checking, integrating,
manipulating, analyzing, and displaying data which are spatially referenced to
the Earth. This is normally considered to involve a spatially referenced
computer database and appropriate application software.
GIS can also be defined as: Activity by which people do
o Measure aspects of geographic phenomena and processes;
o Represent these measurements, usually in the form of computer
database, to emphasize spatial themes, entities, and relationships;
o Operate upon these representations to produce more measurements
and to discover new relationships by integrating disparate sources; and
o Transform these representations to conform to other entities and
The sources of spatial data could be maps, field surveys, censuses , aerial
photographs and satellite imagery.
The ability of GIS to combine spatial data from different sources and non-
spatial data (attribute data) distinguishes it from other data processing
The data sets vary in format, accuracy, level of detail, reference etc.
If you use a computer or a cell phone, you have probably already used a GIS
in some form without even realising it. Maybe it was a map on a web site,
Google Earth, an information booth or your cell phone telling you where you
General thinking of a GIS
o Maps, Photos taken from aircrafts, Satellite images, ETC
o Generally where, what is there, why? Etc
o How can these be represented digitally in the form of zeros and ones?
o If we can express the contents of a map or image in digital form, the
power of the computer open an enormous range of possibilities for
communication, analysis, modeling, and accurate decision making.
Simply: In GIS, geographic data are transformed into geographic information
Geographic data begins as a raw positional feature data holding attributes
These data are the overlaid with complementary and/or contrasting data sets,
which form coincident relationships
Data and relationships are analyzed, geoprocessed, and presented as
geographic information products
Geography as with for other subjects, stipulates the use of ICT to gain access
to additional information sources and to assist in handling, presenting and
analyzing spatial information.
Data representing the real world can be stored and processed so that they can
be presented later in simplified forms to suit specific needs.
Related Discplines
Cartography and Computer Assisted drafting: computers offer the same
advantages to cartographers that word-processing software offers writers.
Automated techniques are now the rule rather than exception in cartographic
Photogrammetry and Remote Sensing : Aerial photogrammetry, a well –
established technique for cartographic production and geographic analysis, is
now complimented by the use of 'remotely sensed' information gathered by
satellites in outer space. ICT have made both readily available and far easier
to use.
Spatial Statistics: Statistical analysis and modeling of spatial patterns and
processes have long relied on computer technology. Advances in information
technology have made these techniques more widely accessible and have
allowed models to expand in complexity and scale to provide more accurate
depictions of real-world processes
Geographic Information Systems : these systems allow geographers to
collate and analyze information far more readily that is possible with traditional
research techniques.
Benefits of GIS
• improves/enhances the effects of physical/environmental growth
• better management of resources
• adding new value-added services
• perform analysis on spatial and non spatial components
• fast recall of data
• ability for complex analysis
• recalling of non spatial data through object location
• display of information in a different light/view
• multiple scenario in planning can be performed easily
• geospatial data maintained in a standard format
• easy to revise and update
• possible to share and excahange data
• time and cost saved
• lead to better decision making
The data used in a GIS normally has a geographical aspect to it. Take a look
at the following example
The longitude and latitude columns hold geographical data. The disease and
date columns hold non-geographical data. A common feature of GIS is that
they allow you to associate information (non-geographical data) with places
(geographical data).
There are many different ways in which data can be locationally referenced.
The following types of data are handled by a GIS:
o Point: Addresses, elevation spot heights, locations of malls, banks,
cities, volcanoes, etc.
o Line: Contours, geological faults, streets, highways, rivers, etc.
o Areas: Forests, climatic zones, lakes, soil types, land use, nations,
counties, etc.
o Networks: Streets, highways, rivers (which are directed networks, an
extra complication!)
o Tessellations: Census districts, postal codes, electoral boundaries. (A
tessellation completely divides a region into non-overlapping areas.)
o Overlapping regions: Newspaper circulation areas, telephone
Vector data in layers
Most GIS applications group vector features into layers. Each layer
represents a common feature.
Features in a layer have the the same geometry type (e.g. they will all be
points) and the same kinds of attributes (e.g. information about what species a
tree is for a trees layer). For example if you have recorded the positions of all
the footpaths in your school, they will usually be stored together on the
computer hard disk and shown in the GIS as a single layer. This is convenient
because it allows you to hide or show all of the features for that layer in your
GIS application with a single mouse click.
Much greater precision in the definition of objects is possible by defining the
geometric extent of the regions in which they occur. This means that one can
draw far better maps with vector data than with raster data.
Much less space (volume) is required to store all the information, since empty
space on the map can be ignored.
It gives a full topology
Fast retrieval
Fast conversion
Topology between the geometric objects must be explicitly defined, though it
can be done quite efficiently. The file structures required are more complex
than the raster data files, and layer overlay operations can be very complex to
Spatial variability can be represented, using a Triangulated Irregular Network,
but it is still not as effective as the use of regularly gridded data, and
mathematical operations, such as derivatives, on layers or between two or
more layers are all but impossible to perform.
Difficult in overlay
Difficult in updating
Expensive data capture
Very widely used in such fields as computer cartography, analysis of networks,
municipal databases that contain descriptions of building footprints, streets,
Common GIS packages that are vector-oriented include ARC/GIS and
Attribute Data
A feature has a geometry (which determines if it is a point, polyline or
polygon) and attributes (which describe the feature).
Maps come to life when colour and different symbols are used to help you to
tell one type of feature from the next. On the map on the left, you cannot tell
the difference between rivers, roads and contours. However, the map on the
right it is much easier to see the different features (figure below)
House Example - house feature is a polygon (based on the floor plan of the
house), the attributes are roof colour, whether there is a balcony, and the year
the house was built. Note that attributes don't have to be visible things – they
can describe things we know about the feature such as the year it was built.
In a GIS Application, we can represent this feature type in a houses polygon
layer, and the attributes in an attribute table.
The GIS application links the attribute records with the feature geometry so
that you can find records in the table by selecting features on the map, and
find features on the map by selecting features in the table.
• There are many satellites circling the earth and the photographs they take are
a kind of raster data that can be viewed in a GIS.
One important difference between raster and vector data is that if you zoom in
too much on a raster image, it will start to appear 'blocky' (see illustration
below). In fact these blocks are the individual cells of the data grid that makes
up the raster image.
Layer overlays are really simple, since all layers are defined with the same
grid over the region.
Topology is implicitly defined, since the location of each cell relative to all the
others can be easily found.
Easy to overlay and model
Suitable for 3D display
Intrgration of Image data
Automated data capture
Large data volume
If you want to increase the resolution (that is, decrease the cell size) by a
factor of two, the data set size will quadruple! In order to reduce this problem,
various compression techniques, such as quadtrees and run-length encoding,
are employed.
Resolution is also problematic because the discretization process has an
effect analogous to rounding of numbers, but in a spatial sense -- that is, what
you see in the raster image is usually larger or smaller than the real-world
equivalent. Objects smaller than one cell may not appear at all!
Low precision
Difficult in network analysis
Slow conversion
All satellite and aerial photograph data come in raster form. Each pixel
represents the amount of light received by the sensor at a particular
wavelength at the location.
All satellites collect data from more than one wavelength, so a particular
satellite pass will create an instant multilayer raster map of an area, as well as
business for the data storage industry.
Common GIS packages using the raster model are GRASS and IDRISI.
Raster data are best used for representing variables that vary continuously in
space, such as elevations.
Georeferencing is the process of defining exactly where on the earth's surface
an image or raster dataset was created.
This positional information is stored with the digital version of the aerial photo.
When the GIS application opens the photo, it uses the positional information to
ensure that the photo appears in the correct place on the map.
Normally this positional information consists of a coordinate for the top left
pixel in the image, the size of each pixel in the X direction, the size of each
pixel in the Y direction, and the amount (if any) by which the image is rotated.
With these few pieces of information, the GIS application can ensure that
raster data are displayed in the correct place.
The georeferencing information for a raster is often provided in a small text file
accompanying the raster.
Concepts of Georeferencing
The need integrate and combine data sets acquired using different
techniques, having different references necessitates referencig to one system
to enable effective manipulationof such data
Georeferencing involves definitions, physical/geometric constructs. And the
tools required to describe the geometry and motions of objects near or on the
earth,s surface.
The map legend in most cases contain the following information:
o Name of the local vertical datum e.g Tide Gauge Mombasa
o Name of local horizontal datum e.g Potsdam
o Name of the reference ellipsoid and fundamental point e.g Bassel
o Types of co-ordinates associated with map grid lines e.g geographic co-
ordinates, plane co-ordinates.
o Map projection e.g Universal Tranverse Mercator
o Map Scale e.g 1:25,000
o Transformation parameters e.g from global datum to a horizontal local
Sometimes raster data are created from vector data because the data owners
want to share the data in an easy to use format. For example, a company with
road, rail, cadastral and other vector datasets may choose to generate a raster
version of these datasets so that employees can view these datasets in a web
browser. This is normally only useful if the attributes, that users need to be
aware of, can be represented on the map with labels or symbology.
Spatial resolution
The size of pixels in a raster determines its spatial resolution.
Spatial resolution becomes apparent when you look at an image at a small
scale and then zoom in to a large scale (it appears ‘blocky’).
Several factors determine the spatial resolution of an image.
o For remote sensing data, spatial resolution is usually determined by the
capabilities of the sensor used to take an image. For example SPOT5
satellites can take images where each pixel is 10m x 10m. Other
satellites, for example MODIS take images only at 500m x 500m per
pixel. In aerial photography, pixel sizes of 50cm x 50cm are not
o Images with a pixel size covering a small area are called 'high
resolution' images because it is possible to make out a high degree of
detail in the image. Images with a pixel size covering a large area are
called 'low resolution' images because the amount of detail the
images show is low.
o In raster data that is computed by spatial analysis, the spatial density of
information used to create the raster will usually determine the spatial
resolution. For example if you want to create a high resolution average
rainfall map, you would ideally need many weather stations in close
proximity to each other.
Spectral Resolution
o When a colour photograph is taken with a digital camera (or cellphone
camera), the camera uses electronic sensors to detect red, green and blue
light. When the picture is displayed on a screen or printed out, the red, green
and blue (RGB) information is combined to show the image. This RGB
information (digital format) is stored in separate colour bands. RGB is the
visible spectrum.
Raster images can contain one or more bands, each covering the same
spatial area, but containing different information.
When raster data contains bands from different parts of the electromagnetic
spectrum (including the non-visble parts of the spectrum), they are called
multi-spectral images.
Other forms of data acquisition platforms include:
o Multi-stage – use of multiple platforms and multiple attitudes. Include
space borne sensors (geostationary orbit -36,000km , and near polar
orbit-600 to 1000km)
o Multi-temporal – use of differnt dates and times. These are airborne
high attitude (3 to 10km)
o Multi-sensors – use of different sensors (airborne low attitude – 300m
to 3km)
o Multi-spectral – use of different spetral bands. Use of airborne sensors
(ultra light airplane – 100 to 300m) and ground observations (close
range remote sensing - 1 to 5m and sensing in situ)
In GIS recording the non-visible parts of the spectrum can be very useful. For
example, measuring infra-red light can be useful in identifying water bodies.
Each band in the image is like a separate layer. The GIS will combine three of
the bands and show them as red, green and blue so that the human eye can
see them. The number of bands in a raster image is referred to as its spectral
Images with a single band are called grayscale images. With grayscale
images, you can apply false colouring to make the differences in values in the
pixels more obvious. Images with false colouring applied are often referred to
as pseudocolour images.
Raster images can consume a large amount of storage space.
Real world
Raster and vector representation
Object-oriented Models
Also called semantic models, object-oriented models organize geographic
objects into different classes, on both a general level and to more specific
The more specific classes inherit certain properties from their "parent" class.
For example, a class called "wetland" could be a parent class of "bog",
"marsh", "swamp", and "lake". Each of the subclasses would inherit properties
such as area, perimeter, and streams that drain into it, from the parent class.
All data pertaining to each object are encapsulated within the definition of the
object, which protect them better from external tampering.
Objects are a more natural way of looking at spatial data and are easier to
They are quite complicated to set up, and the theory behind them is rather
difficult for the novice to get a grip on.
It expresses the spatial relationships between connecting or adjacent vector
features (points, polylines and polygons) in a GIS.
Topological or topology-based data are useful for detecting and correcting
digitising errors (e.g. two lines in a roads vector layer that do not meet
perfectly at an intersection).
Topology is necessary for carrying out some types of spatial
analysis, such as network analysis.
Topology Errors
Include error like slivers, overshoots and undershoots.
Topological errors break the relationship between features.
These errors need to be fixed in order to be able to analyse vector data with
procedures like network analysis (e.g. finding the best route across a road
network) or measurement (e.g. finding out the length of a river).
With topology errors, even when accurate measurement tools are used, the
results you will be incorrect.
Topology can be used to detect and correct digitizing errors.
GIS application provide topological tools such as tools for topological editing,
prevention of overlaps etc
Buffering usually creates two areas: one area that is within a specified
distance to selected real world features and the other area that is beyond.
The area that is within the specified distance is called the buffer zone.
A buffer zone is any area that serves the purpose of keeping real world
features distant from one another. Buffer zones are often set up to protect the
environment, protect residential and commercial zones from industrial
accidents or natural disasters, or to prevent violence.
Common buffer zones may be greenbelts between residential and
commercial areas, border zones between countries, noise protection zones
around airports, or pollution protection zones along rivers.
In GIS, buffer zones are always represented as vector polygons enclosing
other polygon, line or point features
Variations in buffering:
The buffer distance or buffer size can vary according to numerical values
provided in the vector layer attribute table for each feature.
The numerical values have to be defined in map units according to the
Coordinate Reference System (CRS) used with the data. For example, the
width of a buffer zone along the banks of a river can vary depending on the
intensity of the adjacent land use. For intensive cultivation the buffer distance
may be bigger than for organic farming.
Buffers around polyline features, such as rivers or roads, do not have to be on
both sides of the lines.
Buffer zones with dissolved (left) and with intact boundaries (right) showing
Spatial Overlay
Is a process that allows you to identify the relationships between two polygon
features that share all or part of the same area.
The output vector layer is a combination of the input features (rectangle and
circle) information (see Illustration below).
Spatial overlay with two input vector layers (a_input = rectangle, b_input = circle).
The resulting vector layer is displayed green.
Spatial interpolation
Is the process of using points with known values to estimate values at other
unknown points.
For example, to make a precipitation (rainfall) map for your country, you will
not find enough evenly spread weather stations to cover the entire region.
Spatial interpolation can estimate the temperatures at locations without
recorded data by using known temperature readings at nearby weather
The following illustration shows a temperature map interpolated from South
African weather stations:
This type of interpolated surface is often called a statistical surface
Other types of data that can be computed using interpolation include elevation
data, precipitation, snow accumulation, water table and population density.
There are many interpolation methods. Some two widely used interpolation
methods are Inverse Distance Weighting (IDW) and Triangulated Irregular
Networks (TIN).
The greater the weighting coefficient, the less the effect points will have if they
are far from the unknown point. As the coefficient increases, the value of the
unknown point approaches the value of the nearest observational point.
Some disadvantages of IDW interpolation method include:
o The quality of the interpolation result can decrease, if the distribution of
sample data points is uneven.
o Maximum and minimum values in the interpolated surface can only
occur at sample data points. This often results in small peaks and pits
around the sample data points(Illustration above).
Delaunay triangulation with circumcircles around the red sample data. The resulting
interpolated TIN surface created from elevation vector points is shown on the right.
There is no single interpolation method that can be applied to all situations. Selection
of a particular interpolation method should depend upon the sample data, the type of
surfaces to be generated and tolerance of estimation errors.
o Direct data acquistion – use of survey techniques and remotely sensed
o Indirect data acquistion – previous data collected can be used and
colud be in form of paper maps or available digital data.
Some advances in Primary data acquistion (direct)
o Electronic Plane Surveying Systems
o Mobile Mapping Systems
o Airborne Laser Scanner Systems
o Airborne three Line Scanners
Data Acquisition methods
Manual digitizing
The human operator folows features using a mouse device to tracing points
that constitute point features, line features or area features, by storing co-
ordinate locations relative to some pre-defined control points.
These control points orient the digitizing feature.
There are two ways of manual digitzing namely:
o On-tablet digitizing – placing the manuscript on special tablet and using
a tablet mouse follow the feature of interest and recording their
o On-Screen digitizing – using a scanned map or image and displaying it
on the screen and using a normal mouse, the features of interest are
picked and their locations recorded
There are two modes of point recording
o Point mode – computer records point locations when operator promts it.
o Screen mode – continously records point locations as they are being
Typical digitizing Errors
o Undershoots (gaps)
o Overshoots
o Spikes
o Duplicates (duplicate lines)
o Disconnections
o Tracing errors
Data Editing
Preprocessing done before data is used in order to remove errors or blunders
For images, this may involve enhancement, post clarification analysis etc.
For vector data, it involves editing of overshoots, undershoots, duplicate lines,
atribute data etc.
The target areas of data corrections are basically geometric, topological and
Remote Sensing
Acquiring information about the earth’s surface without being in contact with it.
Done by sensing and recording reflected or emitted energy and processing,
analysing, and applying that information
Remote sensing Process
o Energy source (sun) – to provide electromagnetic energy to the target
of interest.
o Radiation and the atmosphere – energy travels from source to target
through the atmosphere (some interaction takes place)
o Interaction with target – Interaction takes place between target and
the radiation and depends on properties of both.
o Recording of energy by Sensor – Emited energy by the target is
recorded by the sensor (on satellite or airplane)
o Transmission, Reception and Processing – energy recorded bys
ensor transmitted (in electronic form) to a receiving and processing
station where data is processed to an image (hardcopy or digital)
o Application – information extracted from the imagery is applied in
order to understand the target, reveal seme information, or assist in
solving some particular problem.
Different types of remote sensing exist:
o Visible and reflective Infra Red RS
o Thermal RS
o Microwave RS
They depend on the electromagnetic radiation
Various optical satellite sensors include: Landsat (channel 7), SPOT(4), RS-
1C/1D(4), NOAA AVHRR(6), MODIS(36)
Spectural Reflectance
Given a certain surface composed of certain material, the enrgy reaching this
surface is called irradiation whereas the energy reflected by the surface is
called radiance. Both are expressed in W/m2.
Preprocessing of RS Data
Sometimes called image restoration and rectification
Intended to correct for sensor- and platform-specific radiometric and geometric
distortions of data
Radiometric corrections:
o Due to variations in scene illumination and viewing geometry,
atmospheric conditions, and sensor noise and response
o Will vary depending on specific sensor and platfrom used to acquire tha
data and the conditions during data acquisition
Geometric corrections:
o Due to several factors including: the perspective of the sensor optics,
the motion of the scanning system, the motion of the platform, the
plaform altitude, attitude, and velocity; the terrain relief; and the
curvature and rotaion of the earth.
Image enhancement
Is conversion of the image quality to a better and more understandable level
for feature extraction or image interpretation.
o Contrast manipulation
o Spatial feature manipluation –they sharpen the appearance of fine
detail in an image
o Multi-image manipulation – objective is to reduce the dimesionality of
the bands thus maximizing the amount of information from the orginal
data into the least number of new components
Database Models
o Data organized in a tree structure
o The links between records use Parent-Child relationship
o Only supports 1:m mappings
o Any record can link to any other
o Child can have more than one parent
o Suports m:n mappings
The ones available for GIS are Relational, Object Oriented and Object-
Relational models
Relational model
o Data organised in tables (rows-records and columns – data fields)
o The tables are related using links called relationships
o Relational database require to be normalized in order to reduce
Object oriented model
o OO model initially designed to address some of the weaknesses of
relational DBMS
Inability to store complete objects
Were never designed with rich data types such as geographic
objects, sound, video
Poor performance for many types of geographcal query
Was difficult to extend RDBMS to support geograhic data types
and pocessing functions
o An object is self-contained package of information describing the
characteristics and capabilities of thae entity under study
o An interaction between two objects is called a relationship
o In a geographic object data model, real world is modeled as a collection
of objects and their relationships
o A class is a collection of objects of the same type. Important in the
implementaion point of view (template for objects)
o Object data models are good for modelling geographic systems
because they support:
Encapsulation – each object packages together a description of
its state (properties) and behaviour (methods or oprtaions
performed on the object)
Inheritance – ability to reuse some or all of the characteristics of
an object in another.
Polymorphism – Each object has its specific implementation for
operations like draw, create, delete
o OODBMS stores data persistently (semi-permanently on disk and other
media) and provide object oriented query tools
o Examples of commercial OODBMS developed
Gemstone/S object server from Gemstone Systems Inc.
ObjectivityDB from Objectivity Inc
ObjectStore from Progress Software
Versant from Versant Object Technology Cor.
o OODBMS has not proved to be commercially sucessful because of the
massively installed RDBMSs. Furthermore, most important ODBMS
capabilities have been added to the standard RDBMS to create hybrid
object-relational DBMS.
o Example of OODBMS
Advantages of OODBMS
o Not necessary to know the inner workings af an object
o Allows complex representation of the real world
o Supports generalization, aggregation and association
o Maintains the history in the database
o Integrates with simulation modelling techniques
o Multiple simulataneous updating
o Intutive feel of objects
o Suits complex data relationships
o Fewer bugs and lower maintenance costs with GIS
o Ensures high level of data integrity
o Complex models more difficult ro design
o Import and exchange with other types is difficult
o Some applications amy not access an object oriented model
o Slow to execute
o Difficult description of the natural world
o Special languages are required for OODBMS
Object-Relational model
o It is extended by software that incorporates object oriented behaviour
but the data is not encapsulated
o Database information is still in tables but some columns may contain
richer data types called abstract data types
o Can be thought of as an RDBMS engine with a framework for handling
objects, and these can be managed and stored together as an
integrated whole
o Examples of RDBMS include:
Informaix Dynamic Server
Microsoft SQL Server
o Advantages
Fast execution
Uniform repository of geographic data
More accurate data entry and editing
High data integrity
Enable to work with more intuitive data objects
Simultaneous data editing
Less need for programming applications to model complex
o Disadvantages
Compromise between object-oriented and relational database
No data encapsulation
Limited support for object relationships
Difficult to model complex relationships
o Query language – a language to handle geographic types (e.g points,
lines and poplygons) and functions (e.g select that touch with each
o Indexing services – the standard uni-dimensional DBMS is extended to
support multi-dimensional (x,y,z coordinate) geographic data
Integration of CD-R and CD-W technologies to allow data access to and from
the banks of media using multiple drives within the same physical unit
ODBC (Open Database Connectivity) – interface that allows appplications to
access data from DBMS
o It defines low-level set of calls which allow client applications and server
applications to exchange instructions and share data without need to
know anything about each other.
o ODBC defines the following:
A library of ODBC calls that allow applications to connect to
DBMS, execute SQL statements, and retrieve results
Standard to connect and log on to a DBMS
Standard representation of data types
Data Integrity – Information from GIS is meant to support and reduce
uncertainty in decision making.
o No organization can be sufficient in geo-spatial data, thus sharing is
o Hence, information about the data being shared should be provided in
addition to location and format so that the user knows the usability of
the data
Data Quality – this is the ability to fulfill a given and impled requirement, fitness
for use. Within the framework of Geo-spatial data,causes of error result from:
o Attribute errors i.e classification and labelling
o Positional errors in terms of loacton and height
o Lineage i.e history of data
o Temporal accuracy
o Competeness
o Logical consistency
Error management – different type of errors use to be dealt with by different
o Positional and height errors – surveyors and photogrammetrists
o Geometric and semantic errors – cartographers
o Equipments introduce error during observation
Human errors during observation and booking
Instrumental erros due to imperfect adjustment of the equipment
Error as a result of natural variations in phenomena unedr
o Error propagation treatment in geo-spatial data
Testing accuracy of each state by measument against the real
Modelling error propagation either analytically or by simulation