Lesson 4. Spatial Data Input and Editing

GIS 205 – GIS and Remote Sensing

Spatial Data Input and Editing
Collecting data and creating a GIS database is a time consuming but an important
task. There are many sources of geographic data and many ways to enter that data
into a GIS. A data pool can be generated by either data capture or data transfer.
The data sources are divided into following two main classes: (1) Primary data and
(2) Secondary data.

Learning Outcomes
Upon completion of this lesson, the students will be able to:
1. Differentiate primary from secondary data.
2. Discuss the different data errors.

Please refer to the attached activity.
Activity No. 6 Working with Raster Data
Activity No. 7 Determination of Flood Prone Areas

1. What are the common problems faced when obtaining data from secondary
2. What are the three main types of data error?
3. Will a repeated generalisation make the boundary of a polygon more precise?

A. Primary Data

It involves direct measurement of objects and phenomena. Given below is the

partial list of primary data:

 Remote sensing data capture: Remote sensing refers to the technique of

deriving the information about the objects without getting in physical contact
with them. The information is derived from the measurements of the amount
of electromagnetic (EM) radiations reflected, emitted or scattered from the
objects under observation. The response is measured /captured by the
sensors deployed in air or in space. The remote sensing data is often talked
in terms of spatial, spectral and temporal resolutions.
o Spatial resolution: It refers to the size of the object that can be
resolved and is the measure of the pixel size.

o Spectral resolution: It refers to the wavelengths of the EM spectrum

in which response of the objects is captured.
o Temporal resolution: It refers to the frequency with which data is
captured for the same area.

Aerial photographic data is as important as remote sensing data for a GIS

project. Though both aerial photographs and remote sensing images are
technically similar, they have few differences as well. The most notable
difference is that aerial photographs are captured using analog optical
cameras and are then rasterized by scanning a film negative. Now a days
digital cameras are being used for aerial photography. The aerial
photographs are suitable for surveying and mapping projects.

Both satellite images and aerial photographs can provide stereo imagery
from overlapping pairs of images i.e. they can generate a three-dimensional
model of the earth’s surface. The other advantages include global coverage
and repetitive monitoring that make these datasets useful for large area
projects and short time events.

 Surveying: Ground surveying is based on the principle of determining the

3D location of a point with the help of angles and distance measured from
other known points. Survey starts from a benchmark position. The location
of all surveyed points is relative to other points. The traditional surveying
involves the use of transits, theodolites, chains and tapes for angle and
distance measurement. These days, electro-optical devices called total
stations measure both angles as well distance to an accuracy of 1mm.
Surveying is a time and resource consuming activity but is the best way of
obtaining accurate geographical data.

Since it is not practically possible as well as worthwhile to observe the value
of a variable at every point throughout the study area we adopt the strategy
of sampling. Using sampling we measure subsets of the features in the area
that best capture the spatial variation of the concerned attribute over the
study area. The following five patterns options may be considered for

a.Simple random
This method ensures that all parts of the project area
have an equal chance of being sampled. Project area
is divided into a grid with numbered coordinates. A
random site is picked by selecting coordinate pairs
from a number table and plotting those on the project
area map. Each random site is a sample point.
Figure 25. Simple random

Advantage  Operator bias is minimal

Disadvantage  Classes with small areas may be inadequately sampled

or missed entirely.
 Some of the sample points may be inaccessible on

b. Stratified random
It maintains randomness and at the same time
overcomes the chance of an uneven distribution of
points among the map classes. Specific numbers of
sample points are assigned to each class with respect
to its size and significance for the project. Within a
class the random sites are generated in the same way
as in simple random pattern. Figure 26. Stratified
random pattern

c. Systematic
It arranges sample points at equidistant intervals
thus forming a grid. Orientation of the grid is chosen

Figure 27. Systematic pattern

Disadvantage  Randomness is not achieved because position of every

sample point is determined by the choice of the starting

d. Systematic unaligned
It distributes the project area into a grid and assigns
the positions of sample points randomly within the grid

Figure 28. Systematic unaligned


Advantage  All parts of the project area are sampled.

 Randomness is maintained within the grid cells.

e. Clustered
In this method, nodal points are the centers for
clusters of sample points. The nodal locations are
selected randomly, stratified by classes, or by
identification of accessible sites.

Figure 29. Clustered pattern

Advantage  In terrain with poor access, the operator can make the
most of accessible sites.

 Field time is greatly reduced as lesser sites are to be


 GPS (Global Positioning System): GPS is a collection of 27 NAVSTAR

satellites orbiting the earth at a height of 12, 500 miles or 20, 200 km. It was
originally funded by US Department of Defense and was only used for
military purposes but, in the year 2000 it was opened for civilians as well.
GPS works on distance and time principal. The GPS satellites transmit radio
signals that indicate their exact position in space. The receiver measures
the time taken by signal to reach the receiver. Similarly, distance from three
or more satellites helps in triangulating the position of the receiver on the
earth’s surface. As soon as the signal from fourth satellite is received,
elevation information is also derived. GPS has led to the development of
hundreds of applications affecting every aspect of modern day to day life.
Farming, mining, construction, logistics, communication, power etc. are
some of the sectors that have started depending heavily on GPS.

B. Secondary Data

Secondary data refers to the data obtained from maps, hardcopy documents etc.
Some of the methods to capture secondary data are as follows:
 Scanned data: A scanner is used to convert analog source map or
document into digital images by scanning successive lines across a map or
document and recording the amount of light reflected from the data source.
Documents such as building plans, CAD drawings, images and maps are
scanned prior to vectorization. Scanning helps in reducing wear and tear;
improves access and provides integrated storage.
There are three different types of scanner that are widely used:
o Flatbed scanner
o Rotating drum scanner
o Large format feed scanner

Flatbed scanner is a PC peripheral which is small and comparatively inaccurate.

The rotating drum scanners are accurate but they tend to be slow and expensive.
Large format feed scanner are the most suitable type for inputting GIS data as they
are cheap, quick and accurate.

 Digitization: Digitizing is the process of interpreting and converting paper

map or image data to vector digital data.

I. Heads down digitization

Digitizers are used to capture data from hardcopy maps. Heads down
digitization is done on a digitizing table using a magnetic pen known as
Puck. The position of a cursor or puck is detected when passed over a table
inlaid with a fine mesh of wires. The function of a digitizer is to input correctly
the coordinates of the points and the lines. Digitization can be done in two
1) Point mode: In this mode, digitization is started by placing a point that
marks the beginning of the feature to be digitized and after that more
points are added to trace the particular feature (line or a polygon).
The number of points to be added to trace the feature and the space
interval between two consecutive points are decided by the operator.
2) Stream mode: In stream digitizing, the cursor is placed at the
beginning of the feature, a command is then sent to the computer to
place the points at either equal or unequal intervals as per the
position of the cursor moving over the image of the feature.

II. Heads-up digitization

This method uses scanned copy of the map or image and digitization is
done on the screen of the computer monitor. The scanned map lays vertical
which can be viewed without bending the head down and therefore is called
as heads up digitization. Semi-automatic and automatic methods of
digitizing require post processing but saves lot of time and resources
compared to manual method and is described in the following section.

Vectorization is the process of converting a raster image into a vector image. It is
a faster way of creating the vector data from raster data. Automatic vectorization
is performed in either batch or interactive mode. Batch vectorization takes one
raster file and converts it into vector objects in a single operation. Post vectorization
editing is required to remove the errors. In interactive vectorization software is used
to automate digitizing. The operator snaps the cursor to a pixel and indicates the
direction in which line is to be digitized. The software then automatically digitizes
the line. The operator can decide various parameters such as density of points,
whether to pause at junction for operator’s intervention or to trace in a specific
direction etc. Though the process involves labor it produces high quality data and
greater productivity than the manual digitization.

Photogrammetry: It is the science of making measurements from aerial

photographs and images. Apart from the 2D measurement from a single
photograph, photogrammetry is also used for making 3D measurements from
models made using stereo pairs of photographs. To make a 3D model, there must
be 60% overlap along each flight line and 30% overlap between flight lines. The
measurements from overlapping pairs of photographs are captured using
stereoplotters. These build a model and allow 3D measurements to be captured,
edited, stored and plotted. One can extract vector objects from 3D model in a way
similar to the above discussed digitization.

Obtaining Data from external sources : Creating the same dataset multiple times
for the same area is a time and resource intensive process. One can always import
data from data repositories. Some of these are freely available while others are
available at a price. Internet is the best way to search geographic data. The internet
gives information about geographic data catalogs and vendors. National agencies
of a state/country also disseminate geographic data through their web portals or
through other digital media on demand made by the users.

C. Data Editing

Errors affect the quality of GIS data. Once the data is collected, and prepared for
visualization and analysis it must be checked for errors.
Burrough (1986) divided the sources of error into the following categories:
1. Common sources of error
2. Errors resulting from original measurements
3. Errors arising through processing

Common sources of error

 Old data sources: The data sources used for a GIS project may be too old
to use. Data collected in past may not be acceptable for current time

 Lack of data: The data for a given area may be incomplete or entirely
lacking. For example the land-use map for border regions may not be

 Map scale: The details shown on a map depend on the scale used. Maps
or data of the appropriate scale at which details are required, must be used
for the project. Use of wrong scale would make the analysis erroneous.

 Observation: High density of observations in an area increases the

reliability of the data. Insufficient observations may not provide the level of
resolution required for adequate spatial analysis as expected from the

Errors resulting from original measurements

 Positional accuracy: Representing correct positions of geographic
features on map depend upon the data being used. Biased field work,
improper digitization and scanning errors result in accuracies in GIS

 Content accuracy: Maps must be labeled correctly. An incorrect labeling

can introduce errors which may go unnoticed by the user. Any omission
from map or spatial database may result in inaccurate analysis.

Errors arising through processing

 Numerical errors: Different computers have different capabilities for
mathematical operations. Computer processing errors occur in rounding off
operations and are subject to the inherent limits of number manipulation by
the processor.

 Topological errors: Data is subject to variation. Errors such as dangles,

slivers, overlap etc are found to be present in the GIS data layers.
o Dangle: An arc is said to be a dangling arc if either it is not connected
to another arc properly (undershoot) or is digitized past its
intersection with another arc (overshoot).

o Sliver: It refers to the gap which is created between the two polygons
when snapping is not considered while creating those polygons.

These errors can be corrected using the constraints or the rules which are
defined for the layers. Topology rules define the permissible spatial
relationships between features. To know the rules read Topology Rules.

 Digitizing and geocoding: Many errors arise at the time of digitization,

geocoding, overlaying or rasterizing. The errors associated with damaged
source maps and error while digitizing can be corrected by comparing
original maps with digitized versions.

Raster data editing is concerned with correcting the specific contents of raster
images than their general geometric characteristics. The objective of the editing is
to produce an image suitable for raster geoprocessing. Following editing functions
are mostly used for raster data editing:

 Filling holes and gaps: To fill holes and gaps that appear in the raster image

 Edge smoothing: To remove or fill single pixel irregularities in the foreground

pixels and background pixels along lines

 Deskewing: To rotate the image by a small angle so that it is aligned

orthogonally to the x and y axes of the computer screen

 Filtering: To remove speckles or the random high or low valued pixels in the

 Clipping and delete: To create a subset of an image or to remove unwanted


Vector data editing is a post digitizing process that ensures that the data is free
from errors. It suggests that

 Lines intersect properly without having any undershoots or overshoots

 Nodes are created at all points where lines intersect

 All polygons are closed and each of them contain a label point

 Topology of the layer is built

You have finished with the common sources of spatial data and the common errors
in mapping with GIS. Always remember that in mapping, data coming from a
reliable source is a PRIORITY AND REQUIREMENT. In the next lesson, we will
understand the importance of setting the correct coordinate reference system to
get a correct measurement in our maps. If you are ready, then let’s go!

