INL120 Theme 4 - 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Theme 4

INL ORGANIZING INFORMATION DIGITAL OBJECTS

120 Presented by Ms Christina Firkins

2024/08/20
Semester Test 1
Date: 23 August 2024 (Friday)
Scope: Theme 1 till 3 and practical class notes 1
Types: MCQs, Short Questions & Long Questions
Time: 17:30-19:00 (90 minutes)
Total: 50 marks
Venue: Thuto 1-1: A – Mac
Thuto 1-2: Mad – Z
2
Learning outcomes
After the completion of this theme, you need to be able to
discuss the following concepts:
• Understand the techniques used in capturing
information objects such as Document Image Processing
(DIP).
• Understand the various types of scanners used in
creating digital objects.
• Understand the types of technologies used in DIP.

2
What is DIP? (Document Image Processing)
A system for storing and retrieving information in the form of
bitmapped images of paper document input with a scanner,
rather than in the form of text or numeric files.

• Includes the online storage, retrieval and management of


electronic images of documents.
• The main method of capturing images is by scanning paper
documents.
• It is also the ability to feed documents through a scanner
whereby an electronic image of the document is created
that can be handled as ordinary computer files.
Why is DIP needed?
Preservation
• Preserves objects in such as way that they are accessible and
can be used.
• Benefits of digitising valuable, unique or fragile materials
physical handling threatens historic collections – prevent
them form being damaged when used.
• Nowadays we get digital archives – a place where only
information objects in digital form are kept.
Why is DIP needed?
Access
• Researchers can consult digitised information objects by means of
networks.
• More than one user can use one information object
• Note: use the object, not change or write – version control
• Applications concerning use and manipulation:
oSearch
oSort
oMerge
oStatistical examination
• I can create my own database of paintings by my favourite artists
because the paintings have been digitised and are available on the
Internet.
Advantages of DIP
• Created as representations of other information objects
• Accurate and compact storage or archiving - Document
imaging systems replace large paper-intensive operations.
• Fast and accurate retrieval
• Integration with PCs and networks
• Unique characteristics that other formats of information
objects may not have:
o Dynamic
o Mutable (changeable)
o Non-linear
Advantages of DIP (cont.)
• DIP takes more memory than purely electronic data
processing, but is readily incorporates signatures, drawings,
and photographs.
• It can also be more familiar to users without computer
training.

Strategic advantages
• Improves quality of service.
• Reduces response time.
• Improves management of documents .
• Reduces cost.
Disadvantages of DIP
• Scanning still requires manual handling.
• Face-up contact-free scanning is preferable.
• Digital cameras and overhead scanners are better.
• Effect of lighting on fragile material.
• Image quality may be lower than original
Technology problems
• Costs of technology may be a problem (in smaller institutions).
• Need to customise equipment.
• Level of participation by conservators and user resistance.
• Unrealistic expectations.
• Resulting document may have limited search capabilities.
Capturing document images through
scanning
Optical scanners provide the means to capture the image of
paper documents.

Need to take into consideration the following:


• Optical resolution: expressed in “pixels per inch” (PPI).
• Bit depth/colour depth: higher bit depths capture more
information and therefore impact on overall size of the image.
• Optical density: measures the “brightest bright” and the
“darkest dark” that a piece of equipment can capture.
• Speed and connectivity: important factor to consider for
efficiency.
Types of scanners
1. Flatbed
• Flatbed scanners are the most common.
• Are used for scanning pieces of paper, objects, photo
prints, and other opaque items.
• Some scanning software comes with simple editing tools,
or you can edit your image by yourself.
2. iphone Scanner
• Thanks to a handful of iOS apps, you can scan whatever
paper documents you might need easy access to and store
them on your iPhone, in the cloud, or on your computer
with just a few taps.
• There are applications that will help you quickly scan and
organize documents.
• The scanning process with all of them is nearly identical;
you take a photo of your document, the application tries to
identify the outline of the document, and it's saved within
the application.
Some examples include:
• TurboScan
• Scanner Pro
• Genius Scan
• CamScanner Free
• Prizmo
3. Digital Cameras
• Digital cameras combine a scanner with camera optics to
form a versatile tool that can produce superior quality
images.
• Though slower and more difficult to use than flatbed
scanners, digital cameras are adaptable to a wide array of
documents and objects.
• Most fragile materials can be safely captured, though the
need to provide external lighting means that light damage
may be a concern.
4. Slide Scanner
• Least understood class of digitization equipment.
• Software designed to interpret the film image and convert
the digital data into an image file that resembles the final
original image.
• Therefore, uses colour or black and white negative to image
file.
• Can be slow.
5. Drum Scanner
• Drum scanners work differently than the flatbed scanners.
Flatbed scanners rely on light from a source hitting a
microchip called a CCD (Charged-Coupled Device).
• The drum scanner, instead of a CCD, has a photo multiplier
tube which is a type of vacuum tube that is highly sensitive
to light.
• A beam of light, which can be focused quite small, is then
moved across the image and the photo multiplier tube
picks up the reflection.
• Drum scanners can generally scan any type of film.
• Drum scanners are large, very expensive, and very difficult
to use.
6. Wide Format Scanner
• Some can handle paper sizes up to 100‘’ long and 42" wide,
and thick documents up to ½" (13 mm)
• Can recognize all colors and all gradations to enable
outstandingly sharp grayscale images
• Provides scan-to-net and copying operations.

Examples of use:
• Scanning Blueprints: Capture the finest technical details.
• Scanning Artwork & Posters: capture quality highlight and
shadow detail
• Map & GIS Scans: create high resolution & dimensionally
accurate images.
From scanning to digitization
• A variety of processing steps follow scanning. Such
procedures may occur at any point in the digitization chain,
from immediately after scanning to just prior to delivery to
end-users.
• These may be customized modifications that affect only
certain files, or mass, automated processing of all files
(batch processing).
• They may be one-time operations or done repeatedly on
an as-needed basis.
File/image processing operations
• Editing: Touch-up, enhancement can be built in the
software or separate image-editing tools (e.g. Adobe
Photoshop, Corel Photo-Paint, ImageMagick) can be
utilized.
• Compression: Sometimes carried out by dedicated scanner
firmware or dedicated hardware in the computer.
• Compression can also be a software-only operation though
dedicated hardware is faster and should be considered
when creating very large files or very large numbers of
files.
File/image processing operations (cont.)
• Scaling: Some scans captured at high resolution will not be
suitable for on-screen display. Scaling (resolution reduction
through bit disposal) is often necessary in order to create
images for Web delivery.
• Metadata creation: Addition of text that helps describe or
organize an image for retrieval.
File/image processing operations (cont.)
• File format conversion: The original scan may not be in a
format suitable for all intended uses, thus requiring
conversion.

Document conversion technologies include:


• OCR
• IWR
• ICR
What is OCR?
• Optical Character Recognition, or OCR, is a technology
that enables you to convert different types of documents,
such as scanned paper documents, PDF files or images
captured by a digital camera into editable and searchable
data.
• When a text document is scanned into the computer, it is
turned into a bitmap, which is a picture of the text.
• OCR software analyzes the light and dark areas of the
bitmap in order to identify each alphabetic letter and
numeric digit.
OCR…
• OCR is an instance of off-line character recognition, where
the system recognizes the fixed static shape of the
character.
• The program tries to recognize as many characters as
possible but sometimes mistakes are made.
• OCR system works well with most 20th century and 19th
century typefaces.
• With earlier material or poor reproductions of any type
face, obstacles are encountered.
OCR…
• Anything that disrupts the integrity of the letter’s shape
can be a potential cause of error such as breaks in letters
or ornate italics:
• Numeral 1 is sometimes interpreted as a lower case l or
as !
• d read as cl
• An m as in
• The e as c
OCR…
• OCR (which is static) is sometimes confused with on-line
character recognition which recognizes the dynamic
motion during handwriting.
• On-line character recognition is also referred to by other
terms such as:
• Dynamic character recognition, real-time character
recognition, and/or
• Intelligent Character Recognition (ICR)
Intelligent Character Recognition (ICR)
• It is an advanced optical character recognition (OCR) more
specific it is a handwriting recognition system that allows
fonts and different styles of handwriting to be learned by a
computer during processing to improve accuracy and
recognition levels.
• It allows data capture software to automatically read
information from all types of documents - even
handwritten ones.
Intelligent Character Recognition (cont.)
• Online handwriting recognition involves the automatic
conversion of text as it is written on a special digitizer or
PDA, where a sensor picks up the pen-tip movements as
well as pen-up/pen-down switching.
• That kind of data is known as digital ink and can be
regarded as a dynamic representation of handwriting.
• The obtained signal is converted into letter codes which are
usable within computer and text-processing applications.
Intelligent Character Recognition (cont.)
• The elements of an on-line handwriting recognition
interface typically include:
• A pen or stylus for the user to write with a touch sensitive
surface, which may be integrated with, or adjacent to, an
output display.
• A software application which interprets the movements of
the stylus across the writing surface, translating the
resulting curves into digital text.
• These methods cannot be used in software that scans
paper documents, so accurate recognition of hand-printed
documents is still largely a problem.
Intelligent Word Recognition (IWR)
• IWR recognizes entire handwritten words or phrases
instead of character-by-character, like its predecessor,
OCR.
• IWR can not only recognize and extract printed
handwritten information, but cursive handwriting as
well.

• ICR recognizes on the character-level, whereas IWR


works with full words or phrases.
Intelligent Word Recognition (cont.)
• IWR is not meant to replace conventional ICR and OCR
systems but is optimized for processing real world
documents that contain mostly free-form, hard to
recognize data fields that are inherently unsuitable for
ICR.
• This means that the highest and best use of IWR is to
eliminate a high percentage of the manual entry of
handwritten data and run-on hand print fields on
documents that otherwise could be keyed only by
humans.
https://www.thecrowleycompany.com/ocr-icr-iwr-omg-get-scanned-text/

Additional reading
OCR? ICR? IWR? OMG!
Get the Most from Your
Scanned Text
Link to article:
https://www.thecrowleycompany.com/ocr-icr-iwr-
omg-get-scanned-text/

You might also like