6.Text Processing and Pattern Searching
6.Text Processing and Pattern Searching
6.Text Processing and Pattern Searching
AND PATTERN
SEARCHING
CHAPTER 6
Introduction
•Apart from numeric computations, computers also are well
equipped to perform processing of textual information.
•Computational strategies and algorithms differ significantly
from existing strategies and methods for numerical data.
•Textual processing can be broadly covered by
◦ Manipulation and movement of characters
◦ Searching for patterns/words.
6.1 Text Line Length Adjustment
•Word-oriented approach:
6.1 Text Line Length Adjustment
•How to handle punctuation marks and keep the indentation
of paragraphs in the original text?
•Words are generally separated by one or more spaces and/or
a new line character(s).
•Any punctuation which generally follows on directly at the
end of words need not be distinguished from the words.
6.1 Text Line Length Adjustment
•Read the input character-by-character.
•If the current character is neither a space nor an end-of-line
it can simply be added to the current word array.
•As each word is built, a character count will need to be
made.
6.1 Text Line Length Adjustment
•Use two values wordcnt and linecnt to keep track of the
length of the word and length of the line.
•If the line length limit not exceeded
◦ The current word can be written out directly and prepare to accept
the next word as input.
•Else
6.1 Text Line Length Adjustment
•How to distinguish between multiple blanks embedded in
the text and a new paragraph.
◦ Detect paragraph by making a condition (using a flag): EOL
followed by a space.
•Assume that the input line has only one space separating
words and there are no added spaces at the end of the line.
•Assume that any punctuation in the text either directly
precedes or directly follows words with no intervening
spaces.
6.2: Left and Right Justification of text
•Four possible situations may prevail
•1) the line is already of the correct length (no processing required)
•2) the number of spaces needed to expand the current line to the
required length is equal to the number of spaces already present in
the line.
•3) the number of spaces to be added is greater than the existing
number of spaces in the line
•4) the number of spaces to be added is lesser than the existing
number of spaces in the line.
6.2: Left and Right Justification of text
•Evenly distribute 10 extra spaces to a line which originally had 7 spaces.