Unit 01 - PART 2
Unit 01 - PART 2
Unit 01 - PART 2
Note that eof retains its use as a marker for the end of the entire input. Any eof that
appears other than at the end of a buffer means that the input is at an end.
Code to advance forward pointer:
Procedure LookAheadwithSentinel
begin
forward : = forward + 1;
if forward ↑ = eof then
begin
if forward at end of first half then
begin
reload second half;
forward := forward +1
end
else if forward at end of
second half then
begin
reload first half;
move forward to beginning of first half
end
else
terminate lexical analysis
end if
end Procedure LookAheadwithSentinel
Advantages
Most of the time, It performs only one
test to see whether forward pointer
points to an eof.
Only when it reaches the end of the
buffer half or eof, it performs more
tests.
The average number of tests per input
character is very close to 1.
Topic
Specification of Tokens
Introduction
To specify the tokens regular expression are used. When a pattern is matched by some
regular expression then token can be recognized.
Regular expressions are used to specify the patterns. Each pattern matches a set of
strings.
There are 3 specifications of tokens: 1) Strings 2) Language 3) Regular expression
Strings And Languages
An alphabet or character class is a finite set of symbols. Symbols are the collection of
letters and characters.
A string over an alphabet is a finite sequence of symbols drawn from that alphabet.
A language is any countable set of strings over some fixed alphabet.
The length of a string S, usually written as |S|, is the number of occurrences of symbols
in S.
The empty string, denoted ε, is the string of length zero