2023 Survey of Programing Lang CSC 301
2023 Survey of Programing Lang CSC 301
LECTURE NOTE
COMPUTER LANGUAGES
Introduction
A computer can only do what a programmer asks it to do. To perform a particular task
programmer writes a sequence, called the program. An instruction command given to
the computer to perform a certain specified operation on the given data.
Now as we know only human languages and computer knows only machine language,
we need some media through which we can communicate with the computer. So we
can complete our desired task. That media is Language.
Languages
Languages are tools human can use to communicate with the hardware of a computer
system.Each language has a systematic method of using symbols of that language. In
English, this method is given by the rules of grammar.
Similarly, the symbols of particular one computer language must also be used as per set
of rules which are known as the “Syntax” of that language, the language which you are
using.
Computer Languages can be classified into three broad categories:
The 1st part is the operation code which tells the computer what function to be
performed.
The 2nd part is the operand which tells the computer where to find & store data
to be manipulated.
So each instruction tells the computer what operation to perform & the length &
location of the data field which are involved in the operation.
Advantages of Machine Language
1
Programs can be executed immediately upon completion because it doesn’t
require any translation.
Programmer has complete control over the performance of the hardware.
ASSEMBLY LANGUAGE.
Assembly language is a language which allows instruction & storage location to be
represented by letters & symbols, instead of number.
A program written in an assembly language is called assembly language program
or symbolic program. Assembly language was introduced in 1952.
Machine language was tedious to code and errors were expected to arise in bulk.
To solve these problems mnemonic codes and symbolic addresses were
developed.
It allows using alphanumeric mnemonic codes instead of numeric code for the
instructions in instruction set. For example using ADD instead of 1110 or 14 to
add.
The storage locations are to be represented in the form of alphanumeric
addresses instead of numeric address.
Format of assembly language is similar to machine language:
2
Assembler is a special program (translator) which translates symbolic operation
codes into machine codes, and symbolic address is addressed into an actual
machine address.
Compiler
Compiler is a special program (translator) which translates high level programs into
machine codes.
Advantages:
3
Machine independent.
Easier to learn, use and understand.
Easier to correct error.
Easier to maintain.
Less time & efforts.
Easily relocatable.
Program preparation cost is low.
Few errors.
Disadvantages:
Less flexible.
Lower efficiency.
Require more time & storage space.
Assembler
A computer can directly execute only machine language programs so the assembly
language program must be converted into its equivalent machine language program
before can be executed. This translation is done with the help of a translator program
which is known as assembler.
Assembler is a special program (translator) which translates symbolic operation codes
into machine codes, and symbolic address is addressed into an actual machine address.
As shown above the input to assembler is the assembly language program (source
program) and the output is the machine language program (object program). Assembler
translates each assembly language instruction into equivalent machine language
instruction. There isone to one correspondencebetween the assembly language
instructions of source program & the machine language instruction of its equivalent
object program. In case of assembly language program the computer not only has to
run the program but also must first run assembler program to translate the original
assembly language program into machine language program. So the computer has to
spend more time in getting desired answer.
Compiler
A computer can directly execute only machine language programs. So the high level
language program must be converted into its equivalent machine language program
before can be executed. This translation is done with the help of a translator program
which is known as compiler.
4
A compiler is a translator program which translates a high level language program into
equivalent machine language program. The process of translating is shown in below:
As shown in figure above the input to compiler is the high level language program
(source program) and the output is the machine language program (object program).
High level language instructions are macro instructions.
The compiler translates each high level language instruction into set of machine
language instructions rather than a single machine language instruction.
There is one to many correspondence between high level language instructions
of source program into equivalent object program.
During the translation the source program is only translates not executed.
A compiler can translates only those source programs which have written in the
language for which compiler is designed.
A compiler can also detect & indicates the syntax errors during the compilation
process but cannot able to detect logical errors.
Interpreter.
An interpreter is another type of translator which is used for translating program written
using high level languages.
It takes one statement of high level language, translates into machine language &
immediately executestheresulting machine language instructions.
The main difference between compiler & interpreter is that compiler can
translates the entire code but not involve in execution.
As shown in above the input to an interpreter is a source program & the output is the
result of an execution program.
Interpreter translates & executes a high level language program statement-by
statement.
A program statement is reinterpreted every time it is encountered during
program execution.
5
The main advantage of interpreter is that interpreter makes it easier & faster to
correct programs.
The main disadvantage is that interpreter is slower than compilers when running
a finished program.
CONCEPT OF PROGRAMMING LANGUAGES
Most computer programming languages were inspired by or built upon concepts from previous
computer programming languages. Today, while older languages still serve as a strong
foundation for new ones, newer computer programming languages make programmers’ work
simpler.
In the beginning, Charles Babbage’s difference engine could only be made to execute tasks by
changing the gears which executed the calculations. Thus, the earliest form of a computer
language was physical motion. Eventually, physical motion was replaced by electrical signals
when the US Government built the Electronic Numerical Integrator and Computer (ENIAC) in
1942. It followed many of the same principles of Babbage’s engine and hence, could only be
“programmed” by presetting switches and rewiring the entire system for each new “program”
or calculation. This process proved to be very tedious.
6
series of symbols that serves as a bridge that allow humans to translate our thoughts into
instructions computers can understand. Human and machines process information differently,
and programming languages are the key to bridging the gap between people and computers
therefore
The five generations of computers are characterized by the electrical current flowing through
the processing mechanisms listed below:
Advantages :
1. Use of English-like words makes it a human-understandable language.
2. Lesser number of lines of code as compared to the above 2 languages.
3. Same code can be copied to another machine & executed on that machine by
using compiler-specific to that machine.
Disadvantages :
1. Compiler/ interpreter is needed.
2. Different compilers are needed for different machines.
10
INFLUENCE OF LANGUAGE DESIGN
INTRODUCTION
Primary influences on language design has a profound effect on the basic Computer architecture in
which languages are developed around computer architecture, known as the Von Neumann architecture
(the most prevalent computer architecture). Pedagogy: Some languages have better “pedagogy” than
others. That is, they are intrinsically easier to teach and to learn, they have better textbooks; they are
implemented in a better program development environment, they are widely known and used by the
best programmers in an application area.
Computer Architecture
A computer scientist John von Neumann in 1945 described a design architecture for an electronic digital
computer with subdivisions of a central arithmetic part, a central control part, a memory to store both
data and instructions, external storage, and input and output mechanisms. John Von Neumann
introduced the idea of the stored program which is used to keep programmed instructions, as well as its
data, in read-write, random-access memory (RAM). Previously data and programs were stored in
separate memories. Von Neumann realized that data and programs are indistinguishable and can,
therefore, use the same memory. On a large scale, the ability to treat instructions as data is what makes
assemblers, compilers and other automated programming tools possible. One can "write programs
which write programs". This led to the introduction of compilers which accepted high level language
source code as input and produced binary code as output.
According to Von Neumann Architecture, the basic function performed by a computer is the execution
of a program. A program is a set of machine instructions. An instruction is a form of control code, which
supplies the information about an operation and the data on which the operation is to be performed.
The Von Neumann architecture uses a single processor which follows a linear sequence of fetch-decode-
execute. In order to do this, the processor has to use some special registers, which are discrete memory
locations with special purposes attached. These are:
11
The program counter keeps track of where to find the next instruction so that a copy of the
instruction can be placed in the current instruction register. Sometimes the program counteris
called the Sequence Control Register (SCR) as it controls the sequence in which instructions are
executed.
The current instruction register holds the instruction that is to be executed. The memory
address register is used to hold the memory address that contains either the next piece of data
or an instruction that is to be used.
The memory data register acts like a buffer and holds anything that is copied from the memory
ready for the processor to use it.
The central processor contains the arithmetic-logic unit (also known as the arithmetic unit) and
the control unit.
The arithmetic-logic unit (ALU) is where data is processed. This involves arithmetic and logical
operations. Arithmetic operations are those that add and subtract numbers, and so on. Logical
operations involve comparing binary patterns and making decisions.
The control unit fetches instructions from memory, decodes them and synchronises the
operations before sending signals to other parts of the computer.
The accumulator is in the arithmetic unit, the program counter and the instruction registers are
in the control unit and the memory data register and memory address register are in the
processor.
An index register is a microprocessor register used for modifying operand addresses during the
run of a program, typically for doing vector/array operations. Index registers are used for a
special kind of indirect addressing (covered in 3.5 (i)) where an immediate constant (i.e. which is
part of the instruction itself) is added to the contents of the index register to form the address
to the actual operand or data.
Load the address that is in the program counter (PC) into the memory address register (MAR). 2.
Increment the PC by 1.
Load the instruction that is in the memory address given by the MAR into the memory data
register (MDR).
Load the instruction that is now in the MDR into the current instruction register (CIR).
Decode the instruction that is in the CIR.
If the instruction is a jump instruction, then a. Load the address part of the instruction into the
PC b. Reset by going to step 1.
Execute the instruction.
Reset by going to step 1.
NOTE
Steps 1 to 4 are the fetch part of the cycle. Steps 5, 6a and 7 are the execute part of the cycle and steps
6b and 8 are the reset part.
Step 1 simply places the address of the next instruction into the memory address register so that the
control unit can fetch the instruction from the right part of the memory. The program counter is then
12
incremented by 1 so that it contains the address of the next instruction, assuming that the instructions
are in consecutive locations. The memory data register is used whenever anything is to go from the
central processing unit to main memory, or vice versa. Thus the next instruction is copied from memory
into the MDR and is then copied into the current instruction register. Now that the instruction has been
fetched the control unit can decode it and decide what has to be done. This is the execute part of the
cycle. If it is an arithmetic instruction, this can be executed and the cycle restarted as the PC contains
the address of the next instruction in order. However, if the instruction involves jumping to an
instruction that is not the next one in order, the PC has to be loaded with the address of the instruction
that is to be executed next. This address is in the address part of the current instruction, hence the
address part is loaded into the PC before the cycle is reset and starts all over again.
The major influences on language design have been machine architecture and software design
methodologies which is a theoretical design for a stored program computer that serves as the basis for
almost all modern computers.
13
LANGUAGE PARADIGM
Introduction
When programs are developed to solve real-life problems like inventory management, payroll
processing, student admissions, examination result processing, etc. they tend to be huge and complex.
The approach to analyzing such complex problems, planning for software development and controlling
the development process is called programming methodology. New software development
methodologies (e.g. object Oriented Software Development) led to new paradigms in programming and
by extension, to new programming languages. A programming paradigm is a pattern of problem-solving
thought that underlies a particular genre of programs and languages. Also a programming paradigm is
the concept by which the methodology of a programming language adheres to.
Language Paradigm
Paradigm is a model or world view. Paradigms are important because they define a programming
language and how it works. A great way to think about a paradigm is as a set of ideas that a
programming language can use to perform tasks in terms of machine-code at a much higher level. These
different approaches can be better in some cases, and worse in others. A great rule of thumb when
exploring paradigms is to understand what they are good at. While it is true that most modern
programming languages are general-purpose and can do just about anything, it might be more difficult
to develop a game, for example, in a functional language than an object-oriented language. Many
people classify languages into these main paradigms:
14
Object-oriented programming languages
know the object representing data as well as procedures. Data structures and their appropriate
manipulation processes are packed together to form a syntactical unit. Here the solution revolves
around entities or objects that are part of problem. The solution deals with how to store data related to
the entities, how the entities behave and how they interact with each other to give a cohesive solution.
Example − If we have to develop a payroll management system, we will have entities like employees,
salary structure, leave rules, etc. around which the solution must be built. E.g. SIMULA 67, SMALLTALK,
C++, Java, Python, C#, Perl, Lisp or EIFFEL.
Scripting Language
A scripting language or script language is a programming language for a runtime system that automates
the execution of tasks that would otherwise be performed individually by a human operator. Scripting
languages are usually interpreted at runtime rather than compiled. Scripting languages are a popular
family of programming languages that allow frequent tasks to be performed quickly. Early scripting
languages were generally used for niche applications – and as “glue languages” for combining existing
systems. With the rise of the World Wide Web, a range of scripting languages emerged for use on web
servers. Since scripting languages simplify the processing of text, they are ideally suited to the dynamic
generation of HTML pages. The user can learn to code in scripting languages quickly, not much
knowledge of web technology is required. It is highly efficient with the limited number of data structures
and variables to use, it helps in adding visualization interfaces and combinations in web pages. There are
different libraries which are part of different scripting languages. They help in creating new applications
in web browsers and are different from normal programming languages. Examples are Node js,
JavaScript, Ruby, Python, Perl, bash, PHP etc.
Software developers may choose one or a combination of more than one of these methodologies to
develop a software. Note that in each of the methodologies discussed, problem has to be broken down
into smaller units. To do this, developers use any of the following two approaches:
Top-down approach: The problem is broken down into smaller units, which may be further broken
down into even smaller units. Each unit is called a module. Each module is a self-sufficient unit that has
everything necessary to perform its task. The following illustration shows an example of how you can
follow modular approach to create different modules while developing a payroll processing program.
15
Bottom-up Approach: In bottom-up approach, system design starts with the lowest level of
components, which are then interconnected to get higher level components. This process continues till a
hierarchy of all system components is generated. However, in real-life scenario it is very difficult to know
all lowest level components at the outset. So bottoms up approach is used only for very simple
problems. Let us look at the components of a calculator program.
16
LANGUAGE TRADE-OFFS DESIGN
Introduction
Programming paradigms, like software architecture, have trade-offs. In fact, many of the same methods
for comparing architectural designs apply just as well to language design. Conceptual design involves a
series of trade-off decisions among significant parameters such as operating speeds, memory size,
power, and I/O bandwidth - to obtain a compromise design which best meets the performance
requirements.
Reliability: this take into account the time required for malfunction detection and
reconfiguration or repair.
Expandability: measures the computer system’s ability to conveniently accommodate increased
requirements by higher speed or by physical expansion without the cost of a major redesign.
Modularity is a desirable method for providing expandability and should be incorporated
whenever feasible.
Programmability: there should be a balance between programming simplicity and hardware
complexity to prevent the cost of programming from becoming overwhelming. The degree of
software sophistication and the availability of support software should be considered during the
design.
Maintainability: should not be neglected when designing the computer, repair should be readily
accomplished during ground operation.
Compatibility: this should be developed between computer and interfaces, software, power
levels to facilitate programming.
Adaptability: is defined as the ability of the system to meet a wide range of functional
requirements without requiring physical modifications.
Availability: is the portability that the computer is operating satisfactorily at a given time. It is
closely related to reliability.
Development status and cost: are complex management factors which can have significant
effects on the design as well. They require the estimation of a number of items such as the
extent off off-the-shelf hardware use, design risks in developing new equipment using advanced
technologies, potential progress in the state of the art during the design and development.
17
PROGRAMMING LANGUAGE IMPLEMENTATION
Introduction
A programming language implementation is a system for executing computer programs. There are four
approaches or methods to programming language implementation which are:-
Implementation Method
Interpretation
Hybrid
Just –in-time
Implementation Method
Explain implementation process in programming language such as:
Compilation
Programs can be translated into machine language, which can be executed directly on the
computer. This method is called a compiler implementation and has the advantage of very fast
program execution, once the translation process is complete. The language that a compiler
translates is called the source language. The process of compilation and program execution
takes place in several phases, the most important of which are shown in Figure below:
18
Compilation Process
- Lexical analysis converts characters in the source program into lexical units (e.g.
identifiers, operators, keywords). - Syntactic analysis: transforms lexical units into parse
trees which represent the syntactic structure of the program. - Semantics analysis check
for errors hard to detect during syntactic analysis; generate intermediate code.
- Code generation – Machine code is generated
Interpretation
Pure interpretation lies at the opposite end (from compilation) of implementation methods. With this
approach, programs are interpreted by another program called an interpreter, with no translation
whatever. The interpreter program acts as a software simulation of a machine whose fetch-execute
cycle deals with high-level language program statements rather than machine instructions. The process
of pure interpretation is shown in Figure below:
Phases of Interpretation
- Programs are interpreted by another program (an interpreter)
- Easier implementation of programs (run-time errors can easily and immediately be
displayed).
- Slower execution (10 to 100 times slower than compiled programs)
- Often requires more memory space
19
Hybrid
Some language implementation systems are a compromise between compilers and pure interpreters;
they translate high-level language programs to an intermediate language designed to allow easy
interpretation. This method is faster than pure interpretation because the source language statements
are decoded only once. Such implementations are called hybrid implementation systems. The process
used in a hybrid implementation system is shown in Figure below:
Instead of translating intermediate language code to machine code, it simply interprets the intermediate
code.
- Programs translated into an intermediate language for easy Interpretation - This involves a
compromise between compilers and pure interpreters. A high level program is translated to an
intermediate language that allows easy interpretation.
- Hybrid implementation is faster than pure interpretation. Examples of the implementation occur in
Perl and Java.
• Perl programs are partially compiled to detect errors before interpretation.
• Initial implementat6ions of Java were hybrid. The intermediate form, byte code, provides
portability to any machine that has a byte code interpreter and a run time system (together,
these are called Java Virtual Machine).
20
Just –in-time
After hybrid, then compile sub programs code the first time they are called. This implementation
initially translates programs to an intermediate language then compile the intermediate language of the
subprograms into machine code when they are called.
- Machine code version is kept for subsequent calls. Just-in-time systems are widely used for Java
programs
Language implementation are intimately related to one another. Obviously an implementation must
conform to the rules of the language. At the same time, a language designer must consider how easy or
difficult it will be to implement various features, and what sort of performance is likely to result for
programs that use those features. Language implementations are commonly differentiated into those
based on interpretation and those based on compilation. However, the difference between these
approaches is fuzzy, and that most implementations include a bit of each. As a general rule, a language
is compiled if execution is preceded by a translation step that fully analyzes both the structure and
meaning of the program, and produces an equivalent program in a significant different form.
The major methods of implementing programming languages are compilation, pure interpretation, and
hybrid implementation. Programming environments have become important parts of software
development systems, in which the language is just one of the components. Implementation method
will acquaint you with the fundamental ideas surrounding the design and implementation of high-level
programming languages.
21
LANGUAGE DESCRIPTION / IMPLEMENTATION
Introduction
The study of programming languages, like the study of natural languages, can be divided into
examinations of syntax and semantics. The syntax of a programming language is the form of its
expressions, statements, and program units. Its semantics is the meaning of those expressions,
statements, and program units.
The semantics of this statement form is that when the current value of the Boolean expression is true,
the embedded statement is executed. Otherwise, control continues after the while construct. Then
control implicitly returns to the Boolean expression to repeat the process. Although they are often
separated for discussion purposes, syntax and semantics are closely related. In a well designed
programming language, semantics should follow directly from syntax; that is, the appearance of a
statement should strongly suggest what the statement is meant to accomplish.
Syntactic Analysis
Syntax is the set of rules that define what the various combinations of symbols mean. This tells the
computer how to read the code. Syntax refers to a concept in writing code dealing with a very specific
set of words and a very specific order to those words when we give the computer instructions. This
order and this strict structure is what enables us to communicate effectively with a computer. Syntax is
to code, like grammar is to English or any other language. A big difference though is that computers are
really exacting in how we structure that grammar or our syntax. This syntax is why we call programming
coding. Even amongst all the different languages that are out there. Each programming language uses
different words in a different structure in how we give it information to get the computer to follow our
instructions. Syntax analysis is a task performed by a compiler which examines whether the program has
a proper associated derivation tree or not. The syntax of a programming language can be interpreted
using the following formal and informal techniques:
Lexical syntax for defining the rules for basic symbols involving identifiers, literals, punctuators
and operators.
Concrete syntax specifies the real representation of the programs with the help of lexical
symbols like its alphabet.
Abstract syntax conveys only the vital program information.
The Syntax of a programming language is used to signify the structure of programs without considering
their meaning. It basically emphasizes the structure, layout of a program with their appearance. It
involves a collection of rules which validates the sequence of symbols and instruction used in a program.
In general, languages can be formally defined in two distinct ways: by recognition and by generation.
22
Language Recognizer
The syntax analysis part of a compiler is a recognizer for the language the compiler translates. In this
role, the recognizer need not test all possible strings of characters from some set to determine whether
each is in the language. Rather, it need only determine whether given programs are in the language. In
effect then, the syntax analyzer determines whether the given programs are syntactically correct. The
structure of syntax analyzers, also known as parsers as discussed before. Language recognizer is like a
filters, separating legal sentences from those that are incorrectly formed.
Language Generator
A language generator is a device that can be used to generate the sentences of a language. a generator
seems to be a device of limited usefulness as a language descriptor. However, people prefer certain
forms of generators over recognizers because they can more easily read and understand them. By
contrast, the syntax-checking portion of a compiler (a language recognizer) is not as useful a language
description for a programmer because it can be used only in trial-anderror mode. For example, to
determine the correct syntax of a particular statement using a compiler, the programmer can only
submit a speculated version and note whether the compiler accepts it. On the other hand, it is often
possible to determine whether the syntax of a particular statement is correct by comparing it with the
structure of the generator. There is a close connection between formal generation and recognition
devices for the same language which led to formal languages.
Parsing
In linguistics, parsing is the process of analyzing a text, made of a sequence of tokens (for example,
words), to determine its grammatical structure with respect to a given (more or less) formal grammar.
Parsing can also be used as a linguistic term, especially in reference to how phrases are divided up in
garden path sentences. The diagram below shows overview process.
23
Parse Trees
One of the most attractive features of grammars is that they naturally describe
the hierarchical syntactic structure of the sentences of the languages they
define. These hierarchical structures are called parse trees. For example, the
parse tree in Figure below showed the structure of the assignment statement
derived. Every internal node of a parse tree is labeled with a nonterminal
symbol; every leaf is labeled with a terminal symbol. Every subtree of a parse
tree describes one instance of an abstraction in the sentence. For example: A
parse tree for the simple statement
A = B * (A + C)
Parser
In computing, a parser is one of the components in an interpreter or compiler,
which checks for correct syntax and builds a data structure (often some kind of
parse tree, abstract syntax tree or other hierarchical structure) implicit in the
input tokens. The parser often uses a separate lexical analyzer to create tokens
from the sequence of input characters. Parsers may be programmed by hand or
may be (semi-)automatically generated (in some programming languages) by a
tool.
Types of parser
The task of the parser is essentially to determine if and how the input can be
derived from the start symbol of the grammar. This can be done in essentially
two ways:
24
Top-down parsing: Top-down parsing can be viewed as an attempt to find
leftmost derivations of an input-stream by searching for parse trees using a
top-down expansion of the given formal grammar rules. Tokens are
consumed from left to right. Inclusive choice is used to accommodate
ambiguity by expanding all alternative right-hand-sides of grammar rules.
Examples includes: Recursive descent parser, LL parser (Left-to-right,
Leftmost derivation), and so on.
Bottom-up parsing: A parser can start with the input and attempt to rewrite
it to the start symbol. Intuitively, the parser attempts to locate the most
basic elements, then the elements containing these, and so on. LR parsers
are examples of bottom-up parsers. Another term used for this type of
parser is Shift-Reduce parsing.
This is the process of recognizing an utterance (a string in natural languages) by
breaking it down to a set of symbols and analyzing each one against the
grammar of the language.
Most languages have the meanings of their utterances structured according to
their syntax—a practice known as compositional semantics. As a result, the first
step to describing the meaning of an utterance in language is to break it down
part by part and look at its analyzed form (known as its parse tree in computer
science, and as its deep structure in generative grammar) as discussed earlier.
Syntactic Ambiguity
Syntactic ambiguity is a property of sentences which may be reasonably interpreted in more than one
way, or reasonably interpreted to mean more than one thing. Ambiguity may or may not involve one
word having two parts of speech or homonyms. Syntactic ambiguity arises not from the range of
meanings of single words, but from the relationship between the words and clauses of a sentence, and
the sentence structure implied thereby. When a reader can reasonably interpret the same sentence as
having more than one possible structure, the text is equivocal and meets the definition of syntactic
ambiguity.
Operator Precedence
When several operations occur in an expression, each part is evaluated and resolved in a predetermined
order called operator precedence. Parentheses can be used to override the order of precedence and
force some parts of an expression to be evaluated before other parts. Operations within parentheses
are always performed before those outside. Within parentheses, however, normal operator precedence
is maintained. When expressions contain operators from more than one category, arithmetic operators
are evaluated first, comparison operators are evaluated next, and logical operators are evaluated last.
Comparison operators all have equal precedence; that is, they are evaluated in the left-to-right order in
which they appear. Arithmetic and logical operators are evaluated in the following order of precedence:
When multiplication and division occur together in an expression, each operation is evaluated as it
occurs from left to right. Likewise, when addition and subtraction occur together in an expression, each
operation is evaluated in order of appearance from left to right. The string concatenation operator (&)
25
is not an arithmetic operator, but in precedence it does fall after all arithmetic operators and before all
comparison operators. The Is operator is an object reference comparison operator. It does not compare
objects or their values; it checks only to determine if two object references refer to the same object.
Syntax analysis is another phase of the compiler design process in which the given input string is
checked for the confirmation of rules and structure of the formal grammar. It analyses the syntactical
structure and checks if the given input is in the correct syntax of the programming language or not.
Semantic Analysis
Semantics term in a programming language is used to figure out the relationship among the syntax and
the model of computation. It emphasizes the interpretation of a program so that the programmer could
understand it in an easy way or predict the outcome of program execution. An approach known as
syntax-directed semantics is used to map syntactical constructs to the computational model with the
help of a function.
Semantic analysis is to provide the task acknowledgment and statements of a semantically correct
program. There are following styles of semantics.
Operational
Determining the meaning of a program in place of the calculation steps which are
necessary to idealized execution. Some definitions used structural operational semantics
which intermediate state is described on the basis of the language itself others use
abstract machine to make use of more ad-hoc mathematical constructions. With an
operational semantics of a programming language, one usually understands a set of
rules for its expressions, statements, programs, etc., are evaluated or executed. These
guidelines tell how a possible implementation of a programming language should be
working and it is not difficult to give skills an implementation of an interpreter of a
language in any programming languages simply by monitoring and translating it
operational semantics of the language destination deployment.
Denotational
26
Determining the meaning of a program as elements of a number of abstract
mathematical structures e.g. with regard to functions such as programming language
specific mathematical functions.
Axiomatic or logical
The definition of a program defining indirectly, by providing the axioms of logic to the
characteristics of the program. Compare with specification and verification.
Semantic Analyzer
It uses syntax tree and symbol table to check whether the given program is semantically consistent with
language definition. It gathers type information and stores it in either syntax tree or symbol table. This
type information is subsequently used by compiler during intermediatecode generation.
Semantic Errors
Some of the semantics errors that the semantic analyzer is expected to recognize:
Type mismatch
Undeclared variable
Reserved identifier misuse.
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
27
Keeps a check that control structures are used in a proper manner (example: no break
statement outside a loop)
Semantic Issues of Variables, Nature of Names and Special Words in Programming Languages
Variables
Variables in programming tells how the data is represented which can be range from very simple value
to complex one. The value they contain can be change depending on condition. When creating a
variable, we also need to declare the data type it contains. This is because the program will use different
types of data in different ways. Programming languages define data types differently. Data can hold a
very simplex value like an age of the person to something very complex like a student track record of his
performance of whole year. It is a symbolic name given to some known or unknown quantity or
information, for the purpose of allowing the name to be used independently of the information it
represents. Compilers have to replace variables' symbolic names with the actual locations of the data.
While the variable name, type, and location generally remain fixed, the data stored in the location may
get altered during program execution.
For example, almost all languages differentiate between ‘integers’ (or whole numbers, eg 12), ‘non-
integers’ (numbers with decimals, eg 0.24), and ‘characters’ (letters of the alphabet or words). In
programming languages, we can distinguish between different type levels which from the user's point of
view form a hierarchy of complexity, i.e. each level allows new data types or operations of greater
complexity.
Elementary level: Elementary (sometimes also called basic or simple) types, such as integers,
reals, booleans, and characters, are supported by nearly every programming language. Data
objects of these types can be manipulated by well-known operators, like +, - , *, or /, on the
programming level. It is the task of the compiler to translate the operators onto the correct
machine instructions, e.g. fixed-point and floating-point operations.
Structured level: Most high level programming languages allow the definition of structured
types which are based on simple types. We distinguish between static and dynamic structures.
Static structures are arrays, records, and sets, while dynamic structures are a b it more
28
complicated, since they are recursively defined and may vary in size and shape during the
execution of a program. Lists and trees are dynamic structures.
Abstract level: Programmer defined abstract data types are a set of data objects with declared
operations on these data objects. The implementation or internal representation of abstract
data types is hidden to the users of these types to avoid uncontrolled manipulation of the data
objects (i.e the concept of encapsulation).
Naming conventions
Unlike their mathematical counterparts, programming variables and constants commonly take multiple-
character names, e.g. COST or total. Single-character names are most commonly used only for auxiliary
variables; for instance, i, j, k for array index variables. Some naming conventions are enforced at the
language level as part of the language syntax and involve the format of valid identifiers. In almost all
languages, variable names cannot start with a digit (0-9) and cannot contain whitespace characters.
Whether, which, and when punctuation marks are permitted in variable names varies from language to
language; many languages only permit the underscore (_) in variable names and forbid all other
punctuation. In some programming languages, specific (often punctuation) characters (known as sigils)
are prefixed or appended to variable identifiers to indicate the variable's type. Case-sensitivity of
variable names also varies between languages and some languages require the use of a certain case in
naming certain entities; most modern languages are case-sensitive; some older languages are not. Some
languages reserve certain forms of variable names for their own internal use; in many languages, names
beginning with 2 underscores ("__") often fall under this category.
Binding
Binding describes how a variable is created and used (or "bound") by and within the given program, and,
possibly, by other programs, as well. There are two types of binding; Dynamic, and Static binding.
Dynamic Binding
Also known as Dynamic Dispatch) is the process of mapping a message to a specific sequence of
code (method) at runtime. This is done to support the cases where the appropriate method
cannot be determined at compile-time. It occurs first during execution, or can change during
execution of the program.
Static Binding
It occurs first before run time and remains unchanged throughout program execution
Scope
The scope of a variable describes where in a program's text, the variable may be used, while the extent
(or lifetime) describes when in a program's execution a variable has a (meaningful) value. Scope is a
lexical aspect of a variable. Most languages define a specific scope for each variable (as well as any other
named entity), which may differ within a given program. The scope of a variable is the portion of the
program code for which the variable's name has meaning and for which the variable is said to be
"visible". It is also of two type; static and dynamic scope.
29
Static Scope
The static scope of a variable is the most immediately enclosing block, excluding any enclosed
blocks where the variable has been re-declared. The static scope of a variable in a program can
be determined by simply studying the text of the program. Static scope is not affected by the
order in which procedures are called during the execution of the program.
Dynamic Scope
The dynamic scope of a variable extends to all the procedures called thereafter during program
execution, until the first procedure to be called that re-declares the variable.
Referencing
The referencing environment is the collection of variable which can be used. In a static scoped language,
one can only reference the variables in the static reference environment. A function in a static scoped
language does have dynamic ancestors (i.e. its callers), but cannot reference any variables declared in
that ancestor.
2. Syntactic errors are handled at the compile time. As against, semantic errors are difficult to find
and encounters at the runtime.
3. For example, in c++ a variable “s” is declared as “int s;”, to initialize it we must use an integer
value. Instead of using integer we have initialized it with “Seven”. This declaration and
initialization is syntactically correct but semantically incorrect because “Seven” does not
represent integer form.
4. In relation syntactic interpretation must have some distinctive meaning, while semantic
component is associated with a syntactic representation.
30