Compiler Design Lab
Compiler Design Lab
Compiler Design Lab
Vision :
" To be a world class university through education, innovation and research for the service of
humanity "
Mission :
5. To provide environment friendly, reasonable and sustainable solutions for local & global
needs.
2
Vision and Mission of Department
Vision
To be a world class university through education, innovation and research for the service of
humanity.
Mission
To develop human resources with sound knowledge and intellectual capability for
innovation, research and excellence in the field of Information Technology
To provide conducive academic environment for achieving world class standards in
teaching and cutting-edge research in the field of Information Technology.
To inculcate professional ethics and social responsibility in students and faculty to
contribute in the overall development of society.
To facilitate the development of academia-industry collaborations and societal outreach
programmes.
3
OBJECTIVE OF THIS LABORATORY
Course Objectives:
To enlighten the student with knowledge base in compiler design and its applications
To familiarize the students with the different phases of a compilation process i.e. lexical
analysis, syntax analysis, semantic analysis, code generation and optimization, target
code generation and optimization.
Course Outcomes:
Reference Books:
4
CONTENTS
S. Experiment Page
No. No.
5
EXPERIMENT 1
AIM:
To implement a DFA in C++ and conduct a membership test for two
strings.
THEORY:
In DFA, for each input symbol, one can determine the state to which the
machine will move. Hence, it is called Deterministic Automaton. As it has a
finite number of states, the machine is called Deterministic Finite Machine
or Deterministic Finite Automaton.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
• Q is a finite set of states.
• ∑ is a finite set of symbols called the alphabet.
• δ is the transition function where δ: Q × ∑ → Q
• q0 is the initial state from where any input is processed (q0 ∈ Q).
STEP 1:
Let us consider a simple Deterministic Finite Automata, for some
Language: having more than one character, where all characters belong to the
set {0,1}.
STEP 2:
They can write a regular expression for the above language as
re = (0+1).(0+1).(0+1)*
6
DFA STATE DIAGRAM for L=(0+1).(0+1).(0+1)* , ∑={0,1}
Above is the DFA for the given language, which accepts only those words
which consists of more than one character.
STEP 3:
STEP 4:
The following the state table for the above DFA:
δ=
Present New State
State
0 1
q0 q1 q1
q1 qf qf
qf qf qf
STEP 5:
And Hence They perform the Membership Test for following two strings :
a. 0110
b. 1
CODE:
7
#include <iostream>
#include <string>
using namespace std;
int main()
{
string inp;
cout<<"Enter String : ";
cin>>inp;
int state;
int pos=0;
while(inp[pos]!='\0'){
if((inp[pos] == '1' || inp[pos] == '0') && state<2 )
state++;
if (state == 2)
flag = 1;
pos++;
}
cout<<"----Membership Test Result----\n";
cout<<"String Belongs to the Language : ";
if (flag==1)
cout<<"Yes";
else
cout<<"No";
cout<<endl;
}
OUTPUT SNAPSHOT:
Hence as expected the output is that 0110 belongs to the given language.
Hence as expected the output is that 1 does not belong to the given language.
8
RESULT & DISCUSSION:
As expected from the C++ code, the membership test for both the test strings
passed.
The automaton takes a finite sequence of 0s and 1s as input. For each state,
there is a transition arrow leading out to a next state for both 0 and 1. Upon
reading a symbol, a DFA jumps deterministically from one state to another by
following the transition arrow.
For example, if the automaton is currently in state q0 and the current input
symbol is 1, then it deterministically jumps to state q1.
For example, a DFA can model software that decides whether or not online
user input such as email addresses are valid.
DFAs recognize exactly the set of regular languages, which are, among other
things, useful for doing lexical analysis and pattern matching.
9
EXPERIMENT 2
AIM:
Write a program for creating tokens for the high level program a =
b + 30;
THEORY:
Lexical analysis is the first phase of a compiler. It takes the modified source code
from language preprocessors that are written in the form of sentences. The
lexical analyzer breaks these syntaxes into a series of tokens, by removing any
whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical
analyzer works closely with the syntax analyzer. It reads character streams from
the source code, checks for legal tokens, and passes the data to the syntax
analyzer when it demands.
ALGORITHM :
1. Scan the string character by character
2. If the symbol encountered is from a-z or A-Z,add the symbol to map with
giving a suitable id and increment static row number
3. If the symbol encountered is one of the operator or any punctuation
push it to the map
4. Traverse the string path and print respective values from map
CODE :
#include<iostream>
#include<string>
#include<map>
#include <sstream>
10
if(s[i]=='+')
return 1;
else if(s[i]=='-')
return 1;
else if(s[i]=='*')
return 1;
else if(s[i]=='/')
return 1;
if(s[i]=='=' || s[i]=='>' || s[i]=='<')
{
if(s[i+1]=='=')
return 2;
else
return 1;
}
return 0;
}
int ifPunctuation(int i,string s)
{
if(s[i]==' ')
return 1;
else if(s[i]==';')
return 1;
else if(s[i]=='(')
return 1;
else if(s[i]==')')
return 1;
return 0;
}
int ifIdentifier(int i,string s)
{
int l=1;
while(!ifOperator(i+1,s) && !ifPunctuation(i+1,s))
{
//cout<<"in identifier "<<s[i]<<" "<<ifOperator(i,s)<<" "<<ifPunctuation(i,s)<<endl;
l++;
i++;
}
return l;
}
bool isDigit(int i,string s)
{
if(s[i]=='0' || s[i]=='1' || s[i]=='2' || s[i]=='3' || s[i]=='4' || s[i]=='5' || s[i]=='6' || s[i]=='7' || s[i]=='8' ||
s[i]=='9')
return true;
return false;
}
int ifNumber(int i,string s)
{
int j=i,l=0;
while(isDigit(j,s))
11
{
l++;
j++;
}
return l;
}
int main()
{
/*compiler design*/
string s,k;
int i=0,l=0,j=0,index=0;
cin>>s;
map<int, string> umap;
while(s[i]!='\0')
{
if(ifOperator(i,s))
{
l=ifOperator(i,s);
j=1;
//cout<<"operator "<<l<<endl;
}
else if(ifNumber(i,s))
{
l=ifNumber(i,s);
j=1;
//cout<<"Number "<<l<<endl;
}
else if(ifPunctuation(i,s))
{
l=ifPunctuation(i,s);
//cout<<"Punctuation "<<l<<endl;
j=1;
}
else if(ifIdentifier(i,s))
{
l=ifIdentifier(i,s);
//cout<<"identifier "<<l<<endl;
j=2;
}
k=s.substr(i,l);
if(j==2)
{
index++;
tokenate* p=new tokenate();
ostringstream str1;
str1 << index;
string temp = str1.str();
p->id="id"+temp;
p->entry=index;
umap[index]=k;
cout<<"<"<<p->id<<","<<p->entry<<"> ";
12
}
else if(j==1)
{
tokenate* m= new tokenate();
m->id=k;
m->entry=-1;
cout<<"<"<<m->id<<"> ";
}
i=i+l;
l=0;
}
cout<<endl;
return 0;
}
OUTPUT :
13
EXPERIMENT 3
AIM:
Implement a lexical analyzer for a given source code stored in an
input file.
THEORY:
In computer science, lexical analysis, lexing or tokenization is the process of
converting a sequence of characters (such as in a computer program or Theyb
page) into a sequence of tokens (strings with an assigned and thus identified
meaning).
If the lexical analyzer finds a token invalid, it generates an error. The lexical
analyzer works closely with the syntax analyzer. It reads character streams
from the source code, checks for legal tokens, and passes the data to the
syntax analyzer when it demands.
PROCEDURE:
STEP 1: Create a table of token classes
Token Name Value
WS -
< relop LT
> relop LT
<= relop LT
>= relop LT
<> relop LT
if if -
then then -
else else -
id idno pos no
number number pos no
14
STEP 2: Read Input stream of characters from an external .txt file till EOF(End
of File) and create tokens corresponding the token table given above.
STEP 3: Ignore Spaces and Comments and output all the set of tokens
identified while parsing
CODE :
#include<stdio.h>
#include<ctype.h>
#include<string.h>
#include<stdlib.h>
void keyw(char *p);
int i=0,id=0,kw=0,num=0,op=0;
char keys[32][10]={"auto","break","case","char","const","continue","default",
"do","double","else","enum","extern","float","for","goto",
"if","int","long","register","return","short","signed",
"sizeof","static","struct","switch","typedef","union",
"unsigned","void","volatile","while"};
int main()
{
char ch,str[25],seps[120]=" \t\n,;(){}[]#\"<>";
char oper[]="!%^&*-+=~|.<>/?";
int j;
char fname[50];
FILE *f1;
//clrscr();
printf("enter file path (drive:\\fold\\filename)\n");
scanf("%s",fname);
f1 = fopen(fname,"r");
//f1 = fopen("Input","r");
if(f1==NULL)
{
printf("file not found");
exit(0);
}
while((ch=fgetc(f1))!=EOF)
{
for(j=0;j<=14;j++)
{
if(ch==oper[j])
{
printf("%c is an operator\n",ch);
op++;
str[i]='\0';
keyw(str);
}
}
for(j=0;j<=14;j++)
{
15
if(i==-1)
break;
if(ch==seps[j])
{
if(ch=='#')
{
while(ch!='>')
{
printf("%c",ch);
ch=fgetc(f1);
}
printf("%c is a header file\n",ch);
i=-1;
break;
}
if(ch=='"')
{
do
{
ch=fgetc(f1);
printf("%c",ch);
}while(ch!='"');
printf("\b is an argument\n");
i=-1;
break;
}
str[i]='\0';
keyw(str);
}
}
if(i!=-1)
{
str[i]=ch;
i++;
}
else
i=0;
}
printf("Keywords: %d\nIdentifiers: %d\nOperators: %d\nNumbers: %d\n",kw,id,op,num);
//getch();
}
void keyw(char *p)
{
int k,flag=0;
for(k=0;k<=31;k++)
{
if(strcmp(keys[k],p)==0)
{
printf("%s is a keyword\n",p);
kw++;
flag=1;
16
break;
}
}
if(flag==0)
{
if(isdigit(p[0]))
{
printf("%s is a number\n",p);
num++;
}
else
{
//if(p[0]!=13&&p[0]!=10)
if(p[0]!='\0')
{
printf("%s is an identifier\n",p);
id++;
}
}
}
i=-1;
}
OUTPUT:
1. program.txt
2. CODE Output
17
RESULT & DISCUSSION:
The code written parses the stream of characters from an external text file
and filters out token classes based on various token classes given in the
token table.
18
EXPERIMENT 4
AIM:
For the given grammar G of a fictitious language L parse the input string
w=id+id*id, Using Recursive Descent Parsing Algorithm
THEORY:
Recursive descent is a top-down parsing technique that constructs the parse
tree from the top and the input is read from left to right. It uses procedures
for every terminal and non-terminal entity. This parsing technique
recursively parses the input to make a parse tree, which may or may not
require back-tracking. But the grammar associated with it (if not left
factored) cannot avoid back-tracking
Algorithm:
1. Start from the input symbol
2. For each non terminal say S
a. Call Procedure(S)
b. for i=1 to k
i. choose production S->A1A2….Ak
ii. if Ai is a non terminal then call Procedure(Ai)
iii. else if Ai is a terminal
• if Ai is same as the current input symbol then break
from current loop of Ai and advance the pointer of
the current input symbol
• else display “Syntax Error”
CODE:
//#include<bits/stdc++.h>
#include<map>
#include<iostream>
#include<vector>
19
}
prod[word].push_back(s.substr(i));
j = i;
while(i < s.length() && s[i] != '+' && s[i] != '*' && s[i] != '(' && s[i] != ')'){
i++;
}
if(i == s.length()){
terminals[s.substr(j)] = 1;
}
return;
}
bool recursiveDescent(string inp,string cur,map<string,vector<string> > &prod,map<string,int>
&terminals){
int op = 0;
string word,c;
cout<<"in "<<cur<<endl;
if(cur == inp){
return true;
}
for(int i = 0;i <= cur.length();i++){
if(i<cur.length() && cur[i] == ' '){
op = i;
}
else if(i == cur.length() || cur[i] == '+' || cur[i] == '*' || cur[i] == '(' || cur[i] == ')'){
if(terminals[word] && inp.substr(op,i-op) == word){
word = "";
continue;
}
for(int j = 0;j < prod[word].size();j++){
if(op == 0)
op = -1;
if(recursiveDescent(inp,cur.substr(0,op+1) + prod[word][j] +
cur.substr(i),prod,terminals)){
return true;
}
}
word = "";
op = i;
}
else{
word += cur[i];
}
}
return false;
}
int main(){
int i,j,n;
string inp,start,productions[100];
map<string,vector<string> > prod;
map<string,int> terminals;
20
cout<<"Enter no. of productions"<<endl;
cin>>n;
cin.ignore();
cout<<"Enter Productions"<<endl;
for(i = 0;i < n;i++){
getline(cin,productions[i]);
update(productions[i],prod,terminals);
}
cout<<"Enter the string to check"<<endl;
cin>>inp;
cout<<"Enter the starting variable"<<endl;
cin>>start;
for(i = 0;i < prod[start].size();i++){
if(recursiveDescent(inp,prod[start][i],prod,terminals)){
cout<<"Accepted"<<endl;
return 0;
}
}
cout<<"Not Accepted"<<endl;
return 0;
}
21
OUTPUT:
22
EXPERIMENT 5
AIM:
To convert NFA to DFA using Subset construction method
THEORY:
The Two main algorithms used in subset construction are:
Subset Construction:
Computing e-closure:
23
Input NFA:
CODE:
#include <stdio.h>
#include <string.h>
#define STATES 50
struct Dstate
{
char name;
char StateString[STATES+1];
char trans[10];
int is_final;
}Dstates[50];
struct tran
{
char sym;
int tostates[50];
int notran;
};
struct state
{
int no;
struct tran tranlist[50];
};
int stackA[100],stackB[100],C[100],Cptr=-1,Aptr=-1,Bptr=-1;
struct state States[STATES];
char temp[STATES+1],inp[10];
int nos,noi,nof,j,k,nods=-1;
void pushA(int z)
{
stackA[++Aptr]=z;
24
}
void pushB(int z)
{
stackB[++Bptr]=z;
}
int popA()
{
return stackA[Aptr--];
}
void copy(int i)
{
char temp[STATES+1]=" ";
int k=0;
Bptr=-1;
strcpy(temp,Dstates[i].StateString);
while(temp[k]!='\0')
{
pushB(temp[k]-'0');
k++;
}
}
int popB()
{
return stackB[Bptr--];
}
int peekB()
{
return stackA[Bptr];
}
int peekA()
{
return stackA[Aptr];
}
int seek(int arr[],int ptr,int s)
{
int i;
for(i=0;i<=ptr;i++)
{
if(s==arr[i])
return 1;
25
}
return 0;
}
void sort()
{
int i,j,temp;
for(i=0;i<Bptr;i++)
{
for(j=0;j<(Bptr-i);j++)
{
if(stackB[j]>stackB[j+1])
{
temp=stackB[j];
stackB[j]=stackB[j+1];
stackB[j+1]=temp;
}
}
}
}
void tostring()
{
int i=0;
sort();
for(i=0;i<=Bptr;i++)
{
temp[i]=stackB[i]+'0';
}
temp[i]='\0';
}
void display_DTran()
{
int i,j;
printf("\n\t\t DFA Transition Table ");
printf("\n\t\t -------------------- ");
printf("\nStates\tString\tInputs\n ");
for(i=0;i<noi;i++)
{
printf("\t%c",inp[i]);
}
printf("\n \t----------");
26
for(i=0;i<nods;i++)
{
if(Dstates[i].is_final==0)
printf("\n%c",Dstates[i].name);
else
printf("\n*%c",Dstates[i].name);
printf("\t%s",Dstates[i].StateString);
for(j=0;j<noi;j++)
{
printf("\t%c",Dstates[i].trans[j]);
}
}
printf("\n");
}
void move(int st,int j)
{
int ctr=0;
while(ctr<States[st].tranlist[j].notran)
{
pushA(States[st].tranlist[j].tostates[ctr++]);
}
}
void lambda_closure(int st)
{
int ctr=0,in_state=st,curst=st,chk;
while(Aptr!=-1)
{
curst=popA();
ctr=0;
in_state=curst;
while(ctr<=States[curst].tranlist[noi].notran)
{
chk=seek(stackB,Bptr,in_state);
if(chk==0)
pushB(in_state);
in_state=States[curst].tranlist[noi].tostates[ctr++];
chk=seek(stackA,Aptr,in_state);
if(chk==0 && ctr<=States[curst].tranlist[noi].notran)
27
pushA(in_state);
}
}
}
main()
{
int final[20],start,fin=0,i;
char c,ans,st[20];
printf("\nEnter no. of states in NFA : ");
scanf("%d",&nos);
for(i=0;i<nos;i++)
{
States[i].no=i;
}
printf("\nEnter the start state : ");
scanf("%d",&start);
printf("Enter the no. of final states : ");
scanf("%d",&nof);
printf("\nEnter the final states : \n");
for(i=0;i<nof;i++)
scanf("%d",&final[i]);
printf("\nEnter the no. of input symbols : ");
scanf("%d",&noi);
c=getchar();
printf("\nEnter the input symbols : \n ");
for(i=0;i<noi;i++)
{
scanf("%c",&inp[i]);
c=getchar();
}
inp[i]='e';
printf("\nEnter the transitions : (-1 to stop)\n");
for(i=0;i<nos;i++)
{
for(j=0;j<=noi;j++)
{
States[i].tranlist[j].sym=inp[j];
k=0;
ans='y';
while(ans=='y')
28
{
printf("move(%d,%c) : ",i,inp[j]);
scanf("%d",&States[i].tranlist[j].tostates[k++]);
if(States[i].tranlist[j].tostates[k-1]==-1)
{
k--;ans='n';
break;
}
}
States[i].tranlist[j].notran=k;
}
}
//Conversion
i=0;nods=0;fin=0;
pushA(start);
lambda_closure(peekA());
tostring();
Dstates[nods].name='A';
nods++;
strcpy(Dstates[0].StateString,temp);
while(i<nods)
{
for(j=0;j<noi;j++)
{
fin=0;
copy(i);
while(Bptr!=-1)
{
move(popB(),j);
}
while(Aptr!=-1)
lambda_closure(peekA());
tostring();
for(k=0;k<nods;k++)
{
if((strcmp(temp,Dstates[k].StateString)==0))
{
Dstates[i].trans[j]=Dstates[k].name;
break;
}
29
}
if(k==nods)
{
nods++;
for(k=0;k<nof;k++)
{
fin=seek(stackB,Bptr,final[k]);
if(fin==1)
{
Dstates[nods-1].is_final=1;
break;
}
}
strcpy(Dstates[nods-1].StateString,temp);
Dstates[nods-1].name='A'+nods-1;
Dstates[i].trans[j]=Dstates[nods-1].name;
}
}
i++;
}
display_DTran();
}
30
OUTPUT :
31
RESULT & DISCUSSION:
The code converts the above given NFA to DFA using subset construction
methods and produces the DFA transition table as output
32
EXPERIMENT 6
AIM: To find FOLLOW set for given production of grammar.
THEORY:
Follow(X) to be the set of terminals that can appear immediately to the right
of Non-Terminal X in some sentential form.
CODE:
#include<stdio.h>
#include<ctype.h>
#include<string.h>
int limit, x = 0;
char production[10][10], array[10];
33
void find_follow(char ch);
void Array_Manipulation(char ch);
int main()
{
int count;
char option, ch;
printf("\nEnter Total Number of Productions:\t");
scanf("%d", &limit);
for(count = 0; count < limit; count++)
{
printf("\nValue of Production Number [%d]:\t", count + 1);
scanf("%s", production[count]);
}
do
{
x = 0;
printf("\nEnter production Value to Find Follow:\t");
scanf(" %c", &ch);
find_follow(ch);
printf("\nFollow Value of %c:\t{ ", ch);
for(count = 0; count < x; count++)
{
printf("%c ", array[count]);
}
printf("}\n");
printf("To Continue, Press Y:\t");
scanf(" %c", &option);
}while(option == 'y' || option == 'Y');
return 0;
}
36
INPUT :
OUTPUT:
37
RESULT & DISCUSSION:
The FIRST set for each non-terminal is-
1. FOLLOW (E) = {)}
2. FOLLOW (D) = { $,)}
3. FOLLOW (T) = {+, $,)}
4. FOLLOW (S) = {+, $,)}
5. FOLLOW (F) = {*,),$}
• These set are further used for predictive parsing as a part of syntax
analyzer stage. By using the FIRST and FOLLOW set, They can find the
probable production to be used for further parsing of the input lexemes
making it a better alternative than Recursive Descent Algorithm.
38
EXPERIMENT 7
AIM: To find FIRST set for given production of grammar.
THEORY:
They saw the need of backtrack in the Recursive Descent Algorithm which is
really a complex process to implement. There can be easier way to sort out this
problem:
If the compiler would have come to know in advance, that what is the “first
character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can
wisely take decision on which production rule to apply.
CODE:
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<vector>
using namespace std;
char production[25][25];
39
int limit;
void Array_Manipulation(char array[], char value) {
int temp;
for(temp = 0; array[temp] != '\0'; temp++)
{
if(array[temp] == value) {
return; }
}
array[temp] = value; array[temp + 1] = '\0';
}
void Find_First(char* array, char ch) {
int count, j, k;
char temporary_result[20]; int x;
temporary_result[0] = '\0'; array[0] = '\0'; if(!(isupper(ch)))
{
Array_Manipulation(array, ch);
return ; }
for(count = 0; count < limit; count++) {
if(production[count][0] == ch) {
if(production[count][2] == '$') {
Array_Manipulation(array, '$'); }
else {
j = 2;
while(production[count][j] != '\0') {
x = 0;
Find_First(temporary_result, production[count][j]); for(k = 0; temporary_result[k] != '\0';
k++)
{
Array_Manipulation(array,temporary_result[k]); }
for(k = 0; temporary_result[k] != '\0'; k++) {
if(temporary_result[k] == '$') {
x = 1;
break; }
} if(!x) {
break; }
j++; }
}}
}
return; }
int main() {
char option;
char ch;
char array[25];
cout<<"Enter Total Number of Productions: "; cin>>limit;
40
cout<<"Value of Production Number " <<count + 1<<" :";
cin>>production[count];
}
vector<char> prod;
for(int count=0;count<limit;count++) {
if(find(prod.begin(),prod.end(),production[count][0])==prod.end()) {
prod.push_back(production[count][0]); Find_First(array,production[count][0]);
cout<<"FIRST SET for "<<production[count][0]<<" is: "; int i;
cout<<"{";
for( i=0;array[i+1]!='\0';i++)
cout<<array[i]<<", "; cout<<array[i]<<"}\n";
}}
return 0; }
OUTPUT:
41
These set are further used for predictive parsing as a part of syntax analyzer
stage. By using the FIRST and FOLLOW set, They can find the probable
production to be used for further parsing of the input lexemes making it a
better alternative than Recursive Descent Algorithm. For sets with ε,
FOLLOW is used to find the parsing table.
42
EXPERIMENT 8
AIM:
THEORY:
ALGORITHM:
CODE:
43
#include <iostream>
#include <map>
#include <set>
#include <vector>
#include <string>
#include <stack>
#include <algorithm>
using namespace std;
44
}
if(j==p.size()) continue;
else if(j==p.size()-1){
if(p[0]!=c) set_add(output, follow(productions,p[0]));
continue;
}
j++;
output.insert('e');
while(output.find('e')!=output.end()){
output.erase('e');
if(j!=p.size()) set_add(output, first(p[j++]));
else break;
}
}
return output;
}
set<char> f = first(p.substr(2,p.size()-1));
set<char>::iterator fi = f.begin();
bool contains_e = true;
while(fi!=f.end()){
if(*fi != 'e'){
contains_e = false;
table[*it][*fi] = p.substr(2,p.size()-1);
}
fi++;
}
if(contains_e){
45
f = first(p.substr(2,p.size()-1));
fi = f.begin();
contains_e = true;
while(fi!=f.end()){
if(*fi != 'e'){
contains_e = false;
table[*it][*fi] = p.substr(2,p.size()-1);
}
fi++;
}
if(contains_e){
table[*it]['$'] = p.substr(2,p.size()-1);
}
}
}
it++;
}
stack<char> s;
s.push('$');
s.push('S');
input.push_back('$');
while(input[la]!='$'){
if(isTerminal(s.top())){
if(s.top() == input[la]){
s.pop();
la++;
}
46
else
return false;
}
else{
char var = s.top();
string p = table[var][input[la]];
//string p = table(var, input[la]);
if(p.size()==0) return false;
reverse(p.begin(), p.end());
s.pop();
for(int i=0; i<p.size(); i++){
s.push(p[i]);
}
}
}
if(s.top()=='$') return true;
else return false;
}
int main(){
vector<string> productions;
productions.push_back("S~i=T-A");
productions.push_back("T~n");
productions.push_back("A~n");
string input;
bool b = LL(input,table);
if(b)
cout<<"The entered string CAN be parsed"<<endl;
else
cout<<"The entered string CANNOT be parsed"<<endl;
OUTPUT:
Input string: “id=num-num”
47
Input string: “id=num-num-num”
RESULT:
The given algorithm correctly creates a LL(1) parsing table of the following
grammar :
1. S -> id=T-A
2. T -> num
3. A -> num
The program also parses an input string and outputs the ansTheyr as yes or no.
DISCUSSION:
The program covers both stages of LL(1) parsing, i.e. :
1. Creating the LL(1) parsing table.
2. Parse the string.
It correctly uses the FOLLOW and FIRST sets to make the parsing table.
HoTheyver, the function to compute the FIRST set can be improved further to
make the program work with all grammars.
48
EXPERIMENT 9
AIM :
THEORY :
Yacc is officially known as a "parser". It's job is to analyse the structure of the
input stream, and operate of the "big picture". In the course of it's normal
work, the parser also verifies that the input is syntactically sound. Consider
again the example of a C-compiler. In the C-language, a word can be a function
name or a variable, depending on whether it is folloTheyd by a ( or a = There
should be exactly one } for each { in the program. YACC stands for "Yet Another
Compiler Compiler". This is because this kind of analysis of text files is normally
associated with writing compilers.
1. Process the yacc grammar file using the -d optional flag (which informs
the yacc command to create a file that defines the tokens used in
addition to the C language source code): yacc -d calc.yacc
2. Use the ls command to verify that the following files Theyre created:
y.tab.c The C language source file that the yacc command created for
the parser y.tab.h A header file containing define statements for the
tokens used by the parser
49
3. Process the lex specification file: lex calc.lex
4. Use the ls command to verify that the following file was created:
lex.yy.c The C language source file that the lex command created for
the lexical analyzer
6. Compile and link the two C language source files: cc y.tab.c lex.yy.c
7. Use the ls command to verify that the following files Theyre created:
y.tab.o The object file for the y.tab.c source file lex.yy.o The object
file for the lex.yy.c source file a.out The executable program file
CODE :
calc.yacc
%{
#include<stdio.h>
int regs[26];
int base;
%}
%start list
%union { int a; }
%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS /*supplies precedence for unary minus */
list: /*empty */
|
list stat '\n'
|
list error '\n'
{
yyerrok;
50
}
;
stat: expr
{
printf("%d\n",$1);
}
|
LETTER '=' expr
{
regs[$1.a] = $3.a;
}
51
|
LETTER
{
$$.a = regs[$1.a];
}
|
number
;
number: DIGIT
{
$$ = $1;
base = ($1.a==0) ? 8 : 10;
} |
number DIGIT
{
$$.a = base * $1.a + $2.a;
}
;
%%
main()
{
return(yyparse());
}
yyerror(s)
char *s;
{
fprintf(stderr, "%s\n",s);
}
yywrap()
{
return(1);
}
calc.lex
%{
#include <stdio.h>
#include "y.tab.h"
int c;
%}
%%
"" ;
[a-z] {
c = yytext[0];
yylval.a = c - 'a';
return(LETTER);
}
[0-9] {
c = yytext[0];
yylval.a = c - '0';
return(DIGIT);
52
}
[^a-z0-9\b] {
c = yytext[0];
return(c);
}
%%
OUTPUT :
Discussion :
The command runs generate several programs like lex.yy.c and y.tab.c.
Calculator outputs are correct as expected as shown above.
53
EXPERIMENT 10
AIM :
To implement / go through a current CC (Compiler Construction)
conference paper.
Published In :
CC 2017 Proceedings of the 26th International Conference on Compiler
Construction
INTRODUCTION:
PAPER GOALS :
54
oriented programs do not generate such polymorphism-related constraints. While being
capable of exploring different targets of conditional branches, where constraints on symbolic
values are introduced for different branch conditions, the state-of-the-art techniques [16,
cannot explore different targets of method invocations, which depend on the types rather
than values of receiver objects. This limitation has largely limited their ability in generating
test cases effectively for object-oriented programs.
To address this problem, They propose object-oriented symbolic execution (OO-SE), which
leverages classic symbolic execution and extends it with symbolic types. A symbolic type
represents all possible concrete types of a receiver object, thereby enabling its mutation in
test case generation. During symbolic execution, constraints on symbolic types are added to
the path condition and type- related operations such as casting, type testing (e.g., instanceof
in JavaTM) and virtual invocation are tracked symbolically. As a result, They have enriched
the expressiveness of path conditions, by considering not only the constraints on symbolic
values as before but also the constraints on symbolic types as in this work.
Figure 1 depicts a test case generation framework, built by ex- ploiting symbolic types, for
object-oriented libraries. Our approach applies also to applications when their individual
parts are sepa- rately tested. When testing an object-oriented library, the following three
steps are repeated until a certain goal has been met:
1. First, a concrete test input for an API method M = c m(t x) {. . .} in the library is
generated and a harness program is synthesized to invoke the API with the test input.
When testing method m, They can assume symbolic inputs for its formal parameters
as well as the formal parameters of the constructor called on its corresponding
55
receiver object. Let x be such a for- mal parameter. Two cases are considered. If x is
of a primitive type, then x has a symbolic (input) value as before. If x is of a reference
type, then x has a symbolic (input) type T , or more precisely, the (input) object o
referenced by x has a symbolic type T . Note that o may be of any subtype of the
declared type d of x, denoted T d. They write lT → oc to represent the fact that o,
which is constructed with a concrete type c, has a symbolic type T . Here, lT is a
reference to o, such that x = lT . Thus, the program is executed with object o being of
the concrete type c and with constraints on its symbolic type T being collected
whenever the reference lT to o is used in type-related operations, such as type casting,
instanceof and and method invocations.
2. Second, the program is executed symbolically by keeping track of a path condition,
which comprises constraints on not only symbolic values but also symbolic types at
run time.
3. The path condition is mutated and passed to a constraint
They have developed a test case generation framework for object- oriented libraries based
on OO-SE. To test an API method in a library, this framework generates automatically different
test cases and their associated harness programs for the API.
They have implemented our test case generation framework in Java. Our experimental results
on OpenJDK show that symbolic types are not only effective in improving test coverage but
also essential in finding bugs and security vulnerabilities.
EXAMPLE:
Suppose They would like to test the API InterfaceAddress. equals(). To trigger the error, it is
essential to change the type of the object referenced by the field address of its receiver object
and the type of the object referenced by its formal parameter obj.
Figure 3 shows the symbolic execution tree with different exe- cution paths for
InterfaceAddress.equals(). At line 1, if obj instanceof InterfaceAddress fails, the execution
returns at line 2. Otherwise, the execution continues by taking a different path from line 3.
When invoking address.equals(cmp.address) at line 4, They may execute one of the three
target methods, InetAdd- ress.equals(), Inet4Address.equals() and Inet6Address. equals(),
denoted m1, m2, and m3, respectively. Note that m1, i.e., InetAddress.equals() always
returns false (line 10). Thus, line 7, where the null pointer error occurs, can be reached only
if either m2 or m3 is invoked, requiring the object referenced by field address to be of type
Inet4Address or Inet6Addres.
56
Figure 2&3
EVALUTATION / RESULTS:
The objective of our evaluation is to demonstrate that our approach (denoted OO-SE) is
superior over the state-of-the-art (denoted CLASSIC-DSE) for testing object-oriented libraries.
CLASSIC- DSE stands for a classic dynamic symbolic execution technique [16, 22, 54] that
57
executes all virtual method invocations concretely. By exploring method invocation targets
with symbolic types systemat- ically, OO-SE can outperform CLASSIC-DSE in terms of both test
coverage achieved and bug-finding ability demonstrated.
They will compare CLASSIC-DSE and OO-SE by us- ing the five packages from OpenJDK 7
Update 6, as shown in Ta- ble 2, where the packages are listed in the ascending order of the
number of basic blocks possessed. The five packages are selected because in testing the five
packages, They did not observe any path divergence and both CLASSIC-DSE and OO-SE are
able to mu- tate all path conditions until termination within 24 hours. The other packages in
OpenJDK either take too long to make any progress in symbolic execution or encounter
frequently path divergences due to invocations to native methods. As a result, They do not
report here the results on these packages, because their coverage data vary across different
runs, and this non-determinism may lead to invalid conclusions.
For each package evaluated, as listed in Table 2, They give the number of uncommented lines
of code as reported by the SLOC- Count tool [61]. Blank, comment or whitespace-only lines
are not considered. In addition, They also give the number of basic blocks and the number of
virtual calls in a package. In java.beans, there is one reflective call to java.lang.reflect.Method.
invoke(), which is analyzed symbolically by OO-SE (as dis- cussed in Section 4) but only
concretely by CLASSIC-DSE.
58
Figure 8 compares CLASSIC-DSE and OO-SE in terms of their efficiency per block of code. In
this experiment, They limit the depth of the symbolic reasoning to a fixed number of 20 basic
blocks, which provides a reasonable comparison betTheyen the two tools. The run time per
block varies little across the packages—1–3.5 seconds per block. There are no significant
differences betTheyen CLASSIC-DSE and OO-SE. The high run time per block of code for
java.applet can be attributed to its relatively small size, such that a small overhead has a
relatively high impact. Otherwise, there appears to be a small tendency for the run time per
block to increase with the size of the package (i.e., number of blocks in the package).
HIGHER COVERAGE
59
With symbolic types, OO-SE can automatically test OpenJDK more effectively than CLASSIC-
DSE. For the five packages tested, the coverage improvements range from 12.8% at java.sql
to 28.7% at java.rmi with an average of 24.3%. In particular, OO- SE has significantly improved
CLASSIC-DSE in testing java.rmi by lifting its coverage from 54.2% to 69.8%. For all the five
pack- ages, except java.text, OO-SE is able to achieve a coverage of more than 60%. Some
blocks are not covered due to exception handling and conditional branches that do not
depend on symbolic inputs. In our current implementation, OO-SE does not generate
constraints for exceptions. Hence, most exception handling blocks are not covered. In
addition, OO-SE neither reasons symbolically about string operations nor represents
symbolically values return- ing from native methods. As a result, just like CLASSIC-DSE, OO-SE
cannot effectively explore different targets of a conditional branch that depends on some
values with no symbolic representations.
60
Figure 10 compares CLASSIC-DSE and OO-SE in terms of the number of path conditions, i.e.,
test cases mutated from another one per block of code. In this experiment, They also limit
the depth of the symbolic reasoning to a fixed number of 20 blocks. Note that the first test
input used for bootstrapping the symbolic execution of each public API method is not
counted. For java.sql and java.text, OO-SE has produced a large number of mutated path
conditions, i.e., new test cases. For java.applet, CLASSIC- DSE, without exploiting symbolic
types, has failed to produce any new test case. In the case of java.sql, java.rmi and java.text,
OO-SE has produced significantly more test cases than CLASSIC-DSE. With java.applet ignored,
OO-SE has increased the number of mutated test cases per block produced by CLASSIC-DSE
by 84.14%. HoTheyver, this increase does not seem to have any correlation with the number
of virtual invocations available in these packages
LIMITATIONS:
At this stage, they have not incorporated into our tool with performance-oriented
optimizations to improve its efficiency. For example, sophisticated path selection algorithms
can be used to achieve a desired level of test coverage more quickly. In addition, adding
method identity constraints to the path condition ([CALL]) enriches the runtime information
with calling contexts, bringing in new optimization opportunities. How to better make use of
such calling context information to guide path selection, especially call target selection, is
worth separate investigation.
Presently, our harness synthesizer instantiates an object of a given class type by invoking the
public constructors in the class. They associate a symbolic representation with a field of a
reference type in an instantiated object if the field is set in the constructor or can be set by a
setter method. Sometimes, a field cannot be set directly, requiring its owner object to be at
a specific state. To further improve test coverage, they may need to synthesize different
method invocation sequences to instantiate an object with different states.
CITATIONS:
61
2 citations
1. Sora Bae , Joonyoung Park , Sukyoung Ryu, Partition-based coverage metrics and type-
guided search in concolic testing for JavaScript applications, Proceedings of the 5th
International FME Workshop on Formal Methods in Software Engineering, May 20-28,
2017, Buenos Aires, Argentina
62