0% found this document useful (0 votes)
16 views62 pages

Compiler Design Lab

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 62

DEPARTMENT OF INFORMATION TECHNOLOGY

COMPILER DESIGN (CD)


LABORATORY MANUAL

DELHI TECHNOLOGICAL UNIVERSITY


(Formerly Delhi College of Engineering)
Shahbad Daulatpur, Bawana Road, Delhi-110042
Vision and Mission of the University

Vision :

" To be a world class university through education, innovation and research for the service of
humanity "

Mission :

1. To establish centres of excellence in emerging areas of science, engineering, technology,


management and allied areas.

2. To foster an ecosystem for incubation, product development, transfer of technology and


entrepreneurship.

3. To create environment of collaboration, experimentation, imagination and creativity.

4. To develop human potential with analytical abilities, ethics and integrity.

5. To provide environment friendly, reasonable and sustainable solutions for local & global
needs.

2
Vision and Mission of Department

Vision
To be a world class university through education, innovation and research for the service of
humanity.

Mission
To develop human resources with sound knowledge and intellectual capability for
innovation, research and excellence in the field of Information Technology
To provide conducive academic environment for achieving world class standards in
teaching and cutting-edge research in the field of Information Technology.
To inculcate professional ethics and social responsibility in students and faculty to
contribute in the overall development of society.
To facilitate the development of academia-industry collaborations and societal outreach
programmes.

3
OBJECTIVE OF THIS LABORATORY

Course Objectives:

To enlighten the student with knowledge base in compiler design and its applications
To familiarize the students with the different phases of a compilation process i.e. lexical
analysis, syntax analysis, semantic analysis, code generation and optimization, target
code generation and optimization.

Course Outcomes:

The student should demonstrate a working understanding of the process of lexical


analysis, parsing and other compiler design aspects. Implement the compilation process
in the initial phases and parse an input string to detect lexical and syntactic errors.

Evaluation scheme: CWS (Continuous Evaluation): 25 marks

Reference Books:

1. Aho,Ullman & Sethi, “Compiler Design”, Addison Wesley. ISBN 81-7808-046-X,


2004
2. D.M.Dhamdhere, “Compiler Construction – Principles & Practice”, Macmillan India
ISBN 0333904060, 2000

4
CONTENTS

S. Experiment Page
No. No.

1 To implement a DFA in C++ and conduct a membership 6


test for two strings.

2 Write a program for creating tokens for the 10


high level program a = b + 30;

3 Implement a lexical analyzer for a given 14


source code stored in an input file.

4 For the given grammar G of a fictitious language L parse 19


the input string w=id+id*id Using Recursive Descent
Parsing Algorithm

5 To convert NFA to DFA using Subset construction method 23

6 To find FOLLOW set for given production of grammar. 33

7 To find FIRST set for given production of grammar. 39

8 Write a program to to parse the string id = num-num using 43


LL(1) predictive parser.

9 To create a program to obtain outputs using Lex and Yacc. 49

10 To implement / go through a current CC (Compiler 54


Construction) conference paper.

5
EXPERIMENT 1
AIM:
To implement a DFA in C++ and conduct a membership test for two
strings.

THEORY:
In DFA, for each input symbol, one can determine the state to which the
machine will move. Hence, it is called Deterministic Automaton. As it has a
finite number of states, the machine is called Deterministic Finite Machine
or Deterministic Finite Automaton.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
• Q is a finite set of states.
• ∑ is a finite set of symbols called the alphabet.
• δ is the transition function where δ: Q × ∑ → Q
• q0 is the initial state from where any input is processed (q0 ∈ Q).

• F is a set of final state/states of Q (F ⊆ Q).

A DFA is represented by digraphs called state diagram.


• The vertices represent the states.
• The arcs labelled with an input alphabet show the transitions.
• The initial state is denoted by an empty single incoming arc.
• The final state is indicated by double circles.

STEP 1:
Let us consider a simple Deterministic Finite Automata, for some
Language: having more than one character, where all characters belong to the
set {0,1}.

STEP 2:
They can write a regular expression for the above language as
re = (0+1).(0+1).(0+1)*

6
DFA STATE DIAGRAM for L=(0+1).(0+1).(0+1)* , ∑={0,1}

Above is the DFA for the given language, which accepts only those words
which consists of more than one character.

STEP 3:

Hence the machine


M = ( {q0,q1,qf} , {0,1} , δ , q0 , qf)

Where δ is given by the state table,

STEP 4:
The following the state table for the above DFA:
δ=
Present New State
State
0 1
q0 q1 q1
q1 qf qf
qf qf qf

STEP 5:
And Hence They perform the Membership Test for following two strings :
a. 0110
b. 1
CODE:

7
#include <iostream>
#include <string>
using namespace std;

int main()

{
string inp;
cout<<"Enter String : ";
cin>>inp;
int state;

// final output as Yes or No.


int flag = 0;

// initial state is 0 , further states are 1,2,3...


state = 0;

int pos=0;
while(inp[pos]!='\0'){
if((inp[pos] == '1' || inp[pos] == '0') && state<2 )
state++;
if (state == 2)
flag = 1;
pos++;
}
cout<<"----Membership Test Result----\n";
cout<<"String Belongs to the Language : ";
if (flag==1)
cout<<"Yes";
else
cout<<"No";
cout<<endl;
}

OUTPUT SNAPSHOT:

a. Membership Test for String: 0110

Hence as expected the output is that 0110 belongs to the given language.

b. Membership Test for String: 1

Hence as expected the output is that 1 does not belong to the given language.

8
RESULT & DISCUSSION:

As expected from the C++ code, the membership test for both the test strings
passed.

The automaton takes a finite sequence of 0s and 1s as input. For each state,
there is a transition arrow leading out to a next state for both 0 and 1. Upon
reading a symbol, a DFA jumps deterministically from one state to another by
following the transition arrow.
For example, if the automaton is currently in state q0 and the current input
symbol is 1, then it deterministically jumps to state q1.

A DFA has a start state (denoted graphically by an arrow coming in from


nowhere) where computations begin, and a set of accept states (denoted
graphically by a double circle) which help define when a computation is
successful.

A DFA is defined as an abstract mathematical concept, but is often


implemented in hardware and software for solving various specific problems.

For example, a DFA can model software that decides whether or not online
user input such as email addresses are valid.
DFAs recognize exactly the set of regular languages, which are, among other
things, useful for doing lexical analysis and pattern matching.

9
EXPERIMENT 2
AIM:
Write a program for creating tokens for the high level program a =
b + 30;

THEORY:
Lexical analysis is the first phase of a compiler. It takes the modified source code
from language preprocessors that are written in the form of sentences. The
lexical analyzer breaks these syntaxes into a series of tokens, by removing any
whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error. The lexical
analyzer works closely with the syntax analyzer. It reads character streams from
the source code, checks for legal tokens, and passes the data to the syntax
analyzer when it demands.

ALGORITHM :
1. Scan the string character by character
2. If the symbol encountered is from a-z or A-Z,add the symbol to map with
giving a suitable id and increment static row number
3. If the symbol encountered is one of the operator or any punctuation
push it to the map
4. Traverse the string path and print respective values from map

CODE :

#include<iostream>
#include<string>
#include<map>
#include <sstream>

using namespace std;


class tokenate{
public:
string id;
int entry;
};
int ifOperator(int i,string s)
{

10
if(s[i]=='+')
return 1;
else if(s[i]=='-')
return 1;
else if(s[i]=='*')
return 1;
else if(s[i]=='/')
return 1;
if(s[i]=='=' || s[i]=='>' || s[i]=='<')
{
if(s[i+1]=='=')
return 2;
else
return 1;
}
return 0;
}
int ifPunctuation(int i,string s)
{
if(s[i]==' ')
return 1;
else if(s[i]==';')
return 1;
else if(s[i]=='(')
return 1;
else if(s[i]==')')
return 1;
return 0;
}
int ifIdentifier(int i,string s)
{
int l=1;
while(!ifOperator(i+1,s) && !ifPunctuation(i+1,s))
{
//cout<<"in identifier "<<s[i]<<" "<<ifOperator(i,s)<<" "<<ifPunctuation(i,s)<<endl;
l++;
i++;
}
return l;
}
bool isDigit(int i,string s)
{
if(s[i]=='0' || s[i]=='1' || s[i]=='2' || s[i]=='3' || s[i]=='4' || s[i]=='5' || s[i]=='6' || s[i]=='7' || s[i]=='8' ||
s[i]=='9')
return true;
return false;
}
int ifNumber(int i,string s)
{
int j=i,l=0;
while(isDigit(j,s))

11
{
l++;
j++;
}
return l;
}
int main()
{
/*compiler design*/
string s,k;
int i=0,l=0,j=0,index=0;
cin>>s;
map<int, string> umap;
while(s[i]!='\0')
{
if(ifOperator(i,s))
{
l=ifOperator(i,s);
j=1;
//cout<<"operator "<<l<<endl;
}
else if(ifNumber(i,s))
{
l=ifNumber(i,s);
j=1;
//cout<<"Number "<<l<<endl;
}
else if(ifPunctuation(i,s))
{
l=ifPunctuation(i,s);
//cout<<"Punctuation "<<l<<endl;
j=1;
}
else if(ifIdentifier(i,s))
{
l=ifIdentifier(i,s);
//cout<<"identifier "<<l<<endl;
j=2;
}
k=s.substr(i,l);
if(j==2)
{
index++;
tokenate* p=new tokenate();
ostringstream str1;
str1 << index;
string temp = str1.str();
p->id="id"+temp;
p->entry=index;
umap[index]=k;
cout<<"<"<<p->id<<","<<p->entry<<"> ";

12
}
else if(j==1)
{
tokenate* m= new tokenate();
m->id=k;
m->entry=-1;
cout<<"<"<<m->id<<"> ";
}

i=i+l;
l=0;
}
cout<<endl;
return 0;
}

OUTPUT :

RESULT & DISCUSSION:


The Code parses the given input statement a=b+30 and gives out all the
tokens with variables assigned IDs from the symbol tables

They create classes for different characters such as operators,punctuations


and then tokenize each statement from the high level program .

13
EXPERIMENT 3
AIM:
Implement a lexical analyzer for a given source code stored in an
input file.

THEORY:
In computer science, lexical analysis, lexing or tokenization is the process of
converting a sequence of characters (such as in a computer program or Theyb
page) into a sequence of tokens (strings with an assigned and thus identified
meaning).
If the lexical analyzer finds a token invalid, it generates an error. The lexical
analyzer works closely with the syntax analyzer. It reads character streams
from the source code, checks for legal tokens, and passes the data to the
syntax analyzer when it demands.

PROCEDURE:
STEP 1: Create a table of token classes
Token Name Value
WS -

< relop LT
> relop LT

<= relop LT
>= relop LT

<> relop LT
if if -

then then -

else else -

id idno pos no
number number pos no

14
STEP 2: Read Input stream of characters from an external .txt file till EOF(End
of File) and create tokens corresponding the token table given above.

STEP 3: Ignore Spaces and Comments and output all the set of tokens
identified while parsing

CODE :
#include<stdio.h>
#include<ctype.h>
#include<string.h>
#include<stdlib.h>
void keyw(char *p);
int i=0,id=0,kw=0,num=0,op=0;
char keys[32][10]={"auto","break","case","char","const","continue","default",
"do","double","else","enum","extern","float","for","goto",
"if","int","long","register","return","short","signed",
"sizeof","static","struct","switch","typedef","union",
"unsigned","void","volatile","while"};
int main()
{
char ch,str[25],seps[120]=" \t\n,;(){}[]#\"<>";
char oper[]="!%^&*-+=~|.<>/?";
int j;
char fname[50];
FILE *f1;
//clrscr();
printf("enter file path (drive:\\fold\\filename)\n");
scanf("%s",fname);
f1 = fopen(fname,"r");
//f1 = fopen("Input","r");
if(f1==NULL)
{
printf("file not found");
exit(0);
}
while((ch=fgetc(f1))!=EOF)
{
for(j=0;j<=14;j++)
{
if(ch==oper[j])
{
printf("%c is an operator\n",ch);
op++;
str[i]='\0';
keyw(str);
}
}
for(j=0;j<=14;j++)
{

15
if(i==-1)
break;
if(ch==seps[j])
{
if(ch=='#')
{
while(ch!='>')
{
printf("%c",ch);
ch=fgetc(f1);
}
printf("%c is a header file\n",ch);
i=-1;
break;
}
if(ch=='"')
{
do
{
ch=fgetc(f1);
printf("%c",ch);
}while(ch!='"');
printf("\b is an argument\n");
i=-1;
break;
}
str[i]='\0';
keyw(str);
}
}
if(i!=-1)
{
str[i]=ch;
i++;
}
else
i=0;
}
printf("Keywords: %d\nIdentifiers: %d\nOperators: %d\nNumbers: %d\n",kw,id,op,num);
//getch();
}
void keyw(char *p)
{
int k,flag=0;
for(k=0;k<=31;k++)
{
if(strcmp(keys[k],p)==0)
{
printf("%s is a keyword\n",p);
kw++;
flag=1;

16
break;
}
}
if(flag==0)
{
if(isdigit(p[0]))
{
printf("%s is a number\n",p);
num++;
}
else
{
//if(p[0]!=13&&p[0]!=10)
if(p[0]!='\0')
{
printf("%s is an identifier\n",p);
id++;
}
}
}
i=-1;
}

OUTPUT:

1. program.txt

2. CODE Output

17
RESULT & DISCUSSION:
The code written parses the stream of characters from an external text file
and filters out token classes based on various token classes given in the
token table.

In general a lexical analyser performs the following tasks in a compiler:


• Reads the source program, scans the input characters, group them into
lexemes and produce the token as output.
• Enters the identified token into the symbol table.
• Strips out white spaces and comments from source program.
• Correlates error messages with the source program i.e., displays error
message with its occurrence by specifying the line number.

18
EXPERIMENT 4
AIM:
For the given grammar G of a fictitious language L parse the input string
w=id+id*id, Using Recursive Descent Parsing Algorithm

THEORY:
Recursive descent is a top-down parsing technique that constructs the parse
tree from the top and the input is read from left to right. It uses procedures
for every terminal and non-terminal entity. This parsing technique
recursively parses the input to make a parse tree, which may or may not
require back-tracking. But the grammar associated with it (if not left
factored) cannot avoid back-tracking

Algorithm:
1. Start from the input symbol
2. For each non terminal say S
a. Call Procedure(S)
b. for i=1 to k
i. choose production S->A1A2….Ak
ii. if Ai is a non terminal then call Procedure(Ai)
iii. else if Ai is a terminal
• if Ai is same as the current input symbol then break
from current loop of Ai and advance the pointer of
the current input symbol
• else display “Syntax Error”

CODE:
//#include<bits/stdc++.h>
#include<map>
#include<iostream>
#include<vector>

using namespace std;


void update(string &s,map<string,vector<string> > &prod,map<string,int> &terminals){
int i = 0,j;
string word;
while(i < s.length() && s[i] != '='){
if(s[i] != ' ')
word += s[i];
i++;
}
i++;
while(i < s.length() && s[i] == ' '){
i++;

19
}
prod[word].push_back(s.substr(i));
j = i;
while(i < s.length() && s[i] != '+' && s[i] != '*' && s[i] != '(' && s[i] != ')'){
i++;
}
if(i == s.length()){
terminals[s.substr(j)] = 1;
}
return;
}
bool recursiveDescent(string inp,string cur,map<string,vector<string> > &prod,map<string,int>
&terminals){
int op = 0;
string word,c;
cout<<"in "<<cur<<endl;
if(cur == inp){
return true;
}
for(int i = 0;i <= cur.length();i++){
if(i<cur.length() && cur[i] == ' '){
op = i;
}
else if(i == cur.length() || cur[i] == '+' || cur[i] == '*' || cur[i] == '(' || cur[i] == ')'){
if(terminals[word] && inp.substr(op,i-op) == word){
word = "";
continue;
}
for(int j = 0;j < prod[word].size();j++){
if(op == 0)
op = -1;
if(recursiveDescent(inp,cur.substr(0,op+1) + prod[word][j] +
cur.substr(i),prod,terminals)){
return true;
}
}
word = "";
op = i;
}
else{
word += cur[i];
}
}

return false;
}
int main(){
int i,j,n;
string inp,start,productions[100];
map<string,vector<string> > prod;
map<string,int> terminals;

20
cout<<"Enter no. of productions"<<endl;
cin>>n;
cin.ignore();
cout<<"Enter Productions"<<endl;
for(i = 0;i < n;i++){
getline(cin,productions[i]);
update(productions[i],prod,terminals);
}
cout<<"Enter the string to check"<<endl;
cin>>inp;
cout<<"Enter the starting variable"<<endl;
cin>>start;
for(i = 0;i < prod[start].size();i++){
if(recursiveDescent(inp,prod[start][i],prod,terminals)){
cout<<"Accepted"<<endl;
return 0;
}
}
cout<<"Not Accepted"<<endl;
return 0;
}

for(i = 0;i < n;i++){


getline(cin,productions[i]);
update(productions[i],prod,terminals);
}
cout<<"Enter the string to check"<<endl;
cin>>inp;
cout<<"Enter the starting variable"<<endl;
cin>>start;
for(i = 0;i < prod[start].size();i++){
if(recursiveDescent(inp,prod[start][i],prod,terminals)){
cout<<"Accepted"<<endl;
return 0;
}
}
cout<<"Not Accepted"<<endl;
return 0;
}

21
OUTPUT:

LIMITATIONS IN THE ABOVE IMPLEMENTATION:


• Uses “=” sign to specify productions thus it cannot be yet parsed within
a string

RESULT & DISCUSSION:


The Code parses the given input string id+id*id ie declares a yes for the
input string given the grammar in the input of the program above.

Recursive descent parsing requires the grammar to be unambiguous and


should not have left recursion

22
EXPERIMENT 5
AIM:
To convert NFA to DFA using Subset construction method
THEORY:
The Two main algorithms used in subset construction are:

Subset Construction:

Computing e-closure:

23
Input NFA:

CODE:
#include <stdio.h>
#include <string.h>
#define STATES 50
struct Dstate
{
char name;
char StateString[STATES+1];
char trans[10];
int is_final;
}Dstates[50];
struct tran
{
char sym;
int tostates[50];
int notran;
};
struct state
{
int no;
struct tran tranlist[50];
};
int stackA[100],stackB[100],C[100],Cptr=-1,Aptr=-1,Bptr=-1;
struct state States[STATES];
char temp[STATES+1],inp[10];
int nos,noi,nof,j,k,nods=-1;
void pushA(int z)
{
stackA[++Aptr]=z;

24
}
void pushB(int z)
{
stackB[++Bptr]=z;
}
int popA()
{
return stackA[Aptr--];
}
void copy(int i)
{
char temp[STATES+1]=" ";
int k=0;
Bptr=-1;
strcpy(temp,Dstates[i].StateString);
while(temp[k]!='\0')
{
pushB(temp[k]-'0');
k++;
}
}
int popB()
{
return stackB[Bptr--];
}
int peekB()
{
return stackA[Bptr];
}
int peekA()
{
return stackA[Aptr];
}
int seek(int arr[],int ptr,int s)
{
int i;
for(i=0;i<=ptr;i++)
{
if(s==arr[i])
return 1;

25
}
return 0;
}
void sort()
{
int i,j,temp;
for(i=0;i<Bptr;i++)
{
for(j=0;j<(Bptr-i);j++)
{
if(stackB[j]>stackB[j+1])
{
temp=stackB[j];
stackB[j]=stackB[j+1];
stackB[j+1]=temp;
}
}
}
}
void tostring()
{
int i=0;
sort();
for(i=0;i<=Bptr;i++)
{
temp[i]=stackB[i]+'0';
}
temp[i]='\0';
}
void display_DTran()
{
int i,j;
printf("\n\t\t DFA Transition Table ");
printf("\n\t\t -------------------- ");
printf("\nStates\tString\tInputs\n ");
for(i=0;i<noi;i++)
{
printf("\t%c",inp[i]);
}
printf("\n \t----------");

26
for(i=0;i<nods;i++)
{

if(Dstates[i].is_final==0)
printf("\n%c",Dstates[i].name);
else
printf("\n*%c",Dstates[i].name);

printf("\t%s",Dstates[i].StateString);
for(j=0;j<noi;j++)
{
printf("\t%c",Dstates[i].trans[j]);
}
}
printf("\n");
}
void move(int st,int j)
{
int ctr=0;
while(ctr<States[st].tranlist[j].notran)
{
pushA(States[st].tranlist[j].tostates[ctr++]);
}
}
void lambda_closure(int st)
{
int ctr=0,in_state=st,curst=st,chk;
while(Aptr!=-1)
{
curst=popA();
ctr=0;
in_state=curst;
while(ctr<=States[curst].tranlist[noi].notran)
{
chk=seek(stackB,Bptr,in_state);
if(chk==0)
pushB(in_state);
in_state=States[curst].tranlist[noi].tostates[ctr++];
chk=seek(stackA,Aptr,in_state);
if(chk==0 && ctr<=States[curst].tranlist[noi].notran)

27
pushA(in_state);
}
}
}
main()
{
int final[20],start,fin=0,i;
char c,ans,st[20];
printf("\nEnter no. of states in NFA : ");
scanf("%d",&nos);
for(i=0;i<nos;i++)
{
States[i].no=i;
}
printf("\nEnter the start state : ");
scanf("%d",&start);
printf("Enter the no. of final states : ");
scanf("%d",&nof);
printf("\nEnter the final states : \n");
for(i=0;i<nof;i++)
scanf("%d",&final[i]);
printf("\nEnter the no. of input symbols : ");
scanf("%d",&noi);
c=getchar();
printf("\nEnter the input symbols : \n ");
for(i=0;i<noi;i++)
{
scanf("%c",&inp[i]);
c=getchar();
}
inp[i]='e';
printf("\nEnter the transitions : (-1 to stop)\n");
for(i=0;i<nos;i++)
{
for(j=0;j<=noi;j++)
{
States[i].tranlist[j].sym=inp[j];
k=0;
ans='y';
while(ans=='y')

28
{
printf("move(%d,%c) : ",i,inp[j]);
scanf("%d",&States[i].tranlist[j].tostates[k++]);
if(States[i].tranlist[j].tostates[k-1]==-1)
{
k--;ans='n';
break;
}
}
States[i].tranlist[j].notran=k;
}
}
//Conversion
i=0;nods=0;fin=0;
pushA(start);
lambda_closure(peekA());
tostring();
Dstates[nods].name='A';
nods++;
strcpy(Dstates[0].StateString,temp);
while(i<nods)
{
for(j=0;j<noi;j++)
{
fin=0;
copy(i);
while(Bptr!=-1)
{
move(popB(),j);
}
while(Aptr!=-1)
lambda_closure(peekA());
tostring();
for(k=0;k<nods;k++)
{
if((strcmp(temp,Dstates[k].StateString)==0))
{
Dstates[i].trans[j]=Dstates[k].name;
break;
}

29
}
if(k==nods)
{
nods++;
for(k=0;k<nof;k++)
{
fin=seek(stackB,Bptr,final[k]);
if(fin==1)
{
Dstates[nods-1].is_final=1;
break;
}
}
strcpy(Dstates[nods-1].StateString,temp);
Dstates[nods-1].name='A'+nods-1;
Dstates[i].trans[j]=Dstates[nods-1].name;
}
}
i++;
}
display_DTran();
}

30
OUTPUT :

31
RESULT & DISCUSSION:
The code converts the above given NFA to DFA using subset construction
methods and produces the DFA transition table as output

Subset Construction method computes Epsilon enclosures to produce a


DFA for the given NFA

32
EXPERIMENT 6
AIM: To find FOLLOW set for given production of grammar.

THEORY:
Follow(X) to be the set of terminals that can appear immediately to the right
of Non-Terminal X in some sentential form.

Apply following rules to find FOLLOW set:


1) FOLLOW(S) = { $ } // where S is the starting Non-Terminal
2) If A -> pBq is a production, where p, B and q are any grammar symbols,
then everything in FIRST(q) except Є is in FOLLOW(B.
3) If A->pB is a production, then everything in FOLLOW(A) is in FOLLOW(B).
4) If A->pBq is a production and FIRST(q) contains Є,
then FOLLOW(B) contains { FIRST(q) – Є } U FOLLOW(A)

The productions of grammar used are-


1. E=TD
2. D=+TD|$
3. T=FS
4. S=*FS|$
5. F=(E)|a

Here E, D, T, S and F are non-terminals while a is terminal and $ represents


epsilon (ε).

CODE:
#include<stdio.h>
#include<ctype.h>
#include<string.h>

int limit, x = 0;
char production[10][10], array[10];

void find_first(char ch);

33
void find_follow(char ch);
void Array_Manipulation(char ch);

int main()
{
int count;
char option, ch;
printf("\nEnter Total Number of Productions:\t");
scanf("%d", &limit);
for(count = 0; count < limit; count++)
{
printf("\nValue of Production Number [%d]:\t", count + 1);
scanf("%s", production[count]);
}
do
{
x = 0;
printf("\nEnter production Value to Find Follow:\t");
scanf(" %c", &ch);
find_follow(ch);
printf("\nFollow Value of %c:\t{ ", ch);
for(count = 0; count < x; count++)
{
printf("%c ", array[count]);
}
printf("}\n");
printf("To Continue, Press Y:\t");
scanf(" %c", &option);
}while(option == 'y' || option == 'Y');
return 0;
}

void find_follow(char ch)


{
int i, j;
int length = strlen(production[i]);
if(production[0][0] == ch)
{
Array_Manipulation('$');
}
for(i = 0; i < limit; i++)
34
{
for(j = 2; j < length; j++)
{
if(production[i][j] == ch)
{
if(production[i][j + 1] != '\0')
{
find_first(production[i][j + 1]);
}
if(production[i][j + 1] == '\0' && ch != production[i][0])
{
find_follow(production[i][0]);
}
}
}
}
}

void find_first(char ch)


{
int i, k;
if(!(isupper(ch)))
{
Array_Manipulation(ch);
}
for(k = 0; k < limit; k++)
{
if(production[k][0] == ch)
{
if(production[k][2] == '$')
{
find_follow(production[i][0]);
}
else if(isloTheyr(production[k][2]))
{
Array_Manipulation(production[k][2]);
}
else
{
find_first(production[k][2]);
}
35
}
}
}

void Array_Manipulation(char ch)


{
int count;
for(count = 0; count <= x; count++)
{
if(array[count] == ch)
{
return;
}
}
array[x++] = ch;
}

36
INPUT :

OUTPUT:

37
RESULT & DISCUSSION:
The FIRST set for each non-terminal is-
1. FOLLOW (E) = {)}
2. FOLLOW (D) = { $,)}
3. FOLLOW (T) = {+, $,)}
4. FOLLOW (S) = {+, $,)}
5. FOLLOW (F) = {*,),$}
• These set are further used for predictive parsing as a part of syntax
analyzer stage. By using the FIRST and FOLLOW set, They can find the
probable production to be used for further parsing of the input lexemes
making it a better alternative than Recursive Descent Algorithm.

• For sets with ε, FOLLOW is used to find the parsing table.

38
EXPERIMENT 7
AIM: To find FIRST set for given production of grammar.

THEORY:
They saw the need of backtrack in the Recursive Descent Algorithm which is
really a complex process to implement. There can be easier way to sort out this
problem:

If the compiler would have come to know in advance, that what is the “first
character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can
wisely take decision on which production rule to apply.

Apply following rules to find FIRST set:


1. If X is terminal, FIRST(X) = {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is a non-terminal, and X → Y1 Y2 … Yk is a production, and ε is in all of
FIRST(Y1), …, FIRST(Yk), then add ε to FIRST(X).
4. If X is a non-terminal, and X → Y1 Y2 … Yk is a production, then add a to
FIRST(X) if for some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1), …, FIRST(Yi-1).

The productions of grammar used are


1. E=TD
2. D=+TD|$
3. T=FS
4. S=*FS|$
5. F=(E)|a

Here E, D, T, S and F are non-terminals while a is terminal and $ represents


epsilon (ϵ).

CODE:
#include<iostream>
#include<cstdio>
#include<algorithm>
#include<vector>
using namespace std;
char production[25][25];

39
int limit;
void Array_Manipulation(char array[], char value) {
int temp;
for(temp = 0; array[temp] != '\0'; temp++)
{
if(array[temp] == value) {
return; }
}
array[temp] = value; array[temp + 1] = '\0';
}
void Find_First(char* array, char ch) {
int count, j, k;
char temporary_result[20]; int x;
temporary_result[0] = '\0'; array[0] = '\0'; if(!(isupper(ch)))
{
Array_Manipulation(array, ch);
return ; }
for(count = 0; count < limit; count++) {
if(production[count][0] == ch) {
if(production[count][2] == '$') {
Array_Manipulation(array, '$'); }
else {
j = 2;
while(production[count][j] != '\0') {
x = 0;
Find_First(temporary_result, production[count][j]); for(k = 0; temporary_result[k] != '\0';
k++)
{
Array_Manipulation(array,temporary_result[k]); }
for(k = 0; temporary_result[k] != '\0'; k++) {
if(temporary_result[k] == '$') {
x = 1;

break; }
} if(!x) {
break; }
j++; }
}}
}
return; }

int main() {
char option;
char ch;
char array[25];
cout<<"Enter Total Number of Productions: "; cin>>limit;

for(int count = 0; count < limit; count++)


{

40
cout<<"Value of Production Number " <<count + 1<<" :";
cin>>production[count];

}
vector<char> prod;
for(int count=0;count<limit;count++) {
if(find(prod.begin(),prod.end(),production[count][0])==prod.end()) {
prod.push_back(production[count][0]); Find_First(array,production[count][0]);
cout<<"FIRST SET for "<<production[count][0]<<" is: "; int i;
cout<<"{";
for( i=0;array[i+1]!='\0';i++)
cout<<array[i]<<", "; cout<<array[i]<<"}\n";
}}
return 0; }

OUTPUT:

RESULT & DISCUSSION:


The FIRST set for each non-terminal is-

FIRST (E) = {(, a}

FIRST (D) = {+, $}

FIRST (T) = {(, a}

FIRST (S) = {*, $}

FIRST (F) = {(, a}

41
These set are further used for predictive parsing as a part of syntax analyzer
stage. By using the FIRST and FOLLOW set, They can find the probable
production to be used for further parsing of the input lexemes making it a
better alternative than Recursive Descent Algorithm. For sets with ε,
FOLLOW is used to find the parsing table.

42
EXPERIMENT 8
AIM:

Write a program to to parse the string id = num-num using LL(1)


predictive parser.

THEORY:

In computer science, an LL parser is a top-down parser for a subset of


context-free languages. It parses the input from Left to right, performing
Leftmost derivation of the sentence.
The first L means the input string is processed from left to right.
The second L means the derivation will be a leftmost derivation (the
leftmost variable is replaced at each step).
An LL parser is called an LL(k) parser if it uses k tokens of lookahead
when parsing a sentence.
LL grammars, particularly LL(1) grammars, are of great practical interest,
as parsers for these
grammars are easy to construct, and many computer languages are
designed to be LL(1) for this reason. LL parsers are table-based parsers,
similar to LR parsers. LL grammars can also be parsed by recursive
descent parsers.

ALGORITHM:

For every production A in the grammar:


For every production A in the grammar:
1. If can derive a string starting with a (i.e., for all a in FIRST( ) ,
Table [A, a] = A
2. If can derive the empty string, , then, for all b that can follow a string
derived from A (i.e., for all b in FOLLOW (A) ,
Table [A,b] = A

CODE:
43
#include <iostream>
#include <map>
#include <set>
#include <vector>
#include <string>
#include <stack>
#include <algorithm>
using namespace std;

bool isTerminal(char c){


return(c=='e' || c=='i' || c=='n' || c=='=' || c=='-');
}

void set_add(set<char> &a, set<char> b){


set<char>::iterator it = b.begin();
while(it!=b.end()){
a.insert(*it);
it++;
}
}

set<char> first(string s){


set<char> t;
if(isTerminal(s[0])){
set<char> t;
t.insert(s[0]);
return t;
}
return t;
}

set<char> first(char c){


set<char> t;
if(isTerminal(c)){
set<char> t;
t.insert(c);
return t;
}
return t;
}

set<char> follow(vector<string> productions, char c){


//cout<<"Follow of "<<c<<" called!"<<endl;
set<char> output;
if(c=='S') output.insert('$');
for(int i=0; i<productions.size(); i++){
string p = productions[i];
int j;
for(j=2; j<p.size(); j++){
if(p[j]==c) break;

44
}
if(j==p.size()) continue;
else if(j==p.size()-1){
if(p[0]!=c) set_add(output, follow(productions,p[0]));
continue;
}
j++;
output.insert('e');
while(output.find('e')!=output.end()){
output.erase('e');
if(j!=p.size()) set_add(output, first(p[j++]));
else break;
}
}
return output;
}

set<char> get_vars(vector<string> productions){


set<char> output;
for(int i=0; i<productions.size(); i++){
string p = productions[i];
for(int j=0; j<p.size(); j++){
if(!isTerminal(p[j])){
output.insert(p[j]);
}
}
}
return output;
}

void make_table(map<char, map<char,string> > &table, vector<string> productions){


set<char> vars = get_vars(productions);
set<char>::iterator it = vars.begin();
while(it!=vars.end()){
for(int i=0; i<productions.size(); i++){
string p = productions[i];
if(p[0]!=*it) continue;

set<char> f = first(p.substr(2,p.size()-1));
set<char>::iterator fi = f.begin();
bool contains_e = true;

while(fi!=f.end()){
if(*fi != 'e'){
contains_e = false;
table[*it][*fi] = p.substr(2,p.size()-1);
}
fi++;
}

if(contains_e){

45
f = first(p.substr(2,p.size()-1));
fi = f.begin();
contains_e = true;

while(fi!=f.end()){
if(*fi != 'e'){
contains_e = false;
table[*it][*fi] = p.substr(2,p.size()-1);
}
fi++;
}

if(contains_e){
table[*it]['$'] = p.substr(2,p.size()-1);
}
}
}
it++;
}

cout<<"The parsing table formed for grammar is:"<<endl;


map<char, map<char,string> >::iterator i = table.begin();
while(i!=table.end()){
char v = (*i).first;
map<char,string> m = (*i).second;
map<char, string>::iterator im = m.begin();
while(im!=m.end()){
cout<<"table["<<(*i).first<<"]["<<(*im).first<<"] = "<<(*im).second<<endl;
im++;
}
cout<<endl;
i++;
}

bool LL(string input, map<char, map<char,string> > table){


int la = 0;

stack<char> s;
s.push('$');
s.push('S');

input.push_back('$');

while(input[la]!='$'){
if(isTerminal(s.top())){
if(s.top() == input[la]){
s.pop();
la++;
}

46
else
return false;
}
else{
char var = s.top();
string p = table[var][input[la]];
//string p = table(var, input[la]);
if(p.size()==0) return false;
reverse(p.begin(), p.end());
s.pop();
for(int i=0; i<p.size(); i++){
s.push(p[i]);
}
}
}
if(s.top()=='$') return true;
else return false;
}

int main(){
vector<string> productions;
productions.push_back("S~i=T-A");
productions.push_back("T~n");
productions.push_back("A~n");
string input;

map<char, map<char,string> > table;


make_table(table, productions);

cout<<"Enter input string"<<endl;


cin>>input;

bool b = LL(input,table);
if(b)
cout<<"The entered string CAN be parsed"<<endl;
else
cout<<"The entered string CANNOT be parsed"<<endl;

OUTPUT:
Input string: “id=num-num”

47
Input string: “id=num-num-num”

RESULT:
The given algorithm correctly creates a LL(1) parsing table of the following
grammar :
1. S -> id=T-A
2. T -> num
3. A -> num
The program also parses an input string and outputs the ansTheyr as yes or no.

DISCUSSION:
The program covers both stages of LL(1) parsing, i.e. :
1. Creating the LL(1) parsing table.
2. Parse the string.
It correctly uses the FOLLOW and FIRST sets to make the parsing table.
HoTheyver, the function to compute the FIRST set can be improved further to
make the program work with all grammars.

48
EXPERIMENT 9
AIM :

To create a program to obtain outputs using Lex and Yacc.

THEORY :

They create a desk-calculator program that performs addition, subtraction,


multiplication, and division operations. This calculator program also allows you
to assign values to variables (each designated by a single, loTheyrcase letter)
and then use the variables in calculations.

Lex is officially known as a "Lexical Analyser".It's main job is to break up an


input stream into more usable elements.Or in, other words, to identify the
"interesting bits" in a text file.

Yacc is officially known as a "parser". It's job is to analyse the structure of the
input stream, and operate of the "big picture". In the course of it's normal
work, the parser also verifies that the input is syntactically sound. Consider
again the example of a C-compiler. In the C-language, a word can be a function
name or a variable, depending on whether it is folloTheyd by a ( or a = There
should be exactly one } for each { in the program. YACC stands for "Yet Another
Compiler Compiler". This is because this kind of analysis of text files is normally
associated with writing compilers.

Running the Program :-

To create the desk calculator example program, do the following:

1. Process the yacc grammar file using the -d optional flag (which informs
the yacc command to create a file that defines the tokens used in
addition to the C language source code): yacc -d calc.yacc

2. Use the ls command to verify that the following files Theyre created:
y.tab.c The C language source file that the yacc command created for
the parser y.tab.h A header file containing define statements for the
tokens used by the parser

49
3. Process the lex specification file: lex calc.lex

4. Use the ls command to verify that the following file was created:
lex.yy.c The C language source file that the lex command created for
the lexical analyzer

6. Compile and link the two C language source files: cc y.tab.c lex.yy.c

7. Use the ls command to verify that the following files Theyre created:
y.tab.o The object file for the y.tab.c source file lex.yy.o The object
file for the lex.yy.c source file a.out The executable program file

8. To run the program directly from the a.out file, type:


$ a.out

CODE :

calc.yacc

%{
#include<stdio.h>

int regs[26];
int base;

%}

%start list

%union { int a; }

%token DIGIT LETTER

%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS /*supplies precedence for unary minus */

%% /* beginning of rules section */

list: /*empty */
|
list stat '\n'
|
list error '\n'
{
yyerrok;

50
}
;
stat: expr
{
printf("%d\n",$1);
}
|
LETTER '=' expr
{
regs[$1.a] = $3.a;
}

expr: '(' expr ')'


{
$$ = $2;
}
|
expr '*' expr
{

$$.a = $1.a * $3.a;


}
|
expr '/' expr
{
$$.a = $1.a / $3.a;
}
|
expr '%' expr
{
$$.a = $1.a % $3.a;
}
|
expr '+' expr
{
$$.a = $1.a + $3.a;
}
|
expr '-' expr
{
$$.a = $1.a - $3.a;
}
|
expr '&' expr
{
$$.a = $1.a & $3.a;
}
|
expr '|' expr
{
$$.a = $1.a | $3.a;
}
|

'-' expr %prec UMINUS


{
$$.a = -$2.a;
}

51
|
LETTER
{
$$.a = regs[$1.a];
}

|
number
;

number: DIGIT
{
$$ = $1;
base = ($1.a==0) ? 8 : 10;
} |
number DIGIT
{
$$.a = base * $1.a + $2.a;
}
;

%%
main()
{
return(yyparse());
}

yyerror(s)
char *s;
{
fprintf(stderr, "%s\n",s);
}

yywrap()
{
return(1);
}

calc.lex

%{

#include <stdio.h>
#include "y.tab.h"
int c;
%}
%%
"" ;
[a-z] {
c = yytext[0];
yylval.a = c - 'a';
return(LETTER);
}
[0-9] {
c = yytext[0];
yylval.a = c - '0';
return(DIGIT);

52
}
[^a-z0-9\b] {
c = yytext[0];
return(c);
}
%%

OUTPUT :

Discussion :

The command runs generate several programs like lex.yy.c and y.tab.c.
Calculator outputs are correct as expected as shown above.

53
EXPERIMENT 10
AIM :
To implement / go through a current CC (Compiler Construction)
conference paper.

Sample Paper: Dynamic Symbolic Execution for Polymorphism


Authors : Lian Li | Yi Lu | JingLing Xeu

Published In :
CC 2017 Proceedings of the 26th International Conference on Compiler
Construction

INTRODUCTION:

Symbolic execution is an important program analysis technique that provides auxiliary


execution semantics to execute programs with symbolic rather than concrete values. There
has been much recent interest in symbolic execution for automatic test case generation and
security vulnerability detection, resulting in various tools being deployed in academia and
industry. Nevertheless, (subtype or dynamic) polymorphism of object-oriented programs has
been neglected: existing symbolic execution techniques can explore different targets of
conditional branches but not different targets of method invocations. They address the
problem of how this polymorphism can be expressed in a symbolic execution framework.
They propose the notion of symbolic types, which make object types symbolic. With symbolic
types, various targets of a method in- vocation can be explored systematically by mutating
the type of the receiver object of the method during automatic test case generation. To the
best of our knowledge, this is the first attempt to address polymorphism in symbolic
execution. Mutation of method invocation targets is critical for effectively testing object-
oriented programs, especially libraries. Our experimental results show that symbolic types
are significantly more effective than existing symbolic execution techniques in achieving test
coverage and finding bugs and security vulnerabilities in OpenJDK.

PAPER GOALS :

This paper addresses (subtype or dynamic) polymorphism in symbolic execution. When


analysing object-oriented programs symbolically, it is critical to explore different targets of
method in- vocations, i.e., the targets that depend on the types of their receiver objects.
Mutating a constraint related to the type of an receiver object should therefore generate a
set of new constraints encompassing all other possible types of the receiver object. However,
such mutation has been overlooked. Existing symbolic execution techniques for object-

54
oriented programs do not generate such polymorphism-related constraints. While being
capable of exploring different targets of conditional branches, where constraints on symbolic
values are introduced for different branch conditions, the state-of-the-art techniques [16,
cannot explore different targets of method invocations, which depend on the types rather
than values of receiver objects. This limitation has largely limited their ability in generating
test cases effectively for object-oriented programs.
To address this problem, They propose object-oriented symbolic execution (OO-SE), which
leverages classic symbolic execution and extends it with symbolic types. A symbolic type
represents all possible concrete types of a receiver object, thereby enabling its mutation in
test case generation. During symbolic execution, constraints on symbolic types are added to
the path condition and type- related operations such as casting, type testing (e.g., instanceof
in JavaTM) and virtual invocation are tracked symbolically. As a result, They have enriched
the expressiveness of path conditions, by considering not only the constraints on symbolic
values as before but also the constraints on symbolic types as in this work.

Figure 1 depicts a test case generation framework, built by ex- ploiting symbolic types, for
object-oriented libraries. Our approach applies also to applications when their individual
parts are sepa- rately tested. When testing an object-oriented library, the following three
steps are repeated until a certain goal has been met:

1. First, a concrete test input for an API method M = c m(t x) {. . .} in the library is
generated and a harness program is synthesized to invoke the API with the test input.
When testing method m, They can assume symbolic inputs for its formal parameters
as well as the formal parameters of the constructor called on its corresponding

55
receiver object. Let x be such a for- mal parameter. Two cases are considered. If x is
of a primitive type, then x has a symbolic (input) value as before. If x is of a reference
type, then x has a symbolic (input) type T , or more precisely, the (input) object o
referenced by x has a symbolic type T . Note that o may be of any subtype of the
declared type d of x, denoted T 􏰁 d. They write lT 􏰁→ oc to represent the fact that o,
which is constructed with a concrete type c, has a symbolic type T . Here, lT is a
reference to o, such that x = lT . Thus, the program is executed with object o being of
the concrete type c and with constraints on its symbolic type T being collected
whenever the reference lT to o is used in type-related operations, such as type casting,
instanceof and and method invocations.
2. Second, the program is executed symbolically by keeping track of a path condition,
which comprises constraints on not only symbolic values but also symbolic types at
run time.
3. The path condition is mutated and passed to a constraint
They have developed a test case generation framework for object- oriented libraries based
on OO-SE. To test an API method in a library, this framework generates automatically different
test cases and their associated harness programs for the API.
They have implemented our test case generation framework in Java. Our experimental results
on OpenJDK show that symbolic types are not only effective in improving test coverage but
also essential in finding bugs and security vulnerabilities.

EXAMPLE:

Suppose They would like to test the API InterfaceAddress. equals(). To trigger the error, it is
essential to change the type of the object referenced by the field address of its receiver object
and the type of the object referenced by its formal parameter obj.

Figure 3 shows the symbolic execution tree with different exe- cution paths for
InterfaceAddress.equals(). At line 1, if obj instanceof InterfaceAddress fails, the execution
returns at line 2. Otherwise, the execution continues by taking a different path from line 3.
When invoking address.equals(cmp.address) at line 4, They may execute one of the three
target methods, InetAdd- ress.equals(), Inet4Address.equals() and Inet6Address. equals(),
denoted m1, m2, and m3, respectively. Note that m1, i.e., InetAddress.equals() always
returns false (line 10). Thus, line 7, where the null pointer error occurs, can be reached only
if either m2 or m3 is invoked, requiring the object referenced by field address to be of type
Inet4Address or Inet6Addres.

56
Figure 2&3

EVALUTATION / RESULTS:

The objective of our evaluation is to demonstrate that our approach (denoted OO-SE) is
superior over the state-of-the-art (denoted CLASSIC-DSE) for testing object-oriented libraries.
CLASSIC- DSE stands for a classic dynamic symbolic execution technique [16, 22, 54] that

57
executes all virtual method invocations concretely. By exploring method invocation targets
with symbolic types systemat- ically, OO-SE can outperform CLASSIC-DSE in terms of both test
coverage achieved and bug-finding ability demonstrated.

They will compare CLASSIC-DSE and OO-SE by us- ing the five packages from OpenJDK 7
Update 6, as shown in Ta- ble 2, where the packages are listed in the ascending order of the
number of basic blocks possessed. The five packages are selected because in testing the five
packages, They did not observe any path divergence and both CLASSIC-DSE and OO-SE are
able to mu- tate all path conditions until termination within 24 hours. The other packages in
OpenJDK either take too long to make any progress in symbolic execution or encounter
frequently path divergences due to invocations to native methods. As a result, They do not
report here the results on these packages, because their coverage data vary across different
runs, and this non-determinism may lead to invalid conclusions.

For each package evaluated, as listed in Table 2, They give the number of uncommented lines
of code as reported by the SLOC- Count tool [61]. Blank, comment or whitespace-only lines
are not considered. In addition, They also give the number of basic blocks and the number of
virtual calls in a package. In java.beans, there is one reflective call to java.lang.reflect.Method.
invoke(), which is analyzed symbolically by OO-SE (as dis- cussed in Section 4) but only
concretely by CLASSIC-DSE.

LESSER RUN TIME :

58
Figure 8 compares CLASSIC-DSE and OO-SE in terms of their efficiency per block of code. In
this experiment, They limit the depth of the symbolic reasoning to a fixed number of 20 basic
blocks, which provides a reasonable comparison betTheyen the two tools. The run time per
block varies little across the packages—1–3.5 seconds per block. There are no significant
differences betTheyen CLASSIC-DSE and OO-SE. The high run time per block of code for
java.applet can be attributed to its relatively small size, such that a small overhead has a
relatively high impact. Otherwise, there appears to be a small tendency for the run time per
block to increase with the size of the package (i.e., number of blocks in the package).

HIGHER COVERAGE

59
With symbolic types, OO-SE can automatically test OpenJDK more effectively than CLASSIC-
DSE. For the five packages tested, the coverage improvements range from 12.8% at java.sql
to 28.7% at java.rmi with an average of 24.3%. In particular, OO- SE has significantly improved
CLASSIC-DSE in testing java.rmi by lifting its coverage from 54.2% to 69.8%. For all the five
pack- ages, except java.text, OO-SE is able to achieve a coverage of more than 60%. Some
blocks are not covered due to exception handling and conditional branches that do not
depend on symbolic inputs. In our current implementation, OO-SE does not generate
constraints for exceptions. Hence, most exception handling blocks are not covered. In
addition, OO-SE neither reasons symbolically about string operations nor represents
symbolically values return- ing from native methods. As a result, just like CLASSIC-DSE, OO-SE
cannot effectively explore different targets of a conditional branch that depends on some
values with no symbolic representations.

MORE NUMBER OF TEST CASES:

60
Figure 10 compares CLASSIC-DSE and OO-SE in terms of the number of path conditions, i.e.,
test cases mutated from another one per block of code. In this experiment, They also limit
the depth of the symbolic reasoning to a fixed number of 20 blocks. Note that the first test
input used for bootstrapping the symbolic execution of each public API method is not
counted. For java.sql and java.text, OO-SE has produced a large number of mutated path
conditions, i.e., new test cases. For java.applet, CLASSIC- DSE, without exploiting symbolic
types, has failed to produce any new test case. In the case of java.sql, java.rmi and java.text,
OO-SE has produced significantly more test cases than CLASSIC-DSE. With java.applet ignored,
OO-SE has increased the number of mutated test cases per block produced by CLASSIC-DSE
by 84.14%. HoTheyver, this increase does not seem to have any correlation with the number
of virtual invocations available in these packages

LIMITATIONS:

At this stage, they have not incorporated into our tool with performance-oriented
optimizations to improve its efficiency. For example, sophisticated path selection algorithms
can be used to achieve a desired level of test coverage more quickly. In addition, adding
method identity constraints to the path condition ([CALL]) enriches the runtime information
with calling contexts, bringing in new optimization opportunities. How to better make use of
such calling context information to guide path selection, especially call target selection, is
worth separate investigation.

Presently, our harness synthesizer instantiates an object of a given class type by invoking the
public constructors in the class. They associate a symbolic representation with a field of a
reference type in an instantiated object if the field is set in the constructor or can be set by a
setter method. Sometimes, a field cannot be set directly, requiring its owner object to be at
a specific state. To further improve test coverage, they may need to synthesize different
method invocation sequences to instantiate an object with different states.

CITATIONS:

61
2 citations
1. Sora Bae , Joonyoung Park , Sukyoung Ryu, Partition-based coverage metrics and type-
guided search in concolic testing for JavaScript applications, Proceedings of the 5th
International FME Workshop on Formal Methods in Software Engineering, May 20-28,
2017, Buenos Aires, Argentina

2. Neville Grech , George Fourtounis , Adrian Francalanza , Yannis Smaragdakis, Heaps


don't lie: countering unsoundness with heap snapshots, Proceedings of the ACM on
Programming Languages, v.1 n.OOPSLA, October 2017

62

You might also like