Comparision of Different Types of Parser and Parsing Techniques
Comparision of Different Types of Parser and Parsing Techniques
Comparision of Different Types of Parser and Parsing Techniques
I. INTRODUCTION
A tool that supports us in getting an overview of a software
system must somehow translate that system into a model. This
translation is a challenging point. Someone must write a
parser that can translate that software system into the model
he wants to support. So, the maintainers of such tools must
provide a parser for every programming language they want
to support. But it is not only the number of languages that is a
problem. A language itself also evolves. A parser that works
with a specific version and/or dialect could not work with the
next version anymore.
A parser does two things while processing its input:
1. Split the input into tokens.
2. Find the hierarchical structure of the input. Figure 2: overview of parsing process
a) Fuzzy Parsing
Most reengineering frameworks use a form of fuzzy parsing in
order to support more programming languages or more
dialects of the same programming language. The goal of a
fuzzy parser is the extraction of a partial source code model
based on a syntactical analysis. The key idea of fuzzy parsing
is that there are some anchor terminals. The parser skips all
input until an anchor terminal is found and then context-free
analysis is attempted using a production starting with the
found anchor terminal
b) Island Grammars
With island grammars we get tolerant parsers. An island
grammar is a grammar that consists of detailed productions
describing certain constructs of interest (the islands) and
Figure 1: The way from source code to a model. liberal productions that catch the remainder (the water). By
varying the amount and details in productions for the
constructs of interest, we can trade off accuracy,
Manuscript received February 20, 2015. completeness and development speed. There are some
Mr.Amitesh Saxena, Ph.D. Scholar, Pacific University, Udaipur different versions of island grammars known besides the one
Mrs. Snehlata Kothari, H.O.D. IT Department, Pacific University,
Udaipur that we just defined [MOON 01]. Leon Moonen speaks of the
following:
254 www.erpublication.org
Comparision of Different Types of Parser and Parsing Techniques
Lake grammar: When we start with a complete grammar of a C. Extensible Markup Language (XML)
language and extend it with a number of liberal productions
(water) we get a lake grammar. Such a grammar is useful Extensible Markup Language (XML) is a markup
when we want to allow arbitrary embedded code in the language that defines a set of rules for encoding documents in
program we want to process. a format that is both human-readable and machine-readable.
Islands with lakes: This is a mix of productions for islands It is defined in the XML 1.0 Specification produced by
and water. We can specify nested constructs as islands with the W3C, and several other related specifications, all
lakes. free open standards. The design goals of XML emphasize
Lakes with islands: This is another mix of productions for simplicity, generality, and usability over the Internet. It is a
islands and water. textual data format with strong support via Unicode for
different human languages. Although the design of XML
III. MARKUP LANGUAGES USED IN PARSING focuses on documents, it is widely used for the representation
of arbitrary data structures, for example in web services.
A. Standard Generalized Markup Language (SGML) Many application programming interfaces (APIs) have been
It deals with the structural markup of electronic documents. developed to aid software developers with processing XML
The basic SGML document consists of a DTD or Document data, and several schema systems exist to aid in the definition
Type Declaration, one of several top level elements of XML-based languages.
(otherwise known as tags or markups), paragraphs and text. XML declaration
The top level element should be a <book>, <chapter>, XML documents may begin by declaring some information
<article>, or <sect1>, depending on the type of document you about themselves, as in the following example:
are writing. We will be using <article> for our documents. <? xml version="1.0" encoding="UTF-8"?>
Here is an example of a simple SGML document.
XML is used for structuring the data. The Structured data
includes things like spreadsheets, address books,
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook configuration parameters, financial transactions, and
V3.1//EN"> technical drawings. XML is a set of rules (you may also think
<article> of them as guidelines or conventions) for designing text
<sect1 id="introduction"><title>Hello world formats that let you structure your data. XML is not a
introduction</title> programming language, and you don't have to be a
<para> programmer to use it or learn it. XML makes it easy for a
Hello world! computer to generate data, read data, and ensure that the data
</para> structure is unambiguous. XML avoids common pitfalls in
</sect1> language design: it is extensible, platform-independent, and it
</article> supports internationalization and localization. XML is
fully Unicode-compliant.
XML has come into common use for the interchange of data
Notice on the document that how we commented out the over the Internet. IETF RFC 7303 gives rules for the
license using the <!-- and the -->. This is important; if you construction of Internet Media Types for use when sending
forget this you will get all kinds of errors when you run the file XML. It also defines the media
through the SGML parser. This information will not be types application/xml and text/xml, which say only that the
viewable once you build it. The reason it is not viewable is the data are in XML, and nothing about its semantics. The use
parser thinks it's just a comment (and it is!) so it just drops it of text/xml has been criticized as a potential source of
out of the final parsed document. encoding problems and it has been suggested that it should be
B. Hyper Text Markup Language (HTML) deprecated. RFC 7303 also recommends that XML-based
languages be given media types ending in +xml; for
HTML is not a programming language, but rather
example image/svg+xml for SVG. Further guidelines for the
a markup language. If you already know XML, HTML will be
use of XML in a networked context may be found in RFC
a snap for you to learn. We urge you not to attempt to blow
3470, also known as IETF BCP 70 a document which covers
through this tutorial in one sitting. Instead, we recommend
many aspects of designing and deploying an XML-based
that you spend 15 minutes to an hour a day practicing HTML
language.
and then take a break to let the information settle in. We aren't
going anywhere! .HTML hasn't been around for many years. XML parsers
HTML is a markup language for describing web documents
Oracle provides XML parsers for Java, C, C++, and PL/SQL.
(web pages).
This chapter discusses the parser for Java only. Each of these
HTML stands parsers is a standalone XML component that parses an XML
for Hyper Text Markup Language document (and possibly also a standalone document type
A markup language is a set of markup tags definition (DTD) or XML Schema) so that they can be
processed by your application. In this chapter, the application
HTML documents are described by HTML
examples presented are written in Java
tags
Each HTML tag describes different document
content
255 www.erpublication.org
International Journal of Engineering and Technical Research (IJETR)
ISSN: 2321-0869, Volume-3, Issue-2, February 2015
components in an interpreter or compiler that checks for
correct syntax and builds a data structure (often some kind of
parse tree, abstract syntax tree or other hierarchical structure)
implicit in the input tokens.
256 www.erpublication.org
Comparision of Different Types of Parser and Parsing Techniques
V. CONCLUSION
I am working on the parsing technique to find the best parsing
technique for different operating system. I have displayed the
working modal of my research. We use Descriptive statistics
along with 1x3 factorial ANOVA Technique and for the
comparison mean, SD, z-test, t- test have been performed for
data analysis.
REFERENCES
257 www.erpublication.org