XML Dom and Sax Parsers

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

XML

DOM and SAX


Parsers
Introduction to parsers

 The word parser comes from


compilers

 In a compiler, a parser is the module


that reads and interprets the
programming language.
Introduction to Parsers

 In XML, a
parser is a
software
component
that sits
between the
application
and the XML
files.
Introduction to parsers

 It reads a text-formatted XML file or


stream and converts it to a
document to be manipulated by the
application.
Well-formedness and validity

 Well-formed documents respect the


syntactic rules.

 Valid documents not only respect the


syntactic rules but also conform to a
structure as described in a DTD.
Validating vs. Non-validating
parsers

 Both parsers enforce syntactic rules

 only validating parsers know how to


validate documents against their
DTDs
Tree-based parsers

 These map an XML document into an


internal tree structure, and then
allow an application to navigate that
tree.

 Ideal for browsers, editors, XSL


processors.
Event-based

 An event-based API reports parsing


events (such as the start and end of
elements) directly to the application
through callbacks.

 The application implements handlers


to deal with the different events
Event-based vs. Tree-based
parsers

 Tree-based parsers deal generally


small documents.

 Event-based parsers deal generally


used for large documents.
Event-based vs. Tree-based
parsers

 Tree-based parsers are generally


easier to implement.

 Event-based parsers are more


complex and give hard time for the
programmer
What is DOM?

 The Document Object Model (DOM)


is an application programming
interface (API) for HTML and XML
documents.

 It defines the logical structure of


documents and the way a document
is accessed and manipulated
Properties of DOM
 Programmers can build documents,
navigate their structure, and add, modify,
or delete elements and content.

 Provides a standard programming


interface that can be used in a wide
variety of environments and applications.

 structural isomorphism.
DOM Identifies

 The interfaces and objects used to


represent and manipulate a document.

 The semantics of these interfaces and


objects - including both behavior and
attributes.

 The relationships and collaborations


among these interfaces and objects.
What DOM is not!!

 The Document Object Model is not a


binary specification.

 The Document Object Model is not a way


of persisting objects to XML or HTML.

 The Document Object Model does not


define "the true inner semantics" of XML
or HTML.
What DOM is not!!

 The Document Object Model is not a


set of data structures, it is an object
model that specifies interfaces.

 The Document Object Model is not a


competitor to the Component Object
Model (COM).
DOM into work
<?xml version="1.0"?>
<products>
<product>
<name>XML Editor</name>
<price>499.00</price>
</product>
<product>
<name>DTD Editor</name>
<price>199.00</price>
</product>
<product>
<name>XML Book</name>
<price>19.99</price>
</product>
<product>
<name>XML Training</name>
<price>699.00</price>
</product>
</products>
DOM into work
DOM levels: level 0

 DOM Level 0 is a mix of Netscape


Navigator 3.0 and MS Internet
Explorer 3.0 document
functionalities.
DOM levels: DOM 1

 It contains functionality for document


navigation and manipulation.

i.e.: functions for creating, deleting


and changing elements and their
attributes.
DOM level 1 limitations
 A structure model for the internal
subset and the external subset.
 Validation against a schema.
 Control for rendering documents via
style sheets.
 Access control.
 Thread-safety.
 Events
DOM levels: DOM 2
 A style sheet object model and
defines functionality for manipulating
the style information attached to a
document.
 Enables of the traversal on the

document.
 Defines an event model.

 Provides support for XML

namespaces
DOM levels: DOM 3
 Document loading and saving as well
as content models (such as DTD’s
and schemas) with document
validation support.

 Document views and formatting, key


events and event groups
An Application of DOM
<HTML>
<HEAD>
<TITLE>Currency Conversion</TITLE>
<SCRIPT LANGUAGE="JavaScript" SRC="conversion.js"></SCRIPT>
</HEAD>
<BODY>
<CENTER>
<FORM ID="controls">
File: <INPUT TYPE="TEXT" NAME="fname" VALUE="prices.xml">
Rate: <INPUT TYPE="TEXT" NAME="rate" VALUE="0.95274" SIZE="4"><BR>
<INPUT TYPE="BUTTON" VALUE="Convert" ONCLICK="convert(controls,xml)">
<INPUT TYPE="BUTTON" VALUE="Clear" ONCLICK="output.value=''"><BR>
<TEXTAREA NAME="output" ROWS="10" COLS="50" READONLY> </TEXTAREA>
</FORM>
<xml id="xml"></xml>
</CENTER>
</BODY>
</HTML>
An Application of DOM
 <xml id="xml"></xml>: defines an XML
island.

 XML islands are mechanisms used to


insert XML in HTML documents.

 In this case, XML islands are used to


access Internet Explorer’s XML parser. The
price list is loaded into the island.
An Application of DOM
 The “Convert” button in the HTML file
calls the JavaScript function
convert(), which is the conversion
routine.

 convert() accepts two parameters,


the form and the XML island.
An Application for DOM
<SCRIPT LANGUAGE="JavaScript"
SRC="conversion.js"></SCRIPT>

function convert(form,xmldocument)
{var fname = form.fname.value,
output = form.output,
rate = form.rate.value;
output.value = "";
var document = parse(fname,xmldocument),
topLevel = document.documentElement;
searchPrice(topLevel,output,rate);}

function parse(uri,xmldocument)
{xmldocument.async = false;
xmldocument.load(uri);
if(xmldocument.parseError.errorCode != 0)
alert(xmldocument.parseError.reason);
return xmldocument;}
function searchPrice(node,output,rate)
{if(node.nodeType == 1)
{if(node.nodeName == "price")
output.value += (getText(node) * rate) + "\r";
var children,
i;
children = node.childNodes;
for(i = 0;i < children.length;i++)
searchPrice(children.item(i),output,rate);}}

function getText(node)
{return node.firstChild.data;}
An Application of DOM
 nodeType is a code representing the type of the object.

 parentNode is the parent (if any) of current Node object.


 childNode is the list of children for the current Node object.

 firstChild is the Node’s first child.


 lastChild is the Node’s last child.

 previousSibling is the Node immediately preceding the


current one.
 nextSibling is the Node immediately following the current
one.

 attributes is the list of attributes, if the current Node has


any.
An Application of DOM

 The parse() function loads the price


list in the XML island and returns its
Document object.

 The function searchPrice() tests


whether the current node is an
element.
An Application of DOM

 The function
searchPrice() visits
each node by
recursively calling
itself for all
children of the
current node.
An Application for DOM
What is SAX?
 SAX (the Simple API for XML) is an event-
based parser for xml documents.

 The parser tells the application what is in


the document by notifying the application
of a stream of parsing events.

 Application then processes those events to


act on data.
SAX History

 SAX 1.0 was released on May 11, 1998.

 SAX is a common, event-based API for


parsing XML documents, developed as a
collaborative project of the members of
the XML-DEV discussion under the
leadership of David Megginson.
Why SAX?

 For applications that are not so XML-


centric, an object-based interface is
less appealing.

 Efficiency: lower level than object-


based interfaces
Why SAX?

 Event-based interface consumes


fewer resources than an object-
based one

 With an event-based interface, the


application can start processing the
document as the parser is reading it
Limitations of SAX

 With SAX, it is not possible to


navigate through the document as
you can with a DOM.

 The application must explicitly buffer


those events it is interested in.
SAX API

 Parser events are similar to user-


interface events such as ONCLICK (in
a browser) or AWT events (in Java).

 Events alert the application that


something happened and the
application might want to react.
SAX API
 Element opening tags

 Element closing tags

 Content of elements

 Entities

 Parsing errors
SAX API
SAX Example

<?xml version="1.0"?>
<doc>
<para>Hello, world!</para>
</doc>
SAX example

 start document
 start element: doc
 start element: para
 characters: Hello, world!
 end element: para
 end element: doc
 end document

You might also like