Why Is XML So Important?

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

What is XML?

"XML is a cross-platform, software and hardware independent tool for transmitting


information"
XML is a W3C Recommendations. It stands for Extensible Markup Language . It is a markup
language much like HTML used to describe data. In XML, tags are not predefined. A user
defines his own tags and XML document structure like Document Type Definition (DTD) ,
XML Schema to describe the data. Hence it is self-descriptive too.There is Nothing Special
About XML It is just plain text with the addition of some XML tags enclosed in angle brackets.
In a simple text editor, the XML document is easily visible .

Why Is XML So Important?


There are number of reasons that contributes to the XML's increasing acceptance , few of them
are:
Plain Text
In XML it is easy to create and edit files with anything from a standard text editor to a visual
development environment. XML also provides scalability for anything from small configuration
files to a company-wide data repository.
Data Identification
The markup tags in XML documents identifiy the information and break up the data into parts
for example.. a search program can look for messages sent to particular people from the rest of
the message. Different parts of the information are identified and further they can be used in
different ways by different applications.
Stylability
When display matters, the stylesheet standard, XSL (an advance feature of XML), lets you
dictate over the convectional designs ( like using HTML) to portray the data. XML being style-
free, uses different stylesheets to produce output in postscript, TEX, PDF, or some new format
that hasn't even been invented yet. A user can use a simple XML document to display data in
diverse formats like
• a plain text file
• an XHTML file
• a WML (Wireless Markup Language) document suitable for display on a PDA
• an Adobe PDF document suitable for hard copy
• a VML (Voice Markup Language) dialog for a voicemail information system
• an SVG (Scalable Vector Graphic) document that draws pictures of thermometers and water
containers
Universally Processed
Apart from being valid , restrictions are imposed on a xml file to abide by a DTD or a Schema to
make it well-formed .Otherwise, the XML parser won't be able to read the data. XML is a
vendor-neutral standard, so a user can choose among several XML parsers to process XML data.
Hierarchical Approach
XML documents get benefitted from their hierarchical structure. Hierarchical document
structures are, faster to access. They are also easier to rearrange, because each piece is delimited.
This makes xml files easy to modify and maintain.
Inline Reusabiliy
XML documents can be composed of separate entities. XML entities can be included "in line" in
a XML document. And this included sections look like a normal part of the document .A user
can single-source a section so that an edit to it is reflected everywhere the section is used, and yet
a document composed looks like a one-piece document.
How Can You Use XML?
Few Applications of XML
Although there are countless applications that use XML, here are a few examples of the
applications that are making use of this technology.
Refined search results - With XML-specific tags, search engines can give users more refined
search results. A search engine seeks the term in the tags, rather than the entire document, giving
the user more precise results.
EDI Transactions - XML has made electronic data interchange (EDI) transactions accessible to
a broader set of users. XML allows data to be exchanged, regardless of the computing systems
or accounting applications being used.
Cell Phones - XML data is sent to some cell phones, which is then formatted by the specification
of the cell phone software designer to display text, images and even play sounds!
File Converters - Many applications have been written to convert existing documents into the
XML standard. An example is a PDF to XML converter.
VoiceXML - Converts XML documents into an audio format so that a user can listen to an XML
document.
and many more........
In the 1970’s, Charles Goldfarb, Ed Mosher and Ray Lorie invented GML at IBM. GML
was used to describe a way of marking up technical documents with structural tags. The initials
stood for Goldfarb, Mosher and Lorie.
Goldfarb invented the term “mark-up language” to make better use of the initials and it became
the Standard Generalised Markup Language .
In 1986 , SGML was adopted by the ISO .
SGML is just a specification for defining markup languages.
SGML (Standardized Generalized Markup Language) is the mother of all markup languages like
HTML, XML, XHTML, WML etc...
In 1986, SGML became an international standard for defining the markup languages. It was
used to create other languages, including HTML, which is very popular for its use on the web.
HTML was made by Tim Berners Lee in 1991.
While on one hand SGML is very effective but complex, on the other, HTML is very easy, but
limited to a fixed set of tags. This situation raised the need for a language that was as effective as
SGML and at the same time as simple as HTML. This gap has now been filled by XML.
The development of XML started in 1996 at Sun Microsystems. Jon Bosak with his team
began work on a project for remoulding SGML. They took the best of SGML and produced
something to be powerful, but much simpler to use.
The World Wide Web Consortium also contributes to the creation and development of the
standard for XML. The specifications for XML were laid down in just 26 pages, compared to the
500+ page specification that define SGML.

Comparing XML with HTML

The Main Differences Between XML and HTML


XML is designed to carry data.
XML describes and focuses on the data while HTML only displays and focuses on how data
looks. HTML is all about displaying information but XML is all about describing information.
In current scenario XML is the most common tool for data manipulation and data transmission.
XML is used to store data in files and for sharing data between diverse applications. Unlike
HTML document where data and display logic are available in the same file, XML hold only
data. Different presentation logics could be applied to display the xml data in the required
format. XML is the best way to exchange information.
XML is Free and Extensible
XML tags are not predefined. User must "invent" his tags.
The tags used to mark up HTML documents and the structure of HTML documents are
predefined. The author of HTML documents can only use tags that are defined in the HTML
standard (like <p>, <h1>, etc.).
XML allows the user to define his own tags and document structure.
XML Tags are Case Sensitive
Unlike HTML, XML tags are case sensitive. In HTML the following will work:
<Message>This is
incorrect</message>

In XML opening and closing tags must therefore be written with the same case:
<message>This is correct</message
XML Elements Must be Properly Nested
Improper nesting of tags makes no sense to XML.
In HTML some elements can be improperly nested within each other like this:
<b><i>This text is bold and italic</b></i>
In XML all elements must be properly nested within each other like this:
<b><i>This text is bold and italic</i></b>
XML is a Complement to HTML
XML is not a replacement for HTML.
It is important to understand that XML is not a replacement for HTML. In Web development it
is most likely that XML will be used to describe the data, while HTML will be used to format
and display the same data.

XML Syntax Rules

The syntax rules for XML are very simple and strict. These are easy to learn and use.
Because of this, creating software that can read and manipulate XML is very easy. Xml enables
an user to create his own tags.
Note - XML documents use a self-describing and simple syntax

Let's develop a simple XML document :


<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u
tonight</Body>
</E-mail>

The XML declaration: Always the first line in the xml document:
The XML declaration should always be included. It defines the XML version and the character
encoding used in the document. In this case the document conforms to the 1.0 specification of
XML and uses the ISO-8859-1 (Latin-1/West European) character set.
<?xml version="1.0" encoding="ISO-8859-1"?>

Root Element: The next line defines the first element of the document . It is called as the root
element
<E-mail>

Child Elements: The next 4 lines describe the four child elements of the root (To, From, Subject
and Body).
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u
tonight</Body>

And finally the last line defines the end of the root element .
</E-mail>

you may feel from this example that the XML document contains a E-mail To Rohan From
Amit. Don't you agree that XML is quite self-descriptive?
Now let's discuss its syntax-rules which are very simple to learn.
All XML elements must have a closing tag
In XML all the elements must have a closing tag like this:
<To>Rohan</To>
<From>Amit</From>

XML tags are case sensitive


XML tags are case sensitive. The tag <To> is different from the tag <to>.Hence the opening and
closing tags must be written with the same case:
<To>Rohan</To>
<to>Rohan</to>

XML Elements Must be Properly Nested


Improper nesting of tags makes no sense to XML. In XML all elements must be properly
nested within each other like this in a logical order:
<b><i>Hi , how are you.....</i></b>

XML Documents Must Have a Root Element


All XML documents must contain a single tag pair to define a root element. All other elements
must be written within this root element. All elements can have sub elements called as child
elements. Sub elements must be correctly nested within their parent element:
<root>
<child>

<subchild>.....</subchild>
</child>
</root>

Always Quote the XML Attribute Values


In XML the attribute value must always be quoted. XML elements can have attributes in
name/value pairs just like in HTML. Just look the two XML documents below.
The error in the first document is that the date and version attributes are not
quoted .

<?xml version=1.0 encoding="ISO-8859-1"?>


<E-mail date=12/11/2002/>
The second document is correct:
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail date="12/11/2002"/>

With XML, White Space is Preserved


With XML, the white space in a document is preserved .
So a sentence like this : Hello How are you, will be displayed like this:
Hello How are you,

Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->

XML elements can have attributes in the start tag, just like HTML. Attributes are used to provide
additional information about elements. Attributes often provide information that is not a part
of the data. In the example below, the file type is irrelevant to the data, but important to the
software that wants to manipulate the element:
<file
type="gif">roseindia.gif</file>
Use the quote styles: "red" or 'red'
Attribute values must always be enclosed in quotes. Use either single or double quotes eg..
<color="red">

or like this:
<color='red'>
Note: If the attribute value itself contains double quotes it is necessary to use single quotes, like
in this example:
<name='Rose "India" Net'>

Note: If the attribute value itself contains single quotes it is necessary to use double quotes, like
in this example:
<name="Rose 'India' Net">

Use of Elements vs. Attributes


If you start using attributes as containers for XML data, you might end up with the documents
that are both difficult to maintain and manipulate. So the user should use elements to describe
the data. Use attributes only to provide data that is not relevant to the reader. Only metadata (data
about data) should be stored as attributes, and that data itself should be stored as elements.
This is not the way to use attributes eg..
<?xml version="1.0"
encoding="ISO-8859-1"?>
<E-mail To="Rohan"
From="Amit"
Subject="Surprise...."
<Body>Be ready for a
cruise...i will catch u
tonight</Body>
</E-mail>
Try to avoid using attributes in few of the situations.
Lot of problems occur with using attributes values. They are not easily expandable and cannot
contain multiple values .They are not easy to test against a Document Type Definition and are
also unable to describe their structure. Becomes more irritating ,because of its difficultly to get
manipulated by program code.
Here is an example, demonstrating how elements can be used instead of attributes. The following
three XML documents contain exactly the same information. A date attribute is used in the first,
a date element is used in the second, and an expanded date element is used in the third:
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail date="15/05/07">
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u
tonight</Body>
</E-mail>
First xml document contains date as a attribute which can not be further extended. But date
used a element in second document makes it more flexible.
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail >
<date="15/05/07">
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u
tonight</Body>
</E-mail>
Second xml document can be further extended as..
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail >
<date>
<day>12</day>
<month>11</month>
<year>99</year>
</date>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u
tonight</Body>
</E-mail>

XML Validation

XML with correct syntax is Well Formed XML.


XML validated against a DTD or a Schema is a Valid XML.
Well Formed XML Documents
A "Well Formed" XML document has correct XML syntax.
A "Well Formed" XML document is a document that conforms to the XML syntax rules that
were described in the previous chapters:
• XML documents must have a root element
• XML elements must have a closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must always be quoted

<?xml version="1.0" encoding="ISO-8859-1"?>


<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch
u tonight</Body>
</E-mail>

Valid XML Documents:

A "Valid" XML document is a "Well Formed" XML document, which also conforms to the
rules of a Document Type Definition (DTD) or a XML Schema .
The following xml document is validated against a DTD , notice the highlighted text.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE e-mail SYSTEM "InternalE-mail.dtd">
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u tonight</Body>
</E-mail

XML DTD

A DTD defines the legal elements of an XML document. The purpose of a DTD is to define
the legal building blocks of an XML document. It defines the document structure with a list of
legal elements.
XML Schema
XML Schema is an XML based alternative to DTD .W3C supports an alternative to DTD
called XML Schema.

DTD:Document Type Definition

A Document Type Definition (DTD) defines the legal building blocks of an XML document. It
defines the document structure with a list of legal elements and attributes.
A DTD can be defined inside a XML document, or a external reference can be declared .
Internal DTD
If the DTD is defined inside the XML document, it should be wrapped in a DOCTYPE
definition with the following syntax:
<!DOCTYPE root-element [element-
declarations]>

Example of a XML document with an internal DTD: E-mail.xml


<?xml version="1.0"?>
<!DOCTYPE E-mail[
<!ELEMENT E-mail (To,From,subject,Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Subject (#PCDATA)>
<!ELEMENT Body (#PCDATA)>
]>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will
catch u tonight</Body>
</E-mail>

Open the file E-mail.xml in a web-browser . you will see the following :

External DTD
If the DTD is defined in an external file, it should be wrapped in a DOCTYPE definition with
the following syntax:
<!DOCTYPE root-element SYSTEM
"filename">

This is the same XML document as above,(but with an external DTD ) : E-mail.xml
<?xml version="1.0"?>
<!DOCTYPE E-mail SYSTEM
"E-mail.dtd">
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will
catch u tonight</Body>
</E-mail>

And this is the file "E-mail.dtd" which contains the following DTD:
<!ELEMENT E-mail (To,From,subject,Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Subject (#PCDATA)>
<!ELEMENT Body (#PCDATA)>

Open the file E-mail.xml in a web-browser. It will display the following :

Importance of a DTD?
• With a DTD, a XML file carries a description of its own format.
• With a DTD, independent groups of people can agree to use a standard DTD
for interchanging data.
• User application can use a standard DTD to verify that the data he receives
from the outside world is valid.
• User can also use a DTD to verify his own data.

DTD - XML Constituent

The constituent components of XML DTD Documents.


DTDs are made up by the following integrants:
• Elements
• Attributes
• Entities
• PCDATA
• CDATA
Brief explanation of each of the integrants :
Elements
Elements are the main constituent components of both XML documents.
Elements can contain text, other elements, or be empty eg..
<To>Rohan</To>
<From>Amit</From>

Attributes

Attributes provide extra information about elements.


Attributes are always placed inside the opening tag of an element. Attributes always come in
name/value pairs. The following "img" element has additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".
Entities:
Entities are expanded when a document is parsed by a XML parser. Some characters have a
special meaning in XML, like the less than sign (<) that defines the start of an XML tag , the
greater than sign (>) that defines the end of a XML tag.
The following entities are predefined in XML:
Entity
Character
References

&lt; <

&gt; >

&amp; &

&quot; "

&apos; '

PCDATA:
PCDATA means parsed character data. It can be thought as the character data ( text ) found
between the start tag and the end tag of a XML element.
PCDATA is a text to be parsed by a parser. The text is checked by the parser for entities
and markup.
Tags inside the text will be treated as markup and entities will be expanded. However, parsed
character data should not contain any &, <, or > characters. These should be represented by the
&amp , &lt, and &gt entities, respectively.
CDATA:
CDATA is character data that will NOT be parsed by a parser. Tags inside the text will
NOT be treated as markup and entities will not be expanded.

DTD-Elements

In a DTD, elements are declared with an ELEMENT declaration.


Declaring Elements : syntax
In a DTD, XML elements are declared with the following syntax:
<!ELEMENT element-name
category>
or
<!ELEMENT element-name
(element-content)>

Empty Elements

Empty elements are declared with the keyword EMPTY inside the parentheses.
<!ELEMENT element-name
EMPTY>
DTD Example: <!ELEMENT br EMPTY>
In XML document:
<br />

Elements with Parsed Character Data


Elements with only parsed character data are declared with #PCDATA inside the parentheses:
<!ELEMENT element-name
(#PCDATA)>
DTD Example :
<!ELEMENT To (#PCDATA)>
<!ELEMENT From
(#PCDATA)>

Elements with Data


Elements declared with the keyword ANY, can contain any combination of parsable data:
<!ELEMENT element-
name ANY>
DTD Example:
<!ELEMENT E-mail
(To,From,Subject,Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>

Elements with Children (sequences)


Elements with one or more children are declared with the name of the children elements inside
the parentheses as :
<!ELEMENT element-name
(child1)>
or
<!ELEMENT element-name
(child1,child2,...)>
DTD Example:
<!ELEMENT E-mail
(To,From,Subject,Body)>

When children are declared in a sequence separated by commas, the children must appear in the
same sequence in the document. In a full declaration, the children must also be declared.Children
can have children. The full declaration of the "E-mail" element is:
<!ELEMENT E-mail
(To,From,Subject,Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Subject
(#PCDATA)>
<!ELEMENT Body (#PCDATA)>

Declaring Only One Occurrence of an Element


<!ELEMENT element-name
(child-name)>
DTD Example:
<!ELEMENT color (Fill-
Red)>

The example above declares that the child element "Fill-Red" must occur once, and only once
inside the "color" element.
Declaring Minimum One Occurrence of an Element
<!ELEMENT element-name
(child-name+)>
DTD Example:
<!ELEMENT color (Fill-
Red+)>

The '+' sign in the example above declares that the child element "Fill-Red" must occur one or
more times inside the "color" element.
Declaring Zero or More Occurrences of an Element
<!ELEMENT element-name
(child-name*)>
DTD Example:
<!ELEMENT color (Fill-
Red*)>

The '*' sign in the example above declares that the child element "Fill-Red" can occur zero or
more times inside the "color" element.
Declaring Zero or One Occurrence of an Element
<!ELEMENT element-name
(child-name?)>
DTD Example:
<!ELEMENT color (Fill-Red?)>
The '?' sign in the example above declares that the child element "Fill-Red" can occur zero or
one time inside the "color" element.
Declaring either/or Content
DTD Example:
<!ELEMENT E-mail
(To,From,Subject,(Message|Body))>

The example above declares that the "E-mail" element must contain a "To" element, a "From"
element, a "Subject" element, and either a "Message" or a "Body" element.
Declaring Mixed Content
DTD Example:
<!ELEMENT E-mail(#PCDATA|To|
From|Subject|Body)*>

The example above declares that the "E-mail" element can contain zero or more occurrences of a
parsed character data, "To", "From", "Subject", or "Body" elements.

DTD-Attributes

In a DTD, attributes are declared with an ATTLIST declaration.

Declaring Attributes
The ATTLIST declaration defines the element having a attribute with attribute name , attribute
type , and attribute default value. An attribute declaration has the following syntax:

<!ATTLIST element-name attribute-name


attribute-type default-value>
DTD example:
<!ATTLIST reciept type CDATA
"check">
XML example:
<reciept type="check" />

Attribute-type
The attribute-type can be one of the following:
Type Description

CDATA The value is character data

The value must be one from an enumerated


(en1|en2|..)
list

ID The value is a unique id

IDREF The value is the id of another element

IDREFS The value is a list of other ids

NMTOKEN The value is a valid XML name

NMTOKENS The value is a list of valid XML names

ENTITY The value is an entity

ENTITIES The value is a list of entities

NOTATION The value is a name of a notation

xml: The value is a predefined xml value

Default-value
The default-value can be one of the following:
Value Explanation

value The default value of the attribute

#REQUIRED The attribute is required

#IMPLIED The attribute is not required

#FIXED value The attribute value is fixed

A Default Attribute Value


DTD Example:
<!ELEMENT Scale
EMPTY>
<!ATTLIST Scale
length CDATA "0">
In the example above, the DTD defines a "Scale" element to be empty with a "length " attribute
of type CDATA . If no length is specified, it has a default value of 0.
Valid XML:
<Scale length
="100" />

REQUIRED
Syntax
<!ATTLIST element-name attribute_name
attribute-type #REQUIRED>
DTD Example
<!ATTLIST person number CDATA
#REQUIRED>

Valid XML:
<person id="5677"
/>

Invalid XML:
<person
/>
Use the #REQUIRED keyword if you don't have an option for a default value, but still want to
force the attribute to be present.
IMPLIED
Syntax
<!ATTLIST element-name attribute-name
attribute-type #IMPLIED>
DTD Example
<!ATTLIST emergency no. CDATA
#IMPLIED>

Valid XML:
<emergency no.="555-
667788" />

Valid XML:
<emergenc
y/>
Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you
don't have an option for a default value.
FIXED
Syntax
<!ATTLIST element-name attribute-name
attribute-type #FIXED "value">
DTD Example
<!ATTLIST Client CDATA #FIXED
"RoseIndia">

Valid XML:
<Client
="RoseIndia" />

Invalid XML:
<Client="LotusIn
dia" />
Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the
author to change it. If an author includes another value, the XML parser will return an error.
Enumerated Attribute Values
Syntax
<!ATTLIST element-name attribute-name (en1|
en2|..) default-value>
DTD Example
<!ATTLIST reciept type (check|cash)
"cash">

XML example:
<reciept
type="check" />
or
<reciept
type="cash" />

Use enumerated attribute values when you want the attribute value to be one of a fixed set of
legal values
DTD-Entities

Entities are variables used to define shortcuts to standard text or special characters. Entity
references are references to entities Entities can be declared internally or externally.
Internal Entity Declaration
Syntax
<!ENTITY entity-name
"entity-value">
DTD Example:
<!ENTITY name "Amit">
<!ENTITY company
"RoseIndia">

XML example:
<Profile>&name;&company
;</Profile>
Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).
An External Entity Declaration
Syntax
<!ENTITY entity-name
SYSTEM "URI/URL">

DTD Example:
<!ENTITY name SYSTEM
"http://www.roseindia.net/entities.dtd">
<!ENTITY company SYSTEM
"http://www.roseindia.net/entities.dtd">

XML example:
<Profile>&name;&company
;</Profile>

Introduction to XML Schema

In this tutorial you will learn how to read and create XML Schemas, why XML Schemas are
more powerful than DTDs, and how to use them in your application.
XML Schema is a W3C Standard. It is an XML-based alternative to DTDs. It describes the
structure of an XML document. The XML Schema language is also referred to as XML Schema
Definition (XSD).
We think that very soon XML Schemas will be used in most Web applications as a
replacement for DTDs. Here are some reasons:
• XML Schemas are extensible to future additions
• XML Schemas are richer and more powerful than DTDs
• XML Schemas are written in XML, supports data types and namespaces.
What is an XML Schema?
XML Schema is used to define the legal building blocks of an XML document, just like a DTD.
An XML Schema defines user-defined integrants like elements, sub-elements and attributes
needed in a xml document. It defines the data types for elements and attributes along with the
occurrence order . It defines whether an element is empty or can include text. It also defines
default and fixed values for elements and attributes
Why Use XML Schemas?
XML Schemas are much more powerful than DTDs.
Features of XML Schemas :
XML Schemas Support Data Types
One of the greatest strengths of XML Schemas is its support for data types. With support for data
types:
• It is easier to describe allowable document content
• It is easier to validate the correctness of data
• It is easier to work with data from a database
• It is easier to define data facets (restrictions on data)
• It is easier to define data patterns (data formats)
• It is easier to convert data between different data types
XML Schemas use XML Syntax
Another great strength about XML Schemas is that they are written in XML. Simple XML
editors are used to edit the Schema files. Even the same XML parsers can be used to parse the
Schema files.
XML Schemas are Extensible
XML Schemas are extensible, because they are written in XML.So a user can reuse a Schema in
other Schemas and can also refer multiple schemas in the same document. He can also create his
own data types derived from the standard types
Well-Formed is not Enough alone
A well-formed XML document is a document that conforms to the XML syntax rules.Even if
documents are well-formed they can still contain errors, and those errors can have serious
consequences.
Think of the following situation: you order 5 gross of laser printers, instead of 5 laser printers.
With XML Schemas, most of these errors can be caught by your validating software.
XML Schemas Secure Reliable Data Communication
When sending data from a sender to a receiver, it is essential that both parts have the same
"expectations" about the content. With XML Schemas, the sender can describe the data in a way
that the receiver will understand. A date like: "03-11-2004" will, in some countries, be
interpreted as 3.November and in other countries as 11.March.However, an XML element with a
data type like this: <datetype="date">2004-03-11</date> ensures a mutual understanding of the
content, because the XML data type "date" requires the format "YYYY-MM-DD".

Designing XML Schema

XML documents can have a reference to a DTD or to an XML Schema.


A Simple XML Document

Look at this simple XML document called "E-mail.xml":


<?xml version="1.0"?>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...</Body>
</E-mail>

XML Schema

The following example is a XML Schema file called "E-mail.xsd" that defines the elements of
the XML document above ("E-mail.xml"):
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.roseindia.net"
xmlns="http://www.roseindia.net"
elementFormDefault="qualified">
<xs:element name="E-mail">
<xs:complexType>
<xs:sequence>
<xs:element name="To" type="xs:string"/>
<xs:element name="From" type="xs:string"/>
<xs:element name="Subject" type="xs:string"/>
<xs:element name="Body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

We will discuss the building blocks of this schema latter in this section further.
Add a reference to the above declared XML document
Now this XML document (E-mail.xml) has a reference to above declared XML Schema(E-
mail.xsd)
<?xml version="1.0"?>
<E-mail
xmlns="http://www.roseindia.net"
xmlns:xsi="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://www.roseindia.net/Schema E-
mail.xsd">

<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...</Body>
</E-mail>

In the above xml document xmlns declares the XML namespaces (we will discuss
it in the coming segment of current page).

Save E-mail.xml and E-mail.xsd in the same location. Open the file E-mail.xml in a web-
browser. You will see the following :

Let's briefly discuss the concept of XML Namespaces


XML Namespaces provide a mechanism to avoid element's name conflicts.
Name Conflicts: Since element names in XML are not predefined, chances for frequency to
meet name conflict increases when two different documents use the same element names.
We solve the Name Conflicts using a Prefix with a element name: By using a prefix, we can
create two different types of elements. Instead of using only prefixes, we add an xmlns attribute
to the conflict causing tags to give the prefix a qualified name .
The XML Namespace (xmlns) Attribute: The XML namespace attribute is placed in the start
tag of an element and has the following syntax:
xmlns:namespace-prefix="namespaceURI"

Example 1(taken from E-mail.xml ) :


<E-mail
xmlns="http://www.roseindia.net"
xmlns:xsi="http://www.w3.org/2001/XM
LSchema"
xsi:schemaLocation="http://www.rosein
dia.net/Schema" E-mail.xsd">

Example 2(taken from E-mail.xsd ) :


<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.roseindia.net"
xmlns="http://www.roseindia.net"
elementFormDefault="qualified">

When a namespace is defined in the start tag of an element, all child elements with the same
prefix are associated with the same namespace. In E-mail.xsd "xs" is the defined namespace in
the start tag. So it prefixes all the child elements with xs eg...
<xs:element name="E-mail">
<xs:complexType>
<xs:sequence>
<xs:element name="To" type="xs:string"/>
<xs:element name="From"
type="xs:string"/>
<xs:element name="Subject"
type="xs:string"/>
<xs:element name="Body"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Note that the address used to identify the namespace is not used by the parser to look up
information. The only purpose is to give the namespace a unique name. However, very often
companies use the namespace as a pointer to a real Web page containing information about the
namespace.
Here a Uniform Resource Identifier (URI) is a string of characters which identifies an Internet
Resource.
Default Namespaces : Defining a default namespace for an element saves us from using
prefixes in all the child elements. It has the following syntax:
xmlns="namespaceURI"

We have not included prefixes in all the child element tags( To, From, Subject, Body) in our
following example :
<?xml version="1.0"?>
<E-mail
xmlns="http://www.roseindia.net"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"
xsi:schemaLocation="http://www.roseindia.net/Schema
E-mail.xsd">

<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...</Body>
</E-mail>

Building blocks of a XML-Schema


XSD - The <schema> Element
The <schema> element is the root element of every XML Schema:
<?xml version="1.0"?
>
<xs:schema>
...
...
</xs:schema>

The <schema> element may contain some attributes like...


<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.roseindia.net"
xmlns="http://www.roseindia.net"
elementFormDefault="qualified">
...
...
</xs:schema>

The following code:


xmlns:xs="http://www.w3.org/2001/XMLSchema"
indicates that the elements and data types used in the schema come from the
"http://www.w3.org/2001/XMLSchema" namespace. It also specifies that the elements and data
types that come from the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed
with xs:
This code segment
targetNamespace="http://www.roseindia.net"

indicates that the elements defined by this schema (E-mail, To, From, Subject, Body.) come from
the "http://www.roseindia.net" namespace.
This fragment:
xmlns="http://www.roseindia.net"
indicates that the default namespace is "http://www.roseindia.net".
This fragment:
elementFormDefault="qualified"

indicates that any elements used by the XML instance document which were declared in this
schema must be a namespace qualified.
Referencing a Schema in an XML Document
This XML document (E-mail.xml) has a reference to an XML Schema (E-mail.xsd).
<?xml version="1.0"?>
<E-mail
xmlns="http://www.roseindia.net"
xmlns:xsi="http://www.w3.org/2001/XMLSchema"
xsi:schemaLocation="http://www.roseindia.net/Schema E-
mail.xsd">

<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...</Body>
</E-mail>

The following fragment:


xmlns="http://www.roseindia.net"
specifies the default namespace declaration. This declaration tells the schema-validator that all
the elements used in this XML document are declared in the "http://www.w3schools.com"
namespace.
Once you have the XML Schema Instance namespace available:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

you can use the schemaLocation attribute. This attribute has two values. The first value is the
namespace to use. The second value is the location of the XML schema to use for that
namespace:
xsi:schemaLocation="http://www.roseindia.net note.xsd"

XSD Simple Elements

XML Schemas define the elements of XML files.

XML simple element contains only text not even any other elements or attributes.But the text
can be of many different types. It can be among the types included in the XML Schema
definition (boolean, string, date, etc.), or it may be a custom type that a user is free to define.
Even. restrictions (facets) can be added to a data type in order to limit its content.
Defining a Simple Element
The syntax for a simple element is:
<xs:element name="aaa" type="bbb"/>

where aaa is the name of the element and bbb is the data type of the element.

XML Schema has a lot of built-in data types. The most common types are:
• xs:string
• xs:decimal
• xs:integer
• xs:boolean
• xs:date
• xs:time
Example:
Few of XML elements:
<name>Rahul</name>
<age>15</age>
<currentdate>2007-05-
15</currentdate>

The corresponding simple element definitions:


<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="currentdate"
type="xs:date"/>

Default Values for Simple Elements

Simple elements may have a specified default value OR a fixed specified value .A default
value is automatically assigned to the element when no other value is specified for example to
set the "orange" default value .
<xs:element name="fruit" type="xs:string"
default="orange"/>
Fixed Values for Simple Elements
A fixed value is also automatically assigned to the element, and it cannot further specify
another value.
In the following example the fixed value is "apple":
<xs:element name="fruit" type="xs:string"
fixed="apple"/>

XSD Complex Elements:

A complex element contains other elements or attributes.


What is a Complex Element?
It is an XML element that contains other elements and/or attributes. They are of four types:
• empty elements
• elements that contain only other elements
• elements that contain only text
• elements that contain both other elements and text
Note: Each of these elements may contain attributes as well!
Examples of Complex Elements
A complex empty XML element, "employee"
<employee
eid="1234"/>
A complex XML element, "employee", which contains only other elements:
<employee>
<firstname>Amit</firstname>
<lastname>Gupta</lastname>
</employee>
A complex XML element, "employee", which contains only text:
<employee
type="category">Programmer</employee>
A complex XML element, "event", which contains both elements and text:
<event>
It occured on <date lang="norwegian">15.05.07</date> ....
</event>

Defining a Complex Element:

Look at this complex XML element, "employee", which contains only other elements:
<employee>
<firstname>Amit</firstname>
<lastname>Gupta</lastname>
</employee>

We can define a complex element in an XML Schema in two different ways:


1. "employee" element can be declared directly by naming the element, like this:
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>

In the above described method only "employee" element can use the specified complex type.
Note that the child elements, "firstname" and "lastname", are surrounded by the <sequence>
indicator. This means that the child elements must appear in the same order as they are
declared.
2. "employee" element can have a type attribute refering to the name of the complex type to
use:
<xs:element name="employee" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Using the method described above, several elements can refer to the same complex type, like
this:
<xs:element name="employee" type="personinfo"/>
<xs:element name="employer" type="personinfo"/>
<xs:element name="teammember" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>

XSD Attributes

All attributes are declared as simple types.

What is an Attribute?
Simple elements do not contain attributes. If an element has attributes, then it is of a complex
type element. But the attribute itself is always declared as a simple type.
Defining an Attribute?
The syntax for defining an attribute is:
<xs:attribute name="aaa" type="bbb"/>

where aaa is the name of the attribute and bbb specifies the data type of the
attribute.

XML Schema has a lot of built-in data types. The most common types are:
• xs:string
• xs:decimal
• xs:integer
• xs:boolean
• xs:date
• xs:time
Example:
Here is an XML element with an attribute:
<name lang="EN">Rahul</name>
And here is the corresponding attribute definition:
<xs:attribute name="lang" type="xs:string"/>

Default Values for Attributes


Attributes may have a specified default value OR a specified fixed value A default value is
automatically assigned to the attribute when no other value is specified for example..the default
value is "EN":
<xs:attribute name="lang" type="xs:string" default="EN"/>
Fixed Values for Attributes
A fixed value is automatically assigned to the attribute, and it cannot further specify another
value for example..the fixed value is "EN":
<xs:attribute name="lang" type="xs:string" fixed="EN"/>

Optional and Required Attributes

Attributes are optional by default. To specify that the attribute is required, use the "use" attribute:
<xs:attribute name="lang" type="xs:string" use="required"/>

Restrictions on Content
When an XML element or attribute has a data type defined, it can put restrictions on the
element's or attribute's content. If an XML element is of type "xs:age" and contains a string like
"Hello", the element will not validate. With XML Schemas, user can also add his own
restrictions to XML elements and attributes. These restrictions are called facets

XML Related Technologies: An overview

Below is a list of XML-related technologies.


DTD (Document Type Definition) is used to define the legal elements in an XML document.
XSD (XML Schema) is an XML-based alternative to DTDs.
XHTML (Extensible HTML) is a stricter and cleaner version of HTML.
XSL (Extensible Style Sheet Language) - XSL consists of three parts: XSLT - a language for
transforming XML documents, XPath - a language for navigating in XML documents, and XSL-
FO - a language for formatting XML documents.
XSLT (XSL Transformations) is used to transform XML documents into other XML formats,
like XHTML.
XML DOM (XML Document Object Model) defines a standard way for accessing and
manipulating XML documents.
XPath is a language for navigating in XML documents.
XSL-FO (Extensible Style Sheet Language Formatting Objects) is an XML based markup
language describing the formatting of XML data for output to screen, paper or other media.
XLink (XML Linking Language) is a language for creating hyperlinks in XML documents.
XPointer (XML Pointer Language) allows the XLink hyperlinks to point to more specific parts
in the XML document.
XForms (XML Forms) uses XML to define form data.
XQuery (XML Query Language) is designed to query XML data.
SOAP (Simple Object Access Protocol) is an XML-based protocol to let applications exchange
information over HTTP.
WSDL (Web Services Description Language) is an XML-based language for describing web
services.
RDF (Resource Description Framework) is an XML-based language for describing web
resources.
RSS (Really Simple Syndication) is a format for syndicating news and the content of news-like
sites.
WAP (Wireless Application Protocol) was designed to show internet contents on wireless
clients, like mobile phones.
SMIL (Synchronized Multimedia Integration Language) is a language for describing
audiovisual presentations.
SVG (Scalable Vector Graphics) defines graphics in XML format.

XML Parsers

XML parser is used to read, update, create and manipulate an XML


document.

Parsing XML Documents


To manipulate an XML document, XML parser is needed. The parser loads the document into
the computer's memory. Once the document is loaded, its data can be manipulated using the
appropriate parser.
We will soon discuss APIs and parsers for accessing XML documents using serially accesss
mode (SAX) and random access mode (DOM). The specifications to ensure the validity of
XML documents are DTDs and the Schemas.
DOM: Document Object Model
The XML Document Object Model (XML DOM) defines a standard way to access and
manipulate XML documents using any programming language (and a parser for that language).
The DOM presents an XML document as a tree-structure (a node tree), with the elements,
attributes, and text defined as nodes. DOM provides access to the information stored in your
XML document as a hierarchical object model.
The DOM converts an XML document into a collection of objects in a object model in a tree
structure (which can be manipulated in any way ). The textual information in XML document
gets turned into a bunch of tree nodes and an user can easily traverse through any part of the
object tree, any time. This makes easier to modify the data, to remove it, or even to insert a new
one. This mechanism is also known as the random access protocol .
DOM is very useful when the document is small. DOM reads the entire XML structure and
holds the object tree in memory, so it is much more CPU and memory intensive. The DOM is
most suited for interactive applications because the entire object model is present in memory,
where it can be accessed and manipulated by the user.
SAX: Simple API for XML
This API was an innovation, made on the XML-DEV mailing list through a product
collaboration, rather than being a product of the W3C.
SAX (Simple API for XML) like DOM gives access to the information stored in XML
documents using any programming language (and a parser for that language).
This standard API works in serial access mode to parse XML documents. This is a very fast-to-
execute mechanism employed to read and write XML data comparing to its competitors. SAX
tells the application, what is in the document by notifying through a stream of parsing events.
Application then processes those events to act on data.
SAX is also called as an event-driven protocol, because it implements the technique to register
the handler to invoke the callback methods whenever an event is generated. Event is generated
when the parser encounters a new XML tag or encounters an error, or wants to tell anything
else. SAX is memory-efficient to a great extend.
SAX is very useful when the document is large.
DOM reads the entire XML structure and holds the object tree in memory, so it is much more
CPU and memory intensive. For that reason, the SAX API are preferred for server-side
applications and data filters that do not require any memory intensive representation of the data.

An Overview of the XML-APIs

Here we have listed all the major Java APIs for XML. But we will concentrate most on JAXP
in our coming tutorials.
JAXP: Java API for XML Processing

This API provides a common interface for creating and using the standard
SAX, DOM, and XSLT APIs in Java, independent to vendor's implementation .

JAXB: Java Architecture for XML Binding

This standard defines a mechanism for writing out Java objects as XML
(marshalling) and for creating Java objects from such structures
(unmarshalling).

The JAXP APIs

JAXP doesn't do any kind of processing instead it provides a mechanism to obtain parsed XML
documents employing SAX and DOM parsers . JAXP provides a mechanism to plug-in with
various providers (supporting standard specifications for DOM, SAX and XSLT ). JAXP also
specifies which provider to use.
Overview of the main JAXP API Packages

The libraries that define needed JAXP APIs are:


The JAXP APIs provide a common interface for different
javax.xml.parsers
vendors' to use SAX and DOM parsers.

Defines the Document class (a DOM) along with the


org.w3c.dom
classes for all of the components of a DOM.

org.xml.sax Defines the basic SAX APIs.

javax.xml.transfor Defines the XSLT APIs that let's to transform XML into
m other forms.

The SAX API is defined in org.xml.sax package of JAXP-APIs. The "Simple API" for XML
(SAX) is the event-driven, serial-access mechanism that does element-by-element processing.
The API for this level reads and writes XML to a data repository or the Web.
The DOM API is defined in org.w3c.dom package of JAXP-APIs. The DOM API is easier to
use. It provides a tree structure of objects. The DOM API are used to manipulate the hierarchy
of application objects it encapsulates.
The XSLT APIs defined in javax.xml.transform package of JAXP-APIs. The XSLT APIs let
you convert XML data to into other forms.
javax.xml.parsers --Description
Provides classes to process XML documents and supports two types of plugable parsers
ie..SAX and DOM Here are the following classses defined in javax.xml.parsers package:
Defines the API to obtain DOM Document instances
DocumentBuilder
from an XML document.

Defines a factory API that enables applications to obtain


DocumentBuilderF
a parser that produces DOM object trees from XML
actory
documents

Defines the API that wraps an XMLReader


SAXParser
implementation class

Defines a factory API that enables applications to


SAXParserFactory configure and obtain a SAX based parser to parse XML
documents

This package contains two vendor-neutral factory classes:SAXParserFactory (builds a


SAXParser) and DocumentBuilderFactory (builds a DocumentBuilder).
TheDocumentBuilder further creates a DOM-compliant document object.
The factory APIs enables to plug-in with the XML implementation (provided by any vendor
without changing the source code). The obtained implementation depends on the setting of the
system properties of these factory classes javax.xml.parsers.SAXParserFactory and
javax.xml.parsers.DocumentBuilderFactory . The default values (unless overridden at
runtime) point to the reference implementation.
JAXP1.4 supports:
SAX 2.0

The Simple API for XML (SAX) specification provides an event-based


mechanism for parsing XML documents. Various interfaces are defined in
JAXP to handle different kind of events. SAX 2.0 supports name spaces and
custom event filters.

DOM Core Level II

The Document Object Model (DOM) specification provides mechanisms to


build and traverse tree-based representation of a XML document.

DOM Level I provided core mechanisms for traversing a tree and adding,
deleting, and updating content.

DOM Level II provides support for events, namespaces, etc.

XSLT 2.0

The Extensible Stylesheet Transformation(XSLT) specification defines various


scripting mechanisms to transform one XML document into another.

The Simple API for XML (SAX) APIs

The SAX Packages: The SAX parser is defined in the following packages.

Package Description

org.xml.sax Defines the SAX interfaces. The


name "org.xml" is the package
prefix that was settled on by the
group that defined the SAX API.

Defines SAX extensions that are


used when doing more sophisticated
SAX processing, for example, to
org.xml.sax.ext
process a document type definitions
(DTD) or to see the detailed syntax
for a file.

Contains helper classes that make it easier


to use SAX -- for example, by defining a
org.xml.sax.hel
default handler that has null-methods for
pers all of the interfaces, so you only need to
override the ones you actually want to
implement.
Defines the SAXParserFactory class
javax.xml.parse which returns the SAXParser. Also
rs defines exception classes for
reporting errors.

javax.xml.parsers Package : Describing the main classes needed here


SAXParser Defines the API that wraps an XMLReader implementation class

SAXParserFa Defines a factory API that enables applications to configure and


ctory obtain a SAX based parser to parse XML documents

org.xml.sax Package : Describing few interfaces


Receive notification of the logical content of a
ContentHandler
document.

Receive notification of basic DTD-related


DTDHandler
events.

EntityResolver Basic interface for resolving entities.

ErrorHandler Basic interface for SAX error handlers.

org.xml.sax.helpers Package : Describing the needed interface


Default base class for SAX2 event
DefaultHandler
handlers.

Understanding SAX Parser


At the very first, create an instance of the SAXParserFactory class which generates an
instance of the parser. This parser wraps a SAXReader object. When the parser's parse()
method is invoked, the reader invokes one of the several callback methods (implemented in the
application). These callback methods are defined by the interfaces ContentHandler,
ErrorHandler, DTDHandler, and EntityResolver.

Brief description of the key SAX APIs:


SAXParserFactory

SAXParserFactory object creates an instance of the parser determined by


the system property, using the class javax.xml.parsers.SAXParserFactory.

SAXParser

The SAXParser interface defines several kinds of parse() methods.


Generally, XML data source and a DefaultHandler object is passed to the
parser. This parser processes the XML file and invokes the appropriate
method on the handler object.

SAXReader

The SAXParser wraps a SAXReader (may use SAXParser's getXMLReader()


and configure it). It is the SAXReader which carries on the conversation with
the SAX event handlers you define.

DefaultHandler

Not shown in the diagram, a DefaultHandler implements the


ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces
(with null methods).You override only the ones you're interested in.

ContentHandler

Methods like startDocument, endDocument, startElement, and endElement are


invoked when an XML tag is recognized. This interface also defines methods
characters and processingInstruction, which are invoked when the parser
encounters the text in an XML element or an inline processing instruction,
respectively.

ErrorHandler

Methods error, fatalError, and warning are invoked in response to various


parsing errors. The default error handler throws an exception for fatal errors
and ignores other errors (including validation errors). To ensure the correct
handling, you'll need to supply your own error handler to the parser.

DTDHandler

Defines methods you will rarely call. Used while processing a DTD to
recognize and act on declarations for an unparsed entity.

EntityResolver

The resolveEntity method is invoked when the parser needs to identify the
data referenced by a URI.

The Document Object Model (DOM) APIs


The DOM Packages

The Document Object Model implementation is defined in the following packages:


Package Description

This package defines the DOM


programming interfaces for XML
org.w3c.dom
documents, as per with the
specifications defined by the W3C.

This package defines the


DocumentBuilderFactory class and
the DocumentBuilder class. The
DocumentBuilder class, returns an
object needed to implement the W3C
javax.xml.par
Document interface. The factory that
sers
creates the builder is determined by the
javax.xml.parsers system property.
This package also defines the
ParserConfigurationException class
to deal with errors.

The diagram here shows the JAXP APIs to process xml document using the DOM parser:
javax.xml.parsers.DocumentBuilderFactory class creates the instance of
DocumentBuilder. Through DocumentBuilder it produces a Document (a DOM) that
conforms to the DOM specification. The System property determines , the builder at the run
time using javax.xml.parsers.DocumentBuilderFactory (it selects the factory implementations
to produce the builder). The platform's default value ie..system property can be overridden
from the command line.

DocumentBuilder newDocument() method can also be used . It creates an empty Document


that implements the org.w3c.dom.Document interface.
Alternatively, one of the builder's parse methods can be used to create a Document from
existing XML data. As a result, a DOM tree like that shown in the diagram.

The XML Style Sheet Translation (XSLT) APIs

The XSLT Packages


The XSLT APIs are defined in the following packages:
Package Description

Defines the TransformerFactory


and Transformer classes. These
classes are used to get a object for
doing transformations. After
javax.xml.transform
creating a transformer object, its
transform() method is invoked.
This method provides an input
(source) and output (result).

Defines classes used to create


javax.xml.transform.
input and output objects from a
dom
DOM.

Defines classes used to create


javax.xml.transform.
input from a SAX parser and output
sax
objects from a SAX event handler.

Defines classes used to create


javax.xml.transform.
input and output objects from an
stream
I/O stream.

The diagram shows the working of the XSLT APIs .

A TransformerFactory object is instantiated, and used to create a Transformer. The source


(input) object acts as the input to the transformation process. This object is created from SAX
reader, from a DOM, or from an input stream.
The output (result) object is the result of this transformation process. This object can be a SAX
event handler, a DOM, or an output stream. Transformer is created from a set of transformation
instructions. If it is created without any specific instructions, then the transformer object simply
copies the source to the result.

Creates a New DOM Parse Tree

This Example describes a method to create a new DOM tree .Methods which are used for
making a new DOM parse tree are described below :-
Element root = doc.createElement("places"):-it is a method to Create an Element node.
doc.appendChild(root):-This method adds a node after the last child node of the specified
element root.
Element root = doc.getDocumentElement():-allows direct access to the root of the DOM
document.
Xml code for the program generated is:-
<?xml version="1.0"
encoding="UTF-8"?>
<!--
Document :
Document6.xml
Created on : 10 July,
2008, 5:20 PM
Author : girish
Description:
Purpose of the
document follows.
-->
<root>
</root>
Parsetree.java:-
/*
* @Program that Creates a New DOM Parse Tree
* Parsetree.java
* Author:-RoseIndia Team
* Date:-10-Jun-2008
*/

import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class Parsetree {

public static void main(String[] args) throws Exception {


DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newIns
tance();
builderFactory.setValidating(false);
builderFactory.setNamespaceAware(true);
builderFactory.setIgnoringElementContentWhitespace(true);
Document doc = builderFactory.newDocumentBuilder().parse(new File("Do
cument6.xml"));
new Parsetree().buildTree(doc);
}

public void buildTree(Document doc) {


Element Companyname;
Text text;
Element root = doc.createElement("Company");
doc.appendChild(root);
Companyname = doc.createElement("Level");
text = doc.createTextNode("SoftwareDevelopment");
Companyname.appendChild(text);
root.appendChild(Companyname);
Companyname = doc.createElement("Location");
text = doc.createTextNode("Rohini");
Companyname.appendChild(text);
root.appendChild(Companyname);
Element root1=doc.getDocumentElement();
System.out.print("Name of the root created is:- "+root.getNodeName());

}
}

Output of the program:-


Name of the root created
is:- Company

Creates element node, attribute node, comment node,


processing instruction and a CDATA section

This Example shows you how to Create an Element node ,Comment node ,Attribute node,
Processing node and CDATA section node in a DOM document. JAXP (Java API for XML
Processing) is an interface which provides parsing of xml documents. Here the Document
BuilderFactory is used to create new DOM parsers. These are some of the methods used in code
given below for adding attribute:-
Element root = doc.createElement("Company"):-This method creates an element node ,Here
Company specifies the name for the element node.
doc.appendChild(root):-This method adds a node after the last child node of the specified
element node.
Comment node = doc.createComment("Comment for company"):-This method creates an
Comment node ,Here "Comment for company" specifies the name for the Comment node.
CDATASection cdata = doc.createCDATASection("Roseindia <, >, .net rohini"):- This
method creates a CDATASection node. Here string "Roseindia <, >, .net rohini" specifies the
data for the node
Xml code for the program generated is:-
<?xml version="1.0"
encoding="UTF-8"?>
<root></root>

CreatesElementnode.java:-
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class CreatesElementnode {

public static void main(String[] args) throws Exception {


DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newIns
tance();

Document doc = builderFactory.newDocumentBuilder().parse(new File("ab


c.xml"));
new CreatesElementnode().CreatesElementnode(doc);
}

public void CreatesElementnode(Document doc) {


//method to creates an element node.
Element root = doc.createElement("Company");
doc.appendChild(root);
System.out.println("Element node created is: " +
doc.getDocumentElement().getNodeName());

//method to creates a comment node.


Comment node = doc.createComment("Comment for company");
root.appendChild(node);
System.out.println("Comment node created is: " + root.getFirstChild()
);

//Create a Cdata section node


CDATASection cdata = doc.createCDATASection("Roseindia <, >, .net roh
ini");
root.appendChild(cdata);
System.out.println("CData node created is: " +

root.getFirstChild().getNextSibling());
ProcessingInstruction pi =

doc.createProcessingInstruction("Roseindia", "Rohini");
root.appendChild(pi);
System.out.println("ProcessingInstruction node created is: " +

root.getLastChild());
}
}

Output of the program:-


Element node created is: Company
Comment node created is: [#comment: Comment for
company]
CData node created is: [#cdata-section: Roseindia
<, >, .net rohini]
ProcessingInstruction node created is:
[Roseindia: Rohini]

Adding an Attribute in DOM Document

This Example shows you how to adds an attribute in a DOM document. JAXP (Java API for
XML Processing) is an interface which provides parsing of xml documents. Here the
Document BuilderFactory is used to create new DOM parsers. There are some of the methods
used in code given below for adding attribute:-
getDocumentElement():- allows direct access to the root node of the document.
root.getAttribute("Id"):-allows to retrieve attributes on Id.
root.setAttribute("Id", "05MC34"):-allows to set o5Mc34 on the Id
Xml code for the program generated is:-
<?xml version="1.0" encoding="UTF-
8"?>
<Author Type='Bible' Id='Rose-78'
Issue='1995'>
</Author>
Addingattribute.java :-

/*
* @Program that Adds an Attribute to an Element
* Addingattribute.java
* Author:-RoseIndia Team
* Date:-09-Jun-2008
*/

import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;

public class Addingattribute {

public static void main(String[] args) throws Exception {


//boolean validating = false;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance()
;
//factory.setValidating(validating);

Document doc = factory.newDocumentBuilder().parse(new File("Document3


.xml"));
new Addingattribute().addAttribute(doc);
}

public void addAttribute(Document doc) {


Element root = doc.getDocumentElement();
System.out.print("Attributes of Id before Adding is: ");
System.out.println(root.getAttribute("Id"));
//Adds an attributes to the element ISBN
root.setAttribute("Id", "05MC34");
System.out.print("Attributes of Id after Adding is: ");
System.out.println(root.getAttribute("Id"));
}
}

Output of the program:-


Attributes of Id before Adding
is: Rose-78
Attributes of Id after Adding is:
05MC34

Reading XML from a File

This Example shows you how to Load Properties from the XML file via a DOM document.
JAXP (Java API for XML Processing) is an interface which provides parsing of xml
documents.Javax.xml.parsers is imported to provide classes for the processing of XML
Documents. Here the Document BuilderFactory is used to create new DOM parsers. Some of the
methods used for reading XML from a file are described below :-
File f = new File("Document2.xml"):-Creating File from where properties are to be loaded.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance():-Declaring
DocumentBuilderFactory to create new DOm parsers.
Element root = doc.getDocumentElement():-By this method we can have direct access to the root
of the DOM Document.
NodeList list = doc.getElementsByTagName("Employee"):-NodeList is an interface that
provides an ordered collection of nodes.We can access nodes from the Nodelist by their index
number.
NodeList nodelist = element.getElementsByTagName("name"):-This method returns a list of
element with a given tagname i.e ("name").
Xml code for the program generated is:-

<?xml version="1.0" encoding="UTF-


8"?>
<Company>
<Employee>
<name Girish="Gi">Roseindia.net
</name>
</Employee>
<Employee>
<name Komal="Ko">newsTrack
</name>
</Employee>
<Employee>
<name Mahendra="Rose">Girish
Tewari
</name>
</Employee>
</Company>

readxmlfromafile.java
/*
* @Program to load properties from XML file.
* readxmlfromafile.java
* Author:-RoseIndia Team
* Date:-10-Jun-2008
*/

import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;

public class readxmlfromafile {


public static void main(String[] args) throws Exception {
File f = new File("Document2.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(f);
new readxmlfromafile().read(doc);
}

public void read(Document doc) {


Element root = doc.getDocumentElement();
NodeList list = doc.getElementsByTagName("Employee");
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {

Element element = (Element) node;


NodeList nodelist = element.getElementsByTagName("name");
Element element1 = (Element) nodelist.item(0);
NodeList fstNm = element1.getChildNodes();
System.out.println("Name : " + (fstNm.item(0)).getNodeValue()
);
}
}
}
}

Output of the program:-

Name :
Roseindia.net

Name : newsTrack

Name : Girish
Tewari

Transforming an XML File with


XSL
This Example gives you a way to transform an XML File with XSL in a DOM document.
JAXP (Java API for XML Processing) is an interface which provides parsing of xml
documents. Here the Document BuilderFactory is used to create new DOM parsers. Some of
the methods used in code given below for Transforming are:-
TransformerFactory factory = TransformerFactory.newInstance():-TransformerFactory is
a class that is used to create Transformer objects. A TransformerFactory instance can be used
to create Transformer and Templates objects.
Templates template = factory.newTemplates(new StreamSource(new
FileInputStream(xslFilename))):-Creates a Template.Template is an Interface which may be
used multiple times in a given session.
Xsl code for the program generated is:-

<?xml version="1.0" encoding


="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http
://www.w3.org/1999/XSL/Transfor
m" version="1.0">
<xsl:output method="html" i
ndent="yes"/>
<xsl:template match="girish
">
<html>
<head>
<title>Girish</
title>
</head>
<body>
<xsl:apply-
templates/>
</body>
</html>
</xsl:template>
<xsl:template match="rosein
dia">
<xsl:value-of select
="@key"/>=
<xsl:value-of select
="@value"/>
<br></br>
</xsl:template>
</xsl:stylesheet>

Xml code for the program generated is:-

<?xml version="1.0" encoding


="UTF-8"?>
<girish>
<entry key="key1" value
="value1" />
<entry key="key2" />
</girish>

XMLtoXSL.java
/*
* @Program that Transforms an XML Fil
e with XSL
* XMLtoXSL.java
* Author:-RoseIndia Team
* Date:-23-July-2008
*/

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class XMLtoXSL {


public static void main(String[] a
rgs) throws Exception {
new XMLtoXSL().xsl("t.xml", "g
t.xml", "newstylesheet1.xsl");
}
public void xsl(String inputFilena
me, String outputFilename,

String xslFilename)throws Exception {


TransformerFactory factory = T
ransformerFactory.newInstance();
Templates template = factory.n
ewTemplates(new StreamSource(
new FileInputStream(xs
lFilename)));
Transformer xformer = template
.newTransformer();
Source source = new StreamSour
ce(new FileInputStream(inputFilename))
;
Result result = new StreamResu
lt(new FileOutputStream(outputFilename
));
xformer.transform(source, resu
lt);
}
}

Output of the program:-


<html>
<head>
<META http-equiv="Content-Type" conten
t="text/html; charset=UTF-8">
<title>Girish</title>
</head>
<body>
</body>
</html>

JDOM: Java DOM

An alternative to DOM, that creates a tree of objects from an XML structure.


The resulting tree is much easier to use, and it can be created from an XML
structure without a compilation step.

DOM4J

Although it is not on the JCP standards track, DOM4J is an open-source,


object-oriented alternative to DOM that is in many ways ahead of JDOM
in terms of implemented features. It is an excellent alternative for Java
developers who need to manipulate XML-based data.

JAXM: Java API for XML Messaging

The JAXM API defines a mechanism for exchanging asynchronous XML-based


messages between applications. ("Asynchronous" means "send it - forget it".)

JAX-RPC: Java API for XML-based Remote Process Communications

The JAX-RPC API defines a mechanism for exchanging synchronous XML-


based messages between applications. ("Synchronous" means "send a
message and wait for the reply".)
JAXR: Java API for XML Registries

The JAXR API provides a mechanism for publishing available services in an


external registry, and for consulting the registry to find those services.

Reading XML Data from a Stream

This Example shows you how to Read XML Data via a Stream in a DOM document. JAXP (Java
API for XML Processing) is an interface which provides parsing of xml documents. Here the
Document BuilderFactory is used to create new DOM parsers.There are some of the methods
used in code given below for Reading XML Data:-
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance():-This
method Creates a DocumentBuilderFactory .DocumentBuilderFactory is a Class that enables
application to obtain parser for building DOM trees from XML Document
DocumentBuilder builder = Factory.newDocumentBuilder():-This method creates a
DocumentBuilder object with the help of a DocumentBuilderFactory.
transformer.transform(source, result):-This method process the source tree to the output result.
Xml code for the program generated is:-
<?xml version="1.0" encoding="UTF-
8"?>
<Company>
<Employee Id="Rose-2345">

<CompanyName>RoseIndia.net</Co
mpanyName>
<City>Haldwani</City>>
<name>Girish Tewari</name>

<Phoneno>1234567890</Phoneno>
<Doj>May 2008</Doj>
</Employee>
<Employee Id="Rose-2346">

<CompanyName>RoseIndia.net</Co
mpanyName>
<City>Lucknow</City>
<name>Mahendra
Singh</name>

<Phoneno>123652314</Phoneno>
<Doj>May 2008</Doj>
</Employee>>
</Company>

ReadingXmlDataFromStream.java

/*
* @Program To Read XML Data from a Stream.
* ReadingXmlDataFromStream.java
* Author:-RoseIndia Team
* Date:-17-July-2008
*/
import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;

public class ReadingXmlDataFromStream {

public static void main(String[] args) throws Exception {


File xmlfile = new File("Document4.xml");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(xmlfile);
new ReadingXmlDataFromStream().readdata(doc);
}

public void readdata(Document doc) throws Exception {


TransformerFactory transformerFactory = TransformerFactory.newInstanc
e();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");

StringWriter writer = new StringWriter();


StreamResult result = new StreamResult(writer);
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
String string=writer.toString();
System.out.println(string);
}
}

Output of the program:-


<?xml version="1.0" encoding="UTF-8"
standalone="no"?>
<Company>
<Employee Id="Rose-2345">

<CompanyName>RoseIndia.net</Compa
nyName>
<City>Haldwani</City>&gt;
<name>Girish Tewari</name>
<Phoneno>1234567890</Phoneno>
<Doj>May 2008</Doj>
</Employee>
<Employee Id="Rose-2346">

<CompanyName>RoseIndia.net</Compa
nyName>
<City>Lucknow</City>
<name>Mahendra Singh</name>
<Phoneno>123652314</Phoneno>
<Doj>May 2008</Doj>
</Employee>&gt;
</Company>

You might also like