Introduction To XML
Introduction To XML
Introduction To XML
It is a Text file. XML was designed to transport and store data. Its file extension is .xml.
HTML was designed to display data.
What is XML?
<note date="12/11/2007">
<to>Mahesh</to>
<from>Ramesh</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an error because
the parser interprets it as the start of a new element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than
character is legal, but it is a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
XML Attributes
From HTML you will remember this: <img src="computer.gif">. The "src" attribute
provides additional information about the <img> element.
In HTML (and in XML) attributes provide additional information about
elements:
<img src="computer.gif">
<a href="demo.asp">
Attributes often provide information that is not a part of the data. In
the example below, the file type is irrelevant to the data, but important
to the software that wants to manipulate the element:
<file type="gif">computer.gif</file>
4
If the attribute value itself contains double quotes you can use single
quotes, like in this example:
<dept name='IIPS "DAVV" Indore'>
or you can use character entities:
<dept name="IIPs "DAVV" Indore">
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In the last, sex is an element. Both examples
provide the same information.
There are no rules about when to use attributes and when to use elements. Attributes are
handy in HTML. In XML my advice is to avoid them. Use elements instead.
XML Validation
XML with correct syntax is "Well Formed" XML.
5
A "Valid" XML document is a "Well Formed" XML document, which also
conforms to the rules of a Document Type Definition (DTD):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Mahesh</to>
<from>Ramesh</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The DOCTYPE declaration in the example above, is a reference to an external DTD file.
The content of the file is shown in the paragraph below.
XML DTD
The purpose of a DTD is to define the structure of an XML document. It defines the
document structure with a list of legal elements. A DTD can be declared inline in your
XML document, or as an external reference.
Internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Mahesh</to>
<from>Ramesh</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
External DTD
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
PCDATA
PCDATA means parsed character data. Think of character data as the text found
between the start tag and the end tag of an XML element.
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as
markup and entities will be expanded.
CDATA
6
CDATA also means character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.