© 2001 First Hop
Java and XML
Janne Kalliola
Director, Product Development
First Hop Ltd
Java and XML
About XML
Basic XML manipulation
DOM
SAX
Transforming and outputting XML documents
XSLT
XSL-FO
New generation technologies
Java XML Bindings
Java XML Messaging
JAX-RPC
Case – configuration information
1
About XML
Extensible Markup Language (XML) is a W3C
recommendation for representing information in
electronical form
XML is a meta language; it does not provide any
semantics, only the basic syntax and conformance
tests
everything else can be decided by the provider of
the XML document
There are dozens of initiatives to create various
languages over XML
MathML
XHTML
SOAP
Using XML
In the beginning XML was strictly thought as a textual
document
the representation of the document structure was
a file containing elements and body text
Nowadays XML is used programmatically, thus
documents can also be
data structures
events
streams
In this presentation, the word "document" is used to
cover all of these aspects
2
XML as Objects
XML documents can be manipulated in object
oriented software using Document Object Model
(DOM) interfaces
DOM is a W3C (www.w3.org) recommendation
programming language and implementation
independent
interfaces available for several languages
including Java, C++, Python, and IDL
DOM is also used in browsers to manipulate
HTML content
The basic idea in DOM is to read XML documents into
trees and manipulate the tree
DOM Levels and Implementations
DOM contains currently three levels:
1. basic manipulation interfaces
2. event based XML document manipulation, tree
travelsal and style sheets (CSS)
3. XML Schemas, Xpath
There are several implementations available, such as
JAXP from Sun Microsystems, and Xerces from Apache
Foundation
the implementations are usually called XML
parsers
contains implementations for the DOM
interfaces and supporting code
the support of different levels varies from
implementation to another
3
Event-based Manipulation
If the XML documents are large or only a part of the
document should be read in, DOM can consume too
much memory and resources
the solution is to read the XML document as a
serie of events
every XML element triggers an event
the program reading in the document catches the
events and reacts accordingly
the document is read element by element and
thus no large data structures are created
The event-based manipulation is done with SAX
(Simple API for XML)
the SAX implementations are usually bundled
with DOM implementations
Transforming XML
If the source XML document is outside the system
using the document, the source document may be in
a format that is not optimal or usable for the system
The document has to be transformed into another
XML syntax
the transformation is hard to program with
normal programming languages
instead, a special programming language
Extensible Style Language Transformations (XSLT)
has been created for solving this problem
4
XSLT
XSLT is rule-based language
the XSLT syntax is based on XML, i.e. XSLT programs
(stylesheets) are XML documents
the XSLT contains rules to transform elements,
attributes and subdocuments to another form
The XSLT stylesheet is run inside a XSLT processor
the processor gets the source document and the
stylesheet as input and produces another XML
documents as output
the processor reads in part of the source document
and tries to find matching rules
if a rule is found, it is executed and the rule
produces a part of the output document
there are default rules for ruleless situations
the process goes on recursively until the source
document has been completely processed
XSL Formatting Objects
XSL-FO is an XML-based language to describe layout
information
the XSL-FO document contains instructions to
render itself, like DTP program documents
the document is rendered using an XSL-FO
processor
the processor outputs a document in some
format such as PDF or PostScript
XSL-FO is a young recommendation, finalised on
October 2001
programmatic support is still in beta level
5
Outputting Human-readable
Documents
If the XML document has to be presented to a person,
there are two basic ways to proceed:
convert the document to XHTML and show the
output document with a WWW browser
transformation is done using XSLT and the
output is readily available for presentation
convert the document to XSL-FO and format the
document to some desired output format
transformation is done using XSLT and the
output is rendered to the final output format
by XSL-FO processor
Transformations in Java
Both XSLT and XSL-FO can be used inside Java
programs
XSLT interfaces are included in JAXP
XSLT processors are available for example from
IBM, Sun Microsystems and Apache Foundation
XSL-FO is currently still in beta phase
the best Java-based processor is FOP, provided by
Apache Foundation, xml.apache.org/fop
FOP provides output as PDF, PS, PCL, SVG
FOP can also be used to render the
documents inside a Java program, using the
supplied AWT component
6
New Generation Technologies
The first uses of XML in Java were very document-
oriented
reading in and writing out documents
using XML in WWW applications for content
presentation, or as a configuration file format
transforming documents to other forms
New APIs are geared towards using XML in a
programmatic manner
the programmers are shielded from the bare XML
documents
APIs are provided instead and XML formats are
kept on the background
Java XML Bindings
If XML documents contain only computer readable
information, DOM and SAX are cumbersome
technologies to read the documents in
semantics has to be provided by the software
itself
Sun Microsystems has proposed a solution for the
problem
Java XML Bindings (JAXB) is an API and collection
of tools to automate mappings between XML
documents and Java classes
JAXB is currently in development, for more
information:
http://java.sun.com/xml/jaxb/
7
JAXB Mechanisms
JAXB provides a compiler that creates Java classes
from XML DTDs (document type definition)
XML Schema support will be available shortly
DTD is converted to a Binding Schema that is
used to create the classes
Generated classes contain error and validity checks as
stated in DTDs
Classes can both read in and generate XML
documents
JAXB Benefits & Problems
Easier programming model
some parts of program logic are written with a
description language
Smaller footprint for XML parsing
no need to keep redundant information in the
memory during parsing
Program logic is divided into two locations
there are some versioning problems, when the
DTDs change
Automatisation may generate poor code
8
Java XML Messaging
Java XML Messaging (JAXM) is a set of APIs that
enable sending and receiving XML formatted
messages
implements Simple Object Access Protocol
(SOAP) 1.1 with attachments
JAXM is used to exchange XML business documents
over the Internet
Transportation is usually done over HTTP
FTP and SMTP can be used, too
messages can be both synchronous and
asynchronous (with or without
acknowledgements)
one message can be sent to several recipients
For more information: http://java.sun.com/xml/jaxm/
JAXM Benefits
Help the developers to concentrate on the core
features of the program
messaging details are left to JAXM components
provides several transports to interchange
messages
Messages can be exchanged with non-Java
applications, too
9
JAX-RPC
Java API for XML-Based Remote Procedure Calls
enables Java applications to communicate with other
programs using RPC mechanism
the API is based on SOAP 1.1
Usually, JAX-RPC is used in a client program to
connect to a remote server
client initiates a remote procedure call in the
server
server has defined a set of calls that are available
for the clients (remote API of the server)
JAX-RPC is thus similar to RMI (remote method
invocation) or CORBA (common object request broker
architecture)
JAX-RPC Functionality
The remote procedure call is represented using an
XML based protocol (for instance SOAP 1.1)
The server can define, describe and export a web
service as an RPC based service
the service is described using Web Service
Description Language (WDSL)
XML based specification to describe service as
a set of endpoints that operate on messages
WDSL is a World Wide Web Consortium
(W3C) specification
For more information:
http://java.sun.com/xml/jaxrpc/
10
Java XML Packs
Sun Microsystems has collected all the available APIs
to a Java XML Pack
the pack contains a set of interoperable Java XML
technologies
Sun Microsystems releases a new version of the
pack quarterly
http://java.sun.com/xml/javaxmlpack.html
Packs should not be confused with JAXP (Java API for
XML Processing)
contains DOM and SAX parsers and XSLT
processor
JAXP forms the base of Java XML Pack
Case – Description
First Hop products' configuration information is stored
in an XML file
the file is read in by the main controller of the
product
the configurations are grouped by the
components of the product
eases maintaining of the configuration
the contents of the file are dissected to several
parts and given to the appropriate components
every component gets only its own
configuration
configurations can be embedded, if a
component uses another component
internally
11
Case – Benefits
The configuration is kept in a single file
reduces number of files to be manipulated
The configuration is flexible
the syntax of the component configuration is
specified by the component
the configurations can be nested, if required
The configuration is automatically checked
if the syntax of the configuration is wrong, DOM
raises errors
no need to create own configuration parsers
Summary
There are two levels of XML manipulation
the first one is based on the document aspects of
XML
older standards, more implementations
generic solutions
the second one is based on programmatic use of
XML
new standards, usually one or few
implementations
some are still in a draft phase
specific solutions
Java and XML provide a wide range of possibilities for
application programmers
12
Questions & Comments?
For more information about First Hop:
www.firsthop.com
For information about XML:
http://java.sun.com/xml/
http://www.w3.org/
http://www.xml.com/
You can reach me at janne.kalliola@firsthop.com
13