XML Schemas: Problems With Dtds
XML Schemas: Problems With Dtds
XML Schemas: Problems With Dtds
XML Schemas
Element Specification
Elements are declared using an element named xs:element
with an attribute that gives the name of the element being
defined.
The type of the content of the new element can be specified
by another attribute or by the content of the xs:element
definition.
Element declarations can be one of two sorts.
Simple Type
Content of these elements can be text only.
Examples
<xs:element name="item" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
The values xs:string and xs:decimal are two of the 44 simple
types predefined in the XML Schema language.
Complex Type
Element content can contain other elements or the element
can have attributes (or both).
Example
<xs:element name="location">
<xs:complexType>
<xs:sequence>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The element xs:sequence is one of several ways to combine
elements in the content.
Corresponding DTD: <!ELEMENT location (city, state)>
2
XML Schemas
// default = 1
maxOccurs="5"
// default = maximum(1, minOccurs)
maxOccurs="unbounded"
XML Schemas
File: phone.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="phoneNumbers">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="entries">
<xs:complexType>
<xs:sequence>
<xs:element name="entry" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="name">
<xs:complexType>
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="middle" type="xs:string"
minOccurs="0"/>
<xs:element name="last" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="phone" type="xs:string"/>
<xs:element name="city" type="xs:string"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
4
XML Schemas
Notes on phone.xsd
Each element of a complex type is followed immediately by
the definition of that type in the content of its xs:element
element.
For example
<xs:element name="name">
<xs:complexType>
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="middle" type="xs:string"
minOccurs="0"/>
<xs:element name="last" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The type of the name element is a complex type without a
name; it is an anonymous type.
Writing XML Schema following this strategy of using
anonymous types leads to very deep indentation.
XML Schemas
File: phoneT.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="nameType">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="middle" type="xs:string" minOccurs="0"/>
<xs:element name="last" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="entryType">
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="phone" type="xs:string"/>
<xs:element name="city" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="entriesType">
<xs:sequence>
<xs:element name="entry" type="entryType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="phoneType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="entries" type="entriesType"/>
</xs:sequence>
</xs:complexType>
<xs:element name="phoneNumbers" type="phoneType"/>
</xs:schema>
6
XML Schemas
Validation
Many tools are available to validate an XML document against
a XML Schema specification.
Most common XML parsers can be configured to perform the
validation as a document is parsed.
Our first example uses the XML parser in the command-line tool
xmllint.
To validate phone2.xml with the XML Schema in the file
phone.xsd, enter:
% xmllint --schema phone.xsd phone2.xml
If the document is valid, it is parsed and printed, followed by a
message "phone2.xml validates".
If you do not want to see the XML document printed, enter:
% xmllint--noout --schema phoneT.xsd phone2.xml
If invalid, all errors are reported, but the document is still
parsed if it is well-formed.
Example: bp.xml
Remove a last element and a phone element from phone2.xml.
In addition, provide a duplicate first element.
% xmllint --schema phone.xsd bp.xml
After the parsed XML document we get:
bp.xml:6: element name: Schemas validity error : Element
'name' [CT local]: The element content is not valid.
bp.xml:21: element name: Schemas validity error : Element
'name' [CT local]: The element content is not valid.
bp.xml:29: element entry: Schemas validity error : Element
'entry' [CT local]: The element content is not valid.
bp.xml fails to validate
XML Schemas
XML Schemas
File: elems.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="one" maxOccurs="unbounded">
<xs:complexType/>
</xs:element>
<xs:choice maxOccurs="unbounded">
<xs:element name="two">
<xs:complexType/>
</xs:element>
<xs:element name="three">
<xs:complexType/>
</xs:element>
</xs:choice>
<xs:element name="four" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType/>
</xs:element>
<xs:sequence maxOccurs="unbounded">
<xs:element name="five" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType/>
</xs:element>
<xs:element name="six">
<xs:complexType/>
</xs:element>
</xs:sequence>
10
XML Schemas
<xs:choice minOccurs="0">
<xs:element name="one">
<xs:complexType/>
</xs:element>
<xs:element name="two">
<xs:complexType/>
</xs:element>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
This XML schema contains a lot of redundancy.
By using a complex type to stand for the type of an empty
element, we can reduce the clutter.
File: ele.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="emptyType"/>
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="one" type="emptyType"
maxOccurs="unbounded"/>
<xs:choice maxOccurs="unbounded">
<xs:element name="two" type="emptyType"/>
<xs:element name="three" type="emptyType"/>
</xs:choice>
<xs:element name="four" type="emptyType"
minOccurs="0" maxOccurs="unbounded"/>
XML Schemas
11
<xs:sequence maxOccurs="unbounded">
<xs:element name="five" type="emptyType"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="six" type="emptyType"/>
</xs:sequence>
<xs:choice minOccurs="0">
<xs:element name="one" type="emptyType" />
<xs:element name="two" type="emptyType"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The XML Schema still contains redundancy since multiple
instances of the elements one and two are defined with the
same types.
Referencing Elements
Elements that are defined at the top level inside of the
xs:schema element are visible throughout the document and
can be referenced inside any element definition by using the
ref attribute in place of the name attribute.
This referencing mechanism can be used to reduce the kind of
redundancy we have just seen, and in addition it provides a
way to organize an XML Schema definition so that it is easier
to read.
References can be made to simple type elements as well as
complex type elements.
The example on the next page shows the use of references.
12
XML Schemas
File: eleref.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="emptyType"/>
<xs:element name="one" type="emptyType"/>
<xs:element name="two" type="emptyType"/>
<xs:element name="three" type="emptyType"/>
<xs:element name="four" type="emptyType"/>
<xs:element name="five" type="emptyType"/>
<xs:element name="six" type="emptyType"/>
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element ref="one" maxOccurs="unbounded"/>
<xs:choice maxOccurs="unbounded">
<xs:element ref="two"/>
<xs:element ref="three"/>
</xs:choice>
<xs:element ref="four" minOccurs="0"
maxOccurs="unbounded"/>
<xs:sequence maxOccurs="unbounded">
<xs:element ref="five" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element ref="six"/>
</xs:sequence>
<xs:choice minOccurs="0">
<xs:element ref="one"/>
<xs:element ref="two"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML Schemas
13
Grouping Elements
References can be made to blocks of XSD code as well as to
element definitions.
First we collect a section of well-formed code using the XML
Schema element xs:group with an attribute that gives the
group a name.
Then at positions in the definition where we wish to insert the
code, we provide an xs:group element with a ref attribute that
specifies the group definition.
File: products.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="productGroup">
<xs:sequence>
<xs:choice>
<xs:element name="productCode" type="xs:string"/>
<xs:element name="stockNum" type="xs:string"/>
</xs:choice>
<xs:element name="price" type="xs:decimal"/>
</xs:sequence>
</xs:group>
:
<xs:complexType name="exportType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:group ref="productGroup"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="localGoodsType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:group ref="productGroup"/>
</xs:sequence>
</xs:complexType>
:
</xs:schema>
14
XML Schemas
Summary of Strategies
We have three ways to organize an XML Schema definition.
Define the type of each element (other than those with a
predefined XSD type) using an anonymous type defined
as the content of the element definition.
Define a series of named complex types (later simple
types also) at the top level of the XML Schema document
and use those names to indicate the types to be used for
the elements.
Define a series of elements and groups of code at the
top level of the Schema definition and then refer to those
element definitions using the attribute ref when specifying
the descendants of the root node of the XML document
type being defined.
Many XML Schema definitions use a combination of these
three techniques.
Mixed Content
Mixed content refers to the situation where an element has
both text and elements in its content.
Because of the sub-elements, the element being defined
must be a complex type.
To allow mixed content in an element definition, simply add an
attribute mixed to the xs:complexType starting tag that
asserts:
mixed="true"
This attribute has a default value of "false".
XML Schemas
15
Example Fragment
DTD Specification
<!ELEMENT narrative (#PCDATA | bold | italics | underline)*>
<!ELEMENT bold (#PCDATA)>
<!ELEMENT italics (#PCDATA)>
<!ELEMENT underline (#PCDATA)>
XML Schema Specification
<xs:element name="narrative">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurrs="unbounded">
<xs:element name="bold" type="xs:string"/>
<xs:element name="italics" type="xs:string/>
<xs:element name="underline" type="xs:string/>
</xs:choice>
</xs:complexType>
</xs:element>
The following element can be validated relative to the previous
XSD fragment.
<narrative>
Higher beings from <italics>outer space</italics> may
not want to tell us the <underline>secrets of life
</underline>because we're not ready. But maybe they'll
change their tune after a little <bold>torture</bold>.
<italics>Jack Handey</italics>
</narrative>
16
XML Schemas
Attribute Specifications
Attributes are defined using the element xs:attribute with its
own attributes, name and type.
The xs:attribute element must lie inside a complex type
specification, but the type of the new attribute (its value) must
be a simple type. Attribute values may not contain elements
or other attributes.
17
18
XML Schemas
<xs:element name="bookTitle">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="author" type="xs:string"/>
<xs:attribute name="isbn" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element>
name
type
use
"required"
"optional" (the default)
"prohibited"
default
fixed
ref
XML Schemas
(rarely used)
19
XML Schemas
A general entity
<xs:element name="myEmail" type="xs:string"
fixed="slonnegr@cs.uiowa.edu"/>
Now use <myEmail/> in an XML document as a kind of
entity reference.
Restrictions
To restrict an existing complex type, first replicate the original
definitions of elements and attributes, and then tighten
relevant constraints in the copied model.
Kinds of Restrictions
Change minOccurs or maxOccurs to be more restrictive.
Omit an element (by setting maxOccurs="0") or an attribute
(by setting use="prohibited") when the item was previously
optional.
Set a default value of an element or an attribute that
previously had none.
Set a fixed value for an element or attribute that previously
had none.
Observe that any instance that conforms to the restricted
schema will also conform to the original base model.
XML Schemas
21
Example: Restriction
This XML Schema allows a sequence of part items followed by
a sequence of rpart items.
Each part and rpart item consists of a sequence of elements,
partNum, partName, description, source, and use, as well as
two attributes.
The definition of the rpart elements has restrictions on the
elements and attributes of the part elements.
File: restrict.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="parts">
<xs:complexType>
<xs:sequence>
<xs:element name="part" type="partType"
maxOccurs="5"/>
<xs:element name="rpart" type="restrictedPartType"
maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
22
XML Schemas
<xs:complexType name="partType">
<xs:sequence>
<xs:element name="partNum" type="xs:string"/>
<xs:element name="partName" type="xs:string"
minOccurs="0" maxOccurs="3"/>
<xs:element name="description" type="xs:string"
minOccurs="0"/>
<xs:element name="source" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
<xs:element name="use" type="xs:string" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="productCode" type="xs:string"/>
<xs:attribute name="date" type="xs:date"/>
</xs:complexType>
<xs:complexType name="restrictedPartType">
<xs:complexContent>
<xs:restriction base="partType">
<xs:sequence>
<xs:element name="partNum" type="xs:string"/>
<xs:element name="partName" type="xs:string"
minOccurs="1" maxOccurs="1"/>
<xs:element name="description" type="xs:string"
minOccurs="0" maxOccurs="0"/>
<xs:element name="source" type="xs:string"
minOccurs="0" maxOccurs="1"
default="Hills"/>
<xs:element name="use" type="xs:string"
minOccurs="0" maxOccurs="2"/>
</xs:sequence>
<xs:attribute name="productCode" type="xs:string"/>
<xs:attribute name="date" type="xs:date"
use="prohibited"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
XML Schemas
23
File: restrict.xml
<?xml version="1.0"?>
<parts>
<part productCode="p454">
<partNum>pq1245</partNum>
<partName>Widget</partName>
<description>A strange thing</description>
<source>Hills</source>
<source>Lone Tree</source>
<use>extraction</use>
</part>
<part productCode="p184" date="2005-02-23">
<partNum>ts8765</partNum>
<partName>Dohickey</partName>
<partName>Dingbat</partName>
<description>A stranger thing</description>
<source>Tiffin</source>
<source>Iowa City</source>
<source>Coralville</source>
<use>production</use>
<use>exclusion</use>
</part>
<rpart>
<partNum>ak9255</partNum>
<partName>Thingamabobber</partName>
<source>Coralville</source>
<use>deletion</use>
</rpart>
<rpart productCode="p095">
<partNum>do7752</partNum>
<partName>Whatsit</partName>
<source>Iowa City</source>
</rpart>
<rpart productCode="p885">
<partNum>yy4396</partNum>
<partName>Wingding</partName>
<use>insertion</use>
<use>accumulation</use>
</rpart>
</parts>
24
XML Schemas
Validation
% xmllint --noout --schema restrict.xsd restrict.xml
restrict.xml validates
The first time I tried this validation, it returned incorrect error
messages, which indicated that xmllint did not understand
attributes in a restricted complex type.
Since then a new version of xmllint must have been installed
because it works now.
XML Schemas
25
Extensions
An extension of a complex type always takes the form of new
components appended to the end of the existing model.
An implied sequence element encloses both models so as to
enforce the rule that all models have a single topmost
grouping construct.
Neither the original data type nor the new one may include the
xs:all element because it must always be at the top of a
content model.
Both parts must agree on allowing mixed content.
File: expand.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="partType">
<!-- Same as in restrict.xsd -->
</xs:complexType>
<xs:complexType name="expandedPartType">
<xs:complexContent>
<xs:extension base="partType">
<xs:sequence>
<xs:element name="color" type="xs:string"/>
<xs:element name="supplier" type="xs:string"
minOccurs="0" maxOccurs="5"/>
</xs:sequence>
<xs:attribute name="newAtt" type="xs:string/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
26
XML Schemas
<xs:element name="parts">
<xs:complexType>
<xs:sequence>
<xs:element name="part" type="partType"
maxOccurs="5"/>
<xs:element name="epart" type="expandedPartType"
maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
File: expand.xml
<?xml version="1.0"?>
<parts>
<part productCode="p454">
<partNum>pq1245</partNum>
<partName>Widget</partName>
<description>A strange thing</description>
<source>Hills</source>
<source>Lone Tree</source>
<use>extraction</use>
</part>
<epart newAtt="att value">
<partNum>ak9255</partNum>
<partName>Thingamabobber</partName>
<partName>Dingbat</partName>
<description>A stranger thing</description>
<source>Tiffin</source>
<source>Iowa City</source>
<source>Coralville</source>
<use>deletion</use>
<color>puce</color>
</epart>
XML Schemas
27
Simple Types
XML Schema Definitions provide a rich set of predefined
primitive types along with a mechanism to customize these
types to create an accurate specification of XML documents.
The predefined types can be classified into several groups.
Numeric
Date and time
XML types
String
Boolean
URIs
Binary data
28
XML Schemas
Numeric Types
The number types are listed in the table below, showing the
range of values for each of the integer types.
Type
Range of Value
xs:byte
-128 to 127
xs:short
-32,768 to 32,767
xs:int
-2,147,483,648 to 2,147,483,647
xs:long
-9,223,372,036,854,775,808 to
9,223,372,036,854,775,807
xs:unsignedByte
0 to 255
xs:unsignedShort
0 to 65,535
xs:unsignedInt
0 to 4,294,967,295
xs:unsignedLong
0 to 18,446,744,073,709,551,615
xs:integer
- to +
xs:positiveInteger
1 to +
xs:negativeInteger
- to -1
xs:nonNegativeInteger
0 to +
xs:nonPositiveInteger
- to 0
xs:decimal
xs:float
xs:double
XML Schemas
29
Value Format
xs:date
ccyy-mm-dd
xs:time
hh:mm:ss.ssss
xs:datetime
ccyy-mm-ddThh:mm:ss.s
xs:gYear
ccyy
xs:gYearMonth
ccyy-mm
xs:gMonth
--mm
xs:gMonthDay
--mm-dd
xs:gDay
---dd
xs:duration
PnYnMnDTnHnMn.nS
XML Schemas
XML Types
To maintain backward compatibility with DTDs, XML Schemas
allow the following XML types.
Type
Description
xs:Name
xs:NCName
xs:QName
xs:NMTOKEN
xs:NMTOKENS
xs:ID
Same as xs:NCName
xs:IDREF
Same as xs:NCName
xs:IDREFS
xs:ENTITY
xs:ENTITIES
xs:language
xs:NOTATION
XML Schemas
31
String Types
In addition to the basic string type that corresponds to
PCDATA, XML Schemas allow two other types that instruct the
XML parser to modify the white space in the string.
Type
Description
xs:string
xs:normalizedString
xs:token
Miscellaneous Types
Here we have the boolean type, a URI type, and two binary
types.
32
Type
Sample Values
xs:boolean
false, true, 0, 1
xs:anyURI
http://www.cs.uiowa.edu/
myDirectory/files/info
xs:hexBinary
87B3EA93C5
xs:base64Binary
jdU7+3hfu/Sm
Copyright 2006 by Ken Slonneger
XML Schemas
Base 64 Encoding
A 65-character subset of ascii is used, enabling 6 bits to be
represented per printable character.
The encoding process translates 24-bit groups of input bits
into output strings of four encoded characters.
Proceeding from left to right, a 24-bit input group is formed by
concatenating three 8-bit input groups. These 24 bits are then
treated as four 6-bit groups, each of which is translated into a
single character in the base 64 alphabet.
Each 6-bit group can be used as an index into an array of 64
printable characters. The character referenced by the index is
placed in the output string.
Base 64 Alphabet
Value Code
0
A
1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
10
K
11
L
12
M
13
N
14
O
15
P
16
Q
XML Schemas
Value Code
17
R
18
S
19
T
20
U
21
V
22
W
23
X
24
Y
25
Z
26
a
27
b
28
c
29
d
30
e
31
f
32
g
33
h
Value Code
34
i
35
j
36
k
37
l
38
m
39
n
40
o
41
p
42
q
43
r
44
s
45
t
46
u
47
v
48
w
49
x
50
y
Value Code
51
z
52
0
53
1
54
2
55
3
56
4
57
5
58
6
59
7
60
8
61
9
62
+
63
/
(pad)
33
34
XML Schemas
Facets
Facets are XML Schema elements whose value attribute
defines the restriction indicated by the facet.
XML Schema has twelve facets for restricting simple types.
Facet
Applicable To
xs:minInclusive
xs:maxInclusive
xs:minExclusive
xs:maxExclusive
xs:totalDigits
numbers only
xs:fractionDigits
numbers only
xs:length
string types
xs:minLength
string types
xs:maxLength
string types
xs:enumeraton
most types
xs:pattern
most types
xs:whitespace
most types
XML Schemas
35
Restrictions on Range
A named type for ages
<xs:simpleType name="ageType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
An anonymous type for an element
<xs:element name="temperature">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="-40"/>
<xs:maxInclusive value="130"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
A type for afternoon
<xs:simpleType name="pmType">
<xs:restriction base="xs:time">
<xs:minInclusive value="12:00:00"/>
<xs:maxInclusive value="23:59:59"/>
</xs:restriction>
</xs:simpleType>
36
XML Schemas
Restrictions on Digits
A currency type
<xs:simpleType name="currencyType">
<xs:restriction base="xs:decimal">
<xs:fractionDigits value="2"/>
</xs:restriction>
</xs:simpleType>
A counting type
The totalDigits facet specifies the maximum number of digits.
<xs:simpleType name="countingType">
<xs:restriction base="xs:nonNegativeInteger">
<xs:totalDigits value="6"/>
</xs:restriction>
</xs:simpleType>
Restrictions on Length
A password type
<xs:simpleType name="passwordType">
<xs:restriction base="xs:string">
<xs:minLength value="8"/>
<xs:maxLength value="12"/>
</xs:restriction>
</xs:simpleType>
A zipcode type
<xs:simpleType name="zipType">
<xs:restriction base="xs:string">
<xs:length value="5"/>
</xs:restriction>
</xs:simpleType>
XML Schemas
37
38
XML Schemas
Patterns
A pattern is a string from the language of regular expressions
that defines a set of strings that are said to match the pattern.
Matches
a
E|F
E or F
EF
E*
E+
E?
XML Schemas
39
Matches
a, b, or c (simple class)
[^abc]
[a-zA-Z]
[a-d[m-p]]
[a-z&&[def]]
d, e, or f (intersection)
[a-z&&[^bc]]
[a-z&&[^k-p]]
Quantifiers
E{n}
E, exactly n times
E{n,}
E, at least n times
E{n,m}
40
XML Schemas
\d
A digit: [0-9]
\D
A non-digit: [^0-9]
\s
\S
[^\s]
\i
\I
[^\i]
\c
\C
[^\c]
\w
\W
[^\w]
Examples
A phone number type
<xs:simpleType name="phoneType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}-\d{4}"/>
</xs:restriction>
</xs:simpleType>
XML Schemas
41
A gender type
<xs:simpleType name="genderType">
<xs:restriction base="xs:string">
<xs:pattern value="male|female"/>
</xs:restriction>
</xs:simpleType>
or
<xs:simpleType name="genderType">
<xs:restriction base="xs:string">
<xs:pattern value="male"/>
<xs:pattern value="female"/>
</xs:restriction>
</xs:simpleType>
A zipcode type
<xs:simpleType name="zipType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{5}-[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>
An auto license type
<xs:simpleType name="licenseType">
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z]{3}\d{3}"/>
</xs:restriction>
</xs:simpleType>
42
XML Schemas
Lists of Items
A simple type can be defined as a list of items of some other
simple type.
The element xs:list has an attribute itemType for specifying
the kind of components in the list.
Example: A List of Exam Scores
<xs:simpleType name="scoreListType">
<xs:list itemType="xs:nonNegativeInteger"/>
</xs:simpleType>
<xs:element name="scores" type="scoreListType"/>
XML Schemas
43
XML element
<scores>82 96 74 68 80</scores>
Lists of xs:string items are dangerous since a string may have
spaces, and spaces are used to delimit the list.
A list type can be restricted using xs:length, xs:maxLength,
xs:minLength, or xs:emuneration.
<xs:simpleType name="shortScoreListType">
<xs:restriction base=" scoreListType ">
<xs:maxLength value="3"/>
</xs:restriction>
</xs:simpleType>
Note that these simple types being defined, including list
types, can be used as an element type, as an attribute type,
or can be created as a named type.
Unions
A simple type can be defined as the (disjoint) union of two
existing simple types.
The element xs:union has an attribute memberTypes whose
value is a space-separated list of simple types that have
already been defined.
44
XML Schemas
Example
Suppose we want to store exam scores, but in some
instances, the grade may not be available.
1. Define a type representing a missing score.
<xs:simpleType name="noScoreType">
<xs:restriction base="xs:string">
<xs:enumeration value="none"/>
</xs:restriction>
</xs:simpleType>
2. Define a union type of integer scores and missing scores.
<xs:simpleType name="scoreOrNoType">
<xs:union memberTypes ="xs:integer noScoreType"/>
</xs:simpleType>
3. Define a list of the union type.
<xs:simpleType name="scoreOrNoList">
<xs:list itemType ="scoreOrNoType "/>
</xs:simpleType>
4. Define a type whose values can be a list of scores (or
none) or can be a date on which we can expect the grades
to be made available.
<xs:simpleType name="scoresOrDateType">
<xs:union memberTypes ="xs:date scoreOrNoList"/>
</xs:simpleType>
XML Schemas
45
XML Schemas
<entry>
<name gender="male">
<first>Justin</first>
<last>Case</last>
</name>
<phone>354-9876</phone>
<city>Coralville</city>
</entry>
<entry>
<name gender="female">
<first>Pearl</first>
<middle>E.</middle>
<last>Gates</last>
</name>
<phone areaCode="319">335-4582</phone>
<city>North Liberty</city>
</entry>
<entry>
<name gender="female">
<first>Helen</first>
<last>Back</last>
</name>
<phone>337-5967</phone>
</entry>
</entries>
</phoneNumbers>
XML Schemas
47
File:phoneX.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="cityType">
<xs:restriction base="xs:string">
<xs:enumeration value="Iowa City"/>
<xs:enumeration value="Coralville"/>
<xs:enumeration value="North Liberty"/>
<xs:enumeration value="Hills"/>
<xs:enumeration value="Solon"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="nameType">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="middle" type="xs:string"
minOccurs="0"/>
<xs:element name="last" type="xs:string"/>
</xs:sequence>
<xs:attribute name="gender">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="male|female"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
<xs:simpleType name="phoneType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}-\d{4}"/>
</xs:restriction>
</xs:simpleType>
48
XML Schemas
<xs:complexType name="entryType">
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="phone">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="phoneType">
<xs:attribute name="areaCode">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="city" type="cityType"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:element name="phoneNumbers">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="entries">
<xs:complexType>
<xs:sequence>
<xs:element name="entry" type="entryType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
XML Schemas
49
File: SaxCheck.java
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.XMLReader;
import org.xml.sax.SAXException;
import java.io.IOException;
public class SaxCheck
{
static public void main(String [] args)
{
SAXParserFactory factory =
SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
50
XML Schemas
try
{ SAXParser saxParser = factory.newSAXParser();
saxParser.setProperty(
"http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setErrorHandler(new MyErrorHandler());
xmlReader.parse("" + new File(args[0]).toURL());
System.out.println("The XML document is valid.");
}
catch (ParserConfigurationException e)
{ System.out.println("Parser configuration error"); }
catch (SAXException e)
{ System.out.println("Parsing error."); }
catch (IOException e)
{ System.out.println("IO error."); }
Execution
% java SaxCheck phoneX.xml
The XML document is valid.
Now change the first areaCode attribute value to "xyz" and try
again.
% java SaxCheck phoneB.xml
Error:
cvc-pattern-valid: Value 'xyz' is not facet-valid with respect
to pattern '\d{3}' for type 'null'.
Line 11
Column 31
Document file:///mnt/nfs/fileserv/fs3/slonnegr/SaxCheck/phoneB.xml
Parsing error.
XML Schemas
51
Type Hierarchy
anyType
| all complex types
| anySimpleType
| duration
| dateTime
| time
| date
| gYearMonth
| gYear
| gMonthDay
| gDay
| gMonth
| boolean
| base64Binary
| hexBinary
| float
| double
| anyURI
| QName
| NOTATION
| decimal
|
| integer
|
| nonPositiveInteger
|
|
| negativeInteger
|
| long
|
|
| int
|
|
| short
|
|
| byte
|
| nonNegativeInteger
|
| positiveInteger
| string
| normalizedString
| token
| language
| NMTOKEN
| Name
|-- NCName
|-- ID
|-- IDREF
|-- ENTITY
52
XML Schemas