XML Metadata: Dep't of Information Science: INSC2092

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 62

Chapter 3

Xml and Metadata

Dep’t of Information Science: INSC2092


1
Lessons outline:
 What is XML?
 What is metadata?
 Types of metadata.
 Function of metadata
 Benefits of metadata
 Metadata lifecycle
 What is RDF?
 What is Dublin core?
 What is FRBR ?
2 07/11/2022
XML
What is XML?

 XML stands for extensible Markup Language.


 XML is designed to transport and store data.
 XML is important to know, and very easy to learn.
 XML was designed to carry data, not to display data.
 XML tags are not predefined. You must define your own tags
 XML is designed to be self-descriptive.
 An XML document resides in its own file with an ‘.xml’
extension.
3 07/11/2022
XML
cont’d…

The Difference Between XML and HTML

XML and HTML were designed with different goals:


 XML was designed to transport and store data, with focus on
what data is.
 HTML was designed to display data, with focus on how data
looks.
 XML is not a replacement for HTML.
 HTML is about displaying information, while XML is about
carrying information.

4 07/11/2022
XML cont’d…

XML Document Example


<?xml version="1.0"?>

<note>

    <to>Tove</to>

    <from>Jani</from>

    <heading>Reminder</heading>

    <body>Don't forget me this weekend!</body>

</note>

5 07/11/2022
XML cont’d…

XML is used in many aspects of web development, often to simplify data


storage and sharing.

XML Separates Data from HTML:


 If you need to display dynamic data in your HTML document, it will take a
lot of work to edit the HTML each time the data changes.
 With XML, data can be stored in separate XML files. This way you can
concentrate on using HTML/CSS for display and layout, and be sure that
changes in the underlying data will not require any changes to the HTML.
 With a few lines of JavaScript code, you can read an external XML file and
update the data content of your web page.

6 07/11/2022
XML cont’d…
XML Simplifies Data Sharing:
 In the real world, computer systems and databases contain
data in incompatible formats.
 XML data is stored in plain text format. This provides a
software- and hardware-independent way of storing data.
 This makes it much easier to create data that can be shared
by different applications.

7 07/11/2022
XML cont’d….

XML Simplifies Data Transport:


 One of the most time-consuming challenges for developers is
to exchange data between incompatible systems over the
Internet.
 Exchanging data as XML greatly reduces this complexity,
since the data can be read by different incompatible
applications.

8 07/11/2022
XML cont’d…

XML Simplifies Platform Changes:


 Upgrading to new systems (hardware or software platforms),
is always time consuming. Large amounts of data must be
converted and incompatible data is often lost.
 XML data is stored in text format. This makes it easier to
expand or upgrade to new operating systems, new
applications, or new browsers, without losing data.

9 07/11/2022
XML
cont’d…

XML Makes Your Data More Available


 Different applications can access your data, not only in
HTML pages, but also from XML data sources.
 With XML, your data can be available to all kinds of
"reading machines" (Handheld computers, voice
machines, news feeds, etc), and make it more available
for blind people, or people with other disabilities.

10 07/11/2022
XML
cont’d…

Main Components of an XML Document:


 Elements: <hello>
 Attributes: <item id=“33905”>
 Entities: &lt; (<)

11 07/11/2022
XML
cont’d…

Xml basic syntax rules:


 XML is case sensitive
 All start tags must have end tags
 Elements must be properly nested
 XML declaration is the first statement
 Every document must contain a root element
 Attribute values must have quotation marks
 Certain characters are reserved for parsing

12 07/11/2022
XML
cont’d…

XML Tags are Case Sensitive:


 The tag <Letter> is different from the tag <letter>.
 Opening and closing tags must be written with the
same case:
 <Message>This is incorrect</message>
<message>This is correct</message>

13 07/11/2022
XML
cont’d…

All start tags must have end tags:


 In HTML, some elements do not have to have a closing tag:
 <p>This is a paragraph.
<br>
 In XML, it is illegal to omit the closing tag. All elements
must have a closing tag:
 <p>This is a paragraph.</p>
<br />

14 07/11/2022
XML cont’d…

XML Elements Must be Properly Nested:

 In HTML, you might see improperly nested elements:


 <b><i>This text is bold and italic</b></i>
 In XML, all elements must be properly nested within each other:
 <b><i>This text is bold and italic</i></b>
 In the example above, "Properly nested" simply means that since
the <i> element is opened inside the <b> element, it must be
closed inside the <b> element.

15 07/11/2022
XML cont’d…

XML Documents Must Have a Root Element:


 XML documents must contain one element that is the parent of
all other elements. This element is called the root element.

<root>
  <child>
    <sub child>.....</sub child>
  </child>
</root>

16 07/11/2022
XML cont’d…

XML Attribute Values Must be Quoted:


 XML elements can have attributes in name/value pairs just like in HTML.
 In XML, the attribute values must always be quoted.
 Studythe two XML documents below. The first one is incorrect, the second
is correct:
<note date=12/11/2007>
  <to>Tove</to>
  <from>Jani</from>
</note>
<note date="12/11/2007">
  <to>Tove</to>
  <from>Jani</from>
</note>
 Theerror in the first document is that the date attribute in the note element
is not quoted.

17 07/11/2022
XML cont’d…

Entity References:
 Some characters have a special meaning in XML.
 If you place a character like "<" inside an XML element, it will
generate an error because the parser interprets it as the start
of a new element.
 This will generate an XML error:
 <message>if salary < 1000 then</message>
 To avoid this error, replace the "<" character with an entity
reference:
 <message>if salary &lt; 1000 then</message>
 There are 5 predefined entity references in XML:
 &lt; < less than &gt; > greater than &amp; & ampersand 
&apos; ' apostrophe &quot; " quotation mark

18 07/11/2022
cont’d…

Comments in XML:
 The syntax for writing comments in XML is similar
to that of HTML.

<!-- This is a comment -->

19 07/11/2022
cont’d…

Common Errors for Element Naming:


 Do not use white space when creating names for
elements
 Element names cannot begin with a digit, although
names can contain digits
 Only certain punctuation allowed: periods, colons, and
hyphens.

20 07/11/2022
Metadata

What is metadata?

 Metadata: is " data about data". It describes the content,


quality, condition, and other characteristics about data written
in a standard format.
 Metadata helps a person to locate and understand data.
 Metadata provides data history. It describes the Who, What,
Where, Why and How of the data.

21 07/11/2022
Metadata cont’d…

 Who created and maintains the data?


 What is the content and structure of the data?
 When was the data collected? Published?
 Where is the geographic location? Storage location?
 Why were the data created?
 How were the data produced? Processed? Raw or modeled data?

 
22 07/11/2022
Metadata cont’d…

Who Uses Metadata?


 Metadata Users :
 Metadata has use for data developers, data managers, data
users, and organizations.
 Standardized metadata documentation is searchable, allowing
data developers and users to search for existing data and avoid
data duplication.

23 07/11/2022
Metadata cont’d…

 It provides a place for sharing and publicizing data production


efforts. These both reduce workloads and increase efficiency.
 Data users are provided with the parameters of the dataset to
evaluate data and if it is applicable to a project.
 It allows searching for specific geographic locations and gives
information on data acquisition and transfer.

24 07/11/2022
cont’d…

 In an organization, metadata increases and protects the value of


its investment in data.
 Data productions and planned acquisition can be managed
through metadata.
 Quality control, data restrictions and uses can be applied to the
entire data in holdings.

25 07/11/2022
cont’d…

 Finally, metadata documentation transcend people


and time. Staff turnover and balancing of multiple
projects can be mitigate with metadata, providing data
permanence and the documentation of institutional
knowledge.

26 07/11/2022
cont’d…

Benefits of Metadata:

For Data Developers:


 Avoids duplication
 Shares reliable information
 Publicizes efforts
 Reduces workload
 Documenting data is critical to preserving its usefulness over

time; without proper documentation, no data set is complete.

27 07/11/2022
cont’d…

For Data Users:


 Makes it possible for data users to search, retrieve, and
evaluate data set information both inside and outside
organizations.
 Finding data: determine which data exist for a geographic
location and/or topic
 Applicability: determine if a dataset meets your needs
 Access and Transfer: acquire the dataset you identified
 Data use: how data can be used; if it has restricted use, etc.

28 07/11/2022
cont’d…

For Organizations:

 Organizes and maintains an organization's investment in data


 Provides for the documentation of data processing steps, quality
control, definitions, data uses and restrictions, etc.
 Transcends people and time; offers data permanence and creates
institutional memory
 Saves time, money, frustration.

29 07/11/2022
Types of metadata

There are three main types of metadata:


 Descriptive metadata:- describes a resource for purposes such
as discovery and identification. It can include elements such as
title, abstract, author, and keywords.
 Structural metadata:- indicates how compound objects are put
together, for example, how pages are ordered to form chapters.

30 07/11/2022
cont’d…

 Administrative metadata:- provides information to help manage


a resource, such as when and how it was created, file type and
other technical information, and who can access it.

Two types of administrative metadata:


 Rights management metadata : which deals with intellectual
property rights, and
 Preservation metadata: which contains information needed to
archive and preserve a resource.

31 07/11/2022
Functions of Metadata

 Resource discovery :
 Allowing resources to be found by relevant criteria;
 Identifying resources;
 Bringing similar resources together;
 Distinguishing dissimilar resources;
 Giving location information.

 Organizing e-resources:
 Organizing links to resources based on audience or topic.
 Building these pages dynamically from metadata stored in
databases.

32 07/11/2022
cont’d…

 Facilitating interoperability :
 Using defined metadata schemes, shared transfer protocols,
and crosswalks between schemes, resources across the
network can be searched more seamlessly.
 Cross-system search. e.g., using Z39.50 protocol. (remote)
 Metadata harvesting. e.g. OAI protocol.

33 07/11/2022
cont’d…

 Digital identification :
 Elements for standard numbers, e.g., ISBN
 The location of a digital object may also be given using:
 a file name
 a URL
 some persistent identifiers, e.g., PURL (Persistent URL);
DOI (Digital Object Identifier) Combined metadata to act
as a set of identifying data, differentiating one object from
another for validation purposes.

34 07/11/2022
cont’d…

 Archiving and preservation:


 Challenges:
 Digital information is fragile and can be corrupted or altered;
 It may become unusable as storage technologies change.
 Metadata is key to ensuring that resources will survive and
continue to be accessible into the future. Archiving and
preservation require special elements:
 To track the lineage of a digital object,
 To detail its physical characteristics, and
 To document its behavior in order to emulate it in future
technologies.

35 07/11/2022
Metadata Life Cycle

 Collection: Identify metadata and capture into repository;


automate
 Maintenance: Put in place processes to synchronize metadata
automatically with changing data architecture; automate.
 Deployment: Provide metadata to users in the right form and
with the right tools; match metadata offered to specific needs of
each audience

36 07/11/2022
cont’d…

 Metadata Collection
 Right metadata at the right time
 Variety of collection strategies

 Sources
 potential sources of data for DW
 external data
 data structures
 Data Models: enterprise data model start point
 import from CASE tool
 correlate enterprise and warehouse models
37 07/11/2022
cont’d…

 Metadata Deployment:
 Warehouse developers need:
 physical structure information for data sources
 enterprise data model
 warehouse data model
 concerned with accuracy, completeness and flexibility of
metadata
 Need access to comprehensive impact analysis capabilities
 Need to defend against accuracy & integrity questions

38 07/11/2022
Dublin Core

 Dublin Core:-is an initiative to create a digital "library card


catalog" for the Web.
 it is made up of metadata elements (data that describes data)
that offer expanded cataloging information and improved
document indexing for search engine programs.

39 07/11/2022
Dublin Core Metadata Elements

 Title (the name given the resource)

 Creator (the person or organization responsible for the content)


 Subject (the topic covered)
 Description (a textual outline of the content)
 Publisher (those responsible for making the resource available)
 Contributor (those who added to the content)

40 07/11/2022
cont’d…

 Date: (when the resource was made available)


 Type: (a category for the content)
 Format:(how the resource is presented)
 Identifier :(numerical identifier for the content such as a URL).
 Source :(where the content originally derived from)
 Language :(in what language the content is written)
 Relation: (how the content relates to other resources for instance,
if it is a chapter in a book)
 Coverage :(where the resource is physically located) and
 Rights: (a link to a copyright notice).
41 07/11/2022
Dublin Core Metadata Initiative
Metadata Definition
The Basic:
Title : The name given to the resource by the creator or publisher
15 DC Elements Creator : The person responsible for the intellectual content of the
resource
Subject : The Topic of the resource
Content Description : A textual description of the content of the source
Publisher : The Entity responsible for making the resource available
Contributor : A person or organization (other than the Creator) who is
responsible for making significant contributions to the
intellectual content of the resource
Responsibility Date: A date associated with the creation or availability of the
resource
Type: The nature or genre of the content of the resource
Format: The physical or digital manifestation of the resource
Identifier: An unambiguous reference that uniquely identifies the
Manifestation resource within a given context
Source: A reference to a second resource from which the present
resource is derived
Language :The language of the intellectual content of the resource
Relation :A reference to a related resource, and the nature of its
relationship
Coverage : Spatial locations and temporal durations characteristic of
the content of the resource
42 Rights: Information about rights held in the resource
Three Basic Types Of Dublin Core Metadata Elements:

Content Intellectual Instantiation


Property
Coverage Contributor Date

Description Creator Format

Type Publisher Identifier

Relation Rights Language

Source

Subject

Title
43 07/11/2022
cont’d…

 The two most common forms of Dublin Core are:


 Simple Dublin Core:-
 expresses elements as attribute-value pairs using just the base
metadata elements from the Dublin Core Metadata Element
Set.(gives basic information).
 Qualified Dublin Core:-
 increases the specificity of metadata by adding information
about encoding schemes, enumerated lists of values, or other
processing clues. if more information is required, we use
Qualified Dublin Core.

44 07/11/2022
Characteristics of Dublin Core

 Simplicity

 Semantic Interoperability

 International Consensus

 Extensibility

 Metadata Modularity on the Web

45 07/11/2022
Simple Dublin Core
<metadata>
<dc:title>Cataloging cultural objects,</dc:title>
<dc:contributor>Baca, Murtha.</dc:contributor>
<dc:contributor>Harpring, Patricia./dc:contributor>
<dc:subject>Information organization</dc:subject>
<dc:subject>Metadata</dc:subject>
<dc:subject>Cultural property--Documentation</dc:subject>
<dc:subject>CC135.C37 2006</dc:subject>
<dc:subject>363.6</dc:subject>
<dc:date>2006</dc:date>
<dc:format>396 p.</dc:format>
<dc:type>Text</dc:type>
<dc:identifier>ISBN:0838935648</dc:identifier>
<dc:language>en</dc:language>
<dc:publisher>ALA Editions</dc:publisher>
</metadata>
46 07/11/2022
Qualified Dublin Core
<metadata>
<dc:title xml:lang="en">Cataloging cultural objects.</dc:title>
<dc:contributor>Baca, Murtha.</dc:contributor>
<dc:contributor>Harpring, Patricia.</dc:contributor>
<dc:subject xsitype="LCSH">Information organization</dc:subject>
<dc:subject xsitype="LCSH">Metadata</dc:subject>
<dc:subject xsitype="LCSH">Cultural property--Documentation</dc:subject>
<dc:subject xsitype="LCC">CC135.C37 2006</dc:subject>
<dc:subject xsitype="DDC">363.3</dc:subject>
<dc:date xsitype="W3CDTF">2006</dc:date>
<dcterms:extent>396 p.</dcterms:extent>
<dc:type xsitype="DCMIType">Text</dc:type>
<dc:identifier xsitype="URI">ISBN: 0838935648 </dc:identifier>
<dc:language xsitype="RFC3066">en</dc:language>
<dc:publisher>ALA Editions</dc:publisher>
<dcterms:audience>Catalogers</dcterms:audience>
</metadata>
47 07/11/2022
RDF
 RDF(Resource Description Framework):- is a W3C standard for
describing Web resources, such as the title, author, modification
date, content, and copyright information of a Web page.
 What is RDF?
 It stands for Resource Description Framework.
 is a framework for describing resources on the web
 is designed to be read and understood by computers
 is not designed for being displayed to people.
 RDF descriptions are not designed to be displayed on the web.
 is written in XML.
 is a part of the W3C's Semantic Web Activity
 is a W3C Recommendation.

48 07/11/2022
cont’d…

 Before using RDF you should have a basic understanding of the


following: HTML,XML XHTML and XML Namespaces.
 RDF - Examples of Use
 Describing properties for shopping items, such as price and availability.
 Describing time schedules for web events
 Describing information about web pages (content, author, created and
modified date)
 Describing content and rating for web pictures
 Describing content for search engines
 Describing electronic libraries

49 07/11/2022
RDF Rules:

RDF uses Web identifiers (URIs) to identify resources.


 RDF describes resources with properties and property values.

RDF Resource, Property, and Property Value


 A Resource:- is anything that can have a URI, such as
http://www.w3schools.com/rdf
 A Property:- is a Resource that has a name, such as "author" or
"homepage"

 A Property value:- is the value of a Property, such as "Jan Egil


Refsnes " or "http://www.w3schools.com" (note that a property
value can be another resource)

50 07/11/2022
cont’d…

 The following RDF document could describe the resource


"http://www.w3schools.com/rdf":

<?xml version="1.0"?>
<RDF>
  <Description about="http://www.w3schools.com/rdf">
 <author>Jan Egil Refsnes</author>

<homepage>http://www.w3schools.com</homepage>
  </Description>
</RDF>

51 07/11/2022
RDF Statements:

Thecombination of a Resource, a Property, and a Property value forms a


Statement (known as the subject, predicate and object of a Statement).
Let's look at some example statements to get a better understanding:
 Statement: "The author of http://www.w3schools.com/rdf is Jan Egil
Refsnes".
The subject of the statement above is: http://www.w3schools.com/rdf
The predicate is: author
The object is: Jan Egil Refsnes
 Statement:"The homepage of http://www.w3schools.com/rdf is
http://www.w3schools.com".
The subject of the statement above is: http://www.w3schools.com/rdf
The predicate is: homepage
The object is: http://www.w3schools.com

52 07/11/2022
cont’d…

 RDF Example:
 Here are two records from a CD-list:

Title Artist Country Company Price Year

Empire
Bob Dylan USA Columbia 10.90 1985
Burlesque

Bonnie CBS
Hide your heart UK 9.90 1988
Tyler Records

 Below is a few lines from an RDF document :

53 07/11/2022
cont’d…

 <?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
  <cd:artist>Bob Dylan</cd:artist>
  <cd:country>USA</cd:country>
  <cd:company>Columbia</cd:company>
  <cd:price>10.90</cd:price>
  <cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Hide your heart">
  <cd:artist>Bonnie Tyler</cd:artist>
  <cd:country>UK</cd:country>
  <cd:company>CBS Records</cd:company>
  <cd:price>9.90</cd:price>
  <cd:year>1988</cd:year>
</rdf:Description>
</rdf:RDF>
07/11/2022
cont’d…

 The first line of the RDF document is the XML declaration. The
XML declaration is followed by the root element of RDF
documents: <rdf:RDF>.
 The xmlns:rdf namespace, specifies that elements with the rdf
prefix are from the namespace
"http://www.w3.org/1999/02/22-rdf-syntax-ns#".
 The xmlns:cd namespace, specifies that elements with the cd
prefix are from the namespace "http://www.recshop.fake/cd#".
 The <rdf:Description> element contains the description of the
resource identified by the rdf:about attribute.
 The elements: <cd:artist>, <cd:country>, <cd:company>, etc.
are properties of the resource.

55 07/11/2022
FRBR

 FRBR stands for: (Functional Requirements for Bibliographic


Records).
 FRBR:- is a conceptual model of the bibliographic universe,
describing the entities in that universe, their attributes, and
relationships among the entities. These concepts are important
for us to communicate with other information communities in
the global environment as we together build systems to find,
identify, select, and obtain materials.

56 07/11/2022
cont’d…

 Entity-Relationship Model

 Entities: Group 1, 2, 3
 Relationships
 Attributes
 User Tasks:
 Find
 Identify
 Select
 Obtain
 Navigate

57 07/11/2022
cont’d…

 Objects of Interest (FRBR Entities) to Catalog Users:


 Work (an intellectual/artistic creation)
 Expression (the realization of a work or translation )
 Manifestations (the physical embodiment of an expression of a work)
 Items (a single exemplar of a manifestation)
 Relationships:
 Structural relationships
 Responsibility relationships
 Subject relationships
 Whole/Part relationships

58 07/11/2022
cont’d…

Implications for Catalogers:


 Consistent cataloging practice.
 Accuracy of data.
 Added entries & details, such as
 Bibliography/Bibliographic note
 Chronological table
 Glossary
 Illustrations/List of illustrations
 Introduction/Foreword
 Maps
 Notes
 Publisher’s note
 Table of contents
 Reproduction of original title page
 Reviews
 Analytics
 Uniform titles
 Authority Work
59 07/11/2022
cont’d…

What is the FRBR model?


 It is a generalized view of the bibliographic universe and is
intended to be independent of any cataloging code or
implementation.
 It’s a conceptual model and is not an application or an
implementation, which makes it difficult for some of us to
understand how it might really be applied to our real world.
 It’s not a data model, it’s not a metadata scheme, it’s not a
system design, but rather a conceptual model that can be used
as the foundation for development of systems.

60 07/11/2022
cont’d…

 The FRBR itself includes a description of the conceptual model


of the bibliographic universe: that is, the entities,
relationships, and attributes (or as we’d call them today, the
metadata) associated with each of the entities and relationships,
and it proposes a national level bibliographic record for all of
the various types of materials.
 It also reminds us of user tasks associated with the
bibliographic resources described in catalogs, bibliographies,
and other bibliographic tools.

61 07/11/2022
cont’d…

FRBR Entity- relationship model:


 Entities
 Relationships
 Attributes

Relationships

Entity 1 Entity 2

62 07/11/2022

You might also like