Distributed Document-Based Systems

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 38

Distributed Document-Based

Systems
Chapter 11
The World Wide Web

Overall organization of the Web.


Document Model (1)
<HTML> <!- Start of HTML document -->
<BODY> <!- Start of the main body
-->
<H1>Hello World/H1> <!- Basic text to be displayed -->
<P> <!- Start of a new paragraph -->
<SCRIPT type = "text/javascript"> <!- identify scripting language -->
document.writeln ("<H1>Hello World</H1>; // Write a line of text
</SCRIPT> <!- End of scripting section
-->
</P> <!- End of paragraph section -->
</BODY> <!- End of main body
-->
</HTML> <!- End of HTML section
-->
A simple Web page embedding a script written in JavaScript.
Document Model (2)
(1) <!ELEMENT article (title, author+,journal)>
(2) <!ELEMENT title (#PCDATA)>
(3) <!ELEMENT author (name, affiliation?)>
(4) <!ELEMENT name (#PCDATA)>
(5) <!ELEMENT affiliation (#PCDATA)>
(6) <!ELEMENT journal (jname, volume, number?, month? pages, year)>
(7) <!ELEMENT jname (#PCDATA)>
(8) <!ELEMENT volume (#PCDATA)>
(9) <!ELEMENT number (#PCDATA)>
(10) <!ELEMENT month (#PCDATA)>
(11) <!ELEMENT pages (#PCDATA)>
(12) <!ELEMENT year (#PCDATA)>

An XML definition for referring to a journal article.


Document Model (3)
(1) <?xml = version "1.0">
(2) <!DOCTYPE article SYSTEM "article.dtd">
(3) <article>
(4) <title> Prudent Engineering Practice for Cryptographic Protocols</title>
(5) <author><name>M. Abadi</name></author>
(6) <author><name>R. Needham</name></author>
(7) <journal>
(8) <jname>IEEE Transactions on Software Engineering</jname>
(9) <volume>22</volume>
(10) <number>12</number>
(11) <month>January</month>
(12) <pages>6 – 15</pages>
(13) <year>1996</year>
(14) </journal>
(15) </article>

An XML document using the XML definitions from previous slide


Document Types
Type Subtype Description
Text Plain Unformatted text
HTML Text including HTML markup commands
XML Text including XML markup commands
Image GIF Still image in GIF format
JPEG Still image in JPEG format
Audio Basic Audio, 8-bit PCM sampled at 8000 Hz
Tone A specific audible tone
Video MPEG Movie in MPEG format
Pointer Representation of a pointer device for presentations
Application Octet-stream An uninterrupted byte sequence
Postscript A printable document in Postscript
PDF A printable document in PDF
Multipart Mixed Independent parts in the specified order
Parallel Parts must be viewed simultaneously

Six top-level MIME types and some common subtypes.


Architectural Overview (1)

The principle of using server-side CGI programs.


Architectural Overview (2)
(1) <HTML>
(2) <BODY>
(3) <P>The current content of <pre>/data/file.txt</PRE>is:</P>
(4) <P>
(5) <SERVER type = "text/javascript");
(6) clientFile = new File("/data/file.txt");
(7) if(clientFile.open("r")){
(8) while (!clientFile.eof())
(9) document.writeln(clientFile.readln());
(10) clientFile.close();
(11) }
(12) </SERVER>
(13) </P>
(14) <P>Thank you for visiting this site.</P>
(15) </BODY>
(16) </HTML>

An HTML document containing a JavaScript to be executed by the server


Architectural Overview (3)

Architectural details of a client and server in the Web.


HTTP Connections

a) Using nonpersistent connections.


b) Using persistent connections
HTTP Methods

Operation Description
Head Request to return the header of a document
Get Request to return a document to the client
Put Request to store a document
Post Provide data that is to be added to a document (collection)
Delete Request to delete a document

Operations supported by HTTP.


HTTP Messages (1)

HTTP request message


HTTP Messages (2)

HTTP response message.


HTTP Messages (3)
Header Source Contents
Accept Client The type of documents the client can handle
Accept-Charset Client The character sets are acceptable for the client

Some HTTP Accept-Encoding Client The document encodings the client can handle
Accept-Language Client The natural language the client can handle
message Authorization Client A list of the client's credentials

headers. WWW-Authenticate
Date
Server
Both
Security challenge the client should respond to
Date and time the message was sent
ETag Server The tags associated with the returned document
Expires Server The time how long the response remains valid
From Client The client's e-mail address
Host Client The TCP address of the document's server
If-Match Client The tags the document should have
If-None-Match Client The tags the document should not have
Tells the server to return a document only if it has been
If-Modified-Since Client
modified since the specified time
Tells the server to return a document only if it has not been
If-Unmodified-Since Client
modified since the specified time
Last-Modified Server The time the returned document was last modified
A document reference to which the client should redirect its
Location Server
request
Referer Client Refers to client's most recently requested document
Upgrade Both The application protocol the sender wants to switch to
Warning Both Information about the status of the data in the message
Clients (1)

Using a plug-in in a Web browser.


Clients (2)

Using a Web proxy when the browser does not speak FTP.
Servers

General organization of the Apache Web server.


Server Clusters (1)

The principle of using a cluster of workstations to implement a Web service.


Server Clusters (2)

(a) The principle of TCP handoff.


Server Clusters (3)

(b) A scalable content-aware cluster of Web servers.


Uniform Resource Locators (1)

Often-used structures for URLs.


b) Using only a DNS name.
c) Combining a DNS name with a port number.
d) combining an IP address with a port number.
Uniform Resource Locators (2)
Name Used for Example

http HTTP http://www.cs.vu.nl:80/globe

ftp FTP ftp://ftp.cs.vu.nl/pup/minx/README

file Local file file:/edu/book/work/chp/11/11

data Inline data data:text/plain;charset=iso-8859-7,%e1%e2%e3

telnet Remote login telnet://flits.cs.vu.nl

tel Telephone tel:+31201234567

modem Modem modem:+31201234567;type=v32

Examples of URLs.
Uniform Resource Names

The general structure of a URN


Web Proxy Caching

The principle of cooperative caching


Server Replication

The principle working of the Akami CDN.


Security (1)

The position of TLS in the Internet protocol stack.


Security (2)

TLS with mutual authentication.


Lotus Notes

The general organization of a Lotus Notes system.


Document Model
Note type Category Description

Document Data A user-oriented document such as a Web page

Structure for creating, editing, and viewing a


Form Design
document
Defines a field shared between a form and
Field Design
subforms

View Design Structure for displaying a collection of documents

ACL Administration Contains an access control list for the database

ReplFormula Administration Describes the replication of the database

Examples of different types of notes.


Processes (1)

The general organization of a Domino server.


Processes (2)

Request handling in a cluster of Domino servers.


Naming

A Notes URL for accessing a database.


Identifiers
Identifier Scope Description
Globally unique identifier assigned to each
Universal ID World
note
Identifier for a note, but includes history
Originator ID World
information
Database ID Server Time-dependent identifier for a database
Identifier of a note relative to a database
Note ID Database
instance
Timestamp used to identify the same copies
Replica ID World
of a database

Some major identifiers in Notes.


Replication
Scheme Description
A replicator task pulls updates in from a target server, and pushes
Pull-push
its own updates to that target as well
A replicator task pulls in updates from a target server, and
Pull-pull
responds to update fetch requests from that target
A replicator task only pushes its own updates to a target server, but
Push-only
does not pull in any updates from the target
A replicator only pulls in updates from a target server, but does not
Pull-only
push any of its own updates to that target

Replication schemes in Notes.


Conflict Resolution

Safely merging two documents with conflicting OIDs.


Authentication: Validating Certificates

Public-key validation in Notes


Access Control
Part Description

Servers ACLs specifying access rights for servers and ports

Workstations Lists specifying execution rights for scripts and such

Databases ACLs specifying permissions for different types of users

Files ACLs used for controlling access by Web clients

Design notes ACLs to control the presentation and such of documents

Documents ACLs to control read and and write access to documents

Parts in Notes subject to access control.


Comparison of Web & Lotus Notes
Issue WWW Notes
Basic model Marked-up text List of text items (note)
Extensions Multimedia, scripts Multimedia, scripts
Storage model File oriented Database oriented
Network comm. HTTP RPC, E-mail
Interprocess comm. Operating sys. dependent Notes Object Services (NOS)
Client process Browser, Editor Browser, Design editor
Client extensions Plug-ins In basic client system
Server process Comparable to file server Comparable to database server
Server extensions Servlets, CGI programs Server tasks
Server clusters Transparent Nontransparent
Naming URNs, URLs URLs, identifiers
Synchronization Mainly local Mainly local
Caching Advanced Not documented
Replication Mirroring, CDNs Lazy
Fault tolerance Reliable comm. & clusters Clusters
Recovery No explicit support Single server
Authentication Mainly TLS Certificate validation
Access control Server dependent Extensive ACLs

You might also like