Database and Data Communication Network Systems
Database and Data Communication Network Systems
Database and Data Communication Network Systems
Numbers in parentheses indicate the pages on which the authors* contributions begin.
XIII
XIV
CONTRIBUTORS
CONTRIBUTORS
XV
XVI
CONTRIBUTORS
FOREWORD
Database and Data Communication Network Systems: Techniques and Applications is a significant, extremely timely contribution to the emerging field
of networked databases. Edited by Cornelius T. Leondes, a leading author in
the areas of system analysis and design, this trilogy addresses three key topics:
(1) database query, organization, and maintenance; (2) advanced database applications, and; (3) data communications and network architectures. This landmark work features 25 authoritative and up-to-date expositions from worldrenowned experts in industry, government, and academia.
The two most valuable features of this work are the breadth of material covered and the alignment of the many diverse topics toward a common
themethe integration of database and data communications technologies.
This work provides an extremely valuable reference for researchers and practitioners interested in the analysis, design, and implementation of distributed, networked databases. Collectively, the 25 chapters will assist the reader in building
the necessary background and in acquiring the most advanced tools to effectively engage in the evaluation of existing network and database systems and
in the design/development of new ones. Volume I covers data processing techniques and includes 9 chapters that describe the architectural characteristics
of a modern database system (including object-oriented structure, multilevel
organization, data compression, and security aspects) as well as the most efficient methods to query a database (e.g., data mining, query optimization,
fuzzy query processing, and geometric hashing). Volume II covers database
application techniques and includes 7 chapters that describe challenging
XVII
XVIII
FOREWORD
FOREWORD
XIX
the network architecture and how to best match the architecture choices to the
target appHcation.
Mario Gerla
Computer Science Department
University of California, Los Angeles
PREFACE
The use of databases and data communication networks (the Internet, LAN/
Local Area Networks, and WAN/Wide Area Networks) is expanding almost literally at an exponential rate and in all areas of human endeavor. This illustrates
their increasing importance and, therefore, the strong need for a rather comprehensive treatment of this broad area with a unique and well-integrated set
of volumes by leading contributors from around the world, including 12 countries in addition to the United States.
It is worth noting that this subject is entirely too broad to be treated adequately in a single volume. Hence, the focus of Database and Data Communication Network Systems: Techniques and Applications is broken into three
areas: database processing techniques are covered in Volume 1, database application techniques are covered in Volume 2, and data communication networks
are covered in Volume 3.
The result is that each respective contribution is a remarkably comprehensive and self-contained treatment of major areas of significance that collectively
provides a well-rounded treatment of the topics in these volumes. The authors
are all to be highly commended for their splendid contributions to this threevolume set, which will provide an important and uniquely comprehensive reference source for students, research workers, practitioners, computer scientists,
and others for years to come.
C. T. Leondes
XXI
EMERGING DATABASE
SYSTEM ARCHITECTURES
T I M O N C. D U
Department of Industrial Engineering, Chung Yuan Christian University, Chung Li,
Taiwan 32023; and
Department of Decision Sciences and Managerial Economics,
The Chinese University of Hong Kong, Shatin, NT Hong Kong
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
INTRODUCTION 2
HISTORY 6
RELATIONAL DATA MODEL 8
NEXT GENERATION DATA MODEL 10
A. Deductive Data Model 10
B. Object-Oriented Database 12
C. Distributed Database 18
D. Active Database 19
E. Other Database Models 21
HYBRID DATABASE TECHNOLOGIES 22
A. Deductive and Object-Oriented Database (DOOD) 22
B. The Joining of Active Databases and Object-Oriented
Databases 24
FUTURE STUDY RELATED TO
DATABASE TECHNOLOGIES 26
A. Software Engineering 26
B. Artificial Intelligence 27
C. Data Mining 29
D. User Interfaces 30
FUTURE DATABASE APPLICATIONS 31
A. Data Warehousing 31
B. On-line Analytic Processing (OLAP) 34
C. Decision-Support System 35
D. Engineering and Production Applications 36
SUMMARY 38
REFERENCES 38
A database is a repository that collects related data, and is different from the
traditional file approach, which is only responsible for data maintenance. Instead, a database should have the characteristics of maintaining persistent data,
preserving a self-describing nature, controlling insulation between programs
TIMON C. DU
and data, supporting multiple views of data, and sharing data among multitransactions. From the 1970s to 1990s, the relational database model has
replaced the hierarchical database model and network database model in most
application areas. Section II briefly describes the database system in historical perspective, and Section III presents the relational data model. Then other
database models such as the deductive data model, object-oriented data model,
distributed data model, active database, and other databases are discussed
in Section IV. Moreover, Section V shows the hybrid system, e.g., deductive and object-oriented database (DOOD) and the joint of active databases
and object-oriented databases, which integrates the advantages of individual
database models. For example, researchers in DOOD have attempted to combine the merits of the deductive database and the object-oriented database.
The future study of database technology will concentrate on design perspectives, knowledge exploration, and system interfaces. These topics are covered
in Section VI. Section VII introduces several important database applications,
such as data warehousing, on-line analytic processing, decision-support system,
and engineering and production applications. Finally, Section VIII presents a
summary.
INTRODUCTION
The database system is a system to support compact, speedy, ease-of-use, and
concurrent databases. Therefore, a database system includes hardware, software, data, and users. A database management system (DBMS) is software that
interacts with the operating system and is responsible for creating, operating,
and maintaining the data. A DBMS has several responsibilities [19]:
(a) Redundancy control. Duplicated data require data to be stored many
times, and thus there are problems of duplication of effort, waste storage, and
more importantly, data inconsistency.
(b) Restriction of unauthorized access. Some data may only be permitted to
be retrieved, or updated by specific users. A DBMS should provide a database security and authorization subsystem to maintain data in either (1) a discretionary
security mechanism, granting privileges to users, or (2) a mandatory security
mechanism, enforcing multilevel security toward various security classes.
(c) Database inference. New information may be needed to be deduced
from the data. The deductive database, data mining, and other technologies,
explained later, can produce new information from existing data.
(d) Representation of complex relationships among data. The relationships
between data reveal a lot of information. To well represent relationships in a
database model for efficient usage is an important duty of a DBMS. An entityrelationship (ER) model is normally adopted for data modeling.
(e) Retrieveval and update of related data easily and efficiently. Since the
data are maintained for future usage, the file organization, such as indexing
structure, is critical for retrieval and updating easily and efficiently.
(f) Enforcement of integrity constraints. The DBMS provides the capabilities to define and maintain the data integrity. The DBMS evaluates the data
States and triggers actions if the constraint is violated. In general, the data are
maintained in the ways of must be and must not be. Defining the data integrity
means that the data must be in certain states. For example, domain constraints
specify the value of attributes must be the domain values while the key constraints assert each table in the relational database must have the primary key
and entity integrity constraints affirm the key value cannot be null. Similarly, the
referential integrity constraint maintains the data consistency between tables.
On the other hand, the DBMS can make sure the data must not be in specified
states. For example, the basic salary must not be lower than a certain amount
and the employees' salary must not be increased higher than their immediate
supervisor. Most of these types of constraints are considered as business rules
and are preserved by the semantic constraints.
(g) Concurrency control. Concurrency control is a mechanism to allow
data to be accessed by many users concurrently so that the lost update problem,
the dirty read problem, and the incorrect summary problem will not happen
[19]. Most DBMS use a locking mechanism, timestamp protocol, multiversion
control, optimistic protocol, or any combination of them to achieve concurrent
data access. The higher degree of concurrency, the more heavier the operations
are on the DBMS.
(h) Backup and recovery. Data transaction may fail because of system
crash, transmission errors, local errors, concurrency control enforcement, or
catastrophes. The recovery process is a mechanism for recovering the committed data without losing any data. The recovery process can be done by DBMS
automatically or through restoring from backup copies by users.
In order to meet these responsibilities, a DBMS has several components
to define (specify the data types, structures, and constraints), construct (store
the data itself on some storage media), and manipulate (query, update, generate report) the data. The components include data definition language (DDL),
data manipulation language (DML), data security and integrity subsystem, data
recovery and concurrency mechanism, query optimization algorithm and performance monitoring functions.
It is an important job of a DBMS to respond to user requests. A query
language is a normal tool for processing a request. The ad hoc query languages
are a structured query language (SQL)-like language. SQL is a kind of declarative language, which specifies what the user wants instead of how the job
is done. The language is easy to use, and is composed of several key words,
i.e., select, from where, group by, having, order by, insert, update, delete, that
most relational DBMS have adopted the query standard. Corresponding to the
SQL language, relational algebra and relational calculus are also well-known
query languages. Any query can be written in either relational algebra, relational calculus, or SQL, and can be transformed from one to the other. The
combination of these three methods provides a strong tool set for theoretical
analysis or computing implementations. Furthermore, both relational algebra
and relational calculus support the query optimization and SQL is easy to
comprehend.
As an example, find which options are unavailable on a 2002 Ford Explorer using the database represented by the ER model. If the relational schema
TIMON C. DU
(tables) is
option (CODE, DESCRIPTION,
PRICE)
avaiLopt (MAKE, MODEL, YEAR, OPTCODE, STD.OPT),
The relational algebra is written as
^^^''^" {CODE,
DESCRIPTION,
PRICE)
- ^ (CODE,
^^ MAKE = 'Ford'
DESCRIPTION,
MODE L='Explorer'
PRICE)
YEAR='2002'
^^''^'^-''P^OFrCODE^CODE^'P^'''''^),
where two relational tables, option and avaiLopt, are joined (oo) based on
attribute OPTCODE = CODE, then select (a) using the conditions oiMake =
Ford, Model = Explorer, and Year = 2002, then projected (n) to leave only
three attributes CODE, DESCRIPTION, and PRICE, Finally, values for the
resulting tuples are returned from the option table.
Similarly, the SQL is written as
select CODE, DESCRIPTION, PRICE
from
option
where not exists
( select *
from
avaiLopt
where CODE = OPTCODE and
MAKE = 'FORD'
MODEL = 'Explorer'
YEAR = '2002');
The tuple relational calculus is
O
External Level
(External Schema)
External View 1
(User 1)
External View n
(User n)
External I Conceptual
Mapping
Conceptual
View
Conceptual Level
(Conceptual Schema)
Process
Internal / Conceptual
Mapping
Internal Level
(Internal Schema)
FIGURE I
Stored Database
Definition
(Meta-Data)
Stored Data
TIMON C. DU
design will not affect the database user's perspective. This feature allows the
database to be improved.
II. HISTORY
The first computer was invented in 1942 by Dr. John V. Atanasoff, a professor of University of Iowa, and his graduate student Clifford E. Berry for
solving physics problems [24], The computer was named ABC, which stands
for "Atanasoff Berry Computer." The computer used an electronic medium
and vacuum tubes to operate binary computation. However, the era of the first
generation of computers is considered from 1951 to 1959, characterized by
the computers using vacuum tubes. The IBM 701 was the first commercialized
computer. The second generation of computers was triggered by the invention
of transistors in 1959. Computers with transistors were compact, more reliable, and cheaper than the vacuum tube computers. Integrated circuits, the
most important invention in the history of computers, created the era of the
third generation of computers in 1964. In 1971 more circuits were able to be
confined in a unit of space, called very large scale of integrated circuits, VLSI,
and the age of the fourth generation of computers was declared. However, the
commonness of information technologies is due to the announcement of IBM
personal computers, PCs, in 1981. From then on, the use of computers grew
and the requests for data storage prevailed.
Corresponding to the movement of computer technology, the trend of information system can also be divided into five generations [19,28] (see also Fig. 2).
In the 1950s, the information system was responsible for simple transaction
1 Information 1 1
Data
1 System
| 1 Processing
Computer
Generation |
IGL
1 Management
1 Reporting
[ 2GL
3GL
Decision
Support
1 4GL
T
1
End-User
Support
Global Internetworking
PC
m m m m m m
FIGURE 2
TIMON C. DU
can be categorized into class constraints and operational constraints. The class
constraints include ISA class constraints, disjoint class constraints, property
induction constraints, required property constraints, single-valued property
constraints, and unique property constraints. In the list, both the key constraints and referential integrity are the kernel components for the relational
database. The key constraints assure that all instances of a table are distinct
and the referential integrity constraints maintain consistency among instances
of two tables. For example, if a student in a student class intends to take a
course, we must assure that this course is actually offered in a course class. A
DBMS should maintain the data consistency when data insertion, deletion, and
modification are requested.
To successfully design a relational database, a designer could start from
two approaches: relational normalization or semantic data modeling. The relational normalization process allocates facts based on the dependencies among
attributes. In a well-designed database, two kinds of dependencies exist among
attributes: functional dependency and multivalued dependency. The attribute A
is functionally dependent on attribute B if the value of attribute A is dependent
on the value of attribute B. Otherwise, the attribute A is multivalued dependent
on attribute B.
Both the ER model and the extended-ER (EER) model are semantic data
modeling tools. This approach is called the synthesis process. The semantic
data modeling technique starts from small tables. In the graphical representation, the designer identifies the primary key and foreign key of each table,
and uses integrity constraints, e.g., entity integrity constraints and referential
integrity constraints, to maintain the consistency and relationships. The entity
integrity constraints require that no null is allowed in the primary key, while the
referential integrity constraints state the relationships in tables and tuples. The
relationship between tables is represented by the relationship entity and cardinality ratio. This ratio is a constraint that specifies the number of participating
entities. It is recommended that the designer first uses the relational normalization process to generate the elementary tables and then revises them using
a semantic data model. At the beginning, the designer may be confronted with
a large source table that includes many entities with several kinds of attributes
that relate to different candidate keys. Structured guidelines were developed to
decompose (normalize) a source table into smaller tables so that update anomalies are avoided. Loomis [25] listed the following five steps of normalization
(each step results in a level of normal form):
1. A table can only have one single value in the same attribute and the
same tuple.
2. Every nonkey attribute has to be fully functionally dependent on the
key attribute or attributes.
3. A table cannot have any nonkey attributes that are functionally
dependent on other nonkey attributes.
4. A table cannot have more than one multivalued dependency.
5. A table cannot be decomposed into smaller tables then rejoined
without losing facts and meaning.
I0
TIMON C. DU
James
(a)
Franklin
John
Joyce
Jennifer
\ Tom
(b)
Mary
Alicia
James
Franklin
John
Joyce
Jennifer
Tom
Mary
Alicia
F I G U R E 3 Backward and forward inference mechanism search paths [17]. (a) The search path of
backward chaining inference and (b) the search path of forward chaining inference.
I2
TIMON C. DU
I 3
related to object-oriented databases, which include (1) class methods and attributes, (2) multiple inheritance, (3) exceptional instances, (4) class extent,
(5) versions, and (6) object identifier. The data technology topics are (1) composite objects, (2) association, (3) persistence, (4) deletion of instance, (5) integrity
constraints, and (6) object migration. Many of these issues are also relevant
to other database management architectures, such as relational, deductive,
and active databases. Application environment topics discussed include (1) application program interface, (2) transactions, (3) object sharing/concurrency,
(4) backup, recovery, and logging, (5) adhoc queries, and (6) constraints and
triggers.
I. The Object-Oriented Technologies
Object-oriented programming is an accepted approach for large scale system development. The important concepts include class, object, encapsulation,
inheritance, and polymorphism. An object can be any kind of conceptual entity.
Object-oriented technologies build on the concept of classes.
A class represents a group of objects with conceptual similarities. For building a solid class structure, a class maintains private memory areas (called attributes)^ and the memory is accessed only through some designated channels
(the channel is called method). This mechanism is called encapsulation. The
outsider uses messages to communicate through methods to access the private
memory. A class can inherit properties from other classes (called class inheritance)^ thereby forming a class hierarchical structure. A subclass can extend the
information from a superclass (called class exception). By means of the class
inheritance, unnecessary specifications and storages can be avoided. However,
a class can only inherit properties from its immediate superclasses. Fortunately,
more than one superclass is allowed (called multiple inheritance).
The same name can be assigned in the class hierarchy. If the contents of the
class are different, the specification in the subclass can override the one in the
superclass (called overloading or overriding). If two attributes with the same
names but different contents appear in the superclasses in multiple inheritance,
the name conflict can be resolved by assigning a higher priority to the default
superclass or the first superclass.
In object-oriented technologies, both classes and their instances are
treated as objects. Although the object-oriented technology appears promising, Banerjee et al, [5] asserts that lack of consensus on the object-oriented
model is the most important problem to be resolved in object-oriented technologies, a fact born out by the different interpretations appearing in different
object-oriented prototypes.
In designing an object-oriented database, there are several important issues
that should be addressed:
1. Class methods and attributes, A class consists of methods and attributes.
The method is the mechanism to manipulate stored data in an object. It includes
two parts, signature and body. The signature is encapsulated in a class and
describes the name of the method, the name of entities to be manipulated,
and the return entities. The body consists of a sequence of instructions for
implementing the method.
I4
TIMON C. DU
In an object-oriented design, the concept of encapsulation forbids the attributes in a class to be accessed from the outside without using the assigned
methods. However, the database management system does allow values and
definitions of attributes to be read and written in order to increase the implementation efficiency. A trade-off between these two conflicting goals must be
made in the design phase.
Several recommendations have been proposed for resolving this conflict:
(1) Provide a "system-defined" method for reading and writing the
attributes.
(2) Allow users to decide which attributes and methods can be
modified from outside.
(3) Allow all attributes and methods to be accessed from the outside,
but use the authorization mechanisms to limit the access level.
2. Multiple inheritance. The inheritance property supports object reusability. This is done by inheriting attributes, methods, and messages from superclasses. Furthermore, a subclass can have its own attributes, methods, and
messages. The multiple inheritance focuses on whether a subclass can inherit
properties from more than one superclass. If the system allows multiple inheritance, several conflicts may happen:
(1) The subclass has the same names for attributes and methods within
superclasses that are within different domains. In this case, the
solution is either to warn the user or simply not to inherit the
functions. The overloading property takes advantage of this
conflict by assigning the same function names to different domains.
(2) More than two superclasses have the same function name but with
different domains. In this case, the system needs to maintain rules
for resolving the conflict.
(3) An inheritance structure might have cyclic inheritance
relationships. Therefore, the system needs to detect the cycle
automatically. If the cycle exists when the new inheritance is
assigned, the inheritance assignment should be forbidden.
3. Exceptional instances. Objects are generated through instantiation from
classes and prototype objects. When instantiation from classes occurs, the class
is used as a template to generate objects with the same structure and behaviors. When instantiation from other prototype objects occurs, new objects are
generated from the old objects through modification of their existing attributes
and methods. Modifying the attributes and methods of an object during the
generation process is called exceptional instances. The features of late-binding
and overloading are accomplished through exceptional instantiation.
4. Class extent. The object-oriented systems interpret classes and types
in several different ways. Generally speaking, the purpose for differentiating
types and classes is to avoid assigning object identifiers (OIDs) to some kinds
of objects, i.e., types. A type represents a collection of the same characteristics
(integers, characters, etc.) of a set of objects, while a class is its implementation.
From this point of view, a class encapsulates attributes and methods, and its
instances are objects while the instances of types are values.
5. Versions. Multiple-version is a necessary property in some application
areas, such as the design version in CAD, but not all prototypes allow more
I 5
than one version to exist. When muhiple-versioned objects are allowed, systems
must resolve referential integrity problems.
6. Object identifier. The OID is used to identify objects. There should not
be any two objects with the same OID. A system assigns OIDs according to
how the system manages objects and secondary storage. Two approaches are
used in assigning OIDs:
(1) The system assigns the logical OIDs (surrogate OIDs) and uses a
table to map the OIDs to the physical locations, or
(2) The system assigns persistent OIDs; the OIDs will point directly to
their physical locations.
2. Database Technologies
A database management system is a tool for organizing real world data into
a miniworld database. To create a good database management system, several
important issues need to be considered:
1. Composite objects. A composite object in an object-oriented database
is equivalent to an aggregation relationship in a relational database. Several
component objects can be grouped into a logical entity, a composite object. As
discussed by Bertino and Martino [8], locking, authorization, and physical clustering are achieved efficiently through this process. The relationship between
a composite object and its component objects is described by the composite reference. The component objects are called dependent objects. The structure of dependent objects is used to maintain the integrity constraints. Thus,
the existence of a dependent object depends on the existence of its composite
object.
Composite references can be implemented in two ways: shared and exclusive. A shared composite reference means that a component object can belong
to more than one composite object, while an exclusive composite reference
can only belong to one. The composite reference can also be distinguished as
dependent and independent. This aspect is used to analyze the existence between the composite object and the component object. It is called a dependent
composite reference if the component objects cannot exist when the corresponding composite object is deleted. However, in the independent composite reference, the component objects can exist no matter whether the composite object
exists.
2. Associations, An association is a link, i.e., a relationship, between related
objects. For example, a manager is associated with employees in his research
department through the project relationship. Therefore, the project number
can be an attribute of manager entities that link with employee entities. The
number of entities involved in an association is indicated by a degree value.
The minimum and maximum number of associations in which an entity can
participate is defined in the cardiality ratio.
The function of association is not explicitly implemented by most objectoriented prototypes. One approach is to use a separate attribute to indicate each
participating entity. For example, if an order entity is associated with three other
entities, customer, supplier, and product, then three attributes, say to-customer,
to-supplier, and tojproduct, can be added to indicate the associations.
I6
TIMON C. DU
Furthermore, for more efficient reverse traversal, adding reverse reference attributes is necessary. The reverse reference attribute is an additional attribute
in the associated entity and points back to the original entity. The consistency
of such an attribute can be controlled by users or the system.
3. Persistence. The persistent design issue deals with whether the objects
should permanently and automatically exist in the database. There are three
kinds of approaches to deal with the existence of instances:
(1) Persistence is automatically implied when an instance is created.
(2) All the instances created during the execution of a program will be
deleted at the end of execution. If the user wants to retain the
instances in the database, he or she needs to insert the instance into
the collection of objects.
(3) Classes are categorized into temporary objects and permanent
objects. The instances created through the permanent classes are
permanent instances, while those created through temporary
classes are temporary objects.
4. Deletion of instance. The issue of deletion of an instance is related to
persistence. There are two approaches to deleting instances:
(1) Provide a system-defined deletion function. When an instance is
deleted, the reference integrity constraint is violated. The system
can totally ignore the violation or maintain a reference count. The
reference count records the number of references connected to
other objects. An object can be deleted when the count is zero.
(2) An object will be removed automatically when no references are
associated with it.
5. Integrity constraints. Integrity constraints are used to maintain the consistency and correctness of data by domain and referential integrity constraints.
The domain constraints specify the legalization and range of data, while the referential integrity constraints control the existence of referenced entities.
Integrity constraints can be implemented as static or dynamic constraints.
Whether the state values of objects are legal is controlled by the static constraints, and the state transactions of objects are controlled by the dynamic
constraints. Integrity constraints in an object-oriented database prototype is
defined imperatively in methods. This approach is different than a relational
database. However, the declarative approach in the relational database seems
to be superior to the object-oriented database approach.
6. Object migration. The migration of instances through classes is an important function in object-oriented databases. For example, when a student goes
from undergraduate to graduate school, the student instance should be able to
migrate from an undergraduate class to a graduate class without deleting the
instance from one class and recreating it in another.
3. Application Environments
To successfully implement an object-oriented database on different platforms, consideration needs to be given to several issues:
1. Transactions. If a prototype is implemented in a client/server architecture, the maintenance of data transaction becomes important. Using a network
I 7
to transfer large data files, e.g., bipmap files, can result in a long transaction period, which can create unexpected problems such as system or network crashes
during the transaction.
There are two kinds of transactions: short and long. The short transaction
is designed for an atomic transaction, one that can either commit or rollback
completely. The long transaction has similar behavior but is used for checkingout objects from a group database to a personal database with a persistent write
or read lock on source database to decrease the network traffic. Therefore, a
long transaction can continue for an extended period.
2. Object sharing/concurrency. Concurrency control is a critical issue in the
multiuser environment. Some prototypes choose an optimistic approach, which
means that no lock is requested when updating the data. However, whether the
new data can overwrite the old data depends on successful commitment. The
other, pessimistic approach requires all transactions to grant locks before doing
any processing. This approach is more conservative but reliable.
The locking strategy is dependent upon the transaction type. As has been
discussed before, there are two kinds of transactions. The short transaction has
four kinds of lock: write, read, updating, and null. The write lock needs to be
granted before updating the objects. If an object is locked by the write lock, it
cannot be read or written by other transactions. The read lock is useful when
the user only wants to read the data of objects. For preventing the data to be
changed at the moment of reading, the write lock and updating lock are not
allowed if an object has a read lock. The updating lock stands between the
write lock and the read lock and is useful when updating is expected after the
reading. Therefore, any other write and updating locks should be rejected when
an object has an updating lock. The long transaction is designed for checking
out objects from the group database to the personal database. Therefore, the
updating lock is not appropriate in the long transaction. The write lock in the
long transaction is used to lock the object in the source database to prevent other
transactions from checking out the object. Since the data will be changed only
when an object is checked back into the group database, the read lock allows
the object to be checked out by other transactions. The null lock for both short
and long transactions is only a snapshot of the object, not a real lock. When the
null lock is used for checking out an object in the long transaction, the object is
actually being copied from the group transaction to the short transaction with
an assigned new OID. If this object is checked-in back to the source database,
it will be treated as a new object.
3. Backup, recovery, and logging. These are important issues when transferring data in a network. Some prototypes provide transaction logs, but some
do not.
Logging can be used either for backing up values or for recovering from
a crash. For example, logging provides a place for temporarily storing values
when the savepoint is chosen in a short transaction. Logging is even important
for long transactions in the following situations:
(1) To maintain the status (e.g., locks, creation/deletion, or updating)
of source copies and duplicated copies when objects are
checked-out or checked-in between a personal database and a
group database.
I8
TIMON C. DU
I 9
advisee(y,x).
In some research, the terminology of constraints, triggers, and rules is confusing. A distinction between constraints and triggers was presented by Gehani
and Jagadish [20]:
1. Constraints are used to ensure the consistency of the states. Triggers do
not concern the consistency of states. They are simply triggered
whenever conditions are true.
20
TIMON C. DU
2 I
[start-event^
end-event].
Therefore, the triggered action can be executed between the start time and end
time. This design alleviates system loading and increases parallelism. Moreover,
a structure for storing the information about triggered actions until they are
executed is provided. The events can come from database operations, temporal
events, or external notifications. The database operations include all operations
on the database such as data definition, data manipulation, and transaction
control. The temporal events are based on the system clock and can be absolute,
relative, or periodic. Any sources other than internal hardware or software
events can be considered as external notification.
22
TIMON C. DU
23
One way is to distinguish between identification and equality. That is, two
objects are said to be identical if all the values and OID are the same. If the values
are the same but with different OIDs, the two objects are said to be equal. As
discussed in the deductive database section, a major advantage of a deductive
database is the recursive predicate. If the equality concept can be accepted (not
necessarily identical), then the previous problems would no longer be an issue.
Actually, the recursive function can help an object-oriented database generate
structured complex objects.
A query in both deductive and relational models uses declaration with high
level abstraction. The optimization process and semantics in both models are
clear and lead to relatively simple strategies for producing optimal results as
the relational algebra does. The object-oriented model, however, does not use
declaration. Manipulation and query of data are defined in the methods of a
class. Even though the query in the object-oriented database can be written in
SQL mode, it is still necessary to specify which method is going to be used. Furthermore, complex objects in an object-oriented database include structured
and unstructured complex objects. A structured complex object is defined by
specifying the data types, and the type of inheritance operates as usual. An
unstructured complex object is for binary large objects (e.g., a bitmap file for
graphical representation). However, since the data and its implementation software are stored in another object, an unstructured complex object cannot be
queried in a declarative way. Thus, Ullman's second assertion is valid.
2. Declarative programming. Associated with the deductive database structure is the declarative programming approach, in which data are grouped by
properties. Controversially, the object-oriented database architecture uses imperative programming, and data are grouped by properties. Controversially,
the object-oriented database architecture uses imperative programming, and
data are grouped by objects. Kifer and co-workers [10] presented a criteria
for classifying future database prototypes. In this classification, the prototypes
Pascal-R and GLUE represent imperative and relational approaches. The deductive object-oriented database architecture, such as F-logic, O-logic, and C-logic,
stands between object-oriented and deductive database architectures. It utilizes
declarative programming, grouping data by objects.
Based on Kifer's point of view, "pure object-oriented relational languages
are not flexible enough; some knowledge is naturally imperative whereas some is
naturally declarative." Therefore, even though some problems exist, the DOOD
will become important if those problems can be solved, because DOOD can
combine the advantages of a deductive databases structure, such as explicit
specification, high-level abstraction, extendibility, modifiability, and easy comprehension, with the merits of an object-oriented structure, such as the richness of object structure and the potential of integration. However, Bancilhon
in [10] insisted that it is too early to combine object-oriented and deductive
database concepts since there are still some difficulties in implementing objectoriented concepts into an object-oriented database itself. He proclaimed that the
object-oriented database and deductive database systems need to be developed
separately rather than abruptly combining them. Moreover, in the long term,
research efforts should focus on a deductive database approach, since it transfers more work to machines. Ullman agrees with this view point. Furthermore,
24
TIMON C. DU
25
26
TIMON C. DU
database and dynamic behavior. However, when the knowledge is too complex for analysis by conventional quantitative techniques or when the available
data sources are quahtative; inexact, or uncertain, the fuzzy logic controller can
provide better performance than conventional approaches [22]. This is done by
replacing fuzzy if-then rules with triggers or constraints and is especially useful
in dynamic knowledge representation, i.e., operational constraints and triggers. Unfortunately, there is no systematic approach for finding membership
functions for triggers and constraints. Fortunately, a neural network utilizes
a simple computing algorithm for optimization with adapting, learning, and
fault-tolerance properties. By combining fuzzy logic controllers and neural networks, the triggers and constraints are able to process many kinds of data for
an active object-oriented database.
27
28
TIMON C. DU
29
C. Data Mining
Since more and more data are stored in the database, e.g., data generated
from science research, business management, engineering appUcation, and other
sources, finding meaning from data becomes difficult. Therefore, assistance to
help find meaning from data is expected urgently. Data mining technology can
discover information from data, and has drawn more attention recently. Data
mining, also called knowledge discovery, knowledge extraction, data archaeology, data dredging, or data analysis in a database, is a terminology of nontrivial
extraction of implicit, previously unknown, and potentially useful information
from data in a database. The discovered knowledge can be applied to areas such
as information management, query processing, decision making, and process
control.
The data mining techniques can be classified based on [13]:
1. What kinds of database to work on. Normally, the relational data model
is the focal point. Not surprisingly, data mining technologies on other database
models, such as object-oriented, distributed, and deductive databases, require
further exploration and should have astonishing results.
2. What kind of knowledge to be mined. There are several kinds of data
mining problems: classification, association, and sequence. The classification
problems group the data into clusters while association problems discover the
relationships between data. The sequence problems focus on the appeared sequence of data [3]. For solving these problems, there are several well-known
data mining techniques, such as association rules, characteristic rules, classification rules, discriminate rules, clustering analysis, evolution, deviation analysis, sequences search, and mining path traversal [6]. Other than that, machine
learning approaches are also being adopted, such as neural network, genetic
algorithm, and simulated annealing [7]. The following introduce several techniques:
(a) Association rules. An association rule discovers the important associations among items such as that the presence of some items in a transaction will imply the presence of other items in the same transaction. Several
famous algorithms of the association rule approach are Apriori [2], DHP (Dynamic Hash Pruning) [6], AIS (Agrawal, Imielinski, and Swami) [2], Parallel
Algorithms [3], DMA (Distributed Mining of Association Rules) [14], SETM
(Set-Oriented Mining) [21], and PARTITION [34].
(b) Characteristic rules. Characteristic rules are also called data generalization. It is a process of abstracting a large set of relevant data in a database
from a low concept level to relatively high ones since summarizing a large set of
data and presenting it at a high concept level is often desirable. The examples
of characteristic rules are data cube approach (OLAP) and attribute-oriented
induction approach. OLAP will be discussed later, and the attribute-oriented
induction takes a data mining query expressed in an SQL-like data mining query
language and collects the set of relevant data in a database.
(c) Classification rules, A classification rules technique is a kind of supervised training that finds common properties among a set of objects in a
database and classifies them into different classes according to a classification
model. Since it is a supervised training algorithm, training sets of data are
30
TIMONCDU
needed in advanced. First, analyze the training data using tools like statistics,
machine learning, neural networks, or expert systems to develop an accurate
description. The classification variables can be categorical, ranking, interval, or
true measure variables. Categorical variables tell to which of several unordered
categories a thing belongs. Ranking variables put things in order, but don't tell
how much larger one thing is than another. Interval variables measure the distance between two observations, and true variables, e.g., age, weight, length,
volume, measure from a meaningful zero point.
(d) Clustering analysis. Unlike classification rules, the clustering analysis technique is an unsupervised training. It helps to construct meaningful
partitioning by decomposing a large scale system into smaller components to
simplify design and implementation. Clusters are identified according to some
distance measurement in a large data set. Normally, the process takes a longer
time than the classification rules.
(e) Pattern-based similarity search. The pattern-based similarity searches for similar patterns in a database. Different similarity measures such as
Euclidean distance, distance between two vectors in the same dimension, and
the correlation, linear correlation between data, are normally used.
(f) Mining path traversal pattern. The mining path traversal pattern
captures user access patterns and path. The information can be used to analyze
the user's behavior, such as the tendency of depth-first search, breadth-first
search, top-down approach, and bottom-up approach.
3. What kind of techniques to be utilized. Data mining can be driven based
on an autonomous knowledge miner, data-driven miner, query-driven miner,
or interactive data miner. Different driven forces trigger different mining techniques, and the database is also used in different ways. The techniques can
also be identified according to its underlying data mining approach, such as
generalization-based mining, pattern-based mining, statistics- or mathematical
theories-based mining, and integrated approaches.
D. User Interfaces
The user interfaces of a database system is also another topic that can be improved. A database user needs (1) customized languages for easily fitting their
problem domain and (2) alternative paradigms for accessing the database. In
the future, data can be presented in different formats such as three-dimensional,
visualization, animation, and virtual reality. Therefore, a new concept of the
user interface is expected. A natural language interface is one of the choices. The
natural language interface demands multilingual support to interact between
people and computers, and would allow complex systems to be accessible to
everyone. People would be able to retrieve data and information using a natural
language without the need of learning complicated query languages. The new
technology would be more flexible and intelligent than is possible with current
computer technology. The applications can be divided into two major classes
[4]:
1. Text-base applications. The text-based interface uses documented expression to find appropriate articles on certain topics from a database of texts.
3 I
extract information from messages on certain topics, translate papers from one
language to another, summarize texts for certain purpose, etc.
2. Dialogue-based applications. The dialogue-based application would resolve the problems of communication between human and machine firstly. The
system is not just process speech recognition. Rather, it provides smooth-flowing
dialogue; therefore, the system needs to participate actively in order to maintain
a natural dialogue. A simple system can use a question-and-answer system to
query a database. A complete system should be able to automate operations,
such as customer service over the telephone, language control of a machine,
and general cooperative problem-solving systems.
A. Data Warehousing
The business automation of diverse computer systems and the increasing of
service quality for a flexible marketplace have requested large data amounts.
On the other hand, the computer and network technology have improved to
support requests on the large data amounts and versatile data types. Also the
user-friendly interface eases the difficultly of using computers and therefore
increases the degree of user dependence on computers. The data warehousing
technology is the process of bringing together disparate data from throughout
an organization for decision-support purposes [7]. The large amount of data
comes from several sources such as (a) automated data input/output device, i.e.,
magnetic-ink character recognition, scanner, optical mark recognition, optical
character recognition, bar code, and voice input [11], and (b) data interchange,
i.e. electrical bank, point-of-sale, POS, electronic data interchange, EDI, ATM
machine, adjustable rate mortgages, just-in-time inventory, credit cards, and
overnight deliveries.
The data warehouse stores only data rather than information. Therefore,
for getting the right information at the right time, the data warehousing technology is normally implemented with data analysis techniques and data mining
tools. In order to implement data, the different levels of abstraction show that
data exists on several interdependent levels. For example, (1) the operational
data of whom, what, where, and when, (2) the summarized data of whom, what,
where, and when, (3) the database schema of the data, tables, field, indexes and
types, (4) the logical model and mappings to the physical layout and sources,
(5) the business rules of what's been learned from the data, and (6) the relationship between data and how they are applied can help to support different levels
of needs [7]. When applying analysis and mining tools, the actionable patterns
in data is expected. That is, the consistent data are required. In this case, a
database management system is the heart of the data warehousing. However,
32
TIMON C. DU
(a)
Workstation
(b)
Workstation
F I G U R E 4 The data warehouse can be located in different types of architecture such as client/server
networks, 3-tier networks, and distributed networks, (a) Departmental data warehouse, (b) interdepartmental data warehouse, (c) middleware approach, and (d) multitiered approach.
33
(b) Service-specific middleware. The middleware accomplishes a particular client/server type of service. Examples for database-specific middleware are ODBC, ID API, DRDA, EDA/SQL, SAG/CLI, and Oracle Glue; for
OLTP-specific middleware are Tuzedo's ATMI, Encina's Transactional RFC,
and X/Open's TxRPC and XATMI; for groupware-specific middleware are
MAPI, VIM, VIC, and Lotus Notes calls; for object-specific middleware are
OMG's ORB and Object Services and ODMG-93; and for system managementspecific middleware are SNMP, CMIP, and ORBs [30].
There are severaldifferent types of client/server architectures:
(a) File servers. The server is only responsible for managing data as files.
The files are shared across a network.
(b) Database server. The server uses its won processing power to find
the requested data instead of singularly passing all the records to a client and
letting it find its own data as was the case of the file server.
(c) Transaction servers. The client invokes remote procedures who reside
on the server with an SQL database engine to accomplish jobs.
(c)
Workstation
(d)
Central
Data Warehouse
Source System
FIGURE 4 (continued)
34
TIMONC. DU
35
Project
Location
Bellaire
5
5
Sugerland
Houston
Project
Employee
Employee
John
Salary
30000
Franklin
40000
25000
Alicia
Department
Name
Research
5
4
Administration
Headquarters
Project = 5
Employee = Franklin
Department = 4
Salary = 40000
F I G U R E 5 The cube used for OLAP is divided into subcubes. Each subcube can contain the calculations, counts, and aggregations of all records landed on i t
C. Decision-Support System
A decision support system (DSS) improves the performance of users through
implementing information technology to increase knowledge level. The system
is comprised of three components: database management software (DBMS),
36
TIMONC.DU
model base management software (MBMS), and dialogue generation and management software (DGMS) [35]. The MBMS integrates data access and decision
models to provide (1) creation generation functions, (2) maintenance update
functions, and (3) manipulation use functions. The decision model is embedded
in the MBMS of an information system, and uses the database as the integration
and communication mechanism between strategic models, tactical models, and
operational models. The output of MBMS is generated toward decision makers
or intellectual processes. The model should have the functions of:
(a) the ability to create new models quickly and easily,
(b) the ability to catalog and maintain a wide range of models, supporting
all levels of management,
(c) the ability to interrelate these models with appropriate linkage
through the database,
(d) the ability to access and integrate model "building blocks", and
(e) the ability to manage the model base with management functions
analogous to database management.
The DGMS has three parts: (1) active language, (2) display or presentation
language, and (3) knowledge base (what the user must know). The DGMS has
the knowledge base to interact with the active language to instruct what the
user can do and with the presentation language to show what the user can see.
In this situation, a good man/machine interface for interactive applications is
required. Therefore, the interface should have:
(a)
(b)
(c)
(d)
the
the
the
the
ability
ability
ability
ability
to
to
to
to
The DBMS can be any data model as previous discussed. The distributed
database model can generate distributed decision support systems, and an active system can increase the alive operations. Moreover, the deductive database
can produce an intelligent decision support environment.
D. Engineering and Production Applications
Engineering and production applications are a structured approach to integrate
production functions. This approach can result in significant savings in both
time and cost. The application topics include financial functions, order processing and accounting functions, capacity requirements planning, inventory
management, manufacturing performance planning, production data management, production monitoring and control, purchasing, sale forecasting, order
processing, product planning, production control, inventory control, and material resource planning (MRP). An engineering application is a tedious and
complicated procedure in which designers need to manage a large amount of
data into applications, and the data needs to be maintained for specific applications. The engineering design has the characteristics of (1) long duration,
(2) complex design object, (3) hierarchical object architecture, (4) multiple versions, and (5) cooperation. A database management system has been adopted
3 7
38
TIMONCDU
enterprise-resource planning (ERP). In order to meet such a diversified environment, the distributed and heterogeneous environment and active database
should be adopted.
Ylll. SUMMARY
In this chapter, we presented the techniques and applications of emerging
database system architecture. The history of computers, information systems,
and database systems are covered. The relational data model, deductive data
model, object-oriented data model, distributed data model, active database
model, deductive and object-oriented database (DOOD), and the joining of
active databases and object-oriented databases are discussed. It is worth noting that different data models can satisfy different needs, and future system
requirements are diversified. In the future, database performance can be increased because of the improvement of system engineering, artificial intelligence, data mining, and user interface. Many techniques will also enhance the
application of database systems, such as data warehousing, on-line analytic
processing (OLAP), decision-support systems, and engineering and production
applications.
REFERENCES
1. Abiteboul, S. Towards a deductive object-oriented database language. Deductive and ObjectOriented Database, 453-472,1990.
2. Agrawal, R., Imielinshi, T., and Swami, A. Mining association rules between sets of items
in large databases. In ?roc. of the 1993 International Conference on Management of Data
(SIGMOD-93), pp. 207-216, 1993.
3. Agrawl, R., and Shafer, J. Parallel mining of association rules. IEEE Trans. Knowledge Data
Engrg. 8(6), 962-969, 1996.
4. Allen, J. Natural Language Understanding. Benjamin/Cummings, Redwood City, CA, 1995.
5. Banerjee, J. et al. Data model issues for object-oriented applications. In Readings in Database
Systems, (M. Stonebreaker, Ed.), 2nd ed., pp. 802-813. Morgan Kaufmann, San Mateo, CA,
1994.
6. Beeri, C , and Milo, T. A model for active object oriented database. In Proceedings of the 17th
International Conference on Very Large Data Bases. (Sep.), pp. 337-349, 1991.
7. Berry, M., and Linoff, G. Data Mining Techniques: For Marketing. Sales, and Customer Support. Wiley, New York, 1997.
8. Bertino, E., and Martino, L. Object-Oriented Database Systems: Concepts and Architecture.
Addison-Wesley, Reading, MA, 1993.
9. Bochmann, G. Concepts for Distributed Systems Design Springer-Verlag, New York, 1983.
10. Brodie, M. L., Bancilhon, E, Harris, E, Kifer, M., Masunada, Y, Sacerdoti, E., and Tanska,
K. Next generation database management systems technology. Deductive and Object-Oriented
Database 335-346,1990.
11. Capron, H. L. Computers: Tools for an Information Age, 5th ed. Addison-Wesley, Reading,
MA, 1998.
12. Caseau, Y. Constraints in an object-oriented deductive database. Deductive and ObjectOriented Database 292-310,1991.
13. Chen, M., Han, J., and Yu, R, Data mining: An overview from a database perspective. IEEE
Trans. Knowledge Data Engrg. 8(6): 866-883, 1996.
39
14. Cheung, D. W., Vincent, X, Fu, W., and Fu, Y. Efficient mining of association rules in distributed
databases. IEEE Trans. Knowledge Data Engrg. 8(6): 911-922,1996.
15. Dayal, U. Active database management systems. In Proceedings of the Third International
Conference on Data and Knowledge Base, Jerusalem, pp. 150-170,1988.
16. Dayton, U., and Hsu, M. Organizing long-running activities v^^ith triggers and transactions.
In Readings in Database Systems, (M. Stonebreaker, Ed.), 2nd. ed., pp. 324-334. Morgan
Kaufmann, San Mateo, CA, 1994.
17. Du, T., and Wolfe, P. Overview^ of emerging database architectures. Comput. Indust. Engrg,
32(4): 811-821, 1997.
18. Du, T., and Wolfe, P. An implementation perspective of applying object-oriented database
technologies. HE Trans. 29: 733-742, 1997.
19. Elmasri, R., and Navathe, S. B. Fundamentals of Database Systems, 3rd ed. Addison-Wesley,
Reading, MA, 2000.
20. Gehani, N., and Jagadish, H. V. Ode as an active database: Constraints and triggers. In Proceedings of the 17th International Conference on Very Large Data Bases, pp. 327-336,1991.
21. Houtsma, M., and Swami, A. Set-oriented data mining in relational databases. Data Knowledge
Engrg. 17: 245-262, 1995.
22. Lee, C. C. Fuzzy logic in control systems: Fuzzy logic controllerParts I and II. IEEE Trans.
Systems Man Cybernet. 20 (2): 404-418, 419-435, 1990.
23. Little, D., and Misra, S. Auditing for database integrity. / . Systems Manage. 6-10, 1994.
24. Long, L., and Long, N. Brief Edition of Computers, 5th ed. Prentice-Hall, Upper Saddle River,
NJ, 1998.
25. Loomis, M. The Database Book. Macmillan, New York, 1987.
26. McCarthy, D., and Dayal, U. The architecture of an active data base management system.
In Reading in Database Systems (M. Stonebreaker, Ed.), 2nd ed., pp. 373-382. Morgan
Kaufmann, San Mateo, CA, 1994.
27. Minker, J. Fundations of Deductive Databases and Logic Programming. Morgan Kaufmann,
Los Altos, CA, 1988.
28. O'Brien, J. A. Introduction to Information Systems, 8th ed. Irwin, Chicago, 1997.
29. Orfali, R., and Harkey, D. Client/Server Programming with ]AVA and CORBA. Wiley,
New York, 1997.
30. Orfali, R., Harkey, D., and Edwards, J. Essential Client/Server Survival Guide. Van Nostrand
Reinhold, New York, 1994.
31. Ozsu, T. M. Principles of Distributed Database Systems. Prentice-Hall, Englewood Cliffs, NJ,
1991.
32. Pressman, R. Software Engineering: A Practitioner's Approach, 3rd ed. McGraw-Hill,
New York, 1992.
33. Ramakrishnan, R., and UUman, D. A survey of deductive database systems. / . Logic Programming 125-148, 1995.
34. Savaere, A., Omiecinski, E., and Navathe, S. An efficient algorithm for association rules in large
databases. In Proc. Int'l Conf. Very Large Data Bases, Zurich, pp. 432-444, 1995.
35. Sprague, R., and Watson, H. Decision Support Systems: Putting Theory into Practice, 3rd ed.
Prentice-Hall, Englewood Cliffs, NJ, 1993.
36. Stonebreaker and the Committee for Advances DBMS Function. Third-generation database system manifesto. In Reading in Database System, pp. 932-945. Morgan Kaufmann, San Mateo,
CA, 1994.
37. Ullman, J. A comparison between deductive and object-oriented database systems. In Proceedings Deductive and Object-Oriented Database Conference, pp. 7-24, 1990.
38. Urban, S., and Lim, B. An intelligent framework for active support of database semantics. Int.
J. Expert Systems. 6(1): 1-37, 1993.
39. Wong, L. Inference rules in object oriented programming systems. Deductive and ObjectOriented Database 493-509,1990.
40. Zaniolo, C. Object identity and inheritance in deductive databasesAn evolutionary approach.
Deductive and Object-Oriented Database 7-24,1990.
DATA MINING
DOHEON LEE
Department of BioSystems, Korea Advanced Institute of Science and Technology, Daejon,
Republic of Korea
MYOUNG HO KIM
Department of Computer Science, Korea Advanced Institute of Science and Technology,
Taejon 305-701, Republic of Korea
I. INTRODUCTION 41
A. Definition of Data Mining 41
B. Motivation of Data Mining 42
C. Comparison with Machine Learning 43
D. Steps of the Data Mining Process 45
II. OVERVIEW OF DATA MINING TECHNIQUES 46
A. Classification of Data Mining Techniques 46
B. Requirements for Effective Data Mining 46
III. DATA CHARACTERIZATION 47
A. Top-Down Data Characterization 48
B. Bottom-Up Data Characterization 64
C. Comparison of Top-Down and Bottom-Up Approaches
IV CLASSIFICATION TECHNIQUES 67
A. Decision Tree Induction 68
B. Artificial Neural Network-Based Classification 70
V ASSOCIATION RULE DISCOVERY 72
A. Definition of the Association Rule 72
B. Association Rule Discovery Algorithms 72
VI. CONCLUDING REMARKS 74
REFERENCES 75
67
This chapter introduces the concept and various techniques of data mining. Data mining
is defined as the nontrivial extraction of impUcit, previously unknown, and potentially
useful knowledge from a large volume of actual data. Various techniques including data
characterization, classification, and association rule discovery are discussed. Especially,
the treatment of fuzzy information in data characterization is explained in detail. Finally,
the relationship between data mining and data warehousing is briefly addressed.
I. INTRODUCTION
A. Definition of Data Mining
According to the Oxford dictionary, "mining" means the activity of digging for
useful mineral resources such as coal and ores from the ground. In the context
Database and Data Communication Network Systems, Vol. 1
Copyright 2002, Elsevier Science (USA). All rights reserved.
4 I
42
of data mining, "mineral resources" and "the ground" are mapped into "knowledge" and "data," respectively. That is, data mining is the activity of exploring
data for useful knowledge. Although there are several precise definitions of
data mining in the recent literature, we adopt the definition as "the nontrivial
extraction of imphcit, previously unknown, and potentially useful knowledge
from large volume of actual data" since it represents essential attributes of
data mining comprehensively [1]. There are several terms in the literature to
represent the same or similar area with data mining such as knowledge discovery
in databases, data archeology, and data visualization, to name a few. However,
we adopt the term, "data mining," in this chapter since it is the most common
term in recent days.
One typical example of data mining might be as follows. After exploring
sales records in a point-of-sales system for a large retail shop, a marketing
manager may discover a fact such that "customers who purchase baby diapers
are likely to purchase some bottles of beer." This example is famous among
data mining practitioners, since it is an unpredictable knowledge before the
data mining is actually performed. Who can expect the relationship between
diapers and beer? However, once such a pattern is discovered in actual data, the
marketing manager could exploit it for redesigning the self-layout and deciding
a target customer list for advertisement of new products. He/she is very likely
to achieve competitive advantages over others without such knowledge.
Let us examine each phrase of the definition in detail with counterexamples
to get more concrete insight for data mining. Firstly, the phrase "implicit" means
that information stored explicitly in the database or the system catalog is not
the subject. The results for ordinary database queries and the schema information such as column names and data types are examples of explicit knowledge.
Secondly, the phrase "previously unknown" means that we are not looking for
well-known knowledge. Suppose we come to know that "most adult men are
taller than five feet" by exploring the national health information records. It is
rarely worth the effort of exploration since the information is common sense.
Thirdly, "potentially useful" implies that the data mining process is driven by
application requirement. The cost of data mining should be rewarded by the
business benefit. The last phrase, "from a large volume of actual data," distinguishes data mining from experimental machine learning. Since data mining
is performed on a large volume of data, it is hard to adopt such algorithms
whose execution times increase fast as the data size grows. In addition, missing
or corrupted values in actual data require a more sophisticated treatment of
data.
B. Motivation of Data Mining
As a variety of disciplines including business automation, engineering support,
and even scientific experiments have begun to rely on database systems, the
advance of database system technology has become faster, and in turn, the
volume of data stored in databases has increased rapidly in recent days. It has
been estimated that the amount of information in the world doubles every
20 months, and the total number of databases in the world was estimated at
five million in 1989. Earth observation satellites, planned for the 1990s, are
DATA MINING
43
expected to generate one terabyte of data every day and the federally funded
Human Genome project will store thousands of bytes for each of the several
billion genetic bases. The census databases are also typical examples of a large
amount of information [1]. As an example around our daily life, several millions
of sales records in retail stores are produced every day.
What are we supposed to do with this flood of raw data? It is no longer
helpful to make direct exposure of raw records to users. If it is be understood
at all, it will have to be analyzed by computers. Although simple statistical
techniques for data analyses were developed long ago, advanced techniques for
intelligent data analyses are not yet mature. As a result there is a growing gap
between data generation and data understanding [1]. At the same time, there
is a growing realization and expectation that data, intelligently analyzed and
presented, will be a valuable resource to be used for a competitive advantage [1].
The necessity of data mining capability is emphasized by this circumstance.
A National Science Foundation workshop on the future of database research
ranked data mining among the most promising research and engineering topics [2]. Decision support systems (DSS), executive information systems (EIS),
and strategic information systems (SIS) are also typical examples that solicit
knowledge through data mining techniques rather than scatter-brained, raw
data chunks through simple SQL queries.
C. Comparison with Machine Learning
In the context of machine learning, there are four levels of learning situations
as follows [3].
1. Rote learning: The environment provides the learning entity with howto-do information precisely. Conventional procedural programs written in C
language are typical examples. In such programs, the learning entity, i.e., the
processor, is given step-by-step instructions to accomplish the goal. The learning
entity does not have to infer anything. It just follows the instructions faithfully.
2. Learning by being told: The environment provides the learning entity
with general information such as rules. For example, suppose that a rule is
given as "if A is a parent of B, and B is a parent of C, then A is a grandparent
of C." When there are facts such as "Tom is a parent of John" and "John is
a parent of Mary," the learning environment is able to deduce a new fact that
"Tom is a grandparent of Mary." Many expert systems and query processors for
deductive databases have adopted this level of learning. This level of learning
is also called deductive learning.
3. Learning from examples: The environment provides the learning entity
with a set of detail facts. Then the learning entity has to infer general rules
representing common properties in the set of given facts. For example, suppose
that we have the facts such as "a pigeon can fly" and "a swallow can fly."
From these facts, the learning entity can induce a rule that "a bird can fly"
based on the knowledge that "a pigeon and a swallow are kinds of birds."
Once the learning entity obtains the knowledge, it can utilize the knowledge
to deduce new facts such as "a sparrow can fly." This level of learning is also
called inductive learning.
44
Dynamic data
Erroneous data
Uncertain data
Missing data
Coexistence with irrelevant data
Immense size of data
Static data
Error-free data
Exact data
No missing data
Only relevant data
Moderate size of data
DATA MINING
45
collected carefully to contain only data relevant to the learning process, actual
databases also include irrelevant data. Data mining techniques must identify
relevant data before and/or in the middle of the learning process. Since data
mining is performed on the immense size of data, it is unrealistic to directly
apply inductive learning algorithms whose execution times increase fast as the
data size grows.
In consequence, data mining can be regarded as the application of the inductive learning paradigm to an actual environment. Although many conventional inductive learning techniques such as classification and clustering can
be adopted as starting points for data mining techniques, we must modify and
supplement them for actual environment.
46
DATA MINING
47
48
LEE ANDKIM
(a))
(b)
(c)
All programs
All programs
Editor
Emacs
Documentation
vi
Word
Editor
Emacs
Documentation
/
n i
0.1
0.3
Emacs:
120 records
vi:
680 records
Word:
25 records
Word
F I G U R E I (a) and (b) show crisp and fuzzy ISA hierarchies: Dotted lines represent partial ISA relationships as strong as the augmented fraction numbers between two incident concept nodes, while solid lines
denote complete ISA relationships, (c) depicts a situation where database summarization is to be done.
DATA MINING
49
the fuzzy ISA hierarchy, we can conclude that "editor" programs have been
mostly executed, since 120 and 680 records of "emacs" and "vi," respectively,
are known mainly for editing source codes not for writing documents.
I. Representation of the Database Summary
There are many domain concepts having fuzzy boundaries in practice. For
example, it is difficult to draw a crisp boundary of documentation programs
in a set of programs since some programs such as "vi" and "emacs" can be
thought as source code editors for some programmers but as word processors
for some manual writers. It is more natural and useful to express such domain
concepts in terms of fuzzy sets rather than crisp sets [20]. Thus, we use a vector
of fuzzy sets to effectively represent a database summary.
A fuzzy set fona domain D is defined by its membership function /x/^(x),
which assigns a fraction number between 0 and 1 as the membership degree to
each domain value [19]. Since iJif(x) represents the degree to which an element
X belongs to a fuzzy set f, a conventional set is regarded as a special case of a
fuzzy set whose membership degrees are either one or zero. Given two fuzzy
sets, /i and /'2, on a domain D, /i is called a subset of fi and denoted as
/i ^ fi^ iff Vx G D, iJifi(x) < iJif2(x). In this paper, a special fuzzy set /^such
that Vx G D, /x/^(x) = 1, is denoted as co.
Definition 1 (Generalized Record). A generalized record is defined as an
m-ary record ( / i , . . . , ^^) on an attribute scheme ( A i , . . . ,A^), where /"y's are
fuzzy sets and Aj's are attributes. Given two different generalized records gi =
(/1I5 5 fim) and g2 = (/2I5 5 fim)^ on the same attribute scheme, gi is called
a specialization of gi iff Vy, /ly c ^2/.
A generalized record ( / i , . . . , /m> on an attribute scheme ( A i , . . . , Afn) IS
for attributes
interpreted as an assertion that "each record has f\,...,fm
A i , . . . , A ^ , respectively." Note that an ordinary database record is also regarded as a generalized record whose fuzzy sets are singleton sets. A singleton
set is a set having a unique element. An example of a generalized record with
respect to an attribute scheme (PROGRAM, USER) is (editor programmer). It
implies an assertion that "the program is an editor and its user is a programmer"; in other words, "programmers have executed editor programs."
b. Validity of Generalized Records
50
Y^ (8)[/X^l(^l.Ai), . . . ,IJ.fm(tl.Am)]\
ti C
/|C|,
J /
where (8) is T-norm operator [20]. iXfjiti.Aj) denotes the membership degree of
an attribute Aj of a record tj with respect to a fuzzy set / } , and | C| denotes
the cardinality of the collection C. We call generalized records with support
degrees higher than a user-given threshold value qualified generalized records.
Note that we do not use the term, a set of database records or a database
relation, because a collection of database records is allowed to have duplicates.
We will denote SD(g/C) as SD(g) for simplicity as long as the data collection
C is obvious in the context. T-norm operators are used to obtain conjunctive
notions of membership degrees in the fuzzy set theory [20]. Examples include
MIN and probabilistic product operators. Since the usage of T-norm operators
in the fuzzy set theory is analogous with that of the product operator in the
probability theory, the symbol (8), which is analogous with x, is commonly
used to denote a specific instance of a T-norm operator.
Since iifj{ti.Aj) denotes the membership degree of an attribute Aj of a
record ti with respect to a fuzzy set fy, a T-norm value over membership degrees of all attributes of a record ^/, i.e., 0[/x/-/(^i. A i ) , . . . , fjifmih-Am)], represents how strongly the record tj supports the assertion of a generalized record
(/i5 5 fm)' -^^ ^ result, the support degree of a generalized record is the normalized sum of such support strength of individual database records. In other
words, the support degree implies the fraction of supporting database records
to the total data collection.
The defined support degree has the following properties:
Boundary conditions:
Given a generalized record g,
-0<SD(g)<l.
If all fuzzy sets in g are co, SD(g) = 1.
Monotonicity:
Given two generalized records, ^i and gi on the same attribute
scheme,
SD(gi) < SD(g2) if gi is a specialization of ^2While boundary conditions are self-evident by definition, the monotonicity
property needs some explanation. The following theorem shows the monotonicity property.
Theorem 1 (Monotonicity of the support degree). Given two generalized
records, gi = ( / i i , . . . , /im), and gi= {fii,...,
fim), on an attribute scheme
51
DATA MINING
TABLE 2
Record
A\
Ai
Mfi(^i)
Mf2(^2)
a
b
c
a.
1.0
0.1
0.1
0.3
1.0
0.3
0(1.0,0.3) = 0.3
0(0.1,1.0) = 0.1
0(0.1,0.3) = 0.1
ti
t3
( A i , . . . 5 Afn)^
^ / ^ {i,...,m}[/XAiy(t/.Ay)]
< \ J2
/\C\
^i^{h--.m}[f^f2j(ti-Ai)]\,)i)/i
/\CU
since
tl C
lifniti.Aj)
< fifijiti.Aj)
= SD(g2).
Thus,
SD(gi)<SD(g2),
Let us look at an example of support degree computation. Suppose a generalized record g on an attribute scheme (Ai, A2) is (/i, fi), where fuzzy sets /i
and fi are [1.0/a, O.l/b] and {O.S/a, 1.0/)8}, respectively. If a data collection
C is given as Table 2, SD(g) is computed as follows: The first record ti supports
g as strong as 0.3, since its first and second attribute values, a and or, belong to
fuzzy sets, /i and / i , to the degrees, 1.0 and 0.3, respectively. Note that we use
the MIN operator for the T-norm operation just for illustration throughout this
paper. Similarly, both the second and third records support g as strong as 0.1.
As a result, we can say that the generalized record g is supported by 0.3 + 0.1 +
0.1 = 0.5 records out of a total of three records, i.e., 17% of the data collection.
2. Fuzzy Domain Knowledge
An ISA hierarchy is an acyclic digraph (N,A), where N and A are a set of
concept nodes and a set of ISA arrows, respectively. If there is an ISA arrow from
a concept node i to another concept node ni, we say that i ISA W2; in other
words, til is a specialized concept of W2. While conventional ISA hierarchies
have only crisp ISA arrows, fuzzy ISA hierarchies include fuzzy ISA arrows.
The meaning of a fuzzy ISA arrow from i to ni can be interpreted as i is a
partially specialized concept of 2- Without loss of generality, we suppose that
the root node of any fuzzy ISA hierarchy is the special fuzzy set a>, and each
terminal node is a singleton set whose unique element is an atomic domain
value, i.e., a value appearing in actual database records.
52
TABLE 3
Level-k Fuzzy S e t s
Set label
Membership function
Level
engineering
business
3
3
editor
documentation
spreadsheet
{1.0/emacs, 1.0/vi}
{0.1/emacs, 0.3/vi, 1.0/word}
{0.1/word, 1.0/wright}
2
2
2
emacs
vi
word
wright
{1.0/emacs}
{1.0/vi}
{1.0/word}
{1.0/wright}
1
1
1
1
In fuzzy set theory [21], the elements of a fuzzy set can themselves be fuzzy
sets, rather than atomic domain values. Ordinary fuzzy sets whose elements are
atomic values are called level-1 fuzzy sets. Fuzzy sets whose elements are level(k-1) fuzzy sets are called level-^ fuzzy sets. Table 3 depicts some level-^ fuzzy
sets. If two fuzzy sets have different levels, we cannot directly determine the
inclusion relationship between them, since the domains are different. However,
the level of a fuzzy set can be either upgraded or downgraded by some fuzzy settheoretic treatments. Thus, if we want to determine the inclusion relationship
between two fuzzy sets with different levels, we must adjust their levels to the
same through upgrading or downgrading levels.
Upgrading the level of a fuzzy set is trivial, since a level-fe fuzzy set can
be simply rewritten as a level-(fe + 1) singleton set whose unique element is the
original level-fe fuzzy set. For example, a level-2 fuzzy set editor in Table 3 can
be thought as a level-3 fuzzy set such as {1.0/editor}.
Downgrading the level of a fuzzy set is done by a support fuzzification
technique based on the extension principle [20]. Rather than spending a large
space to explain support fuzzification precisely, we will explain it by example.
Interested readers are recommended to refer to Zadeh's orginal paper [22].
The transformation procedure of a fuzzy ISA hierarchy to a fuzzy set hierarchy is composed of three steps as follows:
1. Downgrade several levels of fuzzy sets in a fuzzy ISA hierarchy to
level-1 fuzzy sets.
53
DATA MINING
Editor
Documentation
Spread sheet
0.1
Emacs
FIGURE 2
vi
Word
Wright
54
editor
docu
spread
engi
busi
a;
emacs
vi
word
wright
1.0
1.0
0.0
0.0
0.1
0.3
1.0
0.0
0.0
0.0
0.1
1.0
1.0
1.0
1.0
0.8
0.1
0.3
1.0
1.0
1.0
1.0
1.0
1.0
Note. The leftmost column enumerates atomic domain values and other columns represent
membership degrees of atomic domain values for the fuzzy sets Usted in the first row.
55
DATA MINING
Editor
Documentation
Spread sheet
0.1
Emacs
FIGURE 3
vi
Word
Wright
generalized record, i.e., (CD, . . . ,&)>, based on the given fuzzy set hierarchies
to search for more specific generahzed records while remaining qualified. The
specialization is done minimally in the sense that only one fuzzy set is specialized
into its direct subset. By evaluating support degrees of those specializations with
respect to a given collection of database records, qualified generalized records
are identified. At this point, human users can interact with the discovery process.
They might choose only some qualified generalized records for further consideration if they are not interested in the others or want to trade in the search
completeness for reducing the search cost. The process minimally specializes
again only user-chosen qualified generalized records. Those specializations become hypotheses for the next phase. After some repetitive steps of such specialization, the process yields a specialization hierarchy of qualified generahzed
records i.e., significant database summaries. Figure 4 diagrams the process, and
Fig. 5 depicts the steps in detail.
Note that the monotonicity property of the support degree in Theorem 1
guarantees that any specializations of unqualified generalized records cannot be
qualified, and as a result, the process never misses qualified generalized records,
even though it does not specialize unqualified generalized records.
In Fig. 5, the support degree of each generalized record is computed from
Line 5 to Line 7. This is the most time-consuming part of the process except the user interaction (Line 12), since the process must scan disks to read
each database record. In Line 9, qualified generalized records with respect to
T are identified, and they are put into the result. After users choose only interesting generalized records in Line 12, they are minimally specialized in the
sub-function specialize(). The process comes back to Line 4 with those specializations to repeat the steps.
Let us consider the efficiency of the process in terms of the number of disk
accesses. Because it is hard to estimate how much time it takes to interact with
human users, let us assume that users choose all qualified generalized records
in Line 12 for the efficiency analysis. It is common to analyze the efficiency of
most disk-based database applications in terms of disk access costs [23]. It is
because the cost of disk accesses is much more expensive than that of in-memory
56
Seeds for
the next
hypotheses
Specialization
FIGURE 4
Fuzzy
set
hierarchies
INPUT:
(i) A collection of database records C. (ii) fuzzy set hierarchies for attributes,
(ill) a support degree threshold value x
OUTPUT:
A specialization hierarchy of qualified generalized records
(1)
SPGRO
(2)
{
(3)
o > };
result = 0 ; curr = {< co
(4)
while(curr = 0){
(5)
for each t in C
(6)
for each g in curr
(7)
accumulate SD(g);
(8)
for each g in curr
(9)
if SD(g) > r, then result = result U g,
(10)
else curr = curr - g;
(11)
for each g in curr {
(12)
if the USER marks g then specialize(curr,gf);
(13)
curr = curr - g;
(14)
}
(15)
}
(16)
(17) specialize (set, g=<a-\
am>)
(18) {
(19)
fori = I t e m
(20)
for each direct subset saj of aj
(in the fuzzy set hierarchy for the jth attribute)
set = set U < a i , . . . ,saj,... am >)
(22) }
FIGURE 5
57
DATA MINING
operations. The disk access cost actually determines the efficiency of a process
if the number of in-memory operations does not increase exponentially along
with the input size, and no external network-based communication is involved.
The number of generalized records whose support degrees are evaluated in
one specialization phase, i.e., the size of "curr" in Line 6, has nothing to do with
the number of database records. Rather, it is determined by the average fan-outs
of given fuzzy set hierarchies. Empirically, we expect that the system memory
buffer space can hold all those generalized records in one specialization phase.
Then the total number of disk accesses is the number of specialization
phases multiplied by the number of disk accesses to read database records in
C. Let us denote the maximal path length of a fuzzy set hierarchy on the ;th
attribute as /y. Since the specialization of generalized records are done attribute
by attribute (see line 16 to 21), the number of specialization phases cannot be
greater than YlAh)- ^^ ^ result, the number of disk accesses is no greater than
Yli (h) ^ p3 where p denotes the number of disk pages containing the collection
C of database records. Note that like the size of curr, Ylj(lj) is also determined
by given fuzzy set hierarchies not by the number of database records. Thus, we
claim that the average cost of our summary discovery process increases linearly
along with the number of database records in a collection if we ignore a human
user's interaction.
Let us see the behavior of the algorithm with an example. Suppose that we
have a collection of computer usage records whose attributes are PROGRAM
and USER as shown in Table 5. Given fuzzy ISA hierarchies on attributes,
PROGRAM and USER are supposed to be resolved to fuzzy sets in Table 6
and fuzzy set hierarchies in Fig. 6. The fuzzy sets in Table 6 are represented in
the form of semantic relations [23]. Semantic relations represent several fuzzy
sets on the same domain in the relational form. If the domain is a continuous
TABLE 5
Program
User
Program
User
emacs
vi
emacs
vi
word
emacs
word
emacs
word
word
emacs
word
emacs
word
vi
emacs
vi
emacs
John
Tom
Tom
John
Kimberly
John
Mary
John
Kimberly
Kimberly
John
Mary
John
Kimberly
John
Tom
Tom
John
emacs
gcc
wright
emacs
emacs
tetris
emacs
ultima
emacs
emacs
ultima
emacs
tetris
emacs
emacs
wright
gcc
emacs
Tom
John
Steve
Jane
Mary
John
Jane
Mary
John
John
Mary
Jane
John
Mary
Jane
Steve
John
Tom
58
compiler
editor
gcc
cc
ill
emacs
vi
word
wright
tetris
ultima
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
0.0
Value
prog
writer
John
1.0
1.0
0.2
0.0
0.0
0.0
0.0
0.0
0.0
0.8
1.0
0.0
0.0
0.0
Tom
Mary
Kimberly
Steve
Jane
Bob
docu
spread
For PROGRAM_01
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.3
0.0
0.1
1.0
0.0
1.0
0.0
0.0
0.0
0.0
seller
engi
bus!
game
a;
1.0
1.0
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.3
1.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
develop
market
a;
1.0
1.0
1.0
1.0
0.0
0.0
0.0
0.0
0.0
0.8
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
account
For USER_01
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.0
1.0
0.0
0.4
1.0
0.0
1.0
59
DATA MINING
PROGRAM_01
Engineering
Compiler
gcc
cc
Editor
f77 Emacs
Business
Docu
vi
Word
Game
Spread
Wright
Tetris Ultima
USER_01
John
Tom
Mary
Kimberly
Steve
Jane
Bob
F I G U R E 6 Fuzzy set hierarchies for PROGRAM and USER. As different users have different ISA
relationships between domain concepts in mind, there can be several fuzzy set hierarchies for an attribute
domain. Thus, vy^e postfix".number" t o each fuzzy set hierarchy name t o denote that it is chosen among
several available ones.
<-, ->
1.000
<engineering, ->
0.833
FIGURE 7
<-, developer>
0.833
<-, marketer>
0.411
60
LEE A N D KIM
1 = 0.4
<-, ->
1.000
<engineering, ->
0.833
<-, developer>
0.833 ^
<engineering, developer>
0.720
FIGURE 8
<-, marketer>
0.411
<-, programmer>
0.589
1 = 0.4
<-, ->
1.000
<engineering, ->
0.833
<-, developer>
0.833 ^
<engineering, developer>
0.720 ^
<-, marketer>
0.411
<-, programmer>
0.589
T = 0.4
<-, ->
1.000
<engineering, ->
0.833
<-, developer>
0.833 ^
<engineering, developer>
0.720 1^
<-, marketer>
0.411
<-, programmer>
0.589
<editor, developer>
0.456
F I G U R E 10
DATA MINING
6 I
Along with the specificity of generahzed records, support degrees also affect
the information values. For example, (editor, programmer) with the support
degree 0.9 could be regarded as more informative than the same generalized
record with the support degree 0.3. It is because the former seems to give
information about 90% of the original collection of database records, while
the latter seems to give information about only 30% of them. Strictly speaking,
this argument alone may mislead. Both the specificity and the support degree
should be considered simultaneously to obtain more exact information value
as detailed in the following sections.
a. The Notion of Shannon's Entropy
62
Now, we explain how to obtain |^()| and \Q(g)\ in our context. Suppose
that a given data collection C has an attribute scheme ( A i , . . . , A^). Let ^ [ ; ]
denote the domain of the /th attribute. Then the set of all possible records that
can be composed from the domains, denoted as ^, becomes ^[1] x . . . x ^[m],
under the ignorance assumption of attribute dependencies.
Consequently,
I^WI = 1^1'^',
(1)
(2)
Also, ^ can be thought as being divided into two parts denoted as ^g and ^g'^
with respect to g, ^g is a set of records consistent with a generalized record g,
and ^gf = ^ ^g. If we define the coverage degree, CV, of a generalized record
g as l^gl/l^l, to measure the fraction of ^ that is consistent with g, ^g and
^g' can be written as
(3)
The coverage degree will be precisely defined in the next subsection. At the
moment, let us assume that it is given from somewhere. It is obvious that
the coverage degree of a generalized record is, by definition, greater than 0.
Complete coverage degree, i.e., one, of a generalized record, implies that the
generalized record is (&),..., &)). By definition, Q(((o,... ,co)) is the same as ^ ( ) .
As depicted in Fig. 11, database records in Cg and Cg, are thought to be
selected from ^g and ^g/, respectively. Thus, we can formulate \Q(g)\ by (2)
(b)
(a)
The set of all
possible records
A given data
collection
T-^
Cg
^Q
Cg'
F I G U R E I I Record selection when a qualified generalized record g is i<nown: (a) denotes a case
where the coverage degree of g is very low but the support degree is high (low CV and high SD) and (b)
denotes the opposite case (i.e., high CV and low SD).
63
DATA MINING
and (3) as
= |*|CV(g)l^l^^<>|*|(l
CV(g)) \Q{l-SD{g)
(4)
INPO(g) =
\og2{m^)\/Wg)\]
= log2[|vI'|'^"/{|vl/|CV(g)"^"^^(^^|^|(l - CV(^))"^"^i-^^(^)}]
by (1) and (4)
= \og,[l/{CV(gf^'^^^Hl
- CV(g))'^'<i-^^(^)}],
(5)
Info
CV
0.0
64
J2
[
[t^fl{Xl),..-,IJ-fm(Xm)]\
,...,xme^m
xle^l,....xme^m
/{|*1|X...X|XI/^|}.
J /
= 6
(6)
xle^l.xle^l
f^fuM]
(7)
Similarly,
X!
^[f^f2i(xl),f^f22{x2)]
= 1.3.
(8)
:V;1G^1,:X;2G>1^2
By (6) and (7), CV(gi) = 2.5/6 = 0.42, and by (6) and (8), CV(g2) =
1.3/6 = 0.22. As a result, we can say that gi and g2 cover the ^ as much
as 42 and 22%, respectively. Figure 13 depicts how those generalized records
cover ^ graphically.
B. Bottom-Up Data Characterization
DBLEARN adopts an attribute-oriented induction method to extract database
summaries [9]. In its attribute-oriented induction, each attribute value of a
65
DATA MINING
(a)
(b)
0.1
a
0.4 a
fii
1.0
b
f21
0.1
a
1.0
a
0.0
a
0.4 a
0.1
0.1
0.0
1.0 p
OA
0.5
c
0,1
fl2
f22
1.0 p
0.1
0.0
"
F I G U R E 13 Coverage of generalized records: (a) is for gi = {f\\, fn}, where f\ \ = (0. l / o , 1.0/b,
0.5/c} and f n = {0.4/a, 1.0/^}. (b) is for gi = { f i i , fii},' where fn = {O.l/o, l.0/b,0.0/c} and
fii = {0.1/a, 1.0/^}. In each diagram, the whole six-block area represents ^g, and the shaded area
represents ^g \ (or ^gi). The density of shade implies how completely the generalized record covers the
corresponding area.
record is substituted with a more general concept. After one pass of the substitution, equivalent classes of generalized records are identified and each class is
regarded as a candidate summary. This bottom-up procedure is repeated until
satisfactory summaries are obtained.
Let us examine the procedure with an example. Suppose that we have a
data table as in Table 7. Note that the last attribute, "Vote," is augmented to
denote how many records are represented by the record. Initially, it is given as
one for each record since only one record is represented by itself. In the process
of summarization, the "Vote" value increases.
Also suppose that we have ISA hierarchies on the domains, "Major," "Birth
Place," and "GPA" as in Fig. 14. Firstly, we remove the key attributes because
it is meaningless to summarize such attributes. In Table 7, the first attribute,
"Name," is removed since it is the key of the table. Each attribute value in the
remaining table is substituted for its direct generalization in the given ISA hierarchies. This is called attribute-oriented substitution. After the first attributeoriented substitution, we obtain the results in Table 8.
We can see the first and the fourth records are equivalent to each other.
The second record is also the same as the third. Thus, each pair is merged into
a single record as in Table 9.
Note that the "Vote" values for the first and second records in Table 9
are increased to 2. They represent that each of them represent two records in
TABLE 7
Name
Major
Birth place
GPA
Vote
Lee
Kim
Yoon
Park
Choi
Hong
Music
Physics
Math
Painting
Computing
Statistics
Kwangju
Soonchun
Mokpo
Yeosu
Taegu
Suwon
3.4
3.9
3.7
3.4
3.8
3.2
1
1
1
1
1
1
66
Major
Music
Painting
Physics
Math
Computing
Birthplace
Kwangju
Sunchon
GPA
Excellent
Good
Bad
(4.0-3.5)
(3.5-3.0)
(3.0-0.0)
the original table. Commonly, users are interested only in records with high
"Vote" values, since they represent the significant portions of the data table.
Thus, the user must determine the acceptable minimum for the "Vote" values
as the threshold value. If we determine the threshold value as 2 in this small
example, the last two records should be removed from the result as in Table 10.
Since the data table contains small number of records and each ISA hierarchy has a few levels for simple illustration, there seldom remain more chances to
summarize further. However, this substitution-merge process will be repeated
several times in real situations where the data table has a large number of
records and each ISA hierarchy has a significant number of levels.
Birth place
GPA
Art
Science
Science
Art
Science
Science
Chonnam
Chonnam
Chonnam
Chonnam
Kyungbuk
Kyunggi
Good
Excellent
Excellent
Good
Excellent
Good
Vote
67
DATA MINING
Birth place
GPA
Vote
Art
Science
Science
Science
Chonnam
Chonnam
Kyungbuk
Kyunggi
Good
Excellent
Excellent
Good
2
2
1
1
Birth place
GPA
Vote
Art
Science
Chonnam
Chonnam
Good
Excellent
2
2
68
69
DATA MINING
TABLE 11 A n Example
Database for Classification
Al
A2
A3
a
a
b
b
d
e
f
g
k
r
m
o
1
2
3
3
the class label. From the table, a decision tree can be induced as in Fig. 15.
Each node represented as a rectangle is called a decision node. The terminal
nodes are called result nodes. The classification process begins from the root
node. According to the decision in the decision node, the corresponding edge is
followed to the next decision node. When the edge traversal reaches a terminal
node, the classification result is obtained. For example, suppose that we have
a new record (a, e, k) and that the decision tree classifies it into a class " 2 . "
Note that given a database, there can be several decision trees. Thus, the
issue of decision tree induction is how to construct an effective and efficient
decision tree. Effectiveness means how correctly the decision tree classifies new
records, and efficiency means how less the representation cost of the decision
tree is. There are a variety of algorithms for constructing decision trees. Two
of the most popular go by the acronyms CART and CHAID, which stand for
"classification and regression trees" and "chi-squared automatic interaction
detection," respectively. A newer algorithm C4.5 is gaining popularity and is
now available in several software packages.
CART constructs a binary tree by splitting the records at each node according to a function of a single input field. The first task, therefore, is to decide
which of the descriptive attributes makes the best splitter. The measure used to
evaluate a potential splitter is "diversity." There are several ways of calculating
the index of diversity for a set of records. With all of them, a high index of
diversity indicates that the set contains an even distribution of classes, while a
low index means that members of a single class predominate. The best splitter
is the one that decreases the diversity of the record sets by the greatest amount.
F I G U R E 15
70
Input 1 -^^w1
Output
Input 3 ^ ' w 3
F I G U R E 16
71
Output
xSXI
^^^^O^ Output
Input 4
F I G U R E 17 Several types of artificial neural networks: (a) a simple network, (b) a network with a
hidden layer, and (c) a network with multiple outputs.
network commonly has three layers, input layer, ouput layer, and hidden layer.
The first type (a) is a very simple neural network with only input and output
layers. The second type (b) also has a hidden layer. The hidden layer makes the
network more powerful by enabling it to recognize more patterns. The third
type (c) can produce multiple output values.
Classification with artificial neural networks consists of the following steps.
Firstly, we must identify the input and output features of records. Even though
descriptive attributes become input features and the class label becomes the
output feature, we must encode them so their range is between 0 and 1. In
the second step, we must set up a network with an appropriate topology. For
example, the number of neurons in the hidden layer must be determined. In
the third step, it is required to train the network on a representative set of
training examples. In the fourth step, it is common to test the performance of
the network with a set of test data. Note that the test data must be strictly
independent from the previous training data. If the result of the test is not
acceptable, we must adjust the network topology and parameters. Lastly, we
apply the network to classify new records.
The training step is one of the most important parts of neural networkbased classification. Actually, it is the process of setting the best weights on the
inputs of each neuron. So far the most common technique is back-propagation
originally developed by Hopfield [26]. In back-propagation, after the network
calculates the output using the existing weights in the network, it calculates the
error, i.e., the difference between the calculated result and the expected. The
error is fed back through the network and the weights are adjusted to minimize
the error.
As compared to decision tree-based classification, artificial neural networks
are reported much more powerfully especially in complicated domains. They
72
73
DATA MINING
Events
101
102
103
104
A, C,D
B, C,E
A, B, C, E
B, E
Figure 18 depicts the first task on the database in Table 12 step by step.
Suppose that the given threshold value for support degree is 40%. Firstly, by
scanning the whole database, we produce event sets, each of which contains
a single event. The result becomes CI. The attribute label as "Support" indicates how many transactions in the original database contain the corresponding
event. Since the number of transactions in the database is four and the threshold value is 40%, the event set should have a "Support" of value more than 1.
Thus, the fourth event set is removed as in LI. By combining event sets in LI,
we produce event sets, each of which contains exactly two events. The result
becomes C2. We count how many transactions in the database contain each
event set by scanning the whole database again. The only event sets with a
C1
L1
Event set
Support
{A}
{B}
{C}
{D}
{E}
2
3
3
1
3
Event set
Support
{A}
{B}
{C}
{E}
2
3
3
3
1
C2
I
L2
Event set
Support
{A.B}
{A.C}
{A.E}
{B.C}
{B.E}
{C,E}
1
2
1
2
3
2
Event set
Support
{A.C}
{B,C}
{B,E}
{C.E}
2
2
3
2
\
C3
L3
Event set
Support
Event set
Support
{B.C.E}
{B.C.E}
74
"Support" value more than 1 are preserved as in L2. Again, we produce event
sets, each of which contains exactly three events by combining event sets in L2.
It becomes C3. This process is repeated until no more operations are possible.
In this example, we obtain frequent event sets as LI U L2 U L3.
Let us examine the second task, i.e., eliciting association rules with high
confidence in an instance. Since the "Support" values of the event sets {B, C, E}
and {B, E} are 2 and 3 respectively, the confidence of the association rule
{B, E} => {C} becomes 67%. It means that 67% of event sets also contain
the event C among event sets containing B and E.
Several variations of the Apriori algorithm have tried to reduce the execution time by exploiting hashing, database partitioning, and so on [16, 30]. In
addition, there are variations for discovering multilevel association rules. For
example, it may be unnecessarily detailed to have an association rule such as "if
a customer purchases a 160-ml Jones Milk (Product ID: 78-2456), then she is
likely to purchase a 200-g Hayes Bread (Product ID: 79-4567)." Instead, it may
be more useful just to know that milk customers are apt to also be bread customers. There have been several research efforts to devise effective algorithms
to discover such generalized rules as well as detail ones [28-30].
75
DATA MINING
Relational DB-1
1
Relational DB-2
Object-oriented DB-1
Object-oriented DB-2
Legacy DB-1
Metadata
'
^ / Data warehouse \
* I builder/manager y
''
Data
warehouse
File system-1
Operational data
F I G U R E 19
warehousing system. Since operational databases have been built for their own
operational purposes, it is rare that their structures fit together in the enterprisewide decision making. Furthermore, their schema are quite different to each
other and some information is apt to be duplicated. A data warehouse builder
must rearrange and adjust all the heterogeneity in the disparate data sources.
When the content of a data source changes, it should be propagated to the
data warehouse properly. It is under the responsibility of the data warehouse
manager. Since Metadata represent the global schema information of the data
warehouse, all queries on the data warehouse must refer to Metadata. Additionally, some systems adopt data marts, which are small-scale data warehouses
for specific portions of the decision making.
As indicated in Fig. 19, data mining techniques can reveal their strength
strongly in data warehousing systems since the data warehouse already contains proper data sources for valuable data mining. Although online analytical
processing (OLAP) provides the querying interface for data warehouses currently, the application coverage of it is so limited. Thus, there have been active
efforts for integrating data mining techniques and data warehousing systems.
REFERENCES
1. Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. Knowledge discovery in databases: An
overview. In Knowledge Discovery in Databases, pp. 1-27. AAAI Press, Menlo Park, CA,
1991.
2. Silberschatz, A., Stonebraker, M., and Ullman, J. Database systems: Achievement and opportunities. The Lagunita Report of the NSF Invitational Workshop, TR-90-22, Department of
Computer Science, University of Texas at Austin, 1990.
3. Cohen, P., and Feigenbaum, E. The Handbook of Artificial Intelligence, Vol. 3, pp. 411-415.
Kaufmann, Los Altos, CA, 1982.
4. Berry, M. J. A., and Linoff, G. Data Mining Techniques: For Marketing, Sales, and Customer
Support. Wiley, New York, 1997.
5. Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., and Zanasi, A. Discovering Data Mining:
From Concept to Implementation. Prentice-Hall PTR, Englewood Cliffs, NJ, 1998.
6. Lee, D. H., and Kim, M. H. Discovering database summaries through refinements of fuzzy
hypotheses. In Froc. 10th Infl Conf. on Data Engineering, pp. 223-230, 1994.
76
OBJECT-ORIENTED DATABASE
SYSTEMS
HIROSHI I S H I K A W A
Department of Electronics and Information Engineering, Tokyo Metropolitan University,
Tokyo 192-0397Japan
I. INTRODUCTION 77
II. FUNCTIONALITY 78
A. Data Model 78
B. Database Programming Language 81
III. IMPLEMENTATION 87
A. Data Management Subsystem 88
B. Object Management Subsystem 94
IV. APPLICATIONS 103
A. Introduction 103
B. System Architecture 105
C. Our Approach to Implementation 109
D. Conclusion 118
V. CONCLUSION 119
REFERENCES 120
I. INTRODUCTION
Initially, complex, large-scale database applications such as CAD [15], hypermedia [14], and AI [11, 12] spawned object-oriented databases (OODBs).
Moreover, OODBs have gone beyond them toward advanced applications such
as networked multimedia applications [24]. In this paper, we describe how we
designed and implemented an object-oriented DBMS called Jasmine [13,15,16,
18, 23]. (Note that Jasmine throughout this paper is not a product name but
a prototype code name of Fujitsu Laboratories Ltd.) We also discuss how we
applied Jasmine to advanced multimedia applications to verify its basic validity
and how we extended Jasmine for such applications.
This paper has the following contributions. First, we focus on the impact
of the design of its object-oriented model and language on database implementation technology. We describe what part of traditional relational database
technology we extend to handle object-oriented features such as object identifiers, complex objects, class hierarchies, and methods. We introduce nested
relations to efficiently store and access clustered complex objects. We use
# #
78
HIROSHIISHIKAWA
79
Specialized functions called demons can be attached to attributes. Constraint demons are checked before values are inserted into attributes. The values
are only inserted if the demon returns true. If-needed, if-added, if-removed, and
if-updated demons are invoked when values are referenced, inserted, deleted,
and replaced. Before and after demons are invoked before and after the procedural attributes they are attached to are invoked. The user can combine
these demons to flexibly implement active databases [19, 33]. Unlike other
systems. Jasmine allows the user both to specify system-defined integrity constraints, such as mandatory and multiple, and to specify user-defined integrity
constraints as demons.
Consider the class PATIENT as an example (see Fig. 1). The keyword Enumerated is followed by the definition of user-supplied enumerated attributes.
The name facet such as Doctor denotes the name of an attribute. The class
facet before the attribute name denotes the range class such as FLOAT of
Height. The value of the attribute of an instance must be an instance of the
range class (see Fig. 2). The domain of the attribute is the class being defined,
PATIENT. The multiple facet denotes that the attribute is multiple-valued such
PATIENT
Db
MEDICAL
Super
PERSON
Enumerated
DOCTOR
Doctor mandatory
STRING Category default outpatient
INTEGER Cardinality common
FLOAT Temperature multiple
constraint {(value > 34.0 && value <43.0)}
FLOAT Weight mandatory constraint {(value > 0.0)}
FLOAT Height constraint {(value > 0.0)}
If-needed
{ inth;
h = self. Weight;
return h + 100.0;}
Procedural
STRING date;
{ MEDICAL_CERTIFICATE mc;
mc = <MEDICAL_CERTIFICATE>.instantiate ();
mc.Patientname = self.Name;
mc.Doctomame = self.Doctor.Name;
mc.Diseasename = self.Disease.name;
mc.Date = date;
return mc;}
FIGURE I Example of class definition.
80
HIROSHIISHIKAWA
MedicalPatientOOT
Sex
male
Age
36
Name
James Bond
Address
Tokyo
Doctor
MedicalDoctorOlO
Category inpatient
Temperature
Weight
76.0
Height
176.0
FIGURE 2
Example of an instance.
as Temperature. The mandatory facet denotes that the attribute allows no null
value such as Doctor and Weight. The mandatory attribute must be specified
its value at instantiation. The common facet denotes that the attribute value
is common to all the instances as the domain objects. The common attribute
is not necessarily a constant, such as Cardinality. The default facet contains a
default value referenced when the attribute value is not yet specified, such as
Category of PATIENT. The if-needed demon, invoked if the referenced attribute
has a null value, computes a value such as Height of PATIENT. The keyword
Procedural is followed by the definition of procedural attributes. Procedural attributes such as make-medical-certificate also have facets. The class facet such
as MEDICALXERTIFICATE denotes the range class of the procedural result.
A superclass in a class hierarchy is denoted by the system-defined attribute
Super. The superclass, for example, PERSON, includes its subclasses, PATIENT,
as a set. The attributes of the superclass are inherited to the subclass, such as Age
of PATIENT. An attribute can be newly defined in the subclass such as Doctor
of PATIENT. Intrinsic instances of a nonleaf class can represent incomplete
knowledge of the domain. For example, PERSON intrinsic instances directly
denote a set of persons known to be neither a patient nor a doctor.
A class can be divided into disjoint subclasses (see Fig. 3). Those subclasses are collectively called a partition. Each subclass is called a member of
FIGURE 3
8 I
82
HIROSHIISHIKAWA
as put and delete specified in a query can modify a set of objects. A query can
invoke demons which implement integrity facilities introduced by QBE [38],
The user can specify multiple-valued attributes in a query. The user can control
unnesting of multiple values and apply aggregate functions correctly. Multiplevalued attributes are existentially or universally quantified.
The integration of query and programming facilities is another important
feature for advanced applications. First, the user can specify methods in a query
as described above. The user can extend the functionality of the query language
just by defining and specifying a method in a query, without modifying the
query language processor. The user can develop application programs more
compactly without specifying details such as iteration variable declaration and
control structures. Making this type of iteration implicit can increase physical data independence [3] of application programs by allowing the system to
optimize the query expression. Second, the user can also define methods by
specifying a query for them. This can define so-called virtual attributes and increase logical data independence [3] of application programs when applications
evolve. Third, the fact that the user invokes a query from programs is one of
the salient aspects of advanced applications. We introduce set variables to solve
the impedance mismatch problem [3] between the query and programming languages. The set variable has a class defined by an object model as its type and
can contain a set of objects returned by a query as its value. The user can fetch
an object by sending the scan message to the set variable and operate on the
object by sending a message to the object in an object-oriented programming
manner.
Class objects can also be operated set-theoretically for advanced applications. Basic database functions such as transactions, locking, and logging can
be provided through system-defined classes. Multimedia data types and operations are provided by implementing them from system-defined primitive classes
in a bootstrap manner.
Now we describe the syntax and semantics of a query through examples.
The query has the syntax
"["object_expression(s)"]" [where condition] [groupby object_expression(s)],
where the object expression has the form class_name ["." attribute_name ["."
attribute-name]...].
The query expression evaluates to a set of the target objects satisfying the
condition. The elements of the constructed set are objects (OIDs), or values
belonging to the database, or newly constructed tuple values. The result type
is determined by the query. For example, to find the name and address of
outpatients, the user forms a query as
(Query 1) [PATIENT.Name, PATIENT.Address]
where PATIENT.Category = = "outpatient".
The tuple operator "[ ] " allows the construction of tuple values, corresponding either to the projection or to the join of relations. Like this example,
the query corresponds to projection only if the target list has the form
[common_object_expression.attributel, common_object_expression.
attribute!, . . . ] .
83
Immediate objects are compared by = = , !=, >, >=, < , and <=, based on
values.
In general, joins are categorized into implicit and explicit joins. Jasmine
supports implicit joins as
(Query 2) PATIENT.Doctor.Name where PATIENT.Name = = "James Bond".
This finds the name of doctors who are in charge of James Bond. The
operator "[ ] " can be omitted only if the target list contains only one object
expression. Assuming that C is a class and Ai is an attribute and Oi is an object,
an implicit join denoted by an object expression C.Al, . . . An has the following
semantics:
{ On |O0 belongs to C and for all/ = 1 , . . . , w, either of the following holds:
(1) Oi is equal to Ai of Oi-1 if Ai is single-valued
(2) Oi belongs to Ai of Oi-1 ii Ai is multiple-valued}.
Nested sets generated by multiple-valued attributes are automatically flattened
unless the user prohibits that. Jasmine can also support explicit joins as
(Query 3) [PATIENT.Name, DOCTOR.Name] where
PATIENTAge = = DOCTOR.Age.
This retrieves pairs of names of patients and doctors who happen to be of
the same age. "[ ] " in this case corresponds to join. Reference objects can also
be compared by = = and != based on OIDs. For example, assume Disease and
Specialty are reference attributes (see Fig. 4):
(Query 4) [PATIENT.Name, DOCTOR.Name] where
PATIENT.Disease = = DOCTOR.Specialty.
This query finds the names of patients and doctors who specialize in their
disease.
The object expression with multiple-valued attributes evaluates to a set
of sets. However, multiple-valued attributes are automatically unnested unless
the user specifies the prohibition of unnesting by a special operator described
later. Therefore, the following query retrieves a flattened set of temperatures of
serious patients:
(Query 5) PATIENT.Temperature where PATIENT.Condition = = "serious"
A condition on multiple-valued attributes is interpreted as at least one value
satisfying the condition, that is, existentially. Universally quantified multiple
Dept
FIGURE 4
Doctor
84
HIROSHI ISHIKAWA
attributes can also be specified as described later. The following retrieves the
names of patients who ran a temperature of higher than 37,5C at least once:
(Query 6) PATIENT.Name where PATIENXTemperature >37.5.
Any class, leaf or nonleaf, in a generalization lattice can be specified in
a set-oriented query. According to the interpretation of a class, the intrinsic
instances of a nonleaf class and the instances of its subclasses can be retrieved
at the same time. For example, to find persons who live in Kyoto:
(Query 7) PERSON where PERSON.Address = = "Kyoto".
This causes a query be specified compactly because several queries against
subclasses such as PATIENT and DOCTOR can be formulated in a single query.
Objects can be retrieved without precise specification since a general class
can be specified in a query together with an attribute defined in its subclasses.
In general, assuming that C and C are classes and A is an attribute, C is
systematically translated into C in a query only if the following set is not empty:
{C'lC is a subclass of C and A is defined or inherited by C'}. The original query
usually generates multiple queries. Note that Query 7 is a special case where
A (e.g.. Address) is defined or inherited by all classes in a class hierarchy with
C (e.g., PERSON) as its top. For example, to find the names of persons whose
disease is a cancer:
(Query 8) PERSON.Name where PERSON.Disease.Name = = "cancer"
The class PERSON is automatically specialized to the subclass PATIENT
with the attribute Disease defined. In the extreme case, OBJECT can be used in
a query. This mechanism fits with how we defined some concepts by differentiating a general concept by providing specializing attributes. The user can thus
formulate a query without knowing specificity like a natural language query.
A condition can be imposed on the categorization attribute of a general class
with a partition. If the specified condition matches some of the categorization
conditions of the partition, the specified class can be specialized to some of
the partition members. In general, assuming that C and C are classes, C is
systematically translated into C only if the following set is not empty: {C'|C' is
a subclass of C and the condition of a query and the categorization condition
of C are not exclusive }. For example, to find infants (i.e., younger than seven):
(Query 9) PERSON where PERSON.Age <7
The class PERSON is automatically speciaHzed to CHILD with its categorization condition Age <18.
The user can do operations other than retrieval set-theoretically by using
procedural attributes, which can be specified in any part of an object expression
of a query. The additional parameters of the procedural attribute are given in
parentheses. In general, the object expression has the following form: receiver,
method (parameter, parameter, . . . ) . A receiver is an object expression and a
parameter is an object expression, an object, or a value. The result is also an
object, a value, or a set of objects or values. For example, to make and print a
copy of serious patients' medical certificates dated February 11,1999, the user
formulates the query
85
86
HIROSHI ISHIKAWA
87
constitute the object variable. The object variable integrates set-oriented access
of a database system and singleton access of a programming language. The
existence of a multiple option at declaration specifies that the object variable is
a set variable. For example,
PATIENT ps multiple, p;
ps and p are declared as set variable and instance variable of PATIENT type.
In general, the set variable is set to the result set of a set-oriented query at the
right-hand side of a statement. The user can access objects individually by using
the system-defined procedural attributes as
ps = PATIENT v^here PATIENT.Diseasee = = "cancer";
ps.openscanO;
while (p = ps.nextO)
ps.closescanO;
The procedural attribute next returns an object at each invocation, which
is set to the instance variable p for further use.
Procedural attributes can include set-oriented queries. The following attribute of the class DEPARTMENT defines interns who work in a department:
Procedural DOCTOR intern() multiple
{ self.Doctor where self.Doctor.Status =="internship" }.
This can be specified in a query to retrieve interns in the pediatrics department as
DEPARTMENT.internO where DEPARTMENT.Name = = "pediatrics".
We do not provide a special syntax for nesting queries. Instead, nested
queries can be simulated by procedural attributes defined by queries like the
above example. Correlated queries can be formulated explicitly by passing object expressions as parameters to the procedural attributes or implicitly through
the system-defined variable self.
III. IMPLEMENTATION
Relational databases have already accumulated large amounts of implementation technology. We don't think that it is clever to throw it away and to build
object-oriented databases from scratch. Relational technology provides basically applicable techniques such as storage structures, access methods, query
optimization, transaction and buffer management, and concurrency control.
Therefore, we take a layered architecture consisting of object management
and data management and use relational technology as data management (see
Fig. 5). However, traditional relational technology has limitations in efficient
support for object-oriented concepts including object identifiers, complex objects, class hierarchies, and methods. We extend relational technology to overcome such limitations. In addition to flat relations, we incorporate nested relations to efficiently store and access clustered complex objects. We support
88
HIROSHI ISHIKAWA
FIGURE 5
System architecture.
both hash and B-tree indexes to efficiently access objects through object identifiers. In addition to a nested-loop join and a sort-merge join, we provide a
hash join to efficiently process nonclustered complex objects in queries. We
extend query optimization to process object-oriented queries including class
hierarchies and method invocation. Note that such optimization is done not by
the data management subsystem but by the object management subsystem. We
provide user-defined manipulation and predicate functions directly evaluated
on page buffers. Methods are compiled into them and efficiently processed. We
devise object buffering in addition to page buffering and integrate these tw^o
schemes to evaluate queries. In a word, our approach is to provide an objectoriented model and language interface to an extensible database kernel [37],
such as GENESIS [1] and EXODUS [2].
Of course, there are alternatives to our extended relational approach to
object-oriented database implementation. A pure relational approach such as
Iris [31] has drawbacks as described above. Another approach uses WiSS
(Wisconsin Storage System) such as 0 2 [5], which provides record-based, singlerelation operators. This makes it difficult for us to focus on query optimizations
based on set-oriented relational operators. In the extreme case, monolithic architectures could be considered in contrast to our layered approach. This would
be less flexible to further tuning and extension. In this section, we will explain
the function and implementation of the data management subsystem, and storage of objects and implementation of the object manipulation language.
A. Data Management Subsystem
I. Data Structures
Advanced applications of OODBs require a variety of indexes such as
hash and B-tree indexes, clustered and nonclustered indexes, and extended
89
data dictionaries. Such indexes and data dictionaries are usually implemented
as special data structures in relational database systems because of access efficiency. The conventional approach using special data structures makes the
system less compact and less flexible to future extension. So the data management subsystem as a database kernel supports only relations (sequential, B-tree,
hash, and inner relations) to allow the user of this subsystem to customize data
dictionaries and indexes by using relations.
Only fixed-length and variable-length data are supported as field types of
tuples by the data management subsystem. The data management subsystem
makes no interpretation of field values except for TIDs and inner relations. Any
type of data can be stored such as an array, a list, and a relation. Inner relations
can be implemented as variable-length fields. Inner relations can have other
inner relations as field values, so nested relations can be recursively defined. The
length of a tuple must be less than the page size for efficient access and simple
implementation. The length and number of fields in a tuple are subject to this
limit.
The data management subsystem supports four types of relations as
follows:
1. Sequential relations have pages that are sequentially linked. Tuples are
stored in the order of insertion. The location of inserted tuples is fixed, so an
index can be created on sequential relations.
2. B-tree relations have B-tree structures. Tuples are stored in the leaf pages
in the order specified by user-defined order functions. This allows new access
methods to be assimilated by supplying dedicated comparison and range functions. B-tree relations consist of key fields and nonkey fields. B-tree relations
used as an index on sequential relations consist of several key fields and one TID
field. This corresponds to a nonclustered index. B-tree relations that contain
the whole data can be viewed as relations with a clustered index.
3. Hash relations use a dynamic hashing scheme called linear hashing with
partial expansion [30], an extension to linear hashing. We choose this scheme
because the space required to store data is proportional to the amount of data
and the space utilization ratio is adjustable and high. Hash relations also consist
of key fields and nonkey fields. The hash function is supplied by the user.
4. Inner relations for realizing nested relations are stored in variable-length
fields of tuples. Tuples of inner relations are sequentially inserted. Nested relations can be recursively implemented by storing inner relations into fields of
another inner relation. We provide nest and unnest operations for nested relations in addition to retrieval, insertion, deletion, and update. Retrieved inner
relations can be operated as sequential relations. Update of inner relations can
be done by retrieving inner relations, updating them as sequential relations,
and replacing old ones by new ones. We provide the functions interpreting the
variable-length fields according to the nested relation schemes to operate on
inner relations. Note that a theoretical basis for the nested relational model
was provided by Kitagawa and Kunii [27].
Tuple structures are uniform independently of relation types (see Fig. 6).
The first two bytes of a tuple contains the tuple length. The tuple consists
of fixed and variable parts. Fixed-length fields are stored in the fixed part.
90
HIROSHIISHIKAWA
49
007
male
Tuple
length
10
36
M<-
Jr
James Bond
Tokyo
-X-
Variable part
Fixed part
Variable-length fields are stored in the variable part. The offsets of the variablelength fields from the top of the tuple are stored in the fixed part. Any data
can be accessed in a constant time, although this tuple structure does not allow
null-value compression. Modification of the variable-length data can be done
without affecting the fixed-length data.
TIDs, which can be stored in fixed-length fields, act as pointers to tuples.
A variety of data structures can be implemented by using TIDs. For example,
a nonclustered index can be implemented by defining an index key field and a
TID field in B-tree or hash relations (see Fig. 7).
Access to fields must be efficiently processed since it is a frequent operation.
We provide pointer arrays for field access (see Fig. 8). Each pointer points to
the corresponding field in a tuple on page buffers. Simple tuple structures allow
efficient construction of pointer arrays. One alternative is to copy field values
to different areas. The alternative is good for data protection, but is rather
time-consuming. Field pointer arrays are passed to user-defined functions such
as manipulation and predicate functions for field access.
To efficiently access data, we move as few data as possible and fix tuples in
buffers if possible. Internal sorting uses pointer arrays for tuples to be sorted
(see Fig. 9). Such pointers are moved instead of tuples. Similarly, when a hash
table is created for internal hashing, pointers to tuples are linked instead of
tuples.
INDEX
Name
PATIENT
TID
Name
Age
Tom Jones
45
James Bond
36
James Bond
^""^
Tom Jones
91
DOCTOR
PATffiNT
Name
Age
Tom Jones
45
James Bond
36
TID
^ \
FIGURE 8
Name
Dr.No
Dr. Who
Precomputed join.
2. Hash-Based Processing
Set operations such as set-difference and duplicate elimination require OIDbased access. Object-oriented queries usually equi-joins based on OIDs. If either
of two joined relations can be loaded into main memory, we can use the hash
join method [36]. Even if neither of them can be loaded into main memory,
the hash join method generally requires less CPU time and I/O times than the
sort-based method. We adopted the hash-based method for equi-joins and set
operations. Unlike Jasmine, other object-oriented systems such as ORION use
nested-loop and sort-merge joins.
The internal hash join is used when either of two input relations for joins
can be loaded into main memory. Recursion is not used in the internal hash
join. Only one relation is partitioned into subrelations. The other relation is
only scanned tuple by tuple. It is not necessary to load both of the relations
entirely. We describe the outline of the algorithm. Only main memory is used
during processing.
(1) Determine which input relation is to be partitioned. Let the
partitioned input relation be A.
(2) Determine a partition number p and a hash function h.
Pointer Array
Page Buffer)
Tuple
36
FIGURE 9
James Bond
92
HIROSHI ISHIKAWA
93
Iter Array
Page Buffer
Tuple
^
w Tuple
Tuple
94
HIROSHI ISHIKAWA
selection of relational algebra. It has three parameters rb, pb, and mb. rb is the
data block that specifies the source relation, pb and mb specify user-defined
predicate and manipulation functions, respectively. (2) hjoin(rbl, r b l , mb, hb)
performs an equi-join of relations specifying rbl and rb2. mb is performed on
each pair of tuples which match on join fields. This operation is based on a hash
function specified by hb. (3) join(rbl, rb2, pb, mb) performs a general join of
relations rbl and rb2. (4) tjoin(rbl, rb2, TID, mb) joins each tuple of rbl with
a tuple of rb2 pointed by its TID field and performs mb on such a pair of tuples.
(5) sort(rbl, rb2, ob) sorts rbl and stores the results into rb2. The order function is specified by ob. (6) unique(rbl, rb2, hb) eliminates duplicates of rbl and
stores the result into rb2. This operation is hash-based. (7) nest(rbl, rb2, fid,
hb) generates a nested relation rb2 from a flat relation rbl with fields specified
by fid. This operation is also hash-based. (8) unnest(rbl, rb2, fid) generates a
flat relation rb2 from a nested relation r b l .
Functions of the tuple layer operate on four types of relations. The operators
are as follows: (1) Scan scans a relation sequentially and finds a tuple satisfying
the specified predicate. (2) Raster scans a relation sequentially fixing scanned
pages on buffers. It is used in internal sorting or making internal hash tables. (3)
Access directly accesses a tuple satisfying the specified predicate. (4) Fetch, (5)
delete, and (6) update directly accesses, deletes, and updates a tuple specified by
a given TID, respectively. (7) Insert inserts a tuple or a group of fields. (8) Clear
deletes all tuples. (9) Flac constructs a field pointer array for the specified fields.
The functions of the storage layer include disk I/O, page buffers, transactions, concurrency control, and recovery. Disk I/O management includes allocation and deallocation of subdatabases (segments) and pages. A database
consists of two types of subdatabases. One is a subdatabase that is permanent
and recoverable. The other is a subdatabase which is used as a workspace for
keeping temporary relations, and is only effective in a transaction. This is not
recoverable. Subdatabases are composed of a number of pages.
The storage layer supports variable-length pages sized It KB (/ = 2 , . . . , 8),
consisting of several 4 KB physical pages, which form a virtually continuous
page on buffers. We use the buddy system for buffer space allocation. The page
length can be specified for each relation because multimedia data and inner
relations may exceed 4 KB.
We use concurrency control based on granularity and two-phase locking.
Deadlock detection is done by examining a cycle in the Wait-For-Graph. One of
the deadlocked transactions in the cycle in the graph is chosen as the victim for
rollback. ORION uses deadlock detection based on timeouts. Our transaction
recovery is based on shadow-paging for simplicity.
B. Object Management Subsystem
I. Object Storage
We efficiently store nested structures of objects by use of nested relations
supported by the data management subsystem unlike other systems. Storage
structures differ from instance to class. Translation of objects to relations is
automatically done by the system. Information about the translation is held
by classes.
95
Relational Layer
Tuple Layer
Storage Layer
FIGURE I I
Architecture.
96
HIROSHI ISHIKAWA
Sex
Age
Name
Doctor
MedicalPatientOOT
male
36
James Bond
MedicalDoctorOOO
Category
Temperature
Weight
Height
inpatient
76.0
181.0
Name
PATIENT
Property
Super
Db
MEDICAL
PERSON
name
Class
Doctor
DOCTOR
Weight
FLOAT
Height
FLOAT
if-needed
DEMON030
Method
name
Class
Main
Make-medical-certificate
Medical-certificate
METHODOOl
F I G U R E 12
classes and attributes have a fixed set of facets, we store enumerated and procedural attributes in different inner relations and facets in the fields of the
inner relations. Procedural attribute (method) definitions are also stored in
relations and are retrieved and utilized during query optimization. The systemdefined attributes such as Super are stored in separate fields (see also Fig. 12).
To store heterogeneous classes in one relation makes set-oriented access to them
efficient.
2. Set-Oriented Access Support
We compile both set-oriented access and singleton access to do early binding and reduce run-time overhead. The Jasmine compiler is implemented using
97
98
HIROSHIISHIKAWA
one relation. Each scan returns an object, by scanning the result relation and
then selecting the base instance relation by the OID. At that time, if an object
is already on core, it is used.
Usually, if a selection predicate of the sequential relation can use an index,
selection by index, which selects a B-tree relation for the index by the key
condition, sorts the result relation containing TIDs, and then joins the result
relation and the original sequential relation by using TIDs, is chosen. The rest
of the selection condition is evaluated at the same time. For B-tree and hash
relations, if a predicate concerns the key fields, key-based searching is done.
Note that if a whole relation of any type is small enough to be contained within
one page, sequential access is chosen.
If one of two relations being joined is small enough to be contained within
a page and the join key is indexed by the other relation, tuple substitution is
chosen. If one of two relations is contained within a page and no index is provided, nested loop is chosen. Otherwise, hash join is chosen. For B-tree and
hash relations, the join is similarly processed. In case of a join of several relations, the order of join is dynamically determined by the size of the intermediate
result relations. First, we choose the smallest relation and the second smallest
one among relations to be joined. Then we join them to obtain an expectedly
small relation as a result. We add the result to relations to be joined and repeat
this process. This dynamic scheme based on exact relation sizes is expected to
be more efficient than static schemes based on database statistics.
Next, consider the case where object expressions contain procedural attributes. User procedural attributes appearing in the target part are translated
into manipulation functions. Procedural attributes in conditions are translated
into predicate functions. For example, the above query graph generates the
relational operator sequence and the predicate and manipulation functions as
follows:
select(DOCTOR, predicatel, manipulate 1);
select (CHILD-PATIENT, predicate 2, manipulate 2);
if (within-page(tmpl)||within-page(tmp2)) join(tmpl, tmp2, predicates,
manipulates);
else hjoin(tmpl, tmp2, predicates, manipulates, hashfunc);
predicate 1 (flag, OID, name)
{if (flag = = MAIN)
{if (name = = "Dr. No") return true
else return false }}
manipulatel (flag, OID)
{if (flag = = PRE) openinsert (tmpl);
else if (flag = = MAIN) insert(tmpl, OID);
else if (flag = = POST) closeinsert(tmpl);}
predicate2 (flag, OID, condition, age)
{if (flag = = MAIN)
{if (age < IS) return true
else return false}}
99
I 00
HIROSHIISHIKAWA
are processed from left to right in the object expression. Section predicates,
if any, are evaluated during join processing. Note that there are methods for
precomputing joins. For example, to process the query (DOCTOR.Patient.Age
where DOCTOR.Patient.Age > 30), an index with Age as a key value and the
OID of DOCTOR as a pointer value is created. Other systems such as ORION
use this approach. However, it is rather difficult to maintain such an index
properly.
We describe how to process queries containing nonleaf classes in a class hierarchy. We assume that PATIENT has ADULT-PATIENT and CHILD-PATIENT
as subclasses. Consider the following examples,
(Query 21) PATIENT.Name where PATIENT Age > 12
and PATIENTAge < 20
(Query 22) DEPARTMENT.Doctor.Patient.Name where
DEPARTMENT.Name = = "pediatrics".
For Query 21, the system generates two subqueries:
resuh = ADULT-PATIENTName where ADULT-PATIENTAge < 20
resuh = resuh -h CHILD-PATIENT.Name where CHILD-PATIENT.Age > 12.
The two query results are inserted into the same output relation.
For Query 22, the join of DEPARTMENT and DOCTOR is processed
first. During the join processing, the intermediate output relations are switched
according to the class of the OID for DEPARTMENT.Doctor.Patient. The class
can be determined just by looking at the OID. The pseudo queries are
aduh-intermediate = DEPARTMENT.Doctor.Patient
where DEPARTMENT.Name = = "pediatrics" and
DEPARTMENT.Doctor.Patient.Class = = <ADULT-PATIENT>
child-intermediate = DEPARTMENTDoctonPatient where
DEPARTMENT.Name = = "pediatrics" and
DEPARTMENT.Doctor.Patient.Class = = < CHILD-PATIENT> .
The switching is done during a single-join operation. The code for the
switching is translated into the manipulation function of the join operator.
Then a pair of adult-intermediate and ADULT-PATIENT and a pair of childintermediate and CHILD-PATIENT are joined, and the results are merged.
As described above, the intermediate result of selection or join operations is
switched to separate relations containing only OIDs relevant to successive joins.
This can estabhsh optimal preconditions for the joins by avoiding unnecessary
search.
Classes (e.g., PERSON, DOCTOR, and PATIENT) in a class hierarchy
share inherited attributes such as Age. Basically there are two methods for
creating indexes on classes in a class hierarchy. One method is to create only
one index on a whole class hierarchy, called a class-hierarchy index. The other is
to create a separate index, called a single-class index, on each class. Jasmine uses
single-class indexes. Other systems such as ORION and 0 2 use class-hierarchy
indexes. The class-hierarchy index has an advantage that the total size of index
pages and the total number of accessed index pages are smaller than those of
the single-class index. However, it is not always optimal when a class hierarchy
Singleton access is also compiled into C programs, which are compiled and
linked with the run-time support library. First, run-time support will be described. The first access of an object fetches the object from secondary memory
to the page buffer. Then the object is cached in the active object table (AOT),
a dedicated internal hash table for object buffering (see Fig. 13).
The primary role of AOT is to efficiently look up objects. When an instance
is referenced through its OID for the first time, the instance is hashed by its OID
as a hash key. A hash entry (an object descriptor) and an in-memory instance
data structure is created. The hash entry points to the instance data structure.
If the instance is referenced through its OID by other instances resident in
AOT, the OID is mapped to the pointer to the hash entry through the AOT.
The pointer can be cashed into the attribute of the referencing instance since
Hash Table
r i Instance Structure
Hash Entry
OIDl
TID7
Instance Structure
FIGURE 13 AOT structure.
L/ 0ID3 L/
TID4
free
I 02
HIROSHI ISHIKAWA
an OID is longer than a physical pointer. Later, the instance can be directly
accessed by the pointer without hashing.
Another important role is to maintain the status flags for update of objects.
When an object is newly created or updated, the status flag in the hash entry for
the object is set to create or update. When a transaction is committed, the object with the status create or update is modified or added into the page buffers.
When an object is destroyed, the corresponding in-memory data structure is
deallocated and the status flag is changed to destroy. Later, if the destroyed
object is referenced, the validity of reference is checked and an exception handler is invoked. This can support referential integrity. When a transaction is
committed, the object is destroyed in databases.
When objects fill up AOT, extraneous objects are swapped out. Such an
object is flushed to the page buffers and the in-memory instance data structure
is deallocated and the status flag in the hash entry is set to free. When the
object is referenced again, the object is directly fetched from databases to AOT
through its TID cashed in the hash entry.
The object management subsystem requires AOT, that is, object buffers in
addition to page buffers of the data management subsystem for the following
reason. In general, buffers are directly associated with patterns of access of
objects. Page buffers have structures suitable for access of different instances of
the same class. AOT have structures suitable for access of correlated instances
of different classes. Advanced applications such as CAD have combinations of
two such patterns. This necessitates a dual buffer scheme consisting of page
buffers and object buffers, not a single-buffer scheme, which would contain
unnecessary objects and decrease memory utilization.
The dual buffer approach, however, makes the same object appear in different formats in different buffers at the same time, so we must maintain internal
consistency between two objects denoting the same entity. Currently, we first
write back updated or newly created instances from AOT to page buffers in
query evaluation. Then we evaluate a query against page buffers. An alternative
is to devise different search mechanisms for different buffers and evaluate the
same query on different buffers and integrate the results, which would make
the system less compact.
Basically there are two methods for query evaluation using object buffers
and page buffers as follows.
Single-buffer evaluation method: (1) The instances newly created or updated associated with the classes specified by the query are searched in the
object buffers. (2) They are flushed from the object buffers to the page buffers.
(3) The query is evaluated against the page buffers.
Dual-buffer evaluation method: (1) The query is evaluated against the object buffers. (2) The same query is evaluated against the page buffers. (3) The
two results are merged into one.
Jasmine adopts the single-buffer evaluation method while ORION adopts
a more sophisticated version of the dual-buffer evaluation method. The singlebuffer evaluation method needs to transfer objects from the object buffers to
the page buffers. However, the single-buffer evaluation method eliminates the
I 03
need for dual evaluation programs and makes the system small and processing
simple in contrast to the dual-buffer evaluation method. Anyway, the combinational use of object buffers and page buffers can support the integration of
programming and query faciUties at an architecture level.
IV. APPLICATIONS
A. Introduction
New multimedia applications emerging on top of information infrastructures
include digital libraries [34] and document warehousing for document management and analysis, which are supposed to be most promising as such networked
multimedia applications. We need next-generation database systems which enable users to efficiently and flexibly develop and execute networked multimedia
applications.
First, we analyze the characteristics of these networked multimedia applications and discuss the requirements for a multimedia information system
consisting of database, operating system (OS), and network layers and the issues for a multimedia database system. Then we describe our approach based
on OODB and extended with agents, focusing on a general architecture of a
multimedia database system and its implementation.
I. Requirements for a Multimedia Information System
First, we discuss the requirements for a multimedia information system in
general.
1. Multimedia applications must be interactive. So each layer of the system
must allow control of quality of service (QOS) parameters, such as latency, bit
and frame rates, to interactively process multimedia data in real-time.
2. A huge amount of data in forms such as text, video, and images are required for multimedia services which every layer of the system must efficiently
process. In the database layer, database techniques for efficiently storing and
accessing a large volume of data, which include access methods and clustering,
are required. In the OS layer, techniques such as hierarchical storage systems,
and thread mechanisms are required. In the network layer, network protocols
suitable for multimedia along with the ability to efficiently process such protocols are required.
3. There is heterogeneity in media data such as text and video, and temporal and spatial dependency between them. Users must be able to uniformly
manipulate heterogeneous media data. Users must also be able to structure
heterogeneous media data explicitly by defining links among them, that is,
hypermedia links. Users must also be able to define temporal and spatial relationships among various media data. Such functionality must be provided
by the database layer as multimedia data models. Heterogeneous physical media, such as magnetic optical disks, and CD-ROMs, must also be uniformly
accessed. Stream media data, such as audio and video, must be temporally synchronized and processed in real-time essentially by the OS layer. The network
I 04
HIROSHIISHIKAWA
Now we address the issues for a multimedia database system for networked
multimedia applications, which are not comprehensive but mandatory.
1. We must distinguish between logical media data and physical media
data. For example, in networked multimedia applications, multimedia contents
are updated or deleted, or even moved from one server to another. We must
allow users to access such contents independently of physical details such as
locations. We must also allow users to access contents in a uniform fashion
independent of data formats such as MPEG and Motion JPEG. Thus, we must
allow users to flexibly define multimedia views by specifying mappings between
logical and physical data.
2. We must provide a query facility based on keywords, which is a prerequisite for database systems, needless to say. Browsing alone is insufficient
because a large amount of media data take a long time to play.
3. We must also provide a content-based retrieval facility [7]. In networked
multimedia applications, keywords are not always attached to a large amount
of data in advance. Moreover, users should sometimes express a query over
multimedia data by using features, such as colors and motion directions, which
are different from conceptual keywords. So we need content-based retrieval,
which allows an inexact match in contrast to an exact match facilitated by the
keyword-based retrieval facility.
4. We must provide a navigational search facility in addition to the query
facility. Of course, the navigational search can be done through user-specified
hyperlinks among data [14]. However, in large-scale networked applications,
explicit specification of links requires a considerable amount of work. So we
must logically cluster multimedia data based on similarity of keywords and
characteristic data, such as colors, for navigation.
5. We must allow users to select physical storage appropriate for applications and physical clustering if necessary.
6. We must provide parallel, distributed processing in networked multimedia applications. For example, several streams are sometimes required to be
played in parallel. Distributing process burdens, such as special effects of video
streams, among servers is required. Federating a query to several distributed
servers is also required.
7. We must handle program components as first-class objects. Program
components are used to control access to databases and to make database
application development more efficient.
I 05
8. We must control QOS in networked multimedia applications. For example, when a single or multiple users require multiple streams to play in parallel,
we cannot guarantee QOS required by users unless we manage resources such
as CPU and network.
B. System Architecture
I. Our Data Model
a.
Multimedia
We think that multimedia data are not just static data, but rather compositions of several media data and operations on them. So we provide structural,
temporal, spatial, and control operations as media composition operators, as
described later. In other words, our model has multiple facets and subsumes
existing models, such as object, temporal, spatial, and agent models. Individual
operations are orthogonal to one another. Our model is integrated seamlessly
with existing technologies.
Multimedia systems consist of multimedia databases and applications.
Multimedia databases consist of a set of media data. Media types include texts,
graphics, images, and streams. Stream types include audio, video, and streamed
texts, graphics, and images as well. Multimedia applications consist of a set of
scripts. Basically, a script has an identifier (ID) and temporal and spatial operations on a set of streams with QOS options. A stream has an ID and temporal
and spatial operations on a set of frames. A frame has an ID and temporal and
spatial operations on frame data.
QOS options are parameters given to a QOS controller. QOS types include
latency, jitter, various bit rates and frame rates, resolution, colors, and fonts.
QOS is controlled either by executing specified QOS functions or by retrieving
stored QOS data. The QOS function takes a stream and a time and gives frame
IDs. The QOS data consisting of time and a frame ID are stored in advance by
obtaining them from rehearsal.
In order to concretely explain the features of our data model, we consider
the following multimedia application or script, called Script 1, assuming that
there exist multimedia databases containing multiple video streams that have
filmed the same object.
Scriptl:
(a) Retrieves all video streams which filmed the Prime Minister on
January 17th, 1995.
(b) selects only parts of the retrieved video streams temporally
overlapping each other.
(c) Arranges the selected parts on the same presentation space (i.e.,
window).
(d) Plays the parts in temporal synchronization.
b. Structural Operations
106
HIROSHIISHIKAWA
STREAM
Super
MEDIA
Property
TIME
Internal Time
TIME
Real Time
SPACE
Internal Space
SPACE
Real Space
Set FRAME Frame
STRING
Topic
FIGURE 14
MPEG
Super
Property
FRAME
STREAM
Super MEDIA-Data
Property
TIME
FRAME-Data
Time
Data
For example, the following query retrieves streams that filmed the Prime
Minister on January 17th, 1995, which realizes Scriptl (a):
STREAM.Frame from STREAM where STREAM.RealTime = "01171995"
and STREAM.Topic = "the Prime Minister".
c. Temporal and Spatial Operations
Temporal and spatial data are viewed as universal keys common to any
stream media data. Such temporal and spatial relationships structure multimedia data implicitly in contrast to explicit links. We define set-oriented temporal
and spatial operators specifying such relationships, which are analogous to
relational algebra [3].
Although time is one-dimensional and space is three-dimensional, they have
similar characteristics. Real-time is elapsed time taken for recording streams
in the real world. Internal time is time required for a usual play of streams.
External time is time taken for real play of streams by scripts. Usually, real time
is equal to internal time. In the case of high-speed video, real time is shorter
than internal time. External time is specified by providing a magnification level
of internal time. Default magnification is one; that is, external time is equal to
internal time by default. In the case of slow play of streams, external time is
longer than internal time; in the case of fast play, external time is shorter than
internal time. Assuming that SI and S2 are streams and that P is a predicate on
frames of a stream, temporal composition of streams is achieved by temporal
operators as follows:
(a) Tintersection (SI, S2) returns parts of SI and S2 which temporally
intersect.
(b) Tdifference (SI, S2) returns a part of SI that does not temporally
intersect with S2.
(c) Tunion (SI, S2) returns SI and S2 ordered in time with possible
overlaps.
(d) Tselect(Sl, P) returns a part of SI which satisfies P.
(e) Tjoin (SI, S2, P) = Tselect(Tunion(Sl, S2), P).
(f) Tproject(Sl, Func) returns the result of Func on SI,
where Func is an operation on frames that may include the spatial operators
described below.
Note that internal time of a composite stream is the union of external time
of its component streams. Real time of a composite stream is the union of real
time of its component streams. For example, we assume that the query result of
107
Stream 1
y.^%t4^t6^v.'\
Stream2
k^
^V.
^
^
Internal Time
Overlapped Time
F I G U R E 15
Script 1(a) is scanned and is individually set to streams Stream 1 and Stream!. To
select only parts of Stream 1 and Stream2 which temporally overlap one another,
w^hich realizes Scriptl (b), we have only to execute an expression Tintersection
(Stream 15 Stream!) based on the internal time. Here we name the selected parts
Streaml' and Stream!' for Streaml and Stream!, respectively. The schematic
explanation of the effect of the expression is presented in Fig. 15.
Similarly, space is divided into real space, internal space, and external space.
Real space is space occupied by streams in the real world. Internal space is space
typically occupied by streams in presentation. External space is space occupied
by streams during the actual presentation of scripts, is specified by providing
a magnification of internal space, and is equal to internal space by default. Assuming that Fl and F ! are frames and that P is a predicate on pixels of a frame,
spatial composition of streams is accompfished by spatial operators as follows:
(a) Sinter section (Fl, F!) returns parts of Fl and F ! that intersect in
space.
(b) Sdifference (Fl, F!) returns a part of Fl that does not intersect in
space with F!.
(c) Sunion (Fl, F!) returns Fl and F ! merged in space.
(d) Sselect(Fl, p) returns a part of Fl that satisfies P.
(e) Sjoin (Fl, F!, P) = Sselect (Sunion (Fl, F!), P).
(f) Sproject(Fl, Func) returns the result of Func on F l ,
where Func is an operation on pixels.
Note that the internal space of a composite stream is the union of the external space of its component streams. The real space of a composite stream
is the union of the real space of its component streams. For example, to arrange two frames Framel of Streaml' and Frame! of Stream!' pn the same
window, which realizes Scriptl (c), we evaluate an expression Sunion (Framel,
Frame!) based on the external space, whose effect is schematically explained
in Fig. 16.
d. Control Operators
108
HIROSHIISHIKAWA
Frame 1
F I G U R E 16
Frame 2
Latency f ^-->
Streaml'
Stream2'
Latency
k->
^
p
^
Play Time
F I G U R E 17
External Time
109
OnDemandServices
Online Shopping/
Information Q&A
systems
1 Document 1 1 Concurrent
Digital Libraries 1
1 Management 1
1 Engineering
and Museums 1
1 Systems
1
Existing Data
Management
Object Management
Multidatabase
Management
RDB
Data Management
OS Multimedia Extension
File Systems
OS Micro Kernel
Network Protocol Management
ATM
ETHER
FDDI
I I0
HIROSHIISHIKAWA
II
2. Approach to Videos
1 View 1
/
1 View 1
Contents 1
\
ll-
12
HIROSHIISHIKAWA
define new views recursively by combining existing stream views. The system
also chooses representative frames within a scene and abstracts characteristic
data and stores them into databases. Please note here that matching with representative frames can reduce the recall ratio of a content-based query since
characteristic data, such as colors and layouts, change even within a single
scene.
The system also detects moving objects by using motion vectors of MPEG.
The system decreases the number of colors to more accurately recognize moving
objects. The system stores motion directions in addition to figures and colors
associated with moving objects. Of course, the user can retrieve substreams
corresponding to views with specified keywords. The user can further retrieve
substreams containing samples of user-specified colors, figures, and motion
directions. The system allows the users to retrieve video substreams containing
user-specified moving objects without any interference from the background
information because the system distinguishes between the moving objects and
the backgrounds unlike other approaches such as QBIC [7]. Content-based
retrieval is used by both end users and content providers.
Now we illustrate a scenario for content-based query by using scripts.
Scripts allow specification of playback control such as parallel or serial and
of layout of played streams. For example, a script for content-based retrieval is
specified as follows:
Script!:
Set 1 = VIEW from VIEW where VIEW.like (Samplel);
On Event Selection by User;
Set2 = Setl from Setl selected by User;
Set2.parallel.play;
Here the user specifies a sample, such as Samplel, through a GUI as
shown in Fig. 20. A sample figure consists of several parts like a human body.
The system abstracts characteristics data from the user-specified sample. The
system uses the largest part, such as a human trunk, as a search key to a
F I G U R E 20
13
Viewl stream
lOfps
View2 stream
5fps
lOfps
View3 stream
5fps
lOfps
interval 1
interval2
Playback Time
FIGURE 21 A script schedule example.
15fps
intervals
I 4
HIROSHI ISHIKAWA
Our database system allows users to acquire newly produced media data via
distributed networks including ATM LANs and the Internet. Moreover, multidatabase functionality is provided to manage metadata (e.g., a directory) of
existing data files or databases and to establish interoperable access to such files
and databases. Our technology includes schema translation between OODB
and RDB [7, 2 1 , 29], uniform WWW gateways to databases, directory management by databases, and HTML (hypertext markup language) page management by databases. We consider digital libraries and on-line publishing as
promising networked text applications.
Now we discuss an approach to text data management, focusing on HTML
page management. First, mapping between text contents and formats such as
HTML, SGML, and ODA is necessary. We resolve such heterogeneity by using
polymorphism, too. Moreover, we need to allow users to acquire texts and
reorganize them for further distribution. To this end, we provide HTML page
management by databases including storage and retrieval. The system abstracts
keywords from texts of collected HTML pages automatically and stores keywords and URL associated with texts into databases. Either HTML texts themselves, or only their file names and URL are stored in databases. In-line images
as components of HTML texts are also stored by databases. The system adds
URL, file names, titles, anchor character strings (i.e., links), and data types (e.g.,
GIF, JPEG) as default keywords. The user can delete or add favorite keywords.
We prefer the recall ratio to the precision ration of keyword-based retrieval.
Relatively addressed links (i.e., URL) are transformed to absolutely addressed
links. The users can retrieve pages or components by a wide variety of keywords
and drag and drop retrieved pages and components into work pages to create
new home pages is a WYSIWYG fashion (see Fig. 22). Content-based retrieval
of texts is facilitated by using a full text search engine.
Ease of data acquisition through WWW, however, makes the size of collected data unmanageable for the user. Keyword-based retrieval alone is not
sufficient. So we logically cluster texts for similarity-based navigation. The
system automatically abstracts keywords from collected HTML or SGML
15
HTML editor
Store into DB |
Location
Title
Home
Location
II Title
1 Store]
|Back
II
Keyword
keyl
key2
Welcome
1 1 keyS
Add
\
Ok
Delete
Cancel|
texts. Then the system chooses the most frequent 100 keywords contained
by a set of texts and places each text in the information space of 100 axes
ranging from having the corresponding keyword to not having it. The system uses a Self-Organizing Map (SOM) [28] technique to logically cluster a
set of collected texts into the given number of groups in the retrieval space.
The system displays the structured map by using 2-D or 3-D graphics such
as VRML. The user can retrieve texts by navigating a 2-D or 3-D user interface. T/ and K/ in Fig. 23 denote texts and keywords, respectively. The
point is that the users cluster collected texts for their own use. Of course,
content providers can use this technique when they cluster their own texts in
advance.
We briefly describe how we have applied the SOM technique to logical text
clustering. Input patterns to the information space, that is, texts have normalized characteristic vectors Vi [vil,..., viM] / = 1 , . . . , N. We choose 100 for M,
that is, the most 100 frequent keywords. N denotes the total number of texts
and is 100 for the moment. If a value of vij is 1, an input text / has a keyword
[K3
T1,T2
K3,
K2
K4,K5
T3
T5
Ki: keyword
Tiitext
FIGURE 23
I6
HIROSHIISHIKAWA
key /; if the value is 0, then the text has no such keyword. On the other hand,
grid patterns in the two-dimensional retrieval space Pij are assigned appropriate characteristic vectors as initial values Vpij in the information space. For
example, we use 10-by-lO grid patterns with a torus nature. Here we define
similarity SIM between characteristic vectors Vi and V/, distance DIS between
them, length of a vector LEN, and vector operations PLUS, MINUS, DIV as
follows:
SIM (Vi, Vj): sum (vik * vjk) k = l,,,,,M.
DIS (Vi, Vj): square-root (sum (squart(vik vjk)) ^ = 1 , . . . , M).
LEN (Vi): square-root (sum (square(vik)) fe = 1, . . . , M).
PLUS (Vi, Vj): [(vik -h vjk)] k = l,...,M,
MINUS (Vi, Vj): [(vik - vjk)] k = l,...,M,
DIV (Vi, C): [(vjk/C)] k = l,..,,M,
Procedure (1). First, we select an input pattern Vi which has the maximum
similarity with a grid pattern Pij and we move Pij, that is, its characteristic vector
Vpij, closer to Vi in the information space. New Vpij is calculated as
V = PLUS (Vpij, DIV (MINUS (Vi, Vpij), exp (A*R*R)))
Vpij = DIV (V, LEN (V)); normalization,
where A < 1 and R = DIS (Vi, Vpij).
Next we move grid patterns in the neighborhood of Pij (i.e., Pij-\-l, P//1,
P/+1/, Pi-lj, P/+1/+1, P / + l ; - l , P / - 1 / + 1 , P / - 1 / - 1 ) closer to Vi at that
time.
We repeat PROCEDURE (1) until the maximum similarity between grid
patterns and input patterns exceeds a limit. For the moment, we choose A such
that repetition times is less than 10. After the termination of PROCEDURE
(1), each input pattern is mapped to its nearest grid patterns in the retrieval
space. We also apply PROCEDURE (1) to keywords, which are represented
as characteristic vectors having only one nonzero element. Thus, we can map
input patterns and keywords to the retrieval space with holding their topological
relationships in the information space.
Once the clustering by SOM is completed, we map new input patterns to
grid patterns whose vectors are most similar to those of new input patterns
unless the total number of input patterns exceeds a limit. If the total number
exceeds the limit, we recluster all the input patterns. We can avoid too frequent
clustering by implementing the clustering as an agent. The clustering agent
responds to the event that the total number of input patterns exceeds a limit
(N*n,n =
l,2,,,.).
We can apply the SOM technique to cluster videos and components programs, too. If we apply the SOM to a mixture of videos, texts, and program
components, we get hypermedia links among heterogeneous media data, based
on similarity of contents.
Generally, we provide media data management mechanisms which enable
efficient storage and access of large amounts of media data. They enable users
to customize media-specific storage in aspects such as indexing, clustering, and
buffering. We take an object-oriented approach to resolving heterogeneity in
I I 7
data formats, CODEC, and physical media, used for implementation of logical
media.
Here we describe our physical clustering facility by taking structured texts
or compound documents consisting of simple texts, graphics, images, and
videos. Structured texts such as SGML texts are often accessed according to
component links. The system clusters relevant texts in the same or neighborhood pages to allow efficient retrieval of them. We assume that the user chooses
to cluster texts. Thus, the user specifies how data are clustered. Then the system
actually clusters data according to the user's specification. In future, we plan to
provide a facility to monitor hot spots of access patterns. Either the system or
the user clusters data based on the result of monitored accesses. We allow the
user to recluster data after heavy updates. In addition to heterogeneous clustering, we allow homogeneous clustering such as all instances of the same class.
We allow subtrees of a whole component tree to be flexibly clustered according to the user's specification by combining homogeneous and heterogeneous
clustering.
To implement physical clustering, we introduce two types of pages. We
usually use a single page to store instances of the same class, that is, for homogeneous clustering. We devise a multiple page to store instances of heterogeneous classes, that is, heterogeneous clustering. A multiple page consists of
single pages. Each of them has its own size and contains instances of its own
class. Multiple pages as a whole are mapped to contiguous space.
4. Approach to Program Components
18
HIROSHI ISHIKAWA
I I 9
Y. CONCLUSION
First, in this paper we described a prototype object-oriented DBMS called Jasmine, focusing on the implementation of its object-oriented features. Jasmine
shares a lot of functionality with other object-oriented database systems. However, Jasmine has the following features that differentiate it from other systems.
Jasmine provides a powerful query language that allows users to specify complex objects, class hierarchies, and methods in queries. Jasmine optimizes such
object-oriented queries by using hash joins, B-tree and hash indexes, and semantic information. Individual object access is evaluated on object buffers.
Jasmine extends relational database technology. Jasmine provides nested relations to efficiently manage complex objects and provides user-defined functions
evaluated on page buffers to efficiently process method invocation in queries.
Jasmine provides a view facility for schema integration and a constraint management facility including integrity constraints, triggers, and rules. We compare
Jasmine with current commercial object-oriented database systems and research
prototypes as follows.
GemStone [32] originates from the attempt to make Smalltalk-80 programs
databases. The GemStone data model is based on Smalltalk-80 and supports
only single inheritance while Jasmine supports multiple inheritance. In addition
to C, C + + , and Smalltalk-80 interfaces, GemStone provides a programming
interface called OPAL. GemStone distinguishes between a class and a collection
of objects. A query expressed by OPAL is formulated against a single collection
of objects. A Jasmine query is formulated against classes, allowing explicit
joins.
ORION [26] supports a variety of functions, such as multiple inheritance,
composite objects, versions, queries, and schema evolution. ORION is built
in Lisp on a secondary storage system that provides facilities for segment and
page management. ORION provides a programming interface to an objectoriented extension of Lisp. A query returns a collection of instances of a single
class while a Jasmine query can generate instances combining more than one
class. Mapping object identifiers to pointers is done by extensible hashing. A
query with attributes of nonleaf classes is processed by use of a class-hierarchy
index unlike Jasmine. ORION evaluates a query against the object and page
buffers and merges the results while Jasmine uses the single-evaluation scheme.
ORION uses sort-merge joins while Jasmine uses hash joins.
In 0 2 [5], an object contains a value, a hst, a set, and a tuple as an attribute
value. 0 2 is used through an object-oriented extension of C called C 0 2 . The
query language is defined rather formally. The query retrieves and composes a
list, a set, and a tuple. 0 2 is implemented on top of WiSS (Wisconsin Storage
System) in C. WiSS provides persistency, disk management, and concurrency
control for flat records. Unlike Jasmine, 0 2 uses physical identifiers of WiSS
records as object identifiers. Like ORION, 0 2 adopts a dual buffer management
scheme. Like Jasmine, 0 2 uses a hash table to manage in-memory objects,
but unlike Jasmine, 0 2 uses a class-hierarchy index to process queries against
nonleaf classes.
In IRIS [31], based on the DAPLEX functional model, properties or methods defined by a class are represented as functions on the class. Functions are
I 20
HIROSHIISHIKAWA
REFERENCES
1. Batory, D. S., Leung, T. Y., and Wise, T. E. Implementation concept for an extensible data
model and data language, ACM Trans. Database Syst. 13(3): 231-262, 1988.
2. Carey, M. J., Dewitt, D. J., and Vandenberg, S. L. A data model and query language
for EXODUS. In Proc. of the 1988 ACM SIGMOD Conference, Chicago, IL, June 1988,
pp. 413-423. ACM, New York, 1988.
3. Date, C. J. An Introduction to Database Systems, Vol. 1. Addison-Wesley, Reading, MA, 1990.
4. Debloch, S. et ah KRISYS: KBMS support for better CAD systems. In Froc. of the 2nd International Conference on Data and Knowledge Systems for Manufacturing and Engineering,
Gaithersburg, MD, Oct. 1989, pp. 172-182. IEEE, Los Alamitos, CA, 1989.
5. Duex, O. et al. The story of 0 2 . IEEE Trans. Knowledge Data Eng. 2(1): 91-108, 1990.
6. Fishman, D. H. et al. IRIS: An object-oriented database management system. ACM Trans.
Office Inform. Systems 5(1): 48-69, 1987.
7. Flickner, M. etal. Query by image and video content: The QBIC system. IEEE Computer 28(9):
23-32,1995.
8. Gibbs, S., Breiteneder, C , and Tsichritzis, D. Data modeUng of time-based media. In Froc. of
ACM Sigmod Conference, May 1994, pp. 91-101.
9. Goldberg, A., and Robson, D. Smalltalk-80: The Language and Its Implementation. AddisonWesley, Reading, MA, 1983.
10. Hamakawa, R., and Rekimoto, J. Object composition and playback models for handling multimedia data. ACM Multimedia Systems 2: 26-35,1994.
11. Ishikawa, H., Izumida, Y, Yoshino, T , Hoshiai, T , and Makinouchi, A. A knowledge-based
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
I2 I
approach to design a portable natural language interface to database systems. In Proc. of the
IEEE Data Engineering Conference, pp. 134-143. IEEE, Los Alamitos, CA, 1986.
Ishikawa, H., Izumida, Y., Yoshino, X, Hoshiai, X, and Makinouchi, A. KID: Designing a
knowledge-based natural language interface. IEEE Expert 2(2): 57-71,1987.
Ishikawa, H., Suzuki, P., and Makinouchi, A. Object-oriented multimedia knowledge base
management system: Design and implementation. In Proc. of the 2nd International Symposium
on Interoperable Information Systems, Xokyo, Japan, Nov. 1988, pp. 195-202. INXAP, Japan,
1988.
Ishikawa, H. An object-oriented knowledge base approach to a next generation of hypermedia
system. In Proc. of the 35th IEEE COMPCON Conference, San Francisco, CA, pp. 520-527,
IEEE, Los Alamitos, CA, 1990.
Ishikawa, H., Izumida, Y., and Kawato, N. An Object-oriented database: System and applications. In Proc. of the IEEE Pacific Rim Conf. Communications, Computers, and Signal
Processing, Victoria, B.C., Canada, pp. 288-291. IEEE, Los Alamitos, CA, 1991.
Ishikawa, H. The Design and Implementation of an Object-Oriented Database System for
Advanced Applications. Ph.D. Xhesis., University of Xokyo, 1992.
Ishikawa, H. et al. An object-oriented database system and its view mechanism for schema
integration. In Proc. of the Second Far-East Workshop on Future Database Systems, Kyoto,
Japan, April 1992, pp. 194-200.
Ishikawa, H. et al. Xhe design and implementation of an object-oriented multimedia knowledge
base management system. ACM Trans. Database Systems 18(1): 1-50, 1993.
Ishikawa, H., and Kubota, K. An active object-oriented database: A multi-paradigm approach
to constraint management. In Proc. of the 19th VLDB Conference (Dublin, Ireland), Aug.
1993, pp. 467-478. VLDB endowment.
Ishikawa, H. Object-Oriented Database System. Springer-Verlag, Berlin, 1993.
Ishikawa, H. etal. A script-based approach to relational and object-oriented database interoperability. In Proc. of Intl. Symposium on Advanced Database Technologies and Their Integration,
Oct. 1994.
Ishikawa, H. et al. A next-generation industry multimedia database system. In Proc. of IEEE
12th Intl. Conference on Data Engineering, pp. 364-371,1996.
Ishikawa, H. et al. An object-oriented database system Jasmine: Implementation, application,
and extension. IEEE Trans. Knowledge Data Eng. 8(2): 285-304,1996.
Ishikawa, H. et al. An extended object-oriented database approach to networked multimedia applications. In Proc. of IEEE 14th Intl. Conference on Data Engineering, pp. 259-266,
1998.
Kato, K., Kondo, A., and Ishikawa, H. Multimedia database infoServerScript and video
playback. In Proc. of the 7th Data engineering Workshop, pp. 109-114, 1996. [In Japanese.]
Kim, W. et al. Architecture of the ORION next-generation database system. IEEE Trans.
Knowledge Data Eng. 2(1): 109-124, 1990.
Kitagawa, H., and Kunii, X. L. The Unnormalized Relational Data Model for Office Form
Processor Design. Springer-Verlag, Xokyo, 1989.
Kohonen, X. Self-Organizing Maps. Springer-Verlag, Berlin, 1995.
Kubota, K., and Ishikawa, H. Structural schema translation in multidatabase system:
Jasmine/M. In Proc. of IPS] Advanced Database Symposium, 1994. [In Japanese.]
Larson, P.-A. Linear hashing with partial expansions. In Proc. of the 6th VLDB Conference,
Montreal, Canada, 1980, pp. 224-232. ACM, New York, 1980.
Lyngbaek, P., and Vianu, V. Mapping a semantic database model to the relational model. In
Proc. of the 1987 ACM SIGMOD Conference, San Francisco, CA, 1987, pp. 132-142. ACM,
New York, 1987.
Maier, D. et al. Development of an object-oriented DBMS. In Proc. of the 1st OOPSLA Conference, Portland, OR, 1986, pp. 472-482. ACM, New York, 1986.
Morgenstern, M. Active databases as a paradigm for enhanced computing environments. In
Proc. of the 9th VLDB Conference, Florence, Italy, Oct. 1983, pp. 34-42. VLDB endowment,
1983.
Special Issue: Digital Libraries, CACM 38(4): 1995.
Stefik, M., and Bobrow, D. G. Object-oriented programming: Xhemes and variations. Al Magazine 6(4): pp. 40-62, 1986.
I 22
HIROSHIISHIKAWA
36. Yamane, Y. A hash join technique for relational database systems. In Proc. of the Foundation
of Data Organization Conference, Kyoto, Japan, May 1985, pp. 388-398.
37. Yamane, Y. et al. Design and evaluation of a high-speed extended relational database engine,
XRDB. In Proc. of International Symposium on Database Systems for Advanced Applications,
Seoul, Korea, April 1989, pp. 52-60.
38. Zloof, M. Security and integrity within the query-by-example data base management language.
IBM Research Report RC6982, Feb. 1978.
I. INTRODUCTION 124
II. SEMANTIC DISCREPANCY A N D SCHEMA CONFLICTS 126
A. Semantic Discrepancy 126
B. Schema Conflicts 117
III. OPTIMIZATION AT THE ALGEBRA LEVEL 130
A. Fundamentals and the Concepts of Lub 131
B. The Structure of a Hyperrelation (R^) 133
C. Schema Conformation: R^ Schema and the Mapping 134
D. The Hyperrelational Algebra 139
E. A Comparison with Related Works 149
IV. OPTIMIZATION AT THE EXECUTION STRATEGY LEVEL 151
A. Assumptions 154
B. Where Should an Intersite Operation Be Performed? 154
C. Different from the Issues in Traditional Database
Systems 156
D. Two Scheduling Strategies Not Involving PDBS 157
E. PDBS Sharing Workload with the MDBS 159
F. The Maximum Merge Scheduling (MMS) Strategy 161
G. Performance Study 165
V CONCLUSIONS 170
REFERENCES 171
i % ^
I 2*5
I 24
CHIANG LEE
schema at the system level and a sound mathematical basis for data manipulation in
a multidatabase system. To resolve this problem, we present the concept of hyperrelation and use it as a powerful and succinct model for the global level representation of
heterogeneous database schemas. A hyperrelation has the structure of a relation, but its
contents are the schemas of the semantically equivalent local relations in the databases.
With this representation, the metadata of the global database and local databases and
the data of these databases are all representable by using the structure of a relation. The
impact of such a representation is that all the elegant features of relational systems can
be easily extended to multidatabase systems. A hyperrelational algebra is designed accordingly. This algebra is performed at the MDBS level such that query transformation
and optimization is supported on a sound mathematical basis.
Another most critical level of optimization is at the execution strategy level. Difficulties
of optimization at this level is that each participating database system does not own the
information (data) and the mechanism (software) required for converting data of one
database to another to resolve the data-type conflict problems. More importantly, this
confines the processing of an intersite operation (such as a join over two relations of
different databases) to within the MDBS only. The participating database systems are not
able to share the workload of the MDBS in this environment. Hence, how to minimize
the consumption of MDBS resources is an urgent problem. In the second part of this
chapter, we present three scheduling algorithms that are used in an MDBS to reduce
the processing cost of a multidatabase query. A major difference between our strategies
and the past methods is that ours does not require the regeneration of the cost models
of the participating databases. Hence, it also minimizes the indeterminacy existing in
multidatabase query optimizations.
INTRODUCTION
A multidatabase system is a system that manages databases as a repository
resource and allows application programs to access this resource in a heterogeneous distributed environment [1, 3,13, 39,40, 69, 74, 79]. In such an environment, query processing is very time-consuming as multiple levels of conversions
(of data as well as the query) among heterogeneous systems are required. Query
optimization becomes an especially important task in order to reduce the query
processing time. This issue, however, has attracted far less attention from researchers than it deserves. Looking into those approaches proposed, we see that
the multidatabase query optimization issues were studied from different levels
of a multidatabase system. They include:
Schema level. This is the highest level of optimization and the least-studied
level in multidatabase systems. A query may be expressible in completely different forms in different databases, as schemas of different databases are often
represented in different manners. As a result, selecting a proper site will facilitate the execution of a query. This allows us to achieve query optimization at the
multidatabase schema level. Studying the effect of different schemas (i.e., representations of data) on the cost of query execution is the focus of research at this
level. References [46, 49] are the only works found in the literature proposing
solutions for issues in this category. More research in this area is needed.
I 25
I 26
CHIANG LEE
Phenomenon
FIGURE I
Semantics
Relations/Types
Semantics.
127
FIGURE 2
different viewpoints. One models them as Student, while the other models them
as Employee. They are not semanticaily equivalent because the relationships
between the types and their referents (Student and Employee) are different.
Semanticaily equivalent relations need not have the same set of attributes.
For example, in Fig. 4 the two relations Student and Stud are semanticaily
equivalent, but their sets of attributes are not equivalent. The address information in Student is represented as street, city, state, but represented as address in
Stud. Also, Student has birthday, whereas Stud has gender.
B. Schema Conflicts
Various types of conflicts can exist between schemas of relations in different
databases. Since value, attribute, and table are the three major components
of a database, the conflicts can be roughly classified into six types: valueversus-table, value-versus-attribute, value-versus-value, attribute-versus-table,
attribute-versus-attribute, and table-versus-table conflicts [49]. An X-versusY conflict means that an X in one database is represented as a Y in another
database. A multidatabase UNIV-DATABASE will be used as an example for illustration. The database consists of component databases of four universities and
colleges as shown in Fig. 5.
I. Value-versus-Value Conflict
The value-versus-value conflict arises when different types of values are
used in the same type of attributes. This type of conflict includes expression
conflicts, data unit conflicts, and precision conflicts. Suppose that one database
use {Excellent, Good, F a i r , Poor, Bad} to represent the score of a student,
while {1, 2, 3 , 4, 5} is used in another database. We say that there exist
expression conflicts between these two databases. A unit conflict exists if in
one database the height of a student is measured in centimeters and in another
it is in feet. A precision conflict exists if the scores of students fall in the range
{0100} in one database and {0.0100.0} in another database.
In the literature, these problems are also referred to as the domain mismatch
problems. Many works have been proposed to resolve these problems [20, 45,
55, 56, 75, 78].
Employee
FIGURE 3
Semantics
Semantics
Student
128
CHIANG LEE
name
Student
name
address
gender
Semantics street
citystate
birthday
Semantics^
FIGURE 4
Stud
2. Attribute-versus-Attribute Conflict
This type of conflict occurs when the relations in component databases use
different numbers of attributes to represent the same information. For example, in CDB1 the addresses of the students are stored in one attribute Addr,
while in CDB2 and CDB3 the same information is stored in three attributes
(No, Str, State) and (Str, City, State), respectively. This type of conflict can be
further classified into the following subtypes: one-to-zero, one-to-one, one-tomany, and many-to-many conflicts. A one-to-zero conflict is a missing attribute
conflict, meaning that some attributes in a table do not appear in any form
in the corresponding semantically equivalent table(s) of another database. For
example, both the tables Stud of CDB1 and CS_Stud of CDB2 store data about
students. However, the attribute Gender is missing from the table CS_Stud. A
many-to-many attribute conflict means that two different sets of attributes are
University Database
Relation Name
Attributes
stud
Course
Take
Faculty
CS_Stud
Math_Stud
EE_Stud
Fndmntl_Crs
Advncd_Crs
Participate
Faculty
Spouse
CDB3(Uni. C)
Studs
Lecture
Tutorial
Seminar
Indvl_stdy
Takes
(SSN,
(CNo,
(CNo,
(CNo,
(CNo,
(SSN,
CDB4(College. D)
STl
ST2
ST3
ST4
(S#,
(S#,
(S#,
(S,
(C#j
(S#,
CDBKUni. A)
CDB2(College.
B)
Courses
Enroll
FIGURE 5
Name, Addr,
Name, Addr,
Name, Addr,
Name, Addr,
Name, Credit,
C#, Score)
City,
Class)
State, Gender,
Tel#, Sex)
Tel#, Sex)
Tel#, Sex)
Tel#, Sex)
Hours, Fndmntl, Advncd)
Class)
I 29
used to model the same information in two databases. Actually, the one-to-one
and the one-to-many attribute conflicts are only special cases of the many-tomany conflict.
3. Attribute-versus-Table Conflict
I 30
CHIANG LEE
second dimension in this type of conflict, i.e., the sets of attributes are different
(but the numbers of involved tables in the two databases are the same), they
are exactly v^hat the attribute-versus-attribute type of conflict is talking about.
Hence, all scenarios of table-versus-table conflicts are not new; they have been
fully covered in the previous cases.
I3 I
Let A(ai,
82,
. . . , an) and B(bi, b2, . . . , bm) be two semantically equivalent relations, where
aj (/ = ! , . . . , ) and by (; = 1, . . . , m) are attributes. We say that A is greater
in information capacity than B, denoted by A ^ B, if V b {b-j, b2, . ., bm},
3 a c {a-i, 82, . . . , Sn}? such that a and b are corresponding attributes.
Consider the following example. Let Stud(Name, Address, Gender, Age)
and Student(Name, Street, City, State, Country, Gender, Birthday) be two (semantically equivalent) relations. The information capacity of Student is greater
than that of Stud, since the information that Stud can provide is also provided
by Student (i.e., each attribute Stud has a corresponding attribute in Student),
but not vice versa. For semantically inequivalent relations, their information
capacities are uncomparable (as they describe data of different types).
Note that the statement "3 a c {a-|, a2, . . . , ap}" shows that a can be
a subset of attributes in {ai, a2, . . . , an}. Its corresponding attribute b, however, is only an attribute in {bi, b i , . . , bm}. According to this definition, the
information capacity of an attribute, such as {Address}, is not equivalent
to that of {Street, City, State}, but {Address} ^ {Street, City, State} (assuming that all information in Address, such as country name, is also contained
in the representation {Street, City, State}). This is the information capacity between one-to-many type corresponding attributes. As for many-to-many type
corresponding attributes, their ordering of information capacity depends on
the relationship between attributes. Suppose that [ri^ri] and {51,52,53} are corresponding attributes.
C^5^ 1. The correspondence between attributes of these two sets can be
decomposed to one-to-one and one-to-many types of correspondence. For example, ri corresponds to si and 52, and r2 corresponds to 53. According to
our discussion on corresponding attributes of one-to-many type, we know that
{ n / 2 } E {51,52,53}.
Case 2. There do not exist one-to-one/one-to-many correspondences between attributes. For instance, r\ and r2 represent a semantics that can be further
132
CHIANG LEE
divided into (t^u) and (v^w)^ respectively, and si, si, and S3 are divisible into (t),
(u,v), and (w), respectively. In this case, the information capacities of these two
sets of attributes are uncomparable. This leads to the definition of the concept
of least upper bound to be given shortly.
The task of integrating a set of semantically equivalent relations can be
viewed as defining another relation such that (1) it is semantically equivalent to
the set of relations to be integrated, (2) its information capacity is greater than
the set of relations, and (3) it should not contain any information not provided
by the set of relations. In other words, the integrated relation schema should be
the least upper bound of the schemas in component databases. Based on these
concepts, we define the least upper bound of relations.
DEFINITION 2 (LEAST UPPER BOUND). Let U be the universal set of relations, and A, B, and C G Z^. A and B are semantically equivalent relations. We
say that C is the least upper bound of A and B, denoted by C=lub(A, B), if
1. C 3 A a n d C ^ B , and
2. VW G ZY,ifW3 A a n d W 3 B , t h e n W 3 C .
From the definition, we find some obvious properties as follows.
PROPERTY 1 (MINIMAL INCLUSION). Let U be the universal set of relations^
and A, B, and C e U. A and B are semantically equivalent relations. If C =
lub(A;, B), then we have
(IDEMPOTENCE).
PROPERTY 3 (UNIQUENESS). For each set of semantically equivalent relations, their lub is unique.
Proof Let C and C be two lubs of a given set of relations {A, B}. According to the definition of lub, we have C 3 A and C 3 B, and C 3 A and C ^ B.
As C is the lub of A and B, any relation (such as C ) satisfying C ^A and C 3
B must also satisfy C 3 C. Similarly, we can derive that C^C
must be true.
Therefore, C and C are equivalent in information capacity, meaning that the
lub of A and B is unique.
The concepts of lub and the properties covering, fineness, and minimality
can be used as a general guideline in the schema integration process in determining the attributes of an entity (relationship) type. In our approach, the local
relations are divided into groups of relations. Within each group the relations
are semantically equivalent. One hyperrelation that is in semantics the lub of
the local relations is used to represent the schemas of the local relations. In this
way, the representation of the schemas of the local relations is in a uniform
133
mannerthe schemas of relations are still relations. Also, because a hyperrelation is the lub of its underlying relations, there will not be any unnecessary
information added as attributes to the hyperrelation. Hence, a global query
issued against the hyperrelations can always be translated into (local) queries
specified on the underlying (local) relations. In the next section, we define the
structure of a hyperrelation.
B. The Structure of a Hyperrelation (R")
In relational databases, a relation represents a class of tuples of the same type,
and each tuple describes the characteristics (type) of a real world entity as shown
in Fig. 6a. We extend this concept to define the structure of a hyperrelation.
We consider that a hyperrelation represents a class of relations having the same
semantics. Each entity in a hyperrelation is the schema of an existing relation
in a component database. This concept is exemplified in Fig. 6b.
Formally, we define the structure of a hyperrelation as follows.
DEFINITION 3 (THE HYPERRELATION). A hyperrelation R^ is composed
of a set of tuples, each having the schema of a relation in a component database
as its data. The relations corresponding to the tuples in a R^ are all semantically equivalent relations, and R^ has the schema R^ (Database, Relation,
A i , . . . , An), where R^ (Ai . . . , An) = lub(^i,..., fm), ^i, . . , ^ are the tuples
of R^, and A i , . . . , An are the attributes mapped from ^ i , . . . , ^^. Database and
Relation are two system-defined attributes. The domain of Relation is the set
of names of the local relations mapping onto R^, and the domain of Database
is the set of names of the databases to which these local relations belong.
STUDENT
Students
StudNo
Name
U0922
J.
U0925
U0930
Tel#
Smith
1023
M.
White
6754
J.
Buch
6543
U0931
F.
Bauman
4333
U0935
C. Himan
3982
U0936
A.
Blown
5456
U0937
S.
Rogers
4782
U0938
T,
Sims
3822
U0940
K.
Lee
3212
(a)
Database
Relation
CDB3
students
(b)
134
CHIANG LEE
USERS
Multidatabase
Management System
Hyperrelation
Hyperrelation
Hyperrelation
t?<i.
Component
Database
System
Component
Database
System
instead of corresponding to a real-world entity, each tuple in a hyperrelation corresponds to a local relation that is semantically equivalent to the hyperrelation.
The design of such a structure is for a representation of the local relations at
the global level so as to facilitate query transformation and processing based
on an algebra (to be presented). The user of a MDBS is allov^ed to inquire the
hyperrelation schemas in order to issue a global query. The environment of
such a multidatabase system is illustrated in Fig. 7, in which the hyperrelations
in the middle represent the global schema of the component databases. Each
global query is issued against the schemas of the hyperrelations.
Note that the hyperrelation is not just a directory or catalog as those in homogeneous distributed databases that keeps track of the attribute information
of local relations. A hyperrelation needs also to reflect the conflicts between
relations in a uniform manner, and more importantly, it allows a global query
that accesses data in different expressions to be translated and executed locally
in component databases. We discuss these issues in the following section.
C. Schema Conformation: R" Schema and the Mapping
We discuss here how to determine the hyperrelation schema from (semantically
equivalent) relations of conflicting schemas and how to map them onto a hyperrelation. A combination of more than one type of conflict can be decomposed
into the above basic types and mapped to hyperrelations by following their corresponding mapping mechanisms. We start our description from simpler (more
I3 5
intuitive) types of conflict to allow the reader to easier comprehend the whole
mapping scheme.
1. Value-versus-Value Conflict
Let R i ( ^ i i , . . . , air,), ^^2(^21, , a2r2), 5 Rm(amU . . . , amrj be a set of relations of different databases and among them there exists a missing attribute
conflict. According to the requirements of lub, the set of attributes of their
corresponding hyperrelation must be the union of all attributes of the relations, i.e., [an, , ^in) U {an,..., a2r2}y U . . . U {ami, , amrj- In the union,
{ap}U {aq} = either {ap} or {aq}, but not both, if they are the corresponding
attributes. The mapping of Ri to this hyperrelation is straightforward: RiMij is
the value of the corresponding attribute in R^ (for 1 < / < m, 1 < / < r^) and
is a NULL if Ri does not have such an attribute.
One-tO'Many Conflict
^ m ? 5 ^m /
The values of R.Sj in R^ is r^, for 1 < i < m 1, and the values of R.s/, for
m < i <n, are defined as r^. S is mapped to
(DB2, 5 , s i , ...,Sw, . . . , s )
in R^, Both of the mappings strictly follow the attribute correspondence rule
mentioned in Section IL
For an example, the student relations of CDB1, CDB2, and CDB3 are
mapped to STUDENT as shown in Fig. 8. The Stud of CDB1 in STUDENT
has Name as the values for FN and LN because of the one-to-many conflict
with the corresponding attributes in CDB2. The address information of these
136
CHIANG LEE
STUDENT
FN
LN
No.
Street
City
State
Tel#
Gender
Grade
stud
s#^
Name
Name
Addr
Addr
Addr
Addr
Tel#
Gender
Class
1 Database Relation
CDBl
StudNo
CDB2
CS-Stud
Si'^
FN
LN
No
Str
Str
State
Tel
NULL
Grade
CDB2
Math-Stud
S#^
FN
LN
No
Str
Str
State
Tel#
NULL
Grade
CDB2
EE-Stud
S'^
FN
LN
No
Str'
Str
State
Tel*
NULL
Grade
CDB3
Studs
S#^
Name
Name
Str
Str
City
State
NULL
Gender
Class
FIGURE 8
relations are mapped to STUDENT based on the same principle. This mapping
is also important for query transformation based on algebra. We will get to that
in a later section.
Many-to-Many
Conflict
As for the many-to-many conflict, the basic idea is the same. We illustrate
the mapping by using a conflict example involving two attributes in one relation and three attributes in another relation. A general case will simply be an
extension of the case presented here. Let ri and ri be attributes of R and si, S2,
S3 be attributes of S, and {ri, ri) are the corresponding attributes of {si, si, S3}.
Assume that ri and ri represent a semantics that can be further divided into
(a, b) and (c, d), respectively; i.e., a^ b, c, and d are subsemantics of ri and r i .
In S, the same semantics (a)^ (fe, c), and (d) are modeled in si, si, and S35 respectively. Then the hyperrelation should choose a^ b, c, and d as its attributes since
this representation is semantically richer than the other two representations. In
other words, R^ has the schema R^(... ,a^b^c,d^...)
(irrelevant attributes
are denoted by dots). Based on the mapping principle presented above, ri and
ri are mapped to ( . . . , r i , r i , r i , r i , . . . ) in R^ and si, S2, and S3 are mapped
to ( . . . , s i , S 2 , S2, S3, . . . ) in R ^ .
3. Attribute-versus-Table Conflict
Consider the Faculty relation of CDB1 and the Faculty and Spouse relations of CDB2. As the Spouse relation is semantically different from the
Faculty relation, there should be two hyperrelations FACULTY and SPOUSE
in the global database.
Formally, let us assume that R(ai, . . . , <3fp, Up+i) is a local relation of DBi
that semantically corresponds to Ri(^i, . . . , ^p, f) and Riif^bi^...
,bq) of
DB25 where RMJ corresponds to (i.e., the corresponding attribute of) RiMi^ for
i <i < p^ R'^p+i corresponds to Ri^bi, and f is the foreign key (key) attribute of Ri (Ri)' R and R\ are semantically equivalent relations. In this
circumstance, the hyperrelations will have the schemas R^ (Database, Relation, Ai, . . . , Ap, F) and R|^ (Database, Relation, f, Bi, . . . , B^), where R^ is
obtained from the schemas of R and Ri, and R2 from that of ^2- ^ is a key
or a foreign key of these hyperrelations to express the referencing information.
We define that Ri and R2 are mapped to the tuples
(DB2, R i , ^ i , . . . , ^ / 7 , f)
137
and
{DB2,R2,f,bu...,bq)
in hyperrelations R^ and R|^, respectively. The relation R is mapped to two
tuples
(DBuR,au...,ap,)
and
(DBu R, ^, ap+u Null,... , Null)
in R^ and R2, respectively, because ap^i corresponds to bi and R does not
have the other (i.e., B 2 , . . . , B^) attributes. ;C is a special character to indicate
that relation R does not have such an attribute and can be grouped back to one
relation schema through this link (i.e., I), In this manner, a global query issued
based on the hyperrelation schema can be correctly translated to local queries
expressed in local schemas.
4. Value-versus-Attribute Conflict
Let Ri(ai^ ^^i? ? ^p) and 1^2(^115 ^12? ? ^ih ^2? 5 ^p) be the relations
of databases DBi and DB2, respectively, where a value-versus-attribute conflict exists between the values of Ri.ai and the attributes ^11,^i2)-.- ^^ik
of R25 and Ri.Ui corresponds to Rz.ai (no schema conflict between them)
for 2 < i < p. As [an, au, 5 ^ik) has a finer structure than ^1, the schema
of their corresponding hyperrelation should be R^(Database,
Relation,
Miy Miy . 5 Mh ^2) 5 Ap)y where aij corresponds to Aiy (for 1 < j < k)
and ai to Aj (for 2 <i < p). Mapped to R^, the schema of Ri becomes the
tuple
(DBi, R , a i , a i ,
...,ai,a2,...,ap).
Since R2 has no conflict with R^, the mapping result is simply R2's schema.
For example, the attribute Type of Course in CDB1 is mapped onto the hyperrelation COURSE as shown in Fig. 9, in which the attributes Fundamental
and Advanced of the CDB1 tuple have the value "Type."
5. Value-versus-Table Conflict
Let R(r,ai,...,ap)
be a relation in the database DBi that has a valueversus-table conflict with the set of relations R i ( ^ i , . . . , ap),..., Rmi^i^ 5^/7)
in database DB2. Formally, we say that the domain of the attribute r of R is
an index set { r i , . . . , r^} on R^ (/ = 1 , . . . , m), where r/ refers to the domain
COURSE
Relation
Database
CDBl
Course
FIGURE 9
CNo
C#^
GName
Credits
Hours
Name
Credit
NULL
Fundamental
Type
Advanced
Prereq
Type
NULL
138
CHIANG LEE
Relation
FN
LN
No.
Street
City
State
Tel#
CDB4
STl
Null
5#*
Name
Name
Addr
Addr
Addr
Addr
Tel
Sex
CDB4
ST2
Null
s#*
Name
Name
Addr
Addr
Addr
Addr
Tel#
Sex
CDB4.ST2.Grade*
CDB4
ST3
Null
s#*
Name
Name
Addr
Addr
Addr
Addr
Tel*
Sex
CDB4.ST3.Grade*
CDB4
ST4
Null
Si"
Name
Name
Addr
Addr
Addr
Addr
Tel*
Sex
CDB4.ST4.Grade*
Database
SSN
StudNo
Gender
CDB4.ST2.Grade-> {sophomore}
Grade
CDB4.STl.Grade*
139
STUDENT
Relation
Database
SSN
LN
No.
Street City
State
Tel#
Gender
Grade
Name
Name
Addr
Addr
Addr
Addr
Telif
Gender
Class
FN
LW
No
Str
Str
State
Tel*
NULL
Grade
SN*
FN
LN
No
Str
Str
State
Tel#
NULL
Grade
SW*
FN
LN
No
Str
Str
State
TeU
NULL
Grade
State
StudNo FN
CDBl
stud
Null
s#'^
CDB2
CS_Stud
Null
SN^
CDB2
Math_Stud
Null
CDB2
EE_Stud
Null
CDB3
Studs
SSN'^
SNo
Name
Name
Str
Str
City
NULL
Gender
CDB4
STl
Null
S#*
Name
Name
Addr
Addr
Addr
Addr
Tel*
Sex
CDB4.STl.Grade*\
Name
CDB4.ST2.Grade*]
Class
CDB4
ST2
Null
S#^
Name
Addr
Addr
Addr
Addr
Tel#
Sex
CDB4
ST3
Null
s#*
Addr
Addr
Addr
Tel#
Sex
CDB4,ST3.Grade*\
CDB4
ST4
Null
s#^
Name
Addr
Addr
Addr
Addr
Tel#
Sex
CDB4.ST4.Grade*]
FIGURE I I
Name
above, multidatabase relations can be mapped onto their corresponding hyperrelations. Some of the hyperrelations mapped from the relations given in Fig. 5
are shown in Figs. 11, 12, and 13.
D. The Hyperrelational Algebra
In this section, we introduce how the relational operations are extended to
hyperrelational operations and how these operations are performed on a hyperrelation. Analogous to the relational algebra in the relational model, the hyperrelational algebra provides a mathematical foundation for expressing global
queries and the transformation and optimization of the queries.
I. The Hyperrelational Operations
The hyperrelational operations are a direct extension of the relational operations. They are performed on hyperrelations. These operations include H-SELECTION ( a ^ ) , H-PROJECTION (jr^), H-JOIN (M^),
H-INTERSECTION (n^), H-UNION (U^), H-DIFFERENCE ( - ) , and
COURSE
Database
Relation
Course
CDBl
GNo
GName
Credits
Hours
c#'
Name
Credit
NULL
Fundannental
Type
Advanced
Prereq
Type
NULL
NULL
CDB2
Fndmntl_Crs
CN'
Name
Credits
NULL
NULL
CDB2
AdvncdjCrs
cw*"
Name
Credits
NULL
NULL
Prereq-C^
CDB3
Lecture
CNo*^ Name
Credit
NULL
NULL
NULL
CDB3
Tutorial
CNo"^ Name
Credit
NULL
NULL
NULL
CDB3
Seminar
CNc/^ Name
NULL
NULL
NULL
NULL
CDB3
Indvl_Stdy
CNo*^ Name
NULL
NULL
NULL
NULL
CDB4
Courses
C#*^
Name
Credit
Hours
Fndmntl
Advncd
NULL
a^
CDB2.Fndmn11_Crs.Fundamental*
*
6 = CDB3.Tutorial.Fundamental
X~
CDB3.Seminar.Advanced
P = CDB3.Lecture.Fundamental
Advanced*
140
CHIANG LEE
TAKE
Database
Relation
SSN
SNo
CNo
Score
Grade
CDBl
Take
Null
S#^
C#^
CDB2
Participate
Null
SN^
CN*
Score
CDB3
Takes
SSN'^
Null
CN^
Grade
CDB4
ENROLL
Null
S#*^
C#*
Score
FIGURE 13
c^, i / i , . . . , Vm))
X . . . X {{Cn)}
X r(Vu
. . . , Vm).
I4 I
Operation
cr^ci^^) = crscA^h) U . . . U
asc^U
where a and U are the relational operator SELECTION and the set operator UNION/ respectively. Assume that the SC is a minterm^ and expressed
as SC = (ai^Oicoi) A (ai^02(O2) . . . A (ai-OjCOj), where aik is an attribute of R^,
Ok ^ {> , >, =, 7^5 < , <}, and cok is a constant value, for all fe = 1 , . . . , /. If the
values of the attributes ai^^..., aj. of the tuple ictp(p = 1 , . . . , w) are f/^,..., Vi.,
respectively, and none of them is a NULL, then SCp = (vi^Oicoi) A (vi^Oicoi)... A
(vj.OjCOj), If any of the attribute values is a NULL, then crscpi^tp) = 0The H-SELECTION operation is composed of a number of relational
SELECTIONS and UNIONS. A H-SELECTION on a hyperrelation is carried
out by applying SELECTIONS on the tuples of the hyperrelation, i.e., the underlying relations from component databases, and UNIONS of the results from
each component databases. For instance, given a hyperrelation STUDENT as
shown in Fig. 11, the hyperrelational query
^^Grade=freshman(STUDENT)
I 42
CHIANG LEE
SName, Addr, NULL) and {db2, r2, S#, SName, Addr, Sex) be two tuples of the
hyperrelation. If a selection query is to find the information of students whose
gender is female, then none of the data of relation ri should be considered,
as the gender of those students is unknown. Only the relation corresponding
to the second tuple (db2.r2) should be processed. However, if maybe values
are welcome in a particular environment, the hyperrelational algebra will still
work by changing the definition of H-selection to allow selection on uncertain
information.
An interesting aspect of this algebra is that as some relations corresponding
to a R^ may not have all the attributes defined in R^, the issue of union compatibility [28] is involved in the UNION operation in a^ (refer to the definition).
There are two choices in dealing with this problem:
1. Strictly confine that all relations to be unioned must have the same set
of attributes. If any relation does not meet this condition, it is removed
from the a^ expression.
2. Take the outer-union approach, which releases the union compatibility
constraint by allowing relations of different sets of attributes to be
unioned and the resulting relation contains all attributes from the
unioned relations.
We take the second approach for the hyperrelational algebra, because it allows
the user to obtain the largest set of information as the result. If the first choice is
taken, then following the last example a relation in R^ (DB, Rel, Studno, Name,
Address, Gender) that has the schema {dba, r3, S#, SName, NULL, Sex) will
not be processed (because the relations are union incompatible), even though
the sexuality of students is known. To the user, the resultant female students
not including those in rs are an incomplete set of students. The answer of
a multidatabase query is by default to inquire all existing qualifying tuples.
Hence, the second choice above is considered. A formal definition of the outerunion operation can be found in [28]. The column of Address for rs tuples can
simply be filled with NULLS to indicate nonavailability of the information.
There may be a concern that the result of an outer-union of two partially
compatible relations having schemas such as Student(SSN, Name, Address,
Class) and Teacher(SSN, Name, Address, Rank will have the schema R(SSN,
Name, Address, Rank, Class) (i.e., all attributes of Student and Teacher are
included). Hence, an outer-union of multiple relations without a guidance will
likely make the semantics of the result relation too complex to be clear and
precise. In our definition of the hyperrelational algebra, however, this situation
will not occur because the union is confined to those relations (tuples) within
the same R^, These relations are semantically equivalent; the relations in the
I 43
above example (Student and Teacher) will never be outer-unioned in our case.
Also, as the semantics of the relations mapped to a R^ are equivalent to that
of the R^, the result of the outer-union of any of the relations still has the
same semantics as R^. In the following operations, all the "U" symbols (not
including U^) denote an outer-union, unless specified otherwise.
The H-PROjECTION
Operation
^{PA}(^
) = ^[PAi}(^h)
U U n{PA}(^tn),
where n is the relational operator PROJECTION and {PA} is the set of attributes of ti corresponding to the attributes {PA}.
Let us take the hyperrelation STUDENT shown in Fig. 11 as an example.
The H-PROJECTION operation on STUDENT
<FN,LN.Tel#}(STUDENT)
I 44
CHIANG LEE
employee relation, for instance, with unknown Tel# is still considered a legal
and valid tuple in the relation.
The H'JOIN
Operation
The H-JOIN operation is used to select related tuples from two hyperrelations and then performs joins on each pair of the selected tuples. We defined
the H-JOIN operation as follows.
DEFINITION 7 (H-JOIN). Given two hyperrelation R^ and S^ having the
set of tuples {ti, ...,tn} and {^i,..., w^}, respectively. The H-JOIN (M^) on R^
and S^ under the join condition JC is defined as
U((*^)M;c^ (^^)),
where X is the relational JOIN operation. The join condition/C is a Boolean
expression and is specified on attributes from the two hyperrelations R^ and
S^, JCij is the transformation of / C by substituting the attributes of the hyperrelations with the values of the attributes. (*f/) Xljc,;(*^/)(^ = ! , . . . , and
/ = 1 , . . . , m) is an empty set if any of the values of the attributes of ti and Uj
involved in JCij is a NULL.
For example, the following H-JOIN
STUDENT M^I^^^^SNo TAKE
is equal to
(^Stud lXs#=s# Take) u {s\u6 IXIS#=SN Participate) u (^Stud MS#=NUII Takes) u
(stud ixis#=s# Enroll) u (^CS.Stud IXSN=S# Take) U (cS.Stud MSN=SN Participate) u
(CS-Stud lXsN=Nuii Takes) u (CS.Stud MSN=S# Enroll) u u (Studs X1SNO=S# Take)u
(studs lxisNo=sN Participate) u (studs IXISNO=NUII Takes) u (Studs MSNO=S# Enroll) u
(ST1 Ms#=s# Take) u (ST1 MS#=SN Participate) u (ST1
Takes) u
The database names in this case need not be given since all local relations
happen to have distinct names. Note that the H-JOIN performs a relational
join on every pair of the relations from the two operand hyperrelations. This
definition gives the general formula for a hyperrelational join operation. Note
that two students in different universities (databases) having the same student
number do not necessarily mean the same person (and most likely they are not).
To identify whether two records in different databases refer to the same realworld entity is termed the entity identification problem in the literature [11,
16, 20, 81]. As this problem is beyond the focus of this paper, we simply assume that the entity identification process is done before or after the algebraic
I45
transformation. In an environment in which entity identification is not supported, the joins between relations of different databases become unnecessary
and a hyperrelational join such as
STUDENT M^No=SNoTAKE
is reduced to
(Stud iXs#=s#Take) u (CS.Stud XSN=SN Participate) u
(Matli_Stud ixisN=SN Participate) u (EE_Stud XSN=SN Participate) u
(Studs MsNo=Nuii Takes) u (ST1 NS#=S# Enroll) u u (ST4 Ms#=s#Enroll).
The next group of hyperrelational data manipulation operations are
the mathematical set operations. These operations include H-UNION,
H-DIFFERENCE, H-INTERSECTION, and H-PRODUCT. Note that although
they are all based on the relational outer-union operations, they require that the
operand hyperrelations be union-compatible, analogous to the requirements of
their counterpart in relational algebra. Details are given as follows.
The H'UNION Operation
The H-UNION operation (U^) is used to merge tuples by first applying
unions to two sets of relations and then applying a union again to the unioned
relations.
DEFINITION 8 (H-UNION). Given two union-compatible hyperrelations
R^ and S^ having the sets of tuples {^i,..., tn] and {^i,..., u^]^ respectively,
the H-UNION (U^) on R^ and S^ is defined as
I 46
CHIANG LEE
is an OUTER-INTERSECTION operation [28]. That is, RnS (R and S art
relations) will have all the attributes from R and S, Tuples of the relations
having the same key values will be retained. For the same purpose, R S does
not require that R and S be union compatible. We generalize the concept of the
relational difference operation and define a new operation named G-difference
(Generalized-difference, ^) operation as follows.
DEFINITION 10 (G-DIFFERENCE). Given two semantically equivalent
relations R and S having the schemas R(ri, r i , . . . , r) and S(si, S 2 , . . . , s^),
respectively, the G-DIFFERENCE on R and S is defined as
Operation
DEFINITION 11 (H-INTERSECTION). Given two union-compatible hyperrelations R^ and S^ having the sets of tuples {?i,..., ^} and {^i,..., Um),
the H-INTERSECTION (n^) on R^ and S^ is defined as
Operation
U . . . U ((i.tn) X (irUrr,)),
I 47
In the following, we will explore the properties of operations in the hyperrelational algebra. As we have stated in the Introduction, the properties will
serve as a mathematical basis for global query optimization at the algebraic
level.
2. Transformation of Hyperrelational Operations
Based on the above definitions, the hyperrelational operations can be transformed from one form to another. Interestingly, even though the hyperrelations
stores the schemas of local relations, all the transformation rules in relational
algebra are still applicable to the hyperrelational operations. In the following,
we give the main theorems on the transformation rules for hyperrelational operations. The proofs of these properties are a direct extension of those given in
[28], and hence are omitted here. Note that in the following attr(X) denotes
the set of attributes involved in the Boolean expression X, if X is a selection
or join condition, and it can also mean the set of attributes of X, if X is a
hyperrelation.
PROPERTY 5 (IDEMPOTANCE OF a^,
n^),
cr^R^ =^ a^S^^^R%
z/ C = Ci A C2
Tt^R^ =^ 7r^(7r^R^),
if A^
PROPERTY 6 (COMMUTATIVITY OF a ^ ,
.H.H'DH
V ,^H,^H TyH
^-.H ,H "DH
V ,H_^H "DH
7 r a R ^ cf^.Tt^^R",
A^.
n^),
if attriCz) c Aj
=^
S")e
T",
R"0{S"eT").
a^( R"
n^).
M " S")
n^(R^^^S^)
AND n"
OVER M " , X ^ , U ^ ,
ifC =
if C =
CRH A Q H
CRH
attriQcA
QH
n").
I 48
CHIANG LEE
a^(R"os")
eeiu",n"}
where in this property CRH and C5H are the selection conditions involving only
R^ and S^, respectively, and ARH and ASH are the projected attributes belonging
to only R^ and S^, respectively.
PROPERTY 9 (TRANSFORMATION RULE BETWEEN U ^ , - ^ , AND P I ^ ) .
X n ^ Y = (XU^ Y) - ^ ( ( X - ^ Y) U^ ( Y - ^ X)),
where X and Y ^r^ ?w/o hyperrelations.
In the relational algebra the set of operations {a, TT^ U, , X} is a complete
set, i.e., any other relational operations can be expressed as a composition of
operations in this set. Similarly, the complete set of the hyperrelational algebraic
operations is {a^, it^^ U^, ^, x ^ } . The proof is also similar to that of the
relational case.
3. Examples
We give a few examples to show how hyperrelational algebra works for
expressing global queries in a multidatabase environment. The databases used
in these examples are those given in Fig. 5, and the hyperrelations are shown in
Fig. 11 through Fig. 13. SQL will be the query language used in the examples.
Query 1. Find the names of the students who take the database course.
The SQL of this query is as follows:
Select
From
Where
and
and
FN, LN
STUDENT, COURSE, TAKE
STUDENTStudNo=TAKE.SNo
COURSE.CNo=TAKE.CNo
COURSE.CName='Database'
In this query, the relations STUDENT, COURSE, and TAKE are the hyperrelations. This global query can also be expressed in hyperrelational algebra as
^FN,LN{^COURSE.CName='Database\{^T^^^^^T^
^StudNo=SNo^^^^)
^''cNo=CNoCOURSE]}.
Query 2. Find the names of the courses taken by only female students or
by only male students.
I 49
Select CName
From COURSE C
Where Not Exists
( Select *
From TAKE T, STUDENT S
Where C.CNo=TCNo
and TSNo= S.StudNo
and S.Gender='Female')
Union Not Exists
( Select *
From TAKE T STUDENT S
Where C.CNo=TCNo
and TSNo= S.StudNo
and
S.Gender='Male')
The algebraic expression of this global query is
{[^CNo COURSE
7tCNo(^Gender='Female'
nncNoTAKE)
U{[7rcNo COURSE
- 7tCNo{^Gender='Male'
STUDENT
I 50
CHIANG LEE
I5 I
I 52
CHIANG LEE
(PDBS). As each PDBS is autonomous, the MDBS has no control over the internal optimization decisions within each PDBS. This makes query optimization
at the execution strategy level a difficult task. Despite the importance of the
work, it has so far received little attention. As the query optimizer of each
PDBS is a "black-box" to the MDBS, the main focus of the past research was
on how to regenerate in the MDBS the cost model of the query optimizer of each
PDBS [24]. Du et aL [24] proposed a cahbrating database that is synthetically
created so as to reasonably deduce the coefficients of cost formulas in a cost
model. Works in [27, 29, 61, 68, 86] are other examples of having a similar
approach. The effectiveness of all these schemes, however, heavily relies on the
precision of the regenerated cost models of the PDBSs. Our study reveals that
many factors affect the execution cost and the processing time of a query in a
PDBS, which makes query optimization using this (regenerating cost models)
approach unrealistic. For instance, one of the factors is that some commercial
DBMSs automatically cache a large portion of a recently accessed relation in
memory for some time and others only cache a very small portion or not cache
at all for such data. For those DBMSs that do, a subsequent query, if it happens
to involve the cached relation(s), is processed much faster than the previous
query. The two dramatically different processing times for the same operation
on the same relation could make the regenerated cost model of a PDBS (which
is usually based on those times) significantly deviate from its real execution
time. A global query optimizer using these inaccurate cost models could lead
to making incorrect decisions. An even worse situation is that there is no way
to predict how long such a relation will stay in the cache (since it depends on
many other factors), which lets the regenerated cost model in the MDBS simply
be untrustful.
Another factor that worsens the situation is that in many occasions multiple
joins are performed in a PDBS. However, there is no way to predict the execution
order of these join operations. A different execution order of the joins could
lead to a dramatically different execution time. None of the previously proposed
methods covers this problem.
Recently, Du et al. [25] proposed a method for reducing query response
time by using tree balancing. This method is based on an assumption that all
PDBSs have the same processing rate for database operations. As all PDBSs are
equal in terms of processing speed, it becomes unnecessary to estimate the cost
models of the PDBSs in their method. Therefore, the issue of regenerating cost
models in that paper is ignored. In reality, however, having to have cost models
for optimization is still a major problem for their method to be realistic.
In this section, we will propose an optimization strategy at a level lower
than that of our previous work but still without the need of regenerating the cost
models of the PDBSs. We propose three query processing strategies for a truly
autonomous multidatabase environment. The first two strategies are based on
traditional optimization concepts and skills and mainly designed for the purpose
of comparing with the third strategy. We assume that all global query results
are eventually collected by the MDBS and sent from the MDBS to the global
users. The first strategy is a first-come-first-serve (FCFS) strategy. The MDBS
performs joins on relations that are received first. In other words, an intersite
join in the MDBS is performed as long as both of its operand relations have
I 53
I 54
CHIANG LEE
and easier to implement than those complex cost model regeneration methods
proposed in the past.
Finally, we note that the reason for not considering the other two wellknown join algorithms, i.e., the nested-loop algorithm and the hash-based algorithm, in our method is because they are either impossible or inefficient to be
implemented in a MDBS environment. We will clarify this point further also in
a later section.
A. Assumptions
From the information exchange point of view, we can classify the participating
database systems into three categories [24]:
Proprietary DBMSs, The PDBSs can provide all the relevant
information on cost functions and database statistics to the MDBS.
Conforming DBMSs. The PDBSs provide database statistics but are
incapable of divulging the cost functions.
Nonconforming DBMSs, The PDBSs are incapable of divulging either
the database statistics or the cost functions.
Our discussion in the following is dedicated to the nonconforming DBMSs environment. The reasons are, first, it is the least studied environment up to now.
All past query optimization methods have not been designed for this environment [24, 25, 27, 29, 61, 68, 86]. Second, it is the most reaUstic environment
considering the fact that almost all DBMSs today either have had or will be
supplied with an accessing interface to the World Wide Web. There are also
vendors utilizing data warehousing and digital library techniques to provide
users with various data sources. All these sources are basically heterogeneous,
same as in the nonconforming DBMSs environment. Both of the other two environments discussed in [24] require a major modification/enhancement to the
core modules of the existing DBMSs.
B. Where Should an Intersite Operation Be Performed?
One difficulty of processing a multidatabase query is that data conflicts may
exist. While performing an intersite operation (such as join, union), the conflicts
of data must first be resolved. This is normally achieved by converting local data
to a certain globally recognized format, such as that specified by the global
schema. The mapping functions (or mapping tables) of heterogeneous data and
the mapping mechanisms (software) all reside in the MDBS. Can the mapping
tables for resolving data discrepancy can be sent to a PDBS to let the PDBS
perform the tasks that are otherwise performed in the MDBS? The answer is
no because not only the mapping tables but also the mechanisms (software) are
needed to execute those conversions. Without the mechanisms, a PDBS is still
unable to convert data of different types. All past research mentioned earlier
ignored this issue in the design of an intersite join optimization strategy.
However, is it feasible to let the MDBS convert the data and the PDBS
perform the intersite joins after data discrepancy is resolved? This issue should
be examined in more detail. Basically, only two query execution schemes are
I5 5
feasible: (1) PDBSs perform the operations, and the MDBS converts the data,
and (2) the MDBS performs the operations and data conversion. Let us use an
example to illustrate these two schemes. Assume that R Mi SN2 T is a multidatabase query, where R, S, and T belong to distinct databases.
1. PDBSs perform the operations. Assume that the databases containing
R, S, and T are P D B S R , PDBS5, and P D B S T , respectively. A strategy of this
category contains the following steps:
(a) P D B S R sends R to MDBS for data conversion.
(b) MDBS converts R to a format understandable to PDBSs.
(c) MDBS sends R to PDBS5.
(d) PDBSs performs R Mi S.
(e) PDBSs sends the result, letting it be RS, to MDBS for another
conversion.
(f) MDBS converts RS to a format understandable to PDBSj.
(g) MDBS sends RS to PDBSj.
(h) P D B S T performs RSM2T.
(i) P D B S T sends the result, letting it be RST^ to MDBS for conversion.
(j) MDBS converts the result and sends it back to the user.
Certainly, we can also send S to P D B S R to perform the join, or even perform
M2 before Mi. However, as long as joins are performed in PDBSs, the required
communications between PDBSs and the MDBS as well as the data conversion
to be performed in the MDBS are basically the same.
2. MDBS performs the operations. The same query will be executed in the
following steps:
(a) PDBSK, P D B S S , and P D B S T send R, S, and T, respectively, to
MDBS simultaneously.
(b) MDBS performs conversion on these relations.
(c) MDBS performs Mi and M2.
(d) MDBS sends the result to the user.
Obviously, the first scheme involves too much communication between
each PDBS and the MDBS, causing serious overhead on query processing. It
can be aggravated in today's applications that are accessed through the World
Wide Web (WWW), a key feature desired by the users and implemented by most
DBMS vendors in their current/next generation DBMSs. On the other hand, the
second scheme does not require as much communication. In addition, many
joins performed in MDBS can be reduced to simply a merge operation if the
relations (subquery results) sent out from PDBSs are sorted. Hence, the second
scheme is a preferred option. However, the MDBS's workload is heavy in this
strategy. How to reduce query cost becomes crucial to the performance of a
MDBS.
As the MDBS is heavily involved in global query execution, a major concern of optimization is to minimize the consumption of MDBS system resources so that it will not be easily overloaded during query processing. In other
words, the optimization goal at this level should be to minimize the query execution cost so that the MDBS can serve a maximum number of concurrent
users.
156
CHIANG LEE
Other assumptions made are, first, the autonomy of a PDBS should not be
violated or compromised for any reasons. The MDBS can only interact with a
PDBS and utilize a PDBS's processing power under this restriction. Through this
assumption, a running database system is ensured to be able to participate in a
multidatabase without the need of modifying any part of its system code for the
query optimization purpose. The second assumption is that all local operations
(i.e., involving relations of a single database) of a query must be performed
locally. A multidatabase (global) query after compilation is decomposed into
subqueries, each being processible in a PDBS. For instance, a selection operation
on a local relation should be performed before the local relation is sent to the
MDBS (to participate in an operation involving relations of multiple databases).
The processed result is then transmitted to the MDBS. Certainly, it is possible
that the result size of a local operation (such as a join) becomes larger than its
input size such that it seems to be better to send its input relations, instead of
its result, to the MDBS and let the MDBS perform the (join) operation so as
to reduce communication overhead. We do not consider this approach because
our major concern here is that the MDBS could easily be overloaded if all
locally processible tasks are sent to the MDBS. Since our optimization goal is
to minimize the query execution cost in the MDBS, processing local operations
locally becomes a natural choice.
157
Different from query optimization in centralized databases, A key difference is that in a centralized database hash-based join methods are normally
considered more efficient than sort-merge join methods [12, 22]. This is because the sorting process has a complexity of 0[n log n) iov n records, which
is much higher than the linear hash process when n is large. Even if sort-merge
join is sometimes used when the distribution of data is highly skewed where
hash join becomes less efficient, we do not sort all the base relations before
joining them. In a multidatabase environment under study, however, since sorting the subquery results is performed in each PDBS rather than in the MDBS,
it is desired to have them all sorted (on an attribute on which the next join is
performed) before they are sent to the MDBS to alleviate the burden of and
facilitate the global joins in the MDBS. The focus becomes to design an optimization algorithm which is able to utilize as many sorted results from PDBSs
as possiblea still different problem from that in traditional database systems.
From the above discussion, we see that the past query optimization methods
cannot be directly applied to the new environment. New methods must be
designed.
es_
I\)
><o a
d
F
tXj
X, h
X, _^
X 3
><^4
_^
><5
\ .
X 6
\M
><8
X, J
F2_
FIGURE 14
e
_^
A join graph.
I 58
CHIANG LEE
The GRS strategy is an iterative algorithm that selects in each iteration the
join that has the lowest join selectivity factor at the current stage. This algorithm
is named so because it produces the least amount of data at every join stage.
The details of this strategy is Hsted in Fig. 16.
A number of criteria could be used to select a join [12, 60] in a greedy
multijoin algorithm. Three frequently seen criteria include:
1. min(JS): select a join with the minimum join selectivity factor
(producing relatively the least amount of data, or | R^ ixi Ry |/(| R^ | * | Rj \)
being minimum).
2. min(|R/| + |Ry |): select a join whose total amount of input data is
minimum.
3. min(|R, | * |Ry | * / S): select a join that produces the least amount of
data, i.e., | Ri M Rj \ being minimum.
FCFS strategy :
Input: a join graph G{V^ E)\
Output : a join sequence Q\
Repeat until |V| = 1 ;
Begin
Choose Ri M Rj such that RiRj E and Ri and Rj have been arrived MDBS
(if more than one joins can be performed (their operands are ready),
an arbitrary one is selected);
Add Ri M Rj into Q ;
Update graph G by merging nodes Ri and Rj into R,nin{ij) J*
End Begin
End Repeat
F I G U R E 15 The First-Come-First-Serve (FCFS) strategy.
I 59
GRS strategy :
Input: a join graph G{V^ E);
Output : a join sequence Q;
Repeat until \V\ = 1;
Begin
Choose Ri CXI Rj such that RiRj 6 E and it has the lowest join selectivity factor
at the current stage;
(if more than one such joins are available, an arbitrary one is selected;)
Add Ri EX Rj into Q]
Update graph G by merging nodes Ri and Rj into Rmin{ij) 5
End Begin
End Repeat
H
F I G U R E 16
I 60
CHIANG LEE
*
*
*
*
*
From KA
From RA
From RA
From RA
From RA
Where
Where
Where
Where
Where
R A - A I < 20
20 < R A . A I < 40
40 < R A . A I < 40
60 < R A . A I < 80
80 < R A . A I
*
RA
RA.AI
The same query but asking for a sorted order on attribute Bl will be sent
to database B. The results obtained from these two databases can then be easily
merged by the MDBS to obtain the join result. This sort-merge approach is
much simpler and more efficient than the hash join approach. As merge is a
simple and linear process, the workload added onto the MDBS is extremely
light.
The nested-loop join algorithm is the easiest to implement since it only
requires the MDBS to send the following query to each PDBS to access the
data:
Select *
From RA
The PDBS need not process the relation in any manner. Since it does not fully
utilize the PDBS's processing power, we do not consider it as a viable approach.
I6 I
One may wonder whether it is possible for database A and database B to send
one block of data at a time to the MDBS such that the MDBS simply join these
two blocks of data in memory. In this manner, the MDBS need not save the
entire relations R^ and R^ to disk in order to run the nested-loop join algorithm,
hence helping to reduce the workload of the MDBS. This scheme however is
infeasible because the required low-level handshaking mechanism violates the
autonomy of a PDBS. Thus, it is not implementable in a nonconforming DBMSs
environment.
The following table summarizes the features of the three join algorithms
implemented in a nonconforming DBMS environment. It is apparent that the
sort-merge join is the most suitable algorithm for joins in a multidatabase
environment, and therefore it is adopted as the global join algorithm in this
research.
Algorithms
Consumed PDBS
processing power
Consumed MDBS
processing power
Hash join
Sort-merge join
Nested-loop join
Very high
Medium
Low
Medium
Low
Very high
162
CHIANG LEE
IX
CxJk
<^
c.
tx
..
11
Ly
X3
^
><5
F5_
F I G U R E 17
One of the maximum matchings found using the maximum matching algorithm.
I\)
FL
Xo
a
X,
3_
a
X3
, b
A.
1 b
tx
F2_
F I G U R E 18
cf
c
^ 6
Xs
Cx^9 c
I 63
Our maximum merging strategy (MMS) is designed to work for such general
query graphs.
We take a simple and yet effective approach to maximize the number of
matching pairs in order to reduce the complexity of our algorithm. The basic
idea is that we first use the existing maximum matching algorithm to find the
maximum number of relation pairs, and let this graph be G. Then we identify
the relations connected through the same join attribute, such as a dotted oval
in Fig. 18. For each set, say C, of these relations, we check whether benefit
(more matching pairs) is gained if the part of G corresponding to C is replaced
by C. If it is beneficial, then they are replaced by C. If the number of matching
pairs does not change by replacing them with C, then we check whether the total
intermediate result size after the replacement will decrease. If it decreases in size,
then the replacement is still adopted because it helps to reduce the workload
of the MDBS. The same process continues until all such sets are examined. In
order not to miss a large set (i.e., with more nodes), this process should start
from the largest set and proceed in a descending order of the set size (number
of nodes in a set) because selecting one set of matching relations might affect
the selection of a set that is considered later.
Let us illustrate our idea by using the example in Fig. 18. Assume that
the maximum matching pairs found by using the maximum matching algorithm is that shown in Fig. 17, but note that their join attributes should be the
same as those in Fig. 18. There are three sets of relations, each set of relations
being connected through the same attribute. In descending order of the size,
they are the set A = [RQ, R I , R2, R3, i ^ } , the set C = [R^, Ry, R9, Rio}^ and
the set JB = {R4, R5,1^} in which set B is not circled because eventually it is
not selected in our result. For convenience, let us denote the set of relations
{JRi, R y , . . . , K^} that are connected through the same attribute as RiRj... Rk.
We start from set A, replace the matching pairs RQRI and R3JR4 by the set of
matching relations R0R1R2R3R49 and check whether the number of matching
relations in the entire graph increases. Since in this case the number is increased
from five to seven, we adopt the new set. Next, we consider the set C. If the
matching pairs R5R6, RyRs, and R9R10 are replaced by the set R6R7R9R1O) the
number of edges is still three. So whether they will be replaced by the new set
is determined by the intermediate result size. That is, assuming that the result
relation of Ri\><\ Ril><\,. MRk is denoted as l^y.j^,and the size of realtion R is
denoted as |R|, then the total size of the intermediate result relations (for only
the part of involved relations R5, i^, R7, Rg, R9, Rio) before the replacement is
1^561 + IRysl + IR910I, and that after the replacement is IR5I + |R679iol + IRsIIf the size become smaller, then the replacement is adopted. Otherwise, it is rejected. If C is adopted, then B need not be checked because only one node, R5,
is left for B. Otherwise, the set B is examined similarly. We list our algorithm
details in the following.
Maximum Merging Strategy
Input: join graph G( V, );
Output: join strategy Q
Let I V| be the number of vertices of G, and | | be the number of edges of G;
We denote the join result of R1MR2 R^ as Ri2...i^5 and the size of R as |R|;
I 64
CHIANG LEE
Use the maximum matching algorithm to find for G a set of matching pairs
SG with the number of matching pairs = M;
Assume that there are I sets of connected vertices. The edges of each set have
the same label (i.e., join attribute) and the number of vertices in each set
>3, where 0 < I < [ | V | / 3 ] ;
Each of these sets of vertices is a subgraph of G, and we denote such a subgraph
as Gi(Vi, Ei)^ where 0 <i < I;
Let VJ = {1^1,2^2,...,!^,^,};
Sort G/'s in descending order (i.e., Gi is the largest set);
LetQ=0;
For / = 1 to I Do
Begin
G(V,F)
=
G(V,E)-GdVi,Ei);
Find maximum matching pairs for G' using the maximum matching
algorithm; let the number of matching pairs be M';
CASE
M < M -\-\Ei\
Begin
G(V,) = C ( V ^ F ) ;
M=M-^\Ei\;
Q=Q+{R~R~^r^R~~^];rRi.
6 M */
End
M=M-\-\Ei\
Begin
R 7 ^ e SG and Rh e V, and 1^, Rf,RgVi
/"^RnRk is a matching pair of G that is broken by the selection of
GijRfRg is a matching pair of G residing in G/. */
then /* accept Gj */
G(V,) = G ' ( V ^ F ) ;
M=M + \Ei\;
Q=Q-^[RiiRi2...Ri^y,];
End
M > M'-\-\Ei\
Begin
Do nothing;
End
End-of-For
Find matching pairs of G( V, ) using the maximum matching algorithm and
let SG = {Rn Ru-, Rii Rii, , ^ i ^ 2 } be the set of matching paris;
/*G( V, E) might have been update so that a recalculation is still needed. */
Q=Q-^SG;
/* End of algorithm */
At the end of this algorithm, the relations that will be merge joined are
determined. The results of these merge joins, however, could be unsorted on
the attribute to be joined next. In general, these later stages of joins could have
either one or two of the input relations being unsorted on the join attribute.
I 65
166
CHIANG LEE
F I G U R E 19
167
From Step 1 to Step 4 this model creates tree queries. Step 5 is to generate
cyclic queries based on a certain probability.
The Steps for Modeling Join Attributes
We say that two edges are adjacent if they have a same terminal node.
1. Assign different join attributes to all edges. Let all edges be unmarked.
2. Randomly select an edge as an initial edge and denote it as /.
3. The join attribute of an unmarked edge, say Ey, that is adjacent to /
may be changed to the same join attribute as / based on a probability of 1/4. If
yS join attribute is changed to be the same as E-s join attribute, then mark y.
After all unmarked edges adjacent to / have been gone through this process,
marks /.
4. If all edges of a join graph are marked. Protocol stops. Otherwise, arbitrarily select an unmarked edge and let it be Ej. Go to Step 3.
Through these steps, we generate 1000 join queries (for a given Nr) to test
the performance of the proposed strategies.
2. Parameter Setting
Parameters used in our study and their default values are summarized in
Fig. 20. These values are standard values and have been used in many other
works [37, 62]. Also similar to other research, we assume that the memory
size is large enough to contain at least a pair of corresponding blocks of the
two relations to be joined so as to simplify our simulation task. The execution
cost is measured in the number of disk accesses. CPU cost is ignored since it is
normally much smaller than disk I/O cost.
3. Results and Discussion
In our experiments, the size of each relation received by the MDBS is assumed to be 10^. Hence, if JS is equal to 10"^, the result of a join will have
a same size as the source relations, i.e., |RN S| = |R| = |S|. In order to examine the effect of join selectivity factor on the performance of the strategies, we
divide JS into three ranges.
1. 2 X 10~^-8 X 10~^: represents a low join selectivity factor. A join in
this case will produce a result of only 20-80% of its input size.
2. 8 X 10~^ 1.2 X 10~^: stands for a medium join selectivity factor. The
result size of such a join falls between 80 and 120% of the input size.
Symbol
Nr
\R\
Ts
Ps
JS
Meaning
The number of relations
The number of tuples of relation R
The size of each tuple
The size of each page
The join selectivity factor
F I G U R E 20
value
5-30
10
200 bytes
4 Kbytes
2*10-^ - 2*10-^
168
CHIANG LEE
3. 1.2 X 10"^ - 1.8 X 10~^: stands for a high join selectivity factor. The
result size of such a join is increased by more than 20% (i.e., 120%) of
the input size up to 180%.
Low Join Selectivity
We first examine the result when JS is low. Figure 21 shows the performance
ratio of all strategies under a low JS. The performance ratio is obtained by
dividing the average execution cost of the generated queries of a proposed
strategy by that of the FCFS strategy. "X" in the label of the vertical axis of the
figure means one of the three strategies. Hence, the curve for FCFS is always
100%, and X's curve below FCFS's curve means that X is better than FCFS and
above FCFS's curve means X is worse.
As we expected, FCFS is the worst because it does not utilize any knowledge about the join selectivity factors in scheduling the join operations. GRS
is better than FCFS by only about 10%. MMS is the best and outperforms
the other two consistently by about 50%. In other words, it consumes only
about half of the computing power that is required by the other two strategies.
This is mainly because of the use of reduction of a join to simply a merge process in that strategy. GRS is only slightly better than FCFS because the greedy
scheduling scheme provides somewhat benefit. However, this advantage is overshadowed by the large cost for performing joins. The consistent benefit margin
of MMS over the others indicates that it is independent of N^, the number of
relations.
Medium Join Selectivity
Number of relations
FIGURE 21
169
* .....*..
*
100 ^l"" *
90
Q . . . n .
n
Ratio 80
70
of
< j
...<j....
<
60 <
t
X
50
to
40
* FCFS
FCFS 30
n GMRMSS
<3
20
10
1
1
n
1
1
1
5
10 15 20 25 30
Number of relations
<
<
range of the given number of relations in this experiment. Its saving will become
more apparent when the number of relations (Nr) increases outside the range
in this experiment.
High Join Seiectivity
Figure 23 gives the result of high JS. In this case, both GRS and MMS
outperform FCFS by a great margin, especially when Nr is large. This is because
the JS is high such that the output size keeps increasing during the joins. Hence,
if the number of joins increases, the benefit over FCFS increases too, enlarging
the difference between FCFS and the other two strategies. Also we note that the
performance of GRS approaches that of MMS at large 2V{.. This indicates that
GRS is actually a good choice too under the condition that JS is high because
this algorithm is simple, easy to implement, and does not require the PDBSs to
sort their output relations. The only limitation is that it is confined to a high JS.
In summary, the MMS strategy is the best and outperforms the other two
by a significant margin in all circumstances. GRS can be a second choice, in
an environment where implementation complexity is a consideration and the
join selectivity factor is known to be large. We have also conducted other sets
of experiments by varying different parameters. Because of size limitation, we
will present them in our future report.
100 >
90 ^1.
Ratio gQ
of 70
Y 60
f
50
to 40
FCFS 30
20
10
0
!
*
D
<
'n.
<a.
FCFS
GRS
MMS
- D . .
D - D
10
15 20
25 30
Number of relations
I 70
CHIANG LEE
Y. CONCLUSIONS
In a multidatabases environment, query processing is a time-consuming task as
multiple levels of conversions (of data as well as the query) are involved. Query
optimization in such an environment becomes especially involved. This issue,
however, has not attracted much attention from researchers in the past.
In this chapter, we first classify the possible approaches to optimization
from different levels. Among these approaching levels, our focus is solely on the
algebra level and the execution strategy level. The reasons are that they are the
key and necessary tasks of multidatabase query optimization, and optimization
at these two levels is more practical and implementable.
For optimization at the algebra level, the task is to first find a uniform representation of conflicting relation schemas for multidatabase query translation.
Although there has been plenty of research discussing the semantic conformation issues as cited earlier, there is still a lack of a thorough investigation on
the representation and manipulation of data of conflicting schemas. The hyperrelational approach presented here provides support for dealing with these
problems. Key concepts discussed for this algebra are as follows:
1. The notion of the least upper bound for a set of semantically equivalent
relations is defined. Based on this definition, a hyperrelation can be found for
each set of such relations. Instead of being a record keeping attribute values of
an entity, a tuple in a hyperrelation keeps the schema information of a local
relation. We presented in detail how schemas of each type of conflict are mapped
to their corresponding hyperrelation.
2. Extending the notion of relational algebra, we developed a hyperrelational algebra that is performed on hyperrelations. Provided with algebraic
transformation rules, the hyperrelational algebra serves as a sound basis for
multidatabase query translation and optimization.
Having a clear and succinct tabular representation of conflicting local schemas,
we believe that the SQL can be used as a multidatabase DDL/DML with minimum changes (to the SQL), because the tabular structure of a hyperrelation is
not much different than that of a relation. This allows a global user to handle
a multidatabase system easily, and it also reduces great effort in implementing a multidatabase system. View mechanism, a direct application of the query
language, can therefore be easily dealt with in the hyperrelational approach.
For optimization at the execution strategy level, we discussed that conflict
resolution on values of relations from different databases is often required in
performing intersite database operations because of the heterogeneity of data
between autonomous databases. Previous researches ignored this heterogeneity
in their query optimization algorithms. In this chapter, we discussed this problem and argued that the MDBS, rather than an arbitrary PDBS, must perform
intersite operations because only the MDBS has the required conflict resolution
information and mechanisms. The transformed (and optimized) algebraic expression of a global query (obtained from the previous level of optimization)
is used to find an optimal execution strategy. As MDBS is the site for executing intersite operations, it can easily become a bottleneck while serving for
multidatabase users/applications. We presented algorithms that minimize the
I7 I
consumption of system resources such that the MDBS bottleneck problem can
be alleviated. The advantages of our methods are (1) the knowledge of cost models of the PDBSs is no needed such that there is not need to regenerate all PDBSs'
cost models in the MDBS (this is however required in all previous works), (2)
our methods reduce the load of the MDBS and even distribute part of the task
(sort) to the PDBSs, and (3) our methods are much simpler and easier to implement than those complex cost model regeneration methods proposed before.
Overall, query optimization in a multidatabase system can be achieved from
multiple levels. Optimization techniques at these levels can be combined into a
complete query compilation/transformation/optimization framework and implemented as modules in a multidatabase query optimizer. This integration
framework is especially crucial for new applications such as data warehouses,
web-based data sources, and mobile information systems, because these applications involve a large number of information sources. We expect that the
presented techniques can soon be realized in the next generation information
systems.
REFERENCES
1. A Special Issue on Heterogeneous Database (March, S. Ed.), ACM Comput. Surveys 22(3):
1990.
2. Agarwal, S., Keller, A. M., Wiederhold, G., and Saraswat, K. Flexible relation: An approach for
integrating data from multiple, possibly inconsistent databases. In International Conference on
Data Engineering, March 1995.
3. Ahmed, R., Smedt, P. D., Du, W., Kent, W., Ketbchi, M. A., Litwin, W. A., Rafii, A., and Shan,
M. C. The Pegasus heterogeneous multidatabase system. IEEE Computer 24(12): 19-27,1991.
4. Batini, C , and Lenzerini, M. A methodology for data schema integration in the entity relationship model. IEEE Trans. Software Engrg. 10(6): 650-664, 1984.
5. Batini, C , Lenzerini, M., and Navathe, S. A comparative analysis of methodologies for database
schema integration. ACM Comput. Surveys 18(4): 1986.
6. Bregolin, M. Implementation of multidatabase SQL, technical report. University of Houston,
May 1993.
7. Breitbart, R., Olson, P. L., and Thompson, G. R. Database integration in a distributed heterogeneous database system. In Proc. IEEE Data Engineering Conference., pp. 301-310, 1986.
8. Bright, M. W., Hurson, A. R., and Pakzad, S., Automated resolution of semantic heterogeneity
in multidatabases. ACM Trans. Database Systems 19(2):212-253, 1994.
9. Chatterjee, A., and Segev, A. Data manipulation in heterogeneous databases. SIGMOD Record
20(4):1991.
10. Chartrand, G., and Oellermann, O. R. Applied and Algorithmic Graph Theory. McGraw-Hill,
New York, 1993.
11. Chen, A. L. P. Outerjoin optimization in multidatabase systems. In International Symposium
on Databases in Parallel and Distributed Systems, 1990.
12. Chen, M.-S., Yu, P. S., and Wu, K.-L. Optimization of parallel execution for multi-join queries.
IEEE Trans. Knowledge Data Engrg. 8(3):416-428, 1996.
13. A Special Issue on Heterogeneous Distributed Database Systems (Ram, S. Ed.), IEEE Computer
24(12):1991.
14. Czejdo, B., Rusinkiewicz, M., and Embley, D. W. An approach to schema integration and query
formulation in federated database systems. In Proceedings of Third International Conference
on Data Engineering, 477-484,1987.
16. Czejdo, B. and Taylor, M. Integration of database systems using an object-oriented approach.
In International Workshop on Interoperability in Multidatabase Systems, Kyoto, Japan, 1991,
pp. 30-37.
I 72
CHIANG LEE
17. Dayal, U. Query processing in a multidatabase system. In Query Processing in Database Systems
(Kim, W., Reiner, D. S., and Batory, D. S. Eds.), pp. 81-108. Springer-Verlag, Berlin, 1984.
18. Dayal, U., and Hwang, H. View definition and generalization for database integration in multidatabase: A system for heterogeneous distributed database. IEEE Trans. Software Engrg.
10(6):628-644, 1984.
19. DeMichiel, L. G. Performing Operations over Mismatched Domain, In IEEE Proceedings of
the Fifth International Conference on Data Engineerings 1989.
20. DeMichiel, L. G. Resolving database incompatibility. IEEE Trans. Knowledge Data Engrg.
1(4):1989.
21. DeMichiel, L. G. Resolving database incompatibility: An approach to performing relational
operations over mismatched domains. IEEE Trans. Knowledge Data Engrg. l(4):485-493,
1989.
22. DeWitt, D. J. et al. Implementation techniques for main memory database systems. In Proc.
ACM SIGMOD Int. Conf on Management of Data, June 1984, pp. 1-8.
23. DeWitt, D.].et al. The gamma database machine project. IEEE Trans. Knowledge Data Engrg.
2(l):44-62, 1990.
24. Du, W., Krishnamurthy, R., and Shan, M. Query optimization in heterogeneous DBMS. In
Proc. of the 18th VLDB Conference, pp. 277-291, 1992.
25. Du, W., Shan, M., and Dayal, U. Reducing multidatabase query response time by tree balancing.
In Proc. of the ACM SIGMOD Conference, pp. 293-303, 1995.
26. Dwyer, R A., and Larson, J. A. Some experiences with a distributed database tested system.
Proc. IEEE 75(5):1987.
27. Egyhazy, C. J., Triantis, K. R, and Bhasker, B. A query processing algorithm for a system of
heterogeneous distributed databases. Distrib. Parallel Databases. 4(l):49-79, 1996.
28. Elmasri, R., and Navathe, S. B. Fundamentals of Database Systems. Benjamin/Cummings,
Reduced City, CA, 1994.
29. Evrendilek, C., Dogac, A., Nural, S., and Ozcan, F. Multidatabase query optimization. Distrib.
Parallel Databases. 5(1):77-114,1997.
30. Gangopadhyay, D., and Barsalou, T. On the semantic equivalence of heterogeneous representations in multimodel multidatabase systems. SIGMOD Record 20(4): 1991.
31. Geller, J., Perl, Y., and Neuhold, E. J. Structural schema integration in heterogeneous multidatabase systems using the dual model. In Proc. the 1st RIDE-IMS Workshop, Kyoto, Japan,
1991.
32. Graefe, G. Query evaluation techniques for large databases. ACM Comput. Surveys 25(2):1993.
33. Grant, J., Litwin, W., Roussopoulos, N., and Sellis, T. Query languages for relational multidatabases. VLDB Journal 1{1):\99?>.
34. Heimbigner, D., and McLeod, D. A federated architecture for information management. ACM
Trans. Office Inform. Systems 253-278,1985.
35. Hsiao, D. K., and Kamel, M. N. Heterogeneous database: Proliferations, issues, and solutions.
IEEE Trans. Knowledge Data Engrg. 1(1):1989.
36. Hua, K. A., Lee, C., and Young, H. Data partitioning for multicomputer database systems: A
cell-based approach. Inform. Systems 18(5):329-342, 1993.
37. Hua, K. A., Lee, C., and Hua, C. M. Dynamic load balancing in multicomputer database system
using partition tuning. IEEE Trans. Knowledge Data Engrg. 7(6):968-983,1995.
38. Hwang, H. Y., Dayal, U., and Gouda, M. Using semiouterjoin to process queries in multidatabase systems. In ACM SIGMOD Conference on Management of Data, 1984.
39. Kambayashi, Y., Rusinkiewicz, M., and Sheth, A., (Eds.) Proceedings of the First International
Workshop on Interoperability in Multidatabase Systems, Kyoto, Japan, 1991.
40. Schek, H-J., Sheth, A., and Czejdo, B., (Eds.). Proceedings of the International Workshop on
Interoperability in Multidatabase Systems, Vienna, Austria, 1993.
41. Kamel, M. N., and Kamel, N. N. Federated database management system: Requirements, issues
and solutions. Comput. Commun. 15(4):1992.
42. Kent, W. Solving domain mismatch and schema mismatch problems with an object-oriented
database programming language. In Proceedings of the International Conference on Very Large
Data Base, 1991.
43. Kim, W., and Seo, J. Classify schematic and data heterogeneity in multidatabase systems. IEEE
Computer 24(12):12-17, 1991.
I 73
44. Krishnamurthy, R., Litwin, W., and Kent, W. Language features for interoperability of databases
with schematic discrepancies. In Proc. of the SIGMOD Conference, pp. 40-49, 1991.
45. Lee, C , and Tsai, C. L. Strategies for selection from heterogeneous databases. In Proceedings of
the Third International Symposium on Database Systems for Advanced Applications, Taejon,
Korea, April 1993.
46. Lee, C , Chen, C.-J., and Lu, H. An aspect of query optimization in multidatabase systems.
ACM SIGMOD Record, 24(3):28-33, 1995.
47. Lee, C , and Chang, Z.-A. Utilizing page-level join index for optimization in parallel join
execution. IEEE Trans. Knowledge Data Engrg. 7(6): 1995.
48. Lee, C , and Wu, M.-C. A hyperrelational approach to integration and manipulation of data
in multidatabase systems. Int. J. Intell. Cooperative Inform. Systems 5(4):395-429, 1996.
49. Lee, C , and Chen, J. R. Reduction of access sites for query optimization in multidatabase
systems. IEEE Trans. Knowledge Data Engrg. 9(6):941-955, 1997.
50. Lee, C , Ke, C.-H., and Chen, Y.-H. Minimization of query execution in multidatabase systems
Int. J. Cooperative Inform. Systems, 2002.
51. Lim, E.-R, Srivstav, J., Prabhakar, S., and Richardson, J. Entity identification in database
integration. In Proc. of the International Conference on Data Engineering, pp. 294-301,
1993.
52. Litwin, W. MALPHA: A relational multidatabase manipulation language. In Proceedings of
the International Conference on Data Engineering, April 1984.
53. Litwin, W., and Abdellatif, A. Multidatabase interoperability. IEEE Computer 1986.
54. Litwin, W., and Vigier, P. Dynamic attributes on the multidatabase system MRDSM. In Proc.
of the International Conference on Data Engineering, 1986.
56. Litwin, W., and Abdellatif, A. An overview of the multi-database manipulation language mDSL.
Proc. IEEE 75(5):1987.
57. Litwin, W., Mark, Leo., and Roussopoulos, N. Interoperability of multiple autonomous
databases. ACM Comput. Surveys, 22(3):267-293, 1990.
58. Litwin, W. MSQL: A multidatabase language. Inform. Sci. 1990.
59. Litwin, W., Ketabchi, M., and Krishnamurthy, R. First order normal form for relational
databases and multidatabases. ACM SIGMOD Record 20(4):1991.
60. Lu, H., Shan, M . - C , and Tan, K.-L. Optimization of multi-way join queries. In Proceedings of
the 17th International Conference on VLDB, Barcelona, September pp. 549-560, 1991.
61. Lu, H., Ooi, B.-C, and Goh, C.-H. Multidatabase query optimization: Issues and solutions. In
Proc. of the Workshop on Interoperability in Multidatabase Systems, pp. 137-143, 1993.
62. Lu, H., Ooi, B.-C, and Tan, K.-L. (Eds.). Query Processing in Parallel Relational Database
Systems. IEEE Computer Society Press, Los Alamitos, CA, 1994.
63. Meng, W., Yu, C , Guh, K. C , and Dao, S. Processing multidatabase queries using the fragment
and replicate strategy. Technical Report CS-TR-93-16. Department of Computer Science, SUNY
at Binghamton, 1993.
64. Meng, W., and Yu, C Query processing in multidatabase systems. In Modern Database Systems:
the Object Model, Interoperability, and Beyond (Kim, W. Ed.), Chap. 27. Addison-Wesley,
Reading, MA, 1995.
65. Missier, P., and Rusinkiewicz, M. Extending a multidatabase manipulation language to resolve
schema and data conflicts. Technical Repost UH-CS-93-10. University of Houston, November
1993.
66. Motro, A. Superviews: Virtual integration of multiple databases. IEEE Trans. Software Engrg.
13(7):785-798, 1987.
67. Navathe, S., Elmasri, R., and Larson, J. Integrating user views in database design. IEEE Computer 1986.
68. Ngu, A. H. H., Yan, L. L., and Wong, L. S. Heterogeneous query optimization using maximal
sub-queries. In International Conference on Database Systems For Advanced Applications
(DASFAA'93), Taejon, Korea, April pp. 4 1 3 ^ 2 0 , 1 9 9 3 .
69. A Special Issue on Semantic Issues in Multidatabase Systems, ACM SIGMOD Record
20(4):1991.
70. Rusinkiewicz, M., Elmasri, R., Czejdo, B., Georgakopoulos, D., Karabatis, G., Jamoussi, A.,
Loa, K., and Li, Y. Query processing in OMNIBASE^A loosely coupled multi-database system.
Technical Report UH-CS-88-05. University of Houston, February 1988.
I 74
CHIANG LEE
71. Salza, S., Barone, G., and Morzy, T. Distributed query optimization in loosely coupled multidatabase systems. In International Conference on Database Theory, 1994.
72. Savasere, A., Sheth, A., Gala, S. K., Navathe, S. B., and Marcus, H. On applying classification
to schema integration. In Froc. First International Workshop on Interoperability in Multidatabase Systems, Kyoto, Japan, April 1991.
73. Sciore, E., Siegel, M., and Rosenthal, A. Using semantic values to facilitate interoperability
among heterogeneous information systems. ACM Trans. Database Systems 19(2):254-290,
1994.
74. Sheth, A. P., and Larson, J. A. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surveys 22(3):1990.
75. Siegel, M., and Madnick, S. E. A metadata approach to resolving semantic conflicts. In Froc.
of the 17th International Conf. on VLDB, September 1991.
76. Spaccapietra, S., and Parent, C. Conflicts and correspondence assertions in interoperable
databases. SIGMOD Record 20(4):1991.
77. Spaccapietra, S., Parent, C., and Dupont, Y. Model independent assertions for integration of
heterogeneous schemas. VLDB Journal 1, 1992.
78. Suardi, L., Rusinkiew^icz, M., and Litwin, W. Execution of extended multidatabase SQL. In
IEEE Froceedings of the International Conference on Data Engineering, 1993.
79. Templeton, M., Brill, D., Dao, S. K., Lund, E., Ward, R, Chen, A. L. R, and Macgregor, R.
MermaidA front-end to distributed heterogeneous databases. Froc. IEEE 75(5): 1987.
80. Tresch, M., and SchoU, M. H. Schema transformation processors for federated object-bases. In
DASFAA 1993.
81. Wang, Y. R., and Madnick, S. E. The inter-database instance identification problem in integrating autonomous systems. In Froc. of the International Conference on Data Engineering,
1989.
82. Whang, W K., Chakravarthy, S., and Navathe, S. B. Heterogeneous database: Inferring
relationships for merging component schemas, and a query language. Technical Report
UF-CIS-TR-92-048. Department of Computer and Information Sciences, University of Florida,
1992.
83. Yang, J., and Papazoglou, M. P. Determining schema interdependencies in object-oriented multidatabase systems. In International DASFAA Symposium, Korea, 1993.
84. Yu, C , Sun, W , Dao, S., and Keirsey, D. Determining relationships among attributes for interoperability of multi-database systems. In Froc. of the Workshop on Interoperability in Multidatabase Systems, 1991.
85. Leon Zhao, J., Segev, A., and Chatterjee, A. A univerisal relation approach to federated database
management. In International Conference on Data Engineering, March, 1995.
86. Zhu, Q., and Larson, P. A. A query sampling method for estimating local cost parameters in a
multidatabase system. In Froc. of the Int'l Conf. on Data Engineering, pp. 144-153, 1994.
DEVELOPMENT OF MULTILEVEL
SECURE DATABASE SYSTEMS
ELISA BERTINO
Dipartimento di Scienze dell'lnformazione, JJniversita di Milano, 2013S Milano, Italy
ELENA FERRARI
Dipartimento di Chimica, Fisica e Matematica, JJniversita delVlnsubria-Como, Italy
I. INTRODUCTION 175
II. ACCESS CONTROL: BASIC CONCEPTS 178
A. Authorization Objects 179
B. Authorization Subjects 179
C. Authorization Privileges 180
III. MANDATORY ACCESS CONTROL 180
A. The Bell and LaPadula Model 180
B. Denning Model 183
IV. MULTILEVEL SECURITY IN RELATIONAL DBMSs 183
A. Multilevel Relational Data Model 184
B. Sea View 185
C. LDV 186
D. Jajodia and Sandhu Model 187
E. The MLR Data Model 187
V MULTILEVEL SECURITY IN OBJECT DBMSs 188
A. Object Data Model 188
B. SODA Model 189
C. SORION Model 189
D. Millen-Lunt Model 190
E. Jajodia-Kogan Model 190
F. Modeling Multilevel Entities 193
VI. SECURE CONCURRENCY CONTROL 194
A. Architectures 196
B. Secure Concurrency Control Protocols 197
VII. CONCLUSIONS 199
REFERENCES 200
INTRODUCTION
Data protection from unauthorized accesses is becoming more and more crucial
as an increasing number of organizations entrust their data to database systems
[10,33].
Database and Data Communication Network Systems, Vol. 1
Copyright 2002, Elsevier Science (USA). All rights reserved.
I 75
I 76
I 77
178
Table Employee
Emp#
1110
nil
1112
Table Stolen
salary
40k
60k
45k
Application
Read Employee
Write Stolen
(a)
Table Employee
Emp#
1110
1111
1112
Table Stolen
salary
40k
60k
45k
Read Employee
Write Stolen
(b)
F I G U R E 1 An example of Trojan Horse.
179
A. Authorization Objects
Authorization objects are the passive components of a system to which protection from unauthorized accesses should be given. Objects to be considered
depend on the underlying data model. For instance, files and directories are
examples of objects of an operating system, v^hereas if we consider a relational
DBMS, resources to be protected are relations, views, and attributes. With respect to the object dimension we can classify access control mechanisms according to the granularity of access control, that is, according to whether it is possible
to authorize a subject to access only selected components within an object.
Access control models can be further classified according to whether the
set of objects to be protected represents a flat domain or whether the objects
are organized into a hierarchy. In the latter case, the semantics assigned to the
hierarchy greatly depends on the object nature. For instance, consider an objectoriented context. If objects to be protected are classes, the hierarchy represents
the inheritance relations among classes. If objects represent class instances, the
hierarchy reflects the way objects are organized in terms of other objects.
B. Authorization Subjects
Authorization subjects are the entities in the system to which authorizations
are granted. Subjects can be classified into the following categories:
Users, that is, single individuals connecting to the system.
Groups, that is, sets of users.
Roles, that is, named collection of privileges needed to perform specific
activities within the system.
Processes, that is, programs executed on behalf of users.
Note that the above categories are not mutually exclusive. For instance,
a model can support both roles and groups, or both users and processes as
authorization subjects.
Often, both roles and groups are hierarchically organized. The hierarchy
imposed on groups usually reflects the membership of a group to another group.
A nested group inherits the privileges of the groups preceding it in the nesting.
By contrast, the role hierarchy usually reflects the relative position of roles
within an organization. The higher is the level of a role in the hierarchy, the
higher is its position in the organization. Thus, a role has all the privileges of
the roles in a lower position in the hierarchy.
Processes need system resources to carry on their activities. Generally, processes refer to memory addresses, use the CPU, call other processes, and operate on data. All these resources must be protected from unauthorized accesses.
Usually, a process is granted accesses only to essential resources, that is, those
necessary to the completion of its tasks. This limits possible damage deriving
from faults of the protection mechanism.
As far as users are concerned, sometimes it would be useful to specify
access policies based on user qualifications and characteristics, rather than user
identity (for example, a user can be given access to an R rated video, only if
he/she is older than 18 years). This is the case, for instance, of digital library
I 80
I8 I
aci > ac2, since {Nuclear} c {Nuclear,Army}; aci > ac3, since TS > C and
{Army} c {Nuclear,Army}. Finally, ac2 and acs are incomparable with respect
to the dominance relation. Indeed, ac2 does not dominate ac^, since {Army} is
not contained in {Nuclear}, and ac^ does not dominate ac2 since TS > C.
I 82
In the Bell and LaPadula model, the state of the system is described by a
4-tuple (A,M,L,G), where:
A is the set of current accesses. Elements of A are triples (s,o,p). If
(s,o,p) belongs to A, it means that subject s is currently exercising
privilege p on object o;
M is the access matrix; that is, it is a matrix containing for each object
and subject in the system the privileges the subject can exercise on
the object;
L is a function that given an object or a subject returns its access class;
G is the current hierarchy of objects; that is, it is a hierarchy denoting
the objects that are currently accessible in the system.
Modifications to the state of the system are caused by requests. Requests
can be of four different types:
1.
2.
3.
4.
a
a
a
a
request
request
request
request
by
by
by
by
a
a
a
a
subject
subject
subject
subject
to
to
to
to
In the following, we consider only access requests, that is, request of the
first type (these are requests for the read, append, write, and execute privilege).
We refer the reader to [3] for the other types of requests.
The answer to a given request is called decision. Given a state of the system
and a request, the result of the request is decided based on a set of security
axioms, A system is safe if the requests are processed based on these axioms.
As far as access requests are concerned, the Bell and LaPadula model is
governed by the following two axioms:
Simple Security Property. A state (A,M,L,G) satisfies the simple security
property if for each element a = (s,o,p) in A one of the following conditions is
verified:
1, p = execute or p = append;
2, p = read or p = write and L(s) > L(o).
*-Property (Read Star Property). A state (A,M,L,G) satisfies the
*-property if for each element a = (s,o,p) in A one of the following conditions
is verified:
1, p = execute;
1, p = append and L(s) < L(o)'^
3, p = write and L(s) = L(o),
For example, a subject with access class (C,{Army}) cannot read objects
with access classes (C,{Navy,NATO}) or (U,{Nato}), since these read operations
would violate the simple security property. Moreover, a subject with access class
(C,{Army,Nuclear}) cannot write an object with access class (U,{Army}), since
this write operation would violate the *-property.
I 83
B. Denning Model
The model by Denning [13] is an extension of the Bell and LaPadula model
presented in the previous section. The main difference between the two models
is that in [13] the concept of security class is introduced, which unifies the
concepts of category and security level of the Bell and LaPadula model. The
Denning model consists of five basic components:
1. a set of objects O, representing the resources to be protected;
2. a set of processes P, which are the active entities of the system,
requiring accesses to the objects;
3. a set of security classes SC;
4. a flow relation, denoted as ->;
5. a binary operator 0 : SC x SC -^ SC.
The operator 0 receives as input two security classes sci and S Q , and
returns the security class that should be assigned to the result of any operation
that combines information contained into objects whose security class is sci
and SC2, respectively. The flow relation specifies the legal information flow. For
example, sci -^ sci specifies that a flow of information may take place from
objects with security class sci to objects with security class SC2. Denning proved
that the triple (SC, -^, 0 ) is a finite lattice under the following hypothesis:
1. SCis a finite set;
2. ^- is a partial order relation over SC;
3. 0 is a total function with a least upper bound.
EXAMPLE
184
BERTINO A N D FERRARI
0
FIGURE 2 Lattice for Example 3.
I 85
to a lower level. Over time this could become a covert channel. Polyinstantiation enables two tuples with the same primary key to exist in a relational
database at different security levels. However, having two tuples with the same
primary key violates the entity integrity property of the standard relational data
model.
Several discussions and debates took place on polyinstantiation in the early
1990s. No consensus was reached. Some argued that polyinstantiation is necessary if we are to design multilevel database systems with higher levels of assurance (see, for example, [14]). Some argued that it is important to maintain the
integrity of the database and that polyinstantiation violates the integrity (see,
for example, [9]). Some used partial polyinstantiation together with security
constraint enforcement in their design (see, for example, [32]). An interesting
operational example showing the disastrous effects of polyinstantiation is given
in the paper by Wiseman [40]. Even among those who support polyinstantiation, there has been much discussion on the correct update semantics to use.
A logic for formalizing concepts in multilevel relations and which supports
polyinstantiation is given in [36].
B. Sea View
SeaView is a multilevel relational data model, developed in the context of the
SeaView project [14]. The SeaView project is a joint project by SRI International
and Gemini Computers, Inc. The project also defined MSQL, an extension of
SQL to handle multilevel data.
The SeaView security model consists of two components; the MAC
(Mandatory Access Control) model and the TCB (Trusted Computing Base)
model. The MAC model defines the mandatory security policy. Each subject is assigned a readclass and a writeclass, A subject can read an object if
the subject's readclass dominates the access class of the object. A subject can
write into an object if the object's access class dominates the writeclass of the
subject.
The TCB model defines discretionary security and supporting policies for
multilevel relations, views, and integrity constraints, among others. The data
model on which SeaView is based is a multilevel relational data model. Multilevel relations are implemented as views over single level relations, that is,
over relations having a single access class associated with them. Implementing
multilevel relations as virtual relations (or views) allows subjects to issue insert, delete, and update requests on these views. Appropriate algorithms are
then used to map updates on views onto updates on the base relations which
are single level. An advantage of the SeaView approach is that the labels of the
data elements need not be stored.
Each database operation is executed by a single level subject. When a subject
at level L issues a request, a database system subject operating at level L will
process the subject's request. This subject then cannot have read access to objects
not classified at or below the level L.
Polyinstantiation is the mechanism introduced by SeaView to handle cover
stories as well as signaling channels. For example, in a multilevel world, it is
possible to have multiple views of the same entity at different security levels.
186
BERTINO A N D FERRARI
In the SeaView model, the two views may be represented, say, by two tuples
with the same primary key, but at different security levels. The primary key
constraint is not violated since in the multilevel relational data model proposed
by SeaView a modified entity integrity property is defined. Additional integrity
properties such as referential integrity property and polyinstantiation integrity
property are also defined in the SeaView model.
C. LDV
In LDV [32], the relational query language SQL is enhanced with constructs
for formulating security assertions. These security assertions serve to imply
sensitivity labels for all atomic values, contexts, and aggregations in a database.
The labeled data are partitioned across security levels, assigned to containers
with dominating security markings or levels, and may only flow upward in level
unless authorized otherwise.
LDV is based on the LOCK security policy, which consists of both a discretionary and a mandatory security policy. The discretionary security policy
regulates the sharing of objects among the various subjects. The mandatory security policy controls the potential interferences among subjects and consists of
a mandatory access control policy and a type enforcement policy. The mandatory access control policy is based on the Bell and LaPadula policy. The type
enforcement policy restricts accesses of subjects to objects based on the domain
of the subject and the type of the object.
Moreover, LDV addresses the problem of updating and querying a multilevel database. The update classification pohcy addresses the problem of proper
classification of the database data. When the database is updated, the classification level of the data is determined. The data are then inserted into an object
whose level dominates the level of the data. The response classification policy
addresses the problem of proper classification of response to queries. This is
a problem because the response may be built based on the data in many base
relations. In the process of manipulating and combining the data, it is possible
that the data will be used in a manner that reveals higher level information.
The problem becomes more acute when one realizes that the response will be
released into an environment in which many responses may be visible. Thus, the
problem becomes one of aggregation and inference over time as well as across
relations. In the LDV model, a response can only be released if it is placed in
an object whose level dominates the derived level of the response, where the
derived level is the maximum level of any information that can be deduced from
the response by a subject reading this response.
Subjects interact with LDV through a request importer and a request exporter. Access to data as well as metadata is controlled by LOCK. Information
in the database as well as the meta-database is stored in single-level files, i.e.,
LOCK objects. LOCK ensures that these database files may be manipulated
only by subjects executing at the appropriate levels, and in the appropriate
database domains. The three major operations performed by LDV are query,
update, and metadata management. Each of these operations is an interaction
between a non-LDV subject representing a user, and the LDV subjects that
manipulate the database.
I 87
I 88
However, there can be situation in which the TS subject would prefer the value
at the lowest level, instead of that at the highest level. Thus, in the MLR model
a new semantics for data classified at different levels is proposed, based on the
following principles:
1. The data accepted by a subject at a given security level consist of two
parts: (i) the data classified at his/her level; and (ii) the data
borrowed from lower levels;
2. The data a subject can view are those accepted by subjects at his/her
level and by subjects at lower levels;
3. A tuple with classification attribute c contains all the data accepted
by subjects of level c.
I 89
I 90
191
Oh
Ok
FIGURE 3
n
The message filter model.
of the sender and receiver object. Several actions are possible. For instance, the
message can be sent unaltered to the receiver object, or it may be rejected.
The latter action is taken when a low-classified object sends a message to a
higher-classified object requesting to read some attributes of the latter.
A third possible action is to send the message to the receiver with the constraint that the invoked method is executed in restricted mode. This means that,
even though the receiver can see the message, the execution of the corresponding method on the receiver should leave the state of the receiver (as well as of
any other object at a level not dominated by the level of the receiver) as it was
before the execution. Thus, the attributes of the receiver (and of any other object at a level not dominated by the level of the receiver) should not be modified
by the method invoked upon receipt of the message.
Figure 3 exemplifies the message filter model. In the graphical representation, objects are denoted by nodes, and messages by oriented arcs from the
sender object to the receiver. Each arc is intercepted by the message filter. An
arc that does not pass through the message filter denotes a rejected message;
an example is the message from object Ok in Fig. 3. An arc that after passing
through the message filter becomes a dashed arc denotes a message to be executed by the receiver in restricted mode; an example is the message from object
Oj in Fig. 3. Finally, an arc that passes through the message filter without any
change denotes a message sent to the receiver, without requiring execution in
restricted mode. An example is the message from object o/ in Fig. 3.
Note that information does not necessarily flow every time a message is
sent between objects in that an object acquires information only by modifying
its internal state, that is, by changing the values of some of its attributes. Thus,
no information flow is enacted if the attribute values of an object have not
been modified as part of the method executed in answer to the message. In
such cases, the forward flow is said to be ineffective. Similarly, whenever a
null reply is sent back as reply to a message, the backward information flow is
ineffective.
I 92
In an object-oriented system, information flow can take place either (i) when
a message is sent from an object to another or (ii) when an object is created.
In case (i), information flow can be from the sender to the receiver, or vice
versa. The forward flow is from the message sender to the receiver and is carried through the message arguments. The backward flow is from the message
receiver to the sender and is carried through the message reply. In case (ii), information flow is only in the forward direction through the attribute values
with which the newly created object is to be initialized. Those flows are called
direct flows.
Information flow can also be indirect in that a message from an object
to another may result in a method execution as part of which a message to a
third object is sent. Consider for instance the case of an object Oj sending a
message gi to another object oy. Suppose that oy does not change its internal
state as a result of receiving gi, but instead sends a message gj to a third object
Ok' Moreover, suppose that the arguments of gj contain information derived
from the arguments of gi (e.g., by copying some arguments of gi to gj). If
the corresponding method execution in Ok results in updating the state of Ok^,
a transfer of information has taken place from Oi to Ok^ Note that the flow
from Oi to Ok has been enacted, even though no message exchange has been
performed between Oi and Ok- Note, moreover, that a flow from o/ to Ok does
not necessarily imply a flow from Oi to Oj.
The message filter intercepts every message and reply exchanged among
objects to prevent both direct and indirect illegal flows of information. For an
object system to be secure, all flows must be from lower level objects to higher
level objects. To prevent all the illegal flows of information, the message filter
makes use of a special indicator. Such an indicator, denoted in the following
as rlevel, keeps track, for each method invocation t, of the least upper bound
of the levels of all objects encountered in the sequence of method invocations
starting from the object that began the computation and ending with t.
The message filter works as follows. Let oi and 02 be the sender and receiver
object, respectively. Moreover, let ti denote the method invocation on oi as part
of which the message gi is sent to 02; ^2 denotes the method invocation on 02
performed upon receiving message gi. Two major cases arise depending on
whether gi is a primitive message. Let us first consider the case of nonprimitive
messages. The following cases arise:
1. the sender and receiver are at the same level: the message and the reply
are allowed to pass;
2. the levels of the sender and receiver are incomparable: the message is
blocked and a null reply is returned to method ti;
3. the receiver has a higher level than the sender: the message is passed
through; however, the actual reply from ti is discarded and a null reply
is returned to ti. To prevent timing channels the null value is returned
before executing ^2;
4. the receiver has a lower level than the sender: the message and reply are
allowed to pass. ^2 is executed in restricted mode; that is, it is restricted
from modifying the state of the receiver or creating a new object (i.e.,
the method invocation is memoryless). Moreover, this restriction is
propagated along with further messages sent out by ti to other objects.
I 93
I 94
of the object are reachable through reference links. The approach is based on
the assumption that the database schema is not protected; i.e., all attribute
definitions are visible to every subject (even if the subject cannot see attribute
values).
Finally, Bertino et ah [6] have recently proposed an approach to model multilevel entities by using composite objects and delegation [17]. The approach
basically consists in mapping each multilevel entity type onto a corresponding set of single-level classes, which are stored in the underlying object DBMS.
Those classes are related by appropriate composite references. Moreover, each
class is equipped with a set of accessor methods, which allow the retrieval of
information from component objects. One important feature of the approach
presented in [6] is that an attribute of a multilevel entity type may take on values
at different security levels. For each level, the application designer may specify
the desired policy with respect to polyinstantiated attribute values. For example, the designer may specify that a low value is a cover story, and thus it should
not be considered at higher levels, or that it should be considered at a higher
level only if no other value is specified for that level. The generation of such class
schema can be however quite difficult. Thus, in [6] a methodology that takes as
input the specification of a set of multilevel entity types organized into aggregation and inheritance hierarchies and returns a set of corresponding single-level
classes is defined. Each class is equipped with the proper composite references
and methods to read and write the attributes of their instances. Algorithms implementing this methodology are also presented in [6]. The single-level object
representation is thus transparent to the users in that the generated single-level
objects provide the same interfaces, i.e., respond to the same messages, as if
multilevel objects were directly supported. The methodology proposed in [6],
called Class Schema Generation (CSG), methodology consists of four steps to
be sequentially executed. These steps are illustrated in Fig. 4. The first phase of
the methodology, called schema completion phase, modifies the input entitytype schema to make easier the execution of the subsequent phases. The second
phase is the class generation phase, which generates the single-level classes corresponding to the multilevel entity types received as input. During this phase
only the class names and the inheritance relationships are defined. Attributes
and methods are defined by the two subsequent phases, called attribute generation and method generation phase, respectively.
195
Entity schema
(
I
Schema
completion
j
)
Completed
entity schema
Class generation
Partial class
schema
f
I
Attribute
generation
^
)
Partial class
schema
f
I
Method
generation
^
J
Class schema
FIGURE 4 The CSG methodology.
196
information to lower level subjects. Thus, in addition to verify the Bell and
LaPadula principles, a concurrency control protocol for MLS/DBMSs must be
free of signaling channels.
In the remainder of this section we first survey the most important architectures on which secure concurrency control relies; we then illustrate some of
the most relevant secure concurrency control protocols for MLS/DBMSs.
A. Architectures
Most of the research on secure transaction processing in MLS/DBMSs can be
categorized into two broad categories: one based on a kernelized architecture
and the other based on a replicated architecture. Both architectures rely on the
notion of trusted front end (TFE), which cannot be bypassed. In the kernelized
architecture (illustrated in Fig. 5) a multilevel database is partitioned into a set
of single-level databases, each of which stores data at one particular level. The
TFE component ensures that when a subject submits a query to the system, this
query is submitted to the DBMS with the same security level as the subject. By
contrast, the trusted back end makes sure that the Bell and LaPadula principles
are satisfied. The main drawback of the kernelized architecture is that the query
execution performance greatly degrades when queries access data from multiple
security levels, since data at different levels are stored separately. In contrast, the
main advantage of the kernelized architecture is that the transaction scheduler
can be decomposed into several untrusted schedulers, one for each security level.
subjects
A
^Iz
TFE
FIGURE 5
197
subjects
TFE
UDBMS
CDBMS
SDBMS
TS DBMS
M/
Unclassified
FIGURE 6
Unclassified
Confidential
Unclassified
Confidential
Secret
Unclassified
Confidential
Secret
Top Secret
The replicated architecture, Hke the kerneUzed architecture, uses different databases to store data at different levels. For each security level, a different database exists. However, unlike the case of the kernelized architecture,
the database at a security level L contains all the data which are classified either
at level L or at a level lesser than L.
The replicated architecture is illustrated in Fig. 6. In this architecture, when
a transaction at a given level wishes to read data at lower levels, it will be given
the replica of the lower level data stored in the database at its own security
level.
Although, the replicated architecture makes query execution more efficient
with respect to the kernelized architecture, it suffers from several drawbacks.
First, this architectures is not practical for a large number of security levels. The
propagation of updates from lower to higher levels is also a critical issue. For
these reasons, in the following we consider the kernelized architecture only. We
refer the interested reader to [12,16,20] for secure update propagation protocols
in replicated architectures.
B. Secure Concurrency Control Protocols
The problem of secure concurrency control in MLS/DBMSs has been extensively
investigated and several proposals can be found in the literature [18,23-25,31],
Algorithms for concurrency control can be divided into two main categories:
two-phase locking algorithms and timestamp-ordering algorithms.
I 98
To avoid signaling channels several approaches that apply to both the twophase locking and the timestamp-ordering protocols have been proposed. In the
following, we illustrate the most relevant proposals for both these protocols.
A method for avoiding the problem of signaling channel when a two-phase
locking protocol is used is to abort a higher-level transaction having a lock
on lower-level data, whenever a lower-level transaction requires the access to
these data. Clearly, this approach avoids the problem of signaling channels;
however, its main drawback is that it can cause transaction starvation, that
is, it may cause a transaction to always be aborted and never complete its
execution.
McDermott and Jajodia proposed a method for reducing transaction starvation [25]. The idea is that whenever a high transaction must release a lock
because of a write request of a lower-level transaction, it does not abort. Rather,
it enters a queue containing all the high-level transactions waiting for reading
I 99
that data. The main drawback of this approach is, however, that it does not
always produce seriaHzable schedules.
As far as timestamp-ordering protocols are concerned, the main problem
with this approach is that when a high transaction reads lower-level data, it cannot modify the read timestamp of such data since this will result in a write-down
operation (that violates the Bell and LaPadula principles). Several approaches
to eliminate this problem and to avoid the problem of transaction starvation
have been proposed. One of the solutions is to maintain multiple versions of
the same data. When multiple copies of the same data are maintained, the notion of correctness for concurrent transaction executions is restated as follows.
The concurrent execution of transactions is said to be correct when its effect
is equivalent to that of a serial execution of the transactions on a one-copy
database (this property is called one-copy serializability). In the following, we
briefly describe two of the most relevant proposals of timestamp-ordering protocols for secure concurrency controls [18,23]. More details on this topic can
be found in [2].
The approach proposed by Keefe and Tsai [23] uses a variation of the
timestamp-ordering protocol, which differs from the conventional one, in the
way of assigning timestamps to transactions. To avoid the problem of signaling
channels a transaction is assigned a timestamp that is smaller than the timestamps of all the other transactions with a lower security level. The Keefe and
Tsai method ensures secure concurrent execution of transactions, guarantees
one-copy serializability, and avoids the problem of starvation. However, it uses
a multilevel scheduler which, therefore, needs to be trusted.
The approach proposed by Jajodia and Atluri [18] assigns timestamps to
transactions based on their arrival order. Whenever a transaction reads data
from lower levels, it must postpone its committing until all the transactions
from these lower levels with smaller timestamps have committed. The protocol
uses a single-level scheduler that does not need to be trusted. Moreover, it
guarantees secure concurrency control as well as one-copy serializability, and
it avoids the problem of starvation.
VII. CONCLUSIONS
This chapter has provided a fairly comprehensive overview of the developments
in secure multilevel database systems. We have first reviewed the basic concepts
of access control. Then we have discussed the basic principles of mandatory
access control, and we have presented two milestones in the development of
access control models for multilevel DBMSs: the model by Bell and LaPadula
and the model by Denning. Furthermore, we have provided an overview of
the proposals for multilevel access control both in relational and in object
database systems. Next, we have provided details on secure concurrency control
in multilevel database systems by illustrating the architectures that can be used
and the protocols that have been proposed.
Directions in secure database systems will be driven by the developments in
system architectures. Database systems are no longer stand-alone systems. They
are being integrated into various applications such as multimedia, electronic
200
commerce, mobile computing systems, digital libraries, and collaboration systems. Therefore, security issues for all these new generation systems will be very
important. Furthermore, there are many developments on various object technologies such as distributed object systems and components and frameworks.
Security for such systems is being investigated. Eventually, the security policies
of the various subsystems and components must be integrated into policies for
the entire systems. There will be many challenges in formulating policies for
such systems. New technologies such as data mining will help solve security
problems such as intrusion detection and auditing. However, these technologies can also violate the privacy of individuals. This is because adversaries can
now use the mining tools and extract unauthorized information about various
individuals. Migrating legacy databases and applications will continually be a
challenge. Security issues for such operations cannot be overlooked. These new
developments in data, information, and knowledge management will involve
numerous opportunities and challenges for research in database security.
REFERENCES
1. Atluri v., Adam, N., Bertino E., and Ferrari, E. A content-based authorization model for digital
libraries, IEEE Trans. Knowledge Data Eng., in press.
2. Atluri, v., Jajodia, S., and Bertino, E. Transaction processing in multilevel secure databases
with kernelized architectures. IEEE Trans. Knowledge Data Eng. 9(5):697-708,1997.
3. Bell, D., and LaPadula. L. Secure computer systems: Unified exposition and multics interpretation. Technical Report ESD-TR-75-306, Hanscom Air Force Base, Bedford, MA, 1975.
4. Bertino. E. Data security. Data Knowledge Eng. 25(1~2):199-216, 1998.
5. Bertino, E., Catania, B., and Vinai A. Transaction modeling and architectures. In Encyclopedia
of Computer Science and Technology. Marcel Dekker, New York, 2000.
6. Bertino, E., Ferrari, E., and Samarati, P. Mandatory security and object-oriented systems: A
multilevel entity model and its mapping onto a single-level object model. Theory Practice Object
Systems 4(4):l-22, 1998.
7. Bertino, E., and Martino, L. Object-Oriented Database Systems: Concepts and Architectures.
Addison-Wesley, Reading, MA, 1993.
8. Boulahia-Cuppens, N., Cuppens, E, Gabillon, A., and Yazdanian, K. Decomposition of
multilevel objects in an object-oriented database. In Computer SecurityESORICS 94
(D. GoUmann, Ed.), Lecture Notes on Computer Science 875, Springer-Verlag, BerUn, 1994.
9. Burns, R. Referential secrecy. In Proc. of the IEEE Symposium on Security and Privacy,
Oakland, CA, May 1990.
10. Castano, S., Fugini, M. G., Martella, G., and Samarati, R Database Security. Addison-Wesley,
Reading, MA, 1995.
11. Chen E, and Sandhu, R. S. The semantics and expressive power of the MLR data model. In
Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, May 1995.
12. Costich, O. Transaction processing using an untrusted scheduler in a multilevel database with
replicated architecture. Database Security V: Status and Prospects. North-Holland, Amsterdam,
1992.
13. Denning. D. E. Cryptography and Data Security. Addison-Wesley, Reading, MA, 1982.
14. Denning, D. E., and Lunt, T. A multilevel relational data model. In Proc. of the IEEE Symposium
on Security and Privacy, Oakland, CA, April 1987.
15. Goguen, J., and Messeguer, J., Noninterference security policy. In Proc. of the IEEE Symposim
on Security and Privacy, Oakland, CA, April 1982.
16. Kang, I. E., and Keefe, T. F. On transaction processing for multilevel secure replicated databases.
In Proc. European Symposyum In Research in Computer Security (ESORICS 92), 1992.
17. Kim, W , Bertino, E., and Garza, J. F. Composite object revisited. In Proc. ACM Sigmod International Conference on Management of Data, Portland, OR, 1991.
201
18. Jajodia, S., and Atluri, V., Alternative correctness criteria for concurrent execution of transactions in multilevel secure databases. In Proc. of the IEEE Symposium on Security and Privacy,
Oakland, CA, 1992.
19. Jajodia, S., and Kogan, B. Integrating an object-oriented data model with multilevel security.
In Froc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 1990.
20. Jajodia, S., and Kogan, B. Transaction processing in multilevel-secure databases using replicated
architeture. In Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 1990.
21. Jajodia, S., and Sandhu, R. S. Toward a multilevel secure relational data model. In Proc. ACM
Sigmod International Conference on Management of Data, Denver, CO, May 1991.
22. Keefe, T , Tsai, T. W., and Thuraisingham, B. SODAA secure object-oriented database system.
Comput. Security 8(6): 1989.
23. Keefe, T , and Tsai, T. W. Multiversion concurrency control for multilevel secure database
systems. In Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 1990.
24. Lamport, L. Concurrent reading and writing. Commun. ACM 20(11):806-811,1977.
25. McDermott, J., and Jajodia, S. Orange locking: Channel free database concurrency control via
locking. In Database Security VJ: Status and Prospects. North-Holland, Amsterdam, 1993.
26. Millen, J., and Lunt, T. Security for knowledge-based systems. In Proc. of the IEEE Symposium
on Security and Privacy, Oakland, CA, 1992.
27. Millen, J., and Lunt, T. Security for object-oriented database systems. In Proc. of the IEEE
Symposium on Security and Privacy, Oakland, CA, 1992.
28. Morgenstern, M. Security and inference in multilevel database and knowledge base systems. In
Proc. of the ACM Sigmod International Conference on Management of Data, San Francisco,
CA, 1987.
29. Morgenstern, M. A security model for multilevel objects with bidirectional relationships. In
Proc. of the 4th IFIP 11.3 Working Conference in Database Security, Halifax, England, 1990.
30. Rosenthal, A., Herndon, W., Graubart, R., Thuraisingham, B. Security for object-oriented
systems. In Proc. of the IFIP 11.3 Working Conf. on Database Security, Hildesheim, August
1994.
31. Reed, D. R, and Kanodia, R. K. Synchronization with event counts and sequencers. Commun.
ACM 22(5):115-123, 1979.
32. Stachour, R, and Thuraisingham, M. B. Design of LDVA multilevel secure database management system. IEEE Trans. Knowledge Data Eng. 2(2):1990.
33. Summers, R. C. Secure Computing: Threats and Safeguard. McGraw-Hill, New York, 1997.
34. TDI, Trusted database interpretation. Department of Defense Document, 1991.
35. Thuraisingham, B. Mandatory security in object-oriented database management systems. In
Proc. of the ACM Conference on Object-oriented Programming Systems, Languages and
Applications (OOPSLA), New Orleans, LA, 1989.
36. Thuraisingham, B., NTML. A nonmonotonic types multilevel logic for secure databases. In
Proc. of the Computer Security Foundations Workshop, Franconia, NH, June 1991.
37. Thuraisingham, B. A tutorial in secure database systems, MITRE Technical Report, June 1992.
38. Thuraisingham, B. Multilevel security for distributed heterogeneous and federated databases.
Comput. Security 14, 1994.
39. Winslett, M., Ching, N., Jones, V., and Slepchin, I. Using digital credentials on the World-Wide
Web./. Comput. Security 5, 1997.
40. Wiseman, S. On the problem of security in databases. In Proc of the IFIP 11.3 Conference on
Database Security, Monterey, CA, September 1989.
41.Ullman, J. Principles of Database and Knowledge-Base Systems, Vol. 1. Computer Science
Press, Rockville, MD, 1988.
HSIN-HORNG CHEN
Department of Computer and Information Science, National Chiao Tung University, Hsinchu,
Taiwan, Republic of China
I. INTRODUCTION 203
II. FUZZY SET THEORY 205
III. FUZZY QUERY TRANSLATION BASED ON THE a-CUTS
OPERATIONS OF FUZZY NUMBERS 207
IV. FUZZY QUERY TRANSLATION IN THE DISTRIBUTED
RELATIONAL DATABASES ENVIRONMENT 214
V DATA ESTIMATION IN THE DISTRIBUTED RELATIONAL
DATABASES ENVIRONMENT 217
VI. CONCLUSIONS 231
REFERENCES 231
This chapter presents a fuzzy query translation method based on the a-cuts operations
of fuzzy numbers to translate fuzzy queries into precise queries in the distributed relational databases environment. We also present a method for estimating incomplete data
when the relations stored in a failed server failed to access in the distributed relational
databases environment. We have implemented a system to translate fuzzy SQL queries to
precise queries in the distributed relational databases environment. The proposed methods allow the users to deal with fuzzy information retrieval in a more flexible manner
in the distributed relational databases environment.
INTRODUCTION
*%^%
203
204
information for the enterprise. There are many types of database systems on
the commercial market. Relational database systems are most widely used in
the enterprise. Existing relational database systems only provide precise query
operations. They cannot deal with imprecise queries. For example, if a user
would only like to know whose salaries are high in his company, he cannot get
the answer from the existing database systems because the query condition
"high" for salary is unknown for the traditional relational database systems.
Moreover, users cannot actually know what data they need. If they submit
queries in current relational database systems, sometimes they cannot get the
right answers. Thus, how to provide a user-friendly query mechanism or relaxed query condition to let the user get the required answer is more and more
important.
Sometimes a company is distributed logically into divisions. Thus, a distributed system enables the structure of the database to mirror the structure of
the company, where local data can be kept locally and the remote data can be
accessed when necessary by means of computer networks. It may happen that
one of the distributed databases stored in a failed server fails to access when
queries are submitted from the end users. In this case, we need a mechanism
for deriving the incomplete information which is nearly equal to the failed data
before the failed server has recovered to the normal state.
Since Zadeh proposed the theory of fuzzy sets in 1965 [33], some researchers have investigated the application of fuzzy set theory for query translations for relational databases systems. In [9], Chen et al. present techniques
of fuzzy query translation for relational database systems. In [31], Yeh and
Chen present a method for fuzzy query processing using automatic clustering
techniques. In [4], Chang and Ke present a database skeleton and introduce its
application to fuzzy query translation. In [5], Chang and Ke present a method
for translation of fuzzy queries for relational database systems. In [1], Bosc
et al. propose an extension of DBMS querying capabilities in order to allow
fuzzy queries against a usual database. In [14], Hou and Chen apply the fuzzy
set theory to the structural query language of relational database systems. In
[23], Nakajima and Senoh investigate the operations of fuzzy data in fuzzy SQL
language. In [35], Zemankova proposes a fuzzy intelligent information system.
In [16], Kacprzyk et al. developed a "human-consistent" database querying
system based on fuzzy logic with linguistic quantifiers.
In this chapter, we propose a method based on the a-cuts operations
for translating fuzzy queries into precise queries in a distributed relational
databases environment. We also propose a method for estimating incomplete data when one of the relations failed to access in the distributed relational databases environment. Based on the proposed methods, we also implement a system for translating fuzzy SQL queries into precise queries in
the distributed relational databases environment. The proposed methods allow the users to access the distributed relational databases in a more flexible
manner.
This paper is organized as follows. In Section II, we briefly review some
basic concepts of fuzzy set theory from [33]. In Section III, we briefly review
the fuzzy query translation method for relational database systems from [8]. In
Section FV, we introduce a method for fuzzy query translation in the distributed
205
A={(ui,fiA(ui))\ui
GU},
(1)
where /x^, MA: U -^[0, 1], is the membership function of the fuzzy set A, and
/XA(W/) is the membership grade in which the element Uj belongs to the fuzzy
set A.
If the universe of discourse U is a finite set, U = {^i, wij ? ^n}? then the
fuzzy set A can be expressed as
n
i=l
= M A ( W I ) M + ^'A{U2)/U2
- } - + fMA(Un)/Un,
(2)
where " + " means "union" and the symbol "/" means the separator.
If the universe of discourse U is an infinite set, then the fuzzy set A can be
expressed as
-i
I^AM/U,
ueU.
(3)
There are three basic operations between fuzzy sets, i.e., intersection, union,
and complement. Let A and B be two fuzzy sets of the universe of discourse
U, [7 = {wi, W2,..., w}, and let /IA and /JLB be the membership functions of the
fuzzy sets A and B, respectively, where
IMA-.U^
[0, 1 ] ,
MB : U ^
[0,1],
A= {{ui,fiA{ui))\i
e U],
B = {(MMB(.))|M, e
17}.
206
The three basic operations of the fuzzy sets are reviewed from [33] as
follows.
DEFINITION
2. Let AnB
defined as
IJ^AnB(ui) = mm(fjiAM,fMB(ui)),
DEFINITION
(4)
DEFINITION
"iui e U,
Vw,- G U.
(5)
Wui e U.
(6)
"ix e U.
(7)
e U.
(8)
(9)
The dilation operator can be used to approximate the effect of the linguistic
modifier More or Less, Thus, for any fuzzy set A,
More or Less A = A^'^ = DIL(A).
(10)
207
1.0
CON(A)
0.5
0
m m
FIGURE I
(H)
HA(U)
1.0
0.5
0
FIGURE 2
DIL(A)
208
a
FIGURE 3
We also can express the trapezoidal fuzzy number shown in Fig. 3 by the
membership function shown as
(
/XM(^
b-a'
<^3fcj^3 d) =
1,
X d
^ c
d'
a < X < b,
b < X < c,
(12)
c < X < d.
(attributes)
(relations)
(query conditions)
RSV = (retrieval threshold value).
M^)
U/a
FIGURE 4
Cu2 d
(14)
209
where
(attributes): List the attributes to be projected.
(relations): Identify the tables where attributes will be projected and possibly
will be joined.
(query condition): Include the conditions for tuple/row selection within a
single table or between tables implicitly joined. It may contain either
standard query conditions or fuzzy query conditions.
(retrieval threshold value): A retrieval threshold value between 0 and 1.0.
A
T
A= C
RSV = a.
(15)
(16)
Step 3: If M(C)a = [^, w^j, then the original query can be translated into
the statements
SELECT
FROM
WHERE
WITH
A
T
A > Ua AND A < Ufe
RSV= a.
(17)
eU}.
(18)
Step 2: Apply the semantic rules to find the meaning M(C) of the
composite fuzzy term C as
M(C) = {(ui,nc(ui))\ui
eU).
(19)
e [0,1]}.
(20)
2 I0
Step 4: If M(C)a = [^, w^], then the original query can be translated into
the statements
SELECT
FROM
WHERE
WITH
A
T
(21)
,^
(22)
(23)
Mm(fici(ui),
l^C2(Ui),...,fMCn(Ui)),Ui
U).
e [0,1]}.
(24)
Step 4: If M(C)a = [M^, Ub], then the original query can be translated into
the statements
SELECT
FROM
WHERE
WITH
A
T
A>Ua AND A < Ufe
RSV = a.
(25)
eUll<j<n.
(26)
I Mc(wi) = Max(/xci(w,),
f^C2M,"',t^CnM),Ui
e U).
(27)
211
(28)
Step 4: If M{C)oi = [w^, w^], then the original query can be translated into
the statements
SELECT
FROM
WHERE
WITH
A
T
A > u^ AND A < u^
RSV = ot.
(29)
A
T
A > u^ AND A < u^ OR A > u, AND A < u^
RSV = a.
(30)
Grades of
Membership
1.0
Salary
10000
20000
30000
40000
F I G U R E 5 Membership functions of the fuzzy terms "low," "medium," and "high" for the linguistic
variable "SALARY."
212
EXAMPLE
ID, SALARY
EMPLOYEE
SALARY = medium OR high
RSV = 0.95.
In the condition part of the above fuzzy SQL query, we can see that it is
a compound fuzzy query, where "medium" and "high" are the fuzzy terms of
the hnguistic variable "SALARY." The meanings of "medium" and "high" for
the attribute SALARY are
M(high) = (50000, 60000, 70000, 70000),
M(medium) = (25000, 35000,45000,55000),
and the retrieval threshold value 0.95 is defined in the "WITH" clause. Thus,
we can perform the union operation on M(high) and M(medium) to get M(C)
as shown in Fig. 6.
After performing the 0.95-cut operation on M(C), we can get two intervals.
We can calculate each interval by formula (13), respectively, shown as
^M(medium) , 095 = 0.95 * (35000 - 25000) + 25000 = 34500,
^M(medium)_, 095 = ^'^^ * (45000 - 55000) + 55000 = 45500,
WM(high) 1 095 = 0.95 * (60000 - 50000) + 50000 = 59500,
^M(high)ro95 = 0.95 * (70000 - 70000) + 70000 = 70000.
Then, we can translate the user's original fuzzy query into the statements
SELECT
FROM
ID, SALARY
EMPLOYEE
Grades of
Membership.
* Salary
10000
FIGURE 6
20000
213
WHERE
WITH
EXAMPLE
In the condition part of the above fuzzy SQL query, we can see that it
is a composited fuzzy query; "high" is a fuzzy term of the Hnguistic variable
"SALARY," and "very" is a linguistic hedge to modify the fuzzy term "high".
The meaning of "high" is shown as
M(high) = (50000, 60000, 70000, 70000).
After we perform the Unguistic hedge "very" on the fuzzy term "high", we
can obtain the meaning M(very high) of the fuzzy term "very high" as shown
in Fig. 7.
After performing 0.95-cut on M(very high), we can obtain an interval
[59746, 70000] shown in Fig. 7. Then we can translate the user's original
fuzzy query into the statement
SELECT
FROM
WHERE
WITH
Grades of
Membership
597461
10000
20000
30000
40000
SOOOO
60000
70000
Salary
214
215
Application
ODBC Driver
ODBC Driver
ODBC Driver
FT J
Data Source
FIGURE 8
Data Source
Data Source
216
Q
A
Fuzzy SQL statements
n
Membership j
Function
Library
SQL-Type Database
Management Environment
Remote
Database
System
FIGURE 9
Rule Base
Remote
Database
System
performed by this component. When the fuzzy SQL translator passes the precise
SQL statement to this component, it will take some necessary operations to deal
with this query statement. For example, if there are some databases residing in
remote sites, this component maintains the communication links between the
remote databases and the local program to ensure the successful access to the
databases.
4. Remote database connectors: These components serve as the communication devices between the local program and the remote database servers.
These remote database connectors are database dependent; i.e., each type of
database has its specific driver. Through these drivers, the program will not
necessarily decide what kind of database system it will deal with. Therefore,
the methods for accessing databases are totally transparent to the program.
The architecture of this system is based on the client/server database operating environment. This system resides at the user's local site as a client. The
databases can be located either in the local site or in the remote site. These two
database servers act as the server sites. In our implementation, we distribute
the environment to a system and two databases as shown in Fig. 10.
217
Database Server B
Database Server A
Local Computer
F I G U R E 10 The environment of the implementation.
(31)
where each value of uij in the fuzzy relation matrix V is defined by an experienced database administrator.
Vl
V2
Vl
Ui2
Uln
V2
U21
1
:
U2n
:
Vn
Unl
FIGURE I I
Un2
Vn
J
1
218
1
1
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
1.0
0.7
0.4
0.7
1.0
0.6
0.4
0.6
1.0
1 ^
1
1
1
1
1
II
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Y
Ph.D.
Ph.D.
Ph.D.
Master
Master
Master
Bachelor
Bachelor
Bachelor
CD(X,Y)
1.0
0.7
0.4
0.7
1.0
0.6
0.4
0.6
1.0
219
Rulel:
Rule 2:
F I G U R E 14
Suppose these two relations are distributed into two different locations. Then
we can create a view called EMP by the SQL statements
CREATE
SELECT
FROM
WHERE
WITH
VIEW EMP AS
ID, DEGREE, EXPERIENCE, SALARY
EMP J N F O , EMP_SALARY
EMPJNFO.ID = EMP_SALARY.ID.
RSV = 0.90.
We know that the results of the relation EMPLOYEE and the view EMP will
be identical. Now, suppose the relation EMP_SALARY failed to access when
the query is submitted as
SELECT
FROM
WHERE
WITH
*
EMP
ID = " S 1 " .
RSV = 0.90.
F I G U R E 15
220
Relation EMPLOYEE
^^
SI
S2
S3
S4
S5
S6
S7
S8
S9
SIO
Sll
S12
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
DEGREE
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Bachelor
Ph.D.
Ph.D.
Bachelor
Master
Master
Master
Ph.D.
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Master
Ph.D.
EXPERIENCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
r^
si
s2
s3
s4
s5
s6
s7
s8
s9
slO
sll
sl2
si 3
sl4
sl5
sl6
sl7
sl8
sl9
II
s2Q
SALARY
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
ll
65000
SALARY
II
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
65000
64000
70000
221
1 si ^^
s2
s3
s4
s5
s6
s7
s8
s9
slO
sll
sl2
sl3
sl4
sl5
sl6
sl7
si 8
sl9
s20
s21
1 s22
DEGREE
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Bachelor
Ph.D.
Ph.D.
Bachelor
Master
Master
Master
Ph.D.
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Master
PhX>.
EXPERIENCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
In this case, the query cannot be processed correctly. Thus, we must propose a
method for estimating the attribute values in a failed relation. The basic idea of
this method is that the failed tuple compares to each rule in the rule base shown
in Fig. 15 to see which is the closest rule. Then we can estimate the failed tuples
by their closeness degrees.
Assume that relation Ri contains attributes Ai, A 2 , . . . , A^ and assume that
relation R2 contains attributes JBI and B2 Furthermore, assume that attribute
Ai and Bi are the primary keys of jRi and R2 respectively, and Ai and Bi are in
the same domain. If we perform a join operation on Ri and R2 with a condition
"Ri.Ai = Ri-Bi," then we can derive a new relation named R3 where relation
R3 contains attributes Ai, A2 . . . , A^, and B2 If attribute B2 is a failed attribute,
then we can estimate attribute B2 by the following method.
Case 1. First, we scan each tuple Ti in relation R3. If attribute B2 is in a
numerical domain. We can compute the closeness degree between tuple 7/ and
the chosen rule according to the rules that determine attribute B2. If rule TJ is
the closest to 7J, then we pick that rule as the base rule, where rule Vj is shown as
IF Ai = A ; i ( W = Wfi)AND A2 = Aj2(W= w/yi) A N D . . . AND
An = Ajn( W = Wjn) THEN B2 = Nj
222
= ^
(32)
. \
Rdomsimir
...^ , ..
(33)
j-Ak][Ti.Ak]
(34)
(35)
After we compute all closeness degrees between tuple TJ and all rules ry, we
choose the closest rule to 7/. Suppose the following rule is the closest one to
tuple Ti:
IFAi = Afi(W=
(36)
Repeat each tuple Ti until all failed attributes have been estimated.
Case 2. If the failed attribute Bi is in a nonnumerical domain, we can also
compute the closeness degree between tuple 7J and the chosen rule according
to the rules that determine attribute Bi. If rule r/ is the closest to 7J, then we
pick that rule as the base rule, where Rule TJ is shown as
IFAi = Aji{W=Wji)
AND Ai = Aj2{W=Wj2)
AND...AND
A = Ay(W =/,)
THEN B2 =
223
(37)
. \
. . ^ , ..
(38)
Rdomamirj.Ak][Ti.Ak]
(39)
(40)
After we compute the all closeness degrees between tuple TJ and all rules, we
choose the closest rule to 7]. Suppose the following rule is the closest one to
tuple 7J:
IFAi = Ayi( W = Wji) AND Ai = Ay2( W = Wj2) A N D . . . AND
An = Ay( W = Wjn) THEN B2 = Wy.
Then the estimated attribute value of Bi is Wy, where
Ti.Bi = Wi
(41)
Repeat each tuple 7] until all failed attributes have been estimated.
EXAMPLE 3. Consider the relation EMPLOYEE shown in Fig. 16. We
can vertically partition this relation into two relations EMPJNFO and
EMP_SALARY as shown in Fig. 17 and Fig. 18.
Then we can join these two relations into the original relation EMPLOYEE
with the SQL statements
SELECT
FROM
WHERE
WITH
Suppose these two relations are distributed in two different locations. Then we
create a view called EMP by the SQL statements
CREATE
SELECT
VIEW EMP AS
ID, DEGREE, EXPERIENCE, SALARY
224
11 ^si
DEGREE
Ph.D.
1 EXPERIENCE
7.2
SALARY
?????
FROM
WHERE
WITH
EMPJNFO, EMP_SALARY
EMPJNFO.ID = EMP_SALARY.ID.
RSV = 0.90.
We can see that the resuhs of the relation EMPLOYEE and the view EMP
are identical. Now suppose the relation EMP_SALARY failed to access when a
query is submitted shown as
SELECT
FROM
WHERE
WITH
*
EMP
ID = "S1".
RSV = 0.90.
In this case, the query cannot be processed correctly. Thus, we must estimate
the attributes in the failed relation. According to the problem described above,
when the database system that stores the relation EMP_SALARY failed, the
query submitted by the user cannot be executed. In this case, the join result
may be like the failed result shown in Fig. 19.
Now, we will estimate all the data in the relation EMP_SALARY. We scan
every tuple in the relation EMP J N F O . First, we pick the first tuple in the
relation EMPJNFO. From the rules for the attribute "SALARY," we know
that the attribute "DEGREE" and the attribute "EXPERIENCE" determine the
attribute "SALARY." Thus, we calculate the closeness degree between the first
tuple of the relation EMP.SALARY to each rule for the attribute "SALARY."
For example, we choose the first rule (i.e.. Rule 1) in the rule base shown in
Fig. 15 as
IF DEGREE = "Master" (W= 0.6) AND EXPERIENCE = 3.6(W=
0.4)
CDdegree(Master, Ph.D.) =
1
"07
= 1.428571428571.
225
From Rule 1 shown in Fig. 15, we can see that that the weights for the
attributes "DEGREE" and "EXPERIENCE" are 0.6 and 0.4, respectively. Then
we calculate the total closeness degree between the first tuple of the relation
EMPJNFO and Rule 1 using formula (35):
CD(Rulel, EMRTi) = 1.428571428571 * 0.6 + 2 * 0.4 = 1.6571428.
Again, we calculate the closeness degree between the first tuple of the relation
EMPJNFO and the rest rules in the rule base for the attribute "SALARY." We
use the same method described above to calculate each CD value:
CD(Rule2, EMRTi) = 1 * 0.6 + 0.9230769230769 * 0.4 = 0.969230769,
CD(Rule3, EMRTi) = 2.5 * 0.65 + 1.44 * 0.35 = 2.129.
In this case, we choose the closest rule to estimate the value of the attribute
"SALARY," say. Rule 2. By formula (36), we can estimate the first tuple of the
relation of the failed attribute "SALARY" as
EMRTi.SALARY = 0.969230769 * 65000 = 63000.
After scanning the rest of the tuples in the relation to estimate the values for
the failed attribute "SALARY," we can get the estimated relation shown in
Fig. 20.
Compared to the original relation, the estimated values and the estimation
errors are shown in Fig. 21.
From Fig. 2 1 , we can see that the average estimated error between the
original value and the estimated value is 0.061.
EXAMPLE 4. Consider the relation EMPLOYEE shown in Fig. 16. We
can vertically partition this relation into two relations EMPJNFO and
EMP_SALARY as shown in Fig. 17 and Fig. 18. Then we can join these two
relations into the original relation EMPLOYEE with the SQL statements
SELECT
FROM
WHERE
WITH
Suppose these two relations are distributed in two different locations. Then
we create a view called EMP by the SQL statements
CREATE
SELECT
FROM
WHERE
VIEW EMP AS
ID, DEGREE, EXPERIENCE, SALARY
EMPJNFO, EMP_SALARY
EMPJNFO.ID = EMP_SALARY.ID.
226
1 ^^
DEGREE
si
s2
Ph.D.
Master
s3
s4
Bachelor
s5
sl7
Bachelor
sl8
sl9
s20
s21
Master
Master
Ph.D.
Master
32219
41000
64533
55666
35999
51866
0.5
7.2
24659
55200
52866
65000
6.5
7.8
8.1
58200
Ph.D.
8.5
67333
DEGREE
si
s2
s3
s4
s5
s6
s7
sg
s9
slO
sll
sl2
sl3
sl4
sl5
sl6
sl7
sis
sl9
s20
s21
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Bachelor
Ph.D.
Ph.D.
Bachelor
Master
Master
Master
Ph.D.
Bachelor
Master
Bachelor
Master
Master
PkD.
Master
Ph.D.
s22 1
56200
27179
29195
39861
48061
40544
Master
1^
F I G U R E 21
36216
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
Bachelor
Bachelor
Ph.D.
Ph.D.
Master
Bachelor
Master
Master
Ph.D.
Bachelor
F I G U R E 20
II
7.2
Ph.D.
Master
II
(ESTIMATED)
63000
33711
46648
2.0
7.0
1.2
s6
s7
s8
s9
sll
slO
sl2
sl3
sl4
sl5
sl6
1 s22
SALARY
EXPERIENCE
EXPERIENCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
SALARY
(ORIGINAL)
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
65000
64000
70000
SALARY
(ESTIMATED)
63000
33711
46648
36216
56200
27179
29195
39861
48061
32219
40544
41000
64533
55666
35999
51866
24659
55200
52866
65000
58200
67333
ESTIMATED ||
ERROR
1
+0.00
-0.09
+0.17
-0.23
+0.06
+0.05
+0.01
-0.20
-0.11
1
-0.08
+0.01
+0.00
-0.05
-0.02
-0.00
+0.04
+0.07
+0.00
+0.04
+0.00
-0.09
-004
II
227
We can see that the results of the relation EMPLOYEE and the view EMP
are identical. Now suppose the relation EMP_SALARY failed to access when a
query is submitted as
SELECT
FROM
WHERE
WITH
*
EMP
ID = " S 1 " .
RSV = 0.90.
In this case, the query cannot be processed correctly. Thus, we must estimate
the attributes in the failed relation. When the database system that stores the
relation EMP J N F O failed, the query submitted previously cannot be processed.
In this case, the join result may be like the failed result shown in Fig. 22.
We can see that the attribute "DEGREE" failed in the relation EMP. Therefore, we must estimate the values in the attribute "DEGREE" in the relation
EMP. We scan every tuple in the relation EMP, and pick the first tuple in the
relation EMP. From the rules for the attribute "DEGREE," we know that the attribute "EXPERIENCE" and the attribute "SALARY" determine the attribute
"DEGREE". Thus, we calculate the closeness degree between the first tuple of
the relation EMP and each rule. Suppose we define the rules for estimating the
attribute "DEGREE" as shown in Fig. 23.
We choose the first rule in the rule base shown in Fig. 23 as
IF EXPERIENCE = 5.0(W=
Relation EMP
1 ^si
s2
s3
s4
s5
s6
s7
s8
s9
slO
sll
sl2
sl3
sl4
sl5
sl6
sl7
sl8
sl9
s20
s21
s22
F I G U R E 22
DEGREE
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
EXPERIENCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
SALARY
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
65000
64000
70000
228
IF EXPERIENCE = 5.0 (W = 0.2) AND SALARY = 57000 (W =0.8) THEN DEGREE = Ph.D.
IF EXPERIENCE = 3.5 (W = 0.2) AND SALARY = 40000 (W =0.8) THEN DEGREE = MASTER
IF EXPERIENCE = 2.3 (W = 0.2) AND SALARY = 29000 (W =0.8) THEN DEGREE = Bachelor
I H I
F I G U R E 23
CDsaiary(57000, 63000) = ^ ^
= 1.105263158.
From Rule 1 of the rule base shown in Fig. 23, we can see that the weights
for the attributes "EXPERIENCE" and "SALARY" are 0.2 and 0.8, respectively. Then we calculate the total closeness degree between the first tuple of
the relation EMP and Rule 1 using formula (40):
CD(Rulel, EMRTi) = 1.44 * 0.2 -h 1.105263158 * 0.8 = 1.088.
Again, we can calculate the closeness degree between the first tuple of the
relation EMP and the rest rules in the rule base for the attribute "DEGREE."
We can use the same method described above to calculate each CD value:
CD(Rule2, EMRTi) = 2.05714 * 0.2 + 1.575 * 0.8 = 1.2114,
CD(Rule3, EMRTi) = 3.13043 * 0.2 + 2.172 * 0.8 = 2.2261.
In this case, we choose the closest rule to estimate the value of the attribute
"DEGREE," i.e.. Rule 1. By formula (41), we can estimate the failed attribute
"DEGREE" as
EMRTi .DEGREE = Rulel.Degree = Ph.D.
After scanning the rest of the tuples in the relation to estimate the values for
the failed attributes "DEGREE," we can get the estimated relation shown in
Fig. 24.
The estimated values and the estimated errors are compared to the original
relation in Fig. 25.
The architecture of the implemented system based on the proposed methods is shown in Fig. 26. There are five parts of components in the implemented
system, i.e., fuzzy SQL statements, fuzzy SQL translator, SQL-type database
management environment, remote database connectors, and failed data
generator.
The components of fuzzy query statements, fuzzy SQL translator, SQLtype database management environment, and remote database connector have
already been described in Section IV, while the failed data generator is based
229
Relation E VIP
1
11
1
111
1
11
11
1
11
1
11
11
1
||1
^
^^^
^
^
^
^
^^
^
^
^^
^^
^
^
^^^
^^
^^
^
^
^^"^
^
^
^
^^^
^
^^^
^^
^
^
^s22^
F I G U R E 24
DEGREE (ESTIMATED)
Ph.D.
SALARY
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
65000
64000
70000
EXPERffiNCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
Bachelor
Bachelor
Bachelor
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Ph.D.
Bachelor
Master
Bachelor
Master
Master
PhD.
Ph.D.
Ph.D.
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
on the method presented in this section. Again, we briefly describe these components as follows:
1. Fuzzy SQL statements: It contains the SQL statements that are issued
by the users.
2. Fuzzy SQL translator: It parses the query statements issued from the
component fuzzy SQL statement and performs the syntax checking and the
ID
si
s2
s3
s4
s5
s6
s7
s8
s9
slO
sll
sl2
sl3
sl4
sl5
sl6
sl7
sl8
sl9
s20
s21
s22
F I G U R E 25
DEGREE
(ORIGINAL)
Ph.D.
Master
Bachelor
Ph.D.
Master
Bachelor
Bachelor
Ph.D.
Ph.D.
Bachelor
Master
Master
Master
Ph.D.
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Master
Ph.D.
EXPERIENCE
7.2
2.0
7.0
1.2
7.5
1.5
2.3
2.0
3.8
3.5
3.5
3.6
10.0
5.0
5.0
6.2
0.5
7.2
6.5
7.8
8.1
8.5
SALARY
63000
37000
40000
47000
53000
26000
29000
50000
54000
35000
40000
41000
68000
57000
36000
50000
23000
55000
51000
65000
64000
70000
DEGREE
(ESTIMATED)
Ph.D.
Bachelor
Master
Bachelor
Master
Bachelor
Bachelor
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Ph.D.
Bachelor
Master
Bachelor
Master
Master
Ph.D.
Ph.D.
Ph.D.
II
||
||
||
11
||
||
||
|
|
|
|
|
|
230
n:
ir
IT
n
SQL-Type Database
Management
Environment
n:
n
Failed Data
Generator
'Membership/
Function I
Library \
Rule Base
Remote Database
Connector
Remote Database
Connector
Remote
Database
System
Remote
Database
System
F I G U R E 2 6 Fuzzy query translation in the distributed relational databases environment including the
failed data generator.
23 I
Yl. CONCLUSIONS
We have presented a method for deaUng with fuzzy query translation based
on the a-cuts operations of trapezoidal fuzzy numbers for the distributed relational databases environment. It allows the users to access the information
in the distributed relational databases environment in a more flexible manner. It also allows the users to submit fuzzy SQL queries and to translate the
fuzzy SQL queries into standard SQL queries. We also present a method for
estimating incomplete data when one of the relations failed to access in the
distributed relational databases environment. Based on the proposed methods,
we have implemented a system on a Pentium PC by using the Borland Delphi
version 3.0 developing environment to translate fuzzy SQL queries into precise SQL queries. It is transparent to the user who submits fuzzy queries to
the relational database system, and it also provides a method for null values
estimation when the queried relations failed to access in the distributed relational databases environment. The proposed methods allow the users to deal
with fuzzy information retrieval in a more flexible manner in the distributed
relational databases environment.
REFERENCES
1. Bosc, P., Galibourg, M., and Hamon, G. Fuzzy querying with SQL: Extensions and implementation aspects. Fuzzy Sets Systems 28(3): 333-349,1988.
2. Bosc, P., and Pivert, O. SQLf: A relational database language for fuzzy querying. IEEE Trans.
Fuzzy Systems 3(1): 1-17, 1995.
3. Codd, E. F. Normalized data base structure: a brief tutorial. In Proceedings of 1971 ACM
SIGFIDET Workshop on DATA Descriptiony Access, and Control, San Diego, CA, November
1971.
4. Chang, S. K., and Ke, J. S. Database skeleton and its application to fuzzy query translation.
IEEE Trans, Software Engrg. 4(1): 31-43,1978.
5. Chang, S. K., and Ke, J. S. Translation of fuzzy queries for relational database systems IEEE
Trans. Pattern Anal. Mach. Intell. 1(3): 281-294, 1979.
6. Chen, H. H., and Chen, S. M. Fuzzy query translation for information retrieval in the distributed
relational database environment. In Proceedings of the 6th National Conference on Science and
Technology of National Defense, Taoyuan, Taiwan, Republic of China, vol. 2, pp. 433-439,
1997.
7. Chen, S. M. Using fuzzy reasoning techniques for fault diagnosis of the J-85 jet engines. In
Proceedings of the Third National Conference on Science and Technology of National Defense,
Taoyuan, Taiwan, Republic of China, vol. 1, pp. 29-34, 1994.
8. Chen, S. M., and Jong, W. T. Fuzzy query translation for relational database systems. IEEE
Trans. Systems Man Cybern et. Part B 27(4): 714-721, 1997.
9. Chen, S. M., Ke, J. S., and Chang, J. F. Techniques of fuzzy query translation for database
systems. In Proceedings ofl 986 International Computer Symposium, Tainan, Taiwan, Republic
of China, vol. 3, pp. 1281-1290,1986.
10. Date, C. J. An Introduction to Database Systems, 6th ed. Addison-Wesley, Reading, MA, 1995.
11. Grant, J. Null values in a relational data base. Inform. Process. Lett. 6(5): 156-157,1977.
12. Huemer, C , Happel, G., and Vieweg, S. Migration in object-oriented database systems - a
practical approach. SoftwarePractice Exper. 25:1065-1096,1995.
13. Jeng, B., and Liang, T. Fuzzy indexing and retrieval in case-based systems. In Proceedings of
1993 Pan Pasific Conference on Information Systems, Taiwan, Republic of China, pp. 258-266,
1993.
232
I. INTRODUCTION 233
11. FUNDAMENTALS OF DATA COMPRESSION
A. Information Theoretical Background 236
B. Models 239
III. STATISTICAL CODING 243
A. Shannon-Fano Coding 244
B. Huffman Coding 245
C. Redundancy of Huffman Codes 248
D. Arithmetic Coding 250
E. Adaptive Techniques 254
IV. DICTIONARY CODING 255
A. Methods Using a Static Dictionary 256
B. Adaptive Methods 265
V. UNIVERSAL CODING 269
VI. SPECIAL METHODS 271
A. Run Length Encoding 271
B. Algorithms Based on List Update 272
C. Special Codes and Data Types 273
VII. CONCLUSIONS 273
REFERENCES 273
235
INTRODUCTION
Despite continuing improvements in storage and transmission technology the
rapidly growing amount of information stored on and transmitting between
computers has increased the need for text compression. A huge amount of
information can be stored on a single CD but sometimes this is not enough
to avoid multivolume applications. It is known that a simple ad hoc method
can compress an English text to around 70% of its original size, and the best
techniques have a compression ratio about 2 5 % . In case of Braille the used
compression processes can reduce the size of books by 20%. For computers.
%'%'%
233
234
DATA COMPRESSION
235
we must use the same model. If we use a fixed model for different texts then
the transmission may become inefficient because the model is not appropriate
for the text. Such a problem can occur if we use a Morse code handling numeric data only. There are different ways to maintain a model in a compression
process:
In a static modeling the encoder and decoder use a fixed model, regardless
of the text to be encoded.
In a semiadaptive scheme before sending the text the encoder checks it
and prepares a "codebook." Then he transmits the codebook, followed by the
coded text. The decoder first gets this codebook and then uses it to decode
the text. This technique has two disadvantages: firstly, the encoder must see the
entire text before transmitting it, and secondly, we must send the codebook for
each text, which could take a substantial amount of time. So, two passes are
required for encoding.
If we use an adaptive model then the encoder sends a text character by
character. By agreement, when a part of the text (we can refer to it as "word")
has been transmittedand also receivedtwice, then both the decoder and
encoder add it to the codebook. In the future that word can be transmitted
using the new code. So, if we use adaptive coding, then the code of a particular
character (or word) is based on the text already transmitted.
There is another possibility to distinguish different techniques. Certain classical statistical methods use static models, and they transmit texts character by
character. Unfortunately, sometimes we do not know the whole text in advance, so we need one of the other techniques to perform a compression. On
the other hand, there are such methods that use the so-called set of dictionary
coding techniques. They use astatic or adaptivedictionary in which they
correspond different parts of the source text to othercodedsequences. In
the following firstly we overview the information theoretical backgrounds and
the basic definitions of the data compression. Section III deals with the different
statistical coding methods. Some dictionary coding techniques are considered in
Section IV. In Section V we discuss some universal coding methods, and further
special techniques are mentioned in Section VI.
In case of lossless compression the later decompressed data must be identical to the original source code. In this paper we deal only with lossless compression methods, so we always assume that our aim is the exact reconstruction.
Usually this kind of method can be used in the cases of database, text, and other
236
DATA COMPRESSION
237
Shannon proved that the function of entropy satisfies the following requirements, and he also demonstrated that it is the only function which does
so:
H(f) is a continuous function of the probabilities p i , . . . , p.
In the case of uniform distributions if i > 2 then H(Fn^) > H(P2) If Si and S2 are two stages with probability pi and p2 respectively, and
F^ and F^ are the sources belonging to the stages, then
H(Fn) = pH(Fji) + p2H(F^).
From the above definition it is clear that more likely messages that appear
with greater probability contain less informations; i.e., the more surprising the
message, the more information content it has.
Now, to measure the goodness of code K we introduce the following.
DEFINITION 8. For the source F the expression L(K) = E/Li Pi^i ^^ ^^^
cost of the code Kn-
238
is pi, we can get the desired entropy using the definition Hi(Fn) = log ^. Now,
making n subsequent decisions with probabiHties p i , . . . , p we get that the
average entropy of the earUer processed decisions is H(Fi) Y^i^x PiHiFn) =
Jl^=i pi log x-5 which is the overall entropy.
Now, performing a compression step by step we will get an "entropy-like"
coding process if after having compressed the text for each individual choice
of an arbitrary string (or a letter) its compressed length ti = log ^ . So, the only
question remaining is to decide whether this result can be beaten.
DEFINITION 9. A uniquely decipherable code K^ is optimal for the source
f, if for any uniquely decipherable code iC for Fn, L(Kn) > L(K^)'
Our aim is to find such a code, which is optimal, i.e., has a minimal cost,
considering the given probabilities. (Of course we would like to assure the
recoverability using uniquely decipherable codes.) The next theorems present
that we can answer the above raised question negatively.
THEOREM 2 [1]. For an arbitrary uniquely decipherable code K^ for Fn,
the relation L(Kn) > H(P) holds,
Abramson also proved that there exists a type of coding that produces a
very good compression. In [1] he introduced the following.
DEFINITION 10. We say that a code-word ai = coicoi" - coi. is a prefix of
another ofy = co^co^- -- (o'l if // < // and there exists a /s < //, such that cok = co^f^,
iik<ls. h code K is a prefix, if no code-word is a real prefix of another.
The prefix codes have great importance, since it is easy to see that if we use
a prefix code, then the decoding process can be done uniquely.
THEOREM
THEOREM
code.
By Theorem 4 there exists an optimal prefix code for a given source F^. The
followingso-called Noiseless Coding Theoremgives an interesting upper
bound on the cost of such a code.
THEOREM 5 [60]. Let K^ be the optimal prefix code for a given source FnThen the relation L(K^) < H(Fn)-\-1 holds.
From Theorems 2 and 5 it follows that the cost of an optimal prefix code
is very close to the entropy of the given source. So, this theorem shows us that
an entropy-like coding process gives us an optimal coding method. So, the next
question is in hand: can we produce such a code?
Because of the above theorem, as a first step of our investigations we can
concentrate on the existence of prefix codes. Taking into account the following
theorem we can get two important results: On the one hand, the constraint
gives us an exact bound for constructing a prefix code. On the other hand, if
we have a code in our hand we can decide easily whether it is a prefix one.
DATA COMPRESSION
239
^2-^'<l.
(1)
i=l
B. Models
As we saw earlier, the entropy of a given text depends on the probabilities of
the symbols being coded. Similarly, the average length of the compressed text
(or as we called it, the cost of a code) is also a function of the probabilities. It is
also obvious that we can state certain models to compute the probabilities, and
in different models these values would be different. Since the entropy serves a
lower bound for the average length of the compressed text it is important to
use the adequate model in our process.
The fundamental role of a model is to supply probabilities for the message.
In a modelusuallywe do not calculate the probabilities in one step. Instead,
they are built up incrementally, starting at the beginning of the text, and processing through it character by character. Depending on this criterion we can
distinguish between static and adaptive models. In the following we will show
the most popular models used in connection with the data compression.
I. Static Models
One of the simplest models for counting the probabilities is what allocates
a fixed probability to each character irrespective of its position in the text. To
do this we need to know all characters that may occur in a message. It is so in
the case of different natural languages. A collection of American English text is
known as the Brown corpus, and it has been widely used in investigating different language statistics. It based on 500 separate 2000-word samples of natural
language text representing a wide range of styles and authors. The alphabet of
the corpus contains 94 symbols. The first 15 most frequent characters is shown
by Table 1. By analyzing a given English text one can estimate for example the
probability of the word "ofor" (here the symbol "o" means the space character)
as follows: One can find the probabilities of each character in Table 1, which
are o = 0.1741, f = 0.0176, o = 0.059, and r = 0.0474. So the probability of
the entire text-fragment is
0.1741 X 0.0176 X 0.059 x 0.0474 = 4.515 x 10"^ = 2-1^'^^^
240
I^H
TABLE I
Letter
Prob. 1(%)
Letter
Prob. {%)
Letter
Prob. (%)
17.41
5.51
3.19
9.76
5.50
3.05
7.01
4.94
2.30
6.15
4.77
2.10
5.90
4.15
1.87
Ref. 10. Reprinted by permission of Pearson Education, Inc., Upper Saddle River, NJ.
SO the optimum coding of the string in hand is 14.435 bits in this simple model.
In this model we did not take into account the preceding characters while
having to calculate the probabilities. In this case we refer to the model as an
order-0 model. It is easy to recognize that the probability of a character will
differ depending on the symbols that precede it. So, we can generalize the above
idea: we will say that a model is an order-k model if it considers k preceding
characters while determining the probability of the next symbol. If we want to
use an order-)t model we will have some difficulties at the beginning of the text:
there will be an insufficient number of characters to supply our formula. In that
case some ad hoc solution is used: either we assume that the text is primed with
a ^-length sequence of default characters or we use an order-0 model for the
first k 1 characters.
Models of up to order 11 have been reported in the book of [10]. It is a
natural feeling that we must use a high-order model to get good estimations for
the probabilities. This is true but it is also easy to see that the number of possible
contexts increases exponentially with the order of the model. Thus larger and
larger samples are needed to make the estimates and so, we may need a huge
amount of memory to store them. This will destroy all of the advantages of any
compressing procedure.
Markov Models
DATA COMPRESSION
24 I
path through the model, and the probability of any string can be computed by
multiplying the probabilities out of each state.
Markov models are very useful for modeling natural languages, and they
are widely used for this purpose.
Some of the state models are nonergodic in the sense that in these models
there are states from which parts of the system are permanently inaccessible.
All state models used in connection with natural languages are ergodic.
Grammar Models
For those types of strings which represent a part of any language with strong
syntactic rules, the above models are not usable. For example, in computer algebra (like MAPLE or MATHEMATICA) if the parentheses are balanced in a
formula, the next character may not be a")". Similarly, for those of computer
languages where the delimiter is a special character (e.g., "." or ";") this character may not occur "inside" a statement. For the above type of strings with
theoreticallyinfinite nesting possibilities, the so-called grammar models are
adequate. In a grammar model we have well-defined productions (e.g., conditional or unconditional jump statements and subroutine calling sequences), and
we can decide the probabilities of each subproduction in a given production.
This gives us a multilevel hierarchy, containing in each level a discrete probability distribution. Now, if we have a string to be modeled by a grammar model
it is parsed according to the grammar, and its probability may be computed by
multiplying together the probability of each production that is used.
The use of grammarsprincipally for compressing Pascal programshas
been explored by Katajainen etaL [39] and Cameron [14]. Compressing strings
from a formalcomputerlanguage, this type of model is very successful, but
it is almost usable for natural languages. It follows from pragmatic reasons that
it is almost impossible to find a grammar for natural languages: their syntactic
rules are very complicated and are not modelable "by hand." Constructing
them mechanically from samples is also impossible because we cannot decide
the exact boundary of the given language. It is worth mentioning that some
heuristics are useful for modeling natural languages.
The Problem of Conditioning Classes
p'[Si) p^SiSi)
p'{K)
P'[si)
p'(SiS2 . . .S)
p(SiS2 . . .S)
p'[SiS2...Sn-\)p'{SxS2...Sn)'
242
GALAMBOS A N D BEKESI
/?(S/|SiS2...S/_i),
p'(SiS2...Si-i]
. . .5,-1)),
DATA COMPRESSION
243
We remind the reader that in the case of semi-adaptive models the model
must be sent with the message, and coding the model requires approximately
Kq\o%n bits, so about the same number of bits will be used as for a fully
adaptive model. Intuitively, it is also expected that for an adaptive model a
"well-suited" message will be excellent compressible. However, what is the
worst-case situation?
THEOREM 8 [10]. There is an adaptive model where the compressed string
will be at most (q + l ) / 2 log n bits longer than the original string.
After this surprising result one can say that an adaptive model may not
be a good choice for processing a compression. Fortunately, there is another
theorem that states that for any text there exists a static model that can expand
it arbitrarily. So the consequence is nice:
COROLLARY 9. There is a message where an adaptive code will do asymptotically better than the static model.
244
are also constant^where the two lengths are not necessarily the samewe say
that the coding is block-to-block type.
EXAMPLE
Code words
Oil
010
110
111
b
c
Code words
ah
b
cbd
dc
11
010
101
oil
245
DATA COMPRESSION
e:0.35
a:0.25
d:0.15
b:0.14
ciO.ll
FIGURE I
00
01
10
no
11
111
Symbols
Probabilities
c
0.11
b
0.14
d
0.15
a
0.25
e
0.35
246
FIGURE 2
Substitute these symbols with a set, whose probabiUty is the sum of the
two probabiHties. We order a bit of value 0 to the code of the first
symbol and a bit of value 1 to the other.
For these newly constrained sets repeat the previous three steps until we
get a list containing only one element.
Using the above algorithm the following Huffman code can be constructed
for Example 3. The symbols and the sets are represented by a tree, which is a
usual technique in the implementations of the method (see Fig. 2). The code
words for Example 3 are
Symbols
Codewords
a
01
b
101
c
100
d
00
e
11
The next theorem helps to decide whether a binary prefix code is a Huffman
code.
THEOREM 10 [29]. A binary prefix code is a Huffman code if and only if
the code tree has the sibling property.
DATA COMPRESSION
247
bound on the maximum length of a binary Huffman code and on the sum of
the length of all code words. In the following we present some of his results.
DEFINITION 12. The external path length of a Huffman tree is the sum of
all path lengths from the root to the leaves.
THEOREM 11 [13]. Assume that we have a source alphabet consisting of
n> 1 symbols. Denote the probabilities of the symbols by pi^... ^ pn and suppose that pi < pi^i for all 1 <i <n 1, Denote the length of the longest code
word of the corresponding Huffman code by L. Then
"{[^^^^(^^)
L < min
,n-l
where ^ = ^ ^ .
COROLLARY
most
111
^^1_^
mm { log(b
n,
{n +
l)(n-l)
Because of the construction none of the assigned code words are a prefix
of another in the Huffman code. So the derived code is a prefix and Theorem 3
holds. As a consequence decoding can be done easily: we search for the first
matching code word in the compressed data, then we substitute it with the
corresponding symbol, and we continue this process until the end of the compressed file.
The main problem with decoding is that storing the Huffman tree may
require large memory. In the original paper the array data structure was used
to implement the complete binary tree. Later Hashemian [34] presented an efficient decoding algorithm using a special array data structure. In the following
we present a memory-efficient array data structure to represent the Huffman
tree. This was published by K.-L. Chung in [19]. The memory requirement
of this representation is 2n-3 where n is the size of the basic alphabet. Based
on this implementation a fast decoding is possible. The following algorithm
creates the array structure H:
Traverse the Huffman tree in a preorder way. For each left edge record
the number of edges and leaf nodes L in the subtree of that edge. Assign the
+ L + 1 value to the node whose left edge has been investigated.
Traverse again the Huffman tree in a preorder way. At each time emit
the assigned value when a left edge is encountered, or a " 1 " when a right edge
is encountered. Emit the source symbol when a leaf edge is encountered.
Save the ordered emitted values in H.
EXAMPLE
Now we present the Chung decoding algorithm [19]. The algorithm uses two
pointers. One pointer points to the Huffman code C, the other to the array H.
The pointers are denoted by cp and ap, respectively. In the code len(C) denotes
248
the length of the corresponding Huffman code C. Using the construction of the
H array, the algorithm moves the pointers in such a way that if cp points to
the end of a code word in C then ap points exactly to the corresponding source
symbol in the H array. Using this process, decoding the next symbol can be
done easily outputting H[ap], The pseudo-code looks like this:
ap := 1; cp := 1;
while cp < len(C) do
begin
if H[c:/7] = 0then
begin
cp := cp-{-1;
ap:=ap-{-l;
end
else
begin
cp:=cp-\-l;
ap:=H[ap]-{-l;
if not eof (C) then
begin
iiH[cp] = 1 then
begin
cp := cp + 1;
ap:=H[ap]-^l;
end
else
begin
cp := cp-\-1;
ap\=ap^-l\
end
end
end
Output H[ap]
end
There are other approaches in the practice: in [50] a VLSI implementation
of Huffman encoding and decoding was considered. Teng [62] presents a
0(log()^) parallel algorithm for constructing a Huffman tree with n leaves.
Some years later in [46] an off-line construction was investigated for those
types of Huffman codes where the depth of leaves is limited.
C. Redundancy of Huffman Codes
It is well known that the Huffman code is an optimal prefix code; i.e., its cost
is minimal among the prefix codes of the given alphabet and probabilities.
The question is how close is the code to the entropy of the given source? The
difference between the entropy and the cost of the Huffman code is
characterized by its redundancy. Given that each character in an alphabet must
occupy an integral number of bits in the encoding, Huffman coding achieves a
DATA COMPRESSION
249
source. Then
R<pi-\-cr,
where a = 1 log2 e + log2(log2 e) ^ 0.086.
THEOREM 14 [29]. Let pi be the probability of the most likely letter in a
source. If pi > 0.5, then
R<2-K(PI)-PU
source. If8>
source. Then
R<f
l + ^-MPi)
- 13 - (3 + 3log3)pi - K(3px) if
where 6 ^ 0.3138. The above bounds are tight.
ifl<Pi<e
e<pi<l,
250
then
f3-/c(/7i)-(l-/7i)(21og3-f)
R< \
4-7(l+log7)pi-/c(7pi)
[
l-K(pt)-(l-pi)A2i-i
if ^<
pi< y
ify^p^^^
ifl>4,
where y ^ O.lAll, Aj = minu^jeWj {H(Wj) Wj] and WjJ > 2 is the set of
positive real numbers w\, w^^..., w/y that satisfy wi > wi > -- > Wj > ^
along with J2h ^z? = 1- The above bounds are tight.
The above theorems cover all the cases when w^e know^ the largest symbol
probability. There are some other results on the redundancy of Huffman codes
w^ith some other assumptions. For example we assume that we know the largest
and the smallest probabilities. Details can be found in [15,17]. In addition to
the redundancy, another similar notion was introduced in the literature. This
is the maximum data expansion.
DEFINITION 14. Assume we are given a code K = {ofi,..., Qf} and denote
h^.. .Jn as the lengths of the ofi,..., a code words. Let the probabilities of
the characters of the alphabet be p i , . . . , /?. The maximum data expansion 6
(K) of the code K is defined as
S(K)^
J2
(/,-log)p,.
{/|/,>log}
(KH)
of a Huffman
1.39.
251
DATA COMPRESSION
the length of the interval more significantly, giving more bits to the representation, while the symbols with higher probability decrease less the length of the
interval.
The algorithm works in the following way:
Let the starting interval be [0,1].
Divide the corresponding interval due to the probabilities of the
symbols.
Consider the next symbol of the text and take the corresponding
subinterval as the next interval.
Repeat the previous two steps until we reach the end of the text. An
arbitrary number from the last interval can represent the text.
EXAMPLE 5. Consider again the symbols and probabilities given in Example 3. Suppose that the input string is deec. The change of the intervals is the
following:
first
after d
after e
after e
after c
[0,1)
[0.60,0.75)
[0.60,0.6525)
[0.60,0.618375)
[0.61635375,0.618375)
after e
0.6525
after e
0.618375-
after c
0.618375-1^
b
d
ceo-"
FIGURE 3
0.60-"
0.61635375-
252
Language
Huffman
Arithmetic
English
Finnish
French
German
Hebrew
Itahan
Portuguese
Spanish
Russian
Enghsh-2
Hebrew-2
4.1854
4.0448
4.0003
4.1475
4.2851
4.0000
4.0100
4.0469
4.4704
7.4446
8.0370
4.1603
4.0134
4.0376
4.1129
4.249
3.9725
3.9864
4.0230
4.4425
7.4158
8.0085
0.6
0.8
0.9
0.8
0.8
0.7
0.6
0.6
0.6
0.4
0.4
bounds show that this is not always true. So for a given example, it is a natural
question as to whether arithmetic coding is superior to Huffman coding. Bookstein and Klein investigated this problem by comparing the two methods from
several points of view [12]. They stated the most important advantages and disadvantages of arithmetic coding relative to Huffman codes. The advantages are
optimality, efficient encoding if the alphabet or its characteristics are changing
over the file, and simple extensibility to infinite alphabet. The disadvantages
are slowness, complexity, and only small savings in realistic situations. Some
other comparisons were based on the size of the alphabet. For large alphabets
it can be stated that in general the probability of the most frequent character is close to 0, so the cost of the Huffman code is close to the entropy and
the difference between the two methods is not significant. Bookstein and Klein
compared the algorithms for natural language alphabets. Here we present their
results in Table 2. The first column contains the name of the language, the next
two columns give the average code-word lengths for the two methods, and the
fourth gives the increase of the Huffman value over the value for arithmetic
coding in percent.
Unfortunately, Huffman coding can work poorly for small alphabets. For
example, in the pathological case when we have a binary alphabet with probabilities 8 and 16. Then the length of each code word is 1 bit with Huffman
coding. Arithmetic coding will reach the entropy, which is s logi ^ -\- (1 e) log2
j3^. The entropy tends to 0 if e -> 0. It shows that Huffman coding is a poor
choice in such cases. It is also stated in [12] that arithmetic coding gives better
compression in case of inaccurate probabilities. Time comparisons present that
arithmetic coding consumes much more time than Huffman coding.
Some variants of arithmetic coding appeared in the literature too. A modification was proposed by Teuhola and Raita [63]. They found some disadvantages of arithmetic coding, like nonpartial decodability, vulnerability, i.e., strong
effect of small (for example, one-bit) errors, etc. To avoid these problems, they
253
DATA COMPRESSION
first
63-1
c(56-63)
^48-55)-d(38-47)
after d
47
c(47)
b(46)
d(44-45)
after c
63-1
c(56-63)
b(48-55)
d(38-47)
a(22-37)
a(22-37)
e(0-21)
e(0-21)
0FIGURE 4
38 -^
Coding steps of Example 3 using the fixed-length arithmetic algorithm.
The main difference between this fixed-length arithmetic coding and the
original one is that in this case some collisions happen, i.e., some symbols can
fall into the same range after the division, if the original range is narrow. Because
this collision information is not utilized, the algorithm would not be optimal.
To avoid this problem, the modified version of the algorithm use this information. This means that if a collision happens, both the encoder and the decoder
know the symbols that are taking part in the collision. In the next step the
algorithm reduces the alphabet to these symbols. This can be done, because in
the case of a collision the next symbol is always restricted to one of the symbols
of the collisions. The main disadvantage of this algorithm is that the successive
code words are not independent and not separately decodable. The experimental results in [63] showed that these variants of arithmetic coding are rather
practical and their redundancy is very small for some longer code words.
At the end of this section we refer back to the modeling step. Those models
that have been introduced are suitable for supplying probabilities to such types
of encoder like the Huffman code or the Arithmetic code. In the later case a
coder is able to compress strings in that number of bits that are indicated by the
entropy of the model being used, and this is the minimum size possible for the
model chosen. Unfortunately, in the case of Huffman coding, this lower bound
can be achieved only under special conditions. This implies ^he importance of
254
DATA COMPRESSION
255
stay the same, so we do not need to reorganize the Huffman tree. Different
aging techniques are considered in [10]. We may have certain problems with
this technique:
The larger the time constant for aging, the slower the adaptation of the
model, which may result in a better estimate for slowly varying
statistics, but it can be irrelevant for rapidly varying statistics.
Rescaling can create fractional parts, and the incremental update tree
algorithm cannot handle these type of counts. So we need either to
round or to truncate them to the nearest integer, which can change the
structure of the Huffman tree dramatically.
Improvements and generalizations were considered in [20,43,64]. Finally,
Lelewer and Hirschberg [47] summarized the improvements proposed by Vitter
[64]. Weyland and Pucket [67] considered an adaptive technique for Huffmanlike trees for compression of a Gaussian source of fixed and floating point
numbers.
At the end of this subsection we must mention that the key of this technique
is always the recreation of the Huffman tree. Since it may be time-consuming
the resulting algorithm could be unsatisfactory for on-line coding.
2. Adaptive Arithmetic Coding
Using the Arithmetic coding we have counts for each symbol, and so we can
count the total as well. Normalizing the counts we can get the relative frequencies of the symbols. It is easy to see that such a model can be updated simply by
incrementing the count of the subsequent character. The only problem is the implementation. We must distinguish techniques developed for binary alphabets
from those effective for large alphabets. For binary alphabets the first method
was developed by Rissanen and Langton [56]. Efficient implementations for
adaptive arithmetic coding both for binary and large alphabets are discussed
in [10]. Further implementation can be find in [36,51] for large alphabets. It
is interesting that there are experimental results for measuring the efficiency
of some implementations; for example in [56] it has been pointed out that the
method examined method has a very good efficiency (about 98.5%). In [70]
the authors gave an implementation for the adaptive arithmetic coding, and a
detailed discussion of its performance was given.
256
GALAMBOS A N D BEKESI
dictionary
S=
Source word
Code word
Weight
Source word
Code word
Weight
THISJS.AN.EXAMPLE!
A
a
X
b
X
c
X
I
d
X
L M
e
f
X X
N
g
x
P
h
x
S
i
x
IS
m
_EX
n
XAM
o
THI
p
MPLE
q
MPLE!
r
V,
T H I
FIGURE 5
V,
S _ I
V,
V, V,
S _ A N _ E X A M P L E
DATA COMPRESSION
257
If we suppose that the weights of the edges are equal, then the shortest path
from vo to V19 is the path
Using the above model the solution of the problem becomes easy, since we can
apply one of the shortest-path algorithms for a directed, weighted graph. These
algorithms run always in polynomial time. If the corresponding graph has many
cut vertices (i.e., vertices that divide the original problem into independent subproblems) and if these subproblems are reasonably small, we can indeed solve
the problem efficiently and can compute the optimal encoding. Unfortunately,
in practice this will not be the case and a shortest-path algorithm cannot be
applied since it takes too much time and space to compute an optimal solution
for very long strings. Similarly, we have difficulties in case of on-line compression, where we must compress a source string block by block (where a block
is a segment of the given string). Therefore, heuristics have been developed to
derive near optimal solutions.
The earlier developed heuristics (for example, the longest fragment first
heuristic (LFF) cf. Schuegraf and Heaps [59]) have not been deeply analyzed
and only experimental results on their performance have been reported. Later,
when the worst-case analysis became more popular theseand other newly
createdalgorithms were analyzed also from a worst-case point of view. Most
of these algorithms are on-line. An on-line data compression algorithm starts
at the source vertex VQ^ examines all outgoing edges, and chooses one of them
according to some given rule. Then the algorithm continues this procedure from
the vertex reached via the chosen edge. There is no possibility of either undoing
a decision made at an earlier time or backtracking.
Of course, usually an on-line heuristic will generate only a suboptimal
compression. One possibility for measuring the "goodness" of an algorithm is
to analyze its worst-case behavior. This is generally measured by an asymptotic
worst-case ratio, which is defined as follows: Let D = {(t^/, Cj) : / = 1 , . . . , fe}
be a static dictionary and consider an arbitrary data compression algorithm A
Let A(D,S), respectively OP T(D,S), denote the compressed string produced by
algorithm A, respectively, the optimal encoding for a given source string S. The
length of these codings will be denoted by || A(D, S)||, respectively || OP T(D, S)||.
Then the asymptotic worst-case ratio of algorithm A is defined as
R.(D) = Jim sup { ^ ^ ^ ^ ^ ^ ^ : S e S ( . )
where S(n) is the set of all text strings containing exactly n characters.
The first worst-case analysis for an on-line data compression method was
performed by Katajainen and Raita [41]. They analyzed two simple on-line
heuristics, the longest matching and the differential greedy algorithm, which
will be defined exactly later.
Four parameters have been used in the literature to investigate the asymptotic worst-case ratios:
Bt(S) = length of each symbol of the source string S in bits
Imax(D) = m2ix{\wi\i = 1 , . . . , ^}
258
..,k]
Consider again now the original problem, i.e., when we want to find an
optimal encoding of a source text assuming that we have a static dictionary.
There were known results for solving this problem even at the beginning of the
1970s [57^65,66], As was mentioned before, Schuegraf and Heaps [58] showed
that this question is equivalent to the problem of finding a shortest path in a
related directed edge-weighted graph.
Later Katajainen and Raita [40] presented an optimal algorithm, which
can be considered as a refinement of the above methods. Their algorithm is
based on the general shortest-path method, but it uses cut vertices to divide the
problem into smaller parts. The authors used the graph model to describe the
steps of the algorithm. The dictionary is stored in an extended trie. A trie is a
multiway tree, where each path from the root to a node represents an element
from the dictionary. The extended trie was introduced by Aho and Corasick
[3], and it allows fast string matching. In an extended trie each node contains
a special pointercalled a failure transitionwhich points to the node whose
associated string is the longest proper suffix of the string associated to the given
node. Using these pointers a linked list is ordered to each node, and this list
contains the nodes associated to the string itself and to those strings of the
dictionary that are proper suffixes of it. Figure 6 illustrates the extended trie of
259
DATA COMPRESSION
FIGURE 6
&
T
a
R
b
I
c
E
d
IE
e
TR
f
RIE
g
Using the notations given in the Introduction, a formal description of the algorithm is the foUow^ing:
Let S = S1S2 . . . s be a source string and D = {(wi, c^) : i = 1 , . . . , ^} a static
dictionary;
Create an extended trie and add the dictionary strings wi,W2,.. .,Wkto it;
Create an empty output buffer B;
d(vo)
:=0;p(vo):=0;cp:=0;
for each character sy in S do
begin
Using the extended trie find the set I of indices which defines
those dictionary strings which match with the original text and end
at position /;
d(vo) := 00.;
for each index / in / do
begin
p := \Wi\;q := \\ci\\;
if d(vj) > d(vj-p) + q then d(vj) := d(vj-p) + qi p(vj) := /;
end
if / Ifnax > cp then
begin
p u t p(Vj-lrmx) t o B;
260
THEOREM
1 lmax(lmax 1) cmax
< IH
.
\B\
cmtn
21 [40]. Let D be a code-uniform dictionary. Then
T, /T^\
KA(D}
^ /x^v
RA(D)<1
THEOREM
lmax(lmaxl)
+ ^
<
THEOREM
\B\
cmtn
Imax (Imax 1) cmax
1H
\B\
cmtn
1H
RA(D)
..
.
if cmtn < Bt < cmax
.^
tfcmtn < cmax < Bt.
Imax cmax
\B\ cmtn
DATA COMPRESSION
26 I
CU
NL
RA(0)
Imax (Imax 1) cmax
^
1
|B|
cmin
Imax cmax
1+|B| cmin
Imax (Imax 1)
1 +
\B\
Imax (Imax 1) mm(Bt, cmax)
1+
\B\
cmin
Imax
1 _l
Imax cmax
cmin
lmax(lmax 1)
1 + -\B I
1+
Imax
In this section we review the most important on-Hne heuristics and compare
them from worst-case points of view. First of all we give the exact definitions.
The longest matching heuristic LM chooses at each vertex of the underlying
graph the longest outgoing arc, i.e., the arc corresponding to the encoding
of the longest substring starting at the current position. Ties can be broken
arbitrary.
Katajainen and Raita [41] analyzed the worst-case behavior of LM for
dictionaries that are code uniform/nonlengthening/suffix and they derived tight
bounds for all eight combinations of these properties.
The greedy heuristic, which we will call differential greedy DG (introduced
by Gonzalez-Smith and Storer [31]) chooses at each position the arc implied by
the dictionary entry (w//, c/) yielding the maximal "local compression," i.e., the
arc maximizing \wi \ Bt \\ci \\. Ties are broken arbitrarily. The fractional greedy
algorithm (introduced by Bekesi et aL [7]) takes at any actual position in the
source string the fractionally best possible local compression, i.e., if I is the set
of indices of the arcs emanating from the current node of the corresponding
graph then an arc /'o will be chosen such that
\\Ci\\
to = arg mm
iel \Wi\Bt
Obviously, although each heuristic is locally optimal, globally they can give a
rather poor result. On the other hand it is possible that some heuristic compresses optimally such inputs, for which another gives the worst possible result
and reverse.
262
GALAMBOS A N D BEKESI
It is intuitively clear that in many cases greedy type heuristics will perform
better than the LM heuristic, which does not care about code lengths at all.
There are also differences between the greedy methods. This is illustrated by
the following example (let a^ = a, a^^^ = aa\ i e N, for any character a):
EXAMPLE 8. Let us consider the following nonlengthening dictionary with
cmax = 4, cmin = 1 and, as usual for ASCII encodings, Bt = 8.
Source word
Code word
Weight
u
10
2
1101
4
uv
1100
4
RDG(D)
cmax
(lmax-l)cmtn
if (Imax if" cmax Bt < cmin^
and
cmax cmtn
Bt
> Imax 1
(Imax l)cmax
otherwise.
cmin
The above result presents that from worst-case point of view the three
methods work very similar. Only the ratio of the DO algorithm differs a little
in a special case.
Another interesting problem may be to analyze the heuristics for different
types of dictionaries. First we will investigate suffix dictionaries and present the
most important results.
THEOREM
RLM(D)
cmax
cmtn
cmin + (Imax l)cmax
cmin + (Imax l)Bt
RDG(D)
(Imax l)cmax
otherwise,
cmtn
RVG(D)
<
cmax(ln(lmax 1) + 1)
cmtn
and there exists a suffix dictionary DQ, for which
cmax(ln(lmax
\)-\-llnT)
RVG(DO).
cmtn
> Imax 1
263
DATA COMPRESSION
cmin
f cmin + (Imax l)cmax
cmin + (Imax l)Bt
RDG(D)
= \
(Imax l)cmax
cmin
;..2
if (Imax 1) cmax Bt < cmin
, cmax cmin
and L
J > Imax 1,
Bt
otherwise.
We can conclude that prefix property does not help at all in the worst
case, because our results are the same as those in the general case. Finally we
compare the three heuristics for nonlengthening dictionaries. This case may be
interesting, because all heuristics give the same worst-case ratio.
THEOREM
string. Then
RUA(L>)
RLM(D)
= I
(Imax l)cmax
cmin
(Imax l)Bt + cmax
cmin
(Imax Bt
if cmax < Bt
if Bt < cmax < IBt
iflBt
< cmax.
cmin
4. Almost On-line Heuristics
More than twenty years ago Shuegraf and Heaps [59] introduced the
longest fragment first (LFF) heuristic. They supposed that the file we want to
compress is divided into records of the same lengths. The idea is the following:
the longest word fragmentwithin the actual recordwhich matches a word
from the dictionary is chosen and encoded. Then the overlapping matches are
eliminated and the algorithm works repeatedly.
Later Stauffer and Hirschberg [61] presented the so-called LFF parsing
algorithm, which was based on the above idea. However they described the
algorithm for a parallel architecture. In this model a processor is assigned to
each character of the input string and these processors can work in parallel. First each processor calculates the list of the lengths of matches between
the dictionary and the input string beginning at its position. Then the algorithm determines the maximum match length. Starting from this match length,
the algorithm goes down to length 1, and in each step it finds the maximum
264
pp^-h
end
The proof of the correctness of the algorithm is given in [52].
A variation of the above-mentioned longest fragment first parsing algorithm
is when the remaining part of the record is compressed with an on-line algorithm
(the LM heuristic can be chosen, for example). If the file has no record structure,
we can consider a buffer, which always contains the actual part of the text (so
265
DATA COMPRESSION
we will talk about buffers instead of records). Now, before reading the next
part of the file into the buffer, the algorithm always encodes the whole buffer.
We will refer to a whole buffer-coding procedure as a step. We will denote this
algorithm by LFFLM- TO avoid double indexing, instead of RLFFLM(^) ^ ^ write
RLFF(LM,D).
As one can see algorithm LFFLM gives up the strict on-line property, because
it looks ahead in the buffer, getting more information about its content. One has
a feeling that the more information about the text to be compressed, the better
worst-case behavior of a good algorithm. This suggests to us that this algorithm
behaves better than the on-line ones. The experimental results showed [59] that
the LFF-type algorithms give a compression ratio better than that of the longest
matching or the differential greedy heuristics. In the paper [8] we presented
some theoretical results on the behavior of the algorithm LFFLM- Here we
mention the most important theorems.
THEOREM
r-.
t
THEOREM
cmin
RLFF(LM,D) =
iflBt
< cmax.
where
T =
THEOREM
t cmin
30 [8]. Let D be a prefix dictionary. Then
D
RLFF(LM,D) =
cmax
r-.
t
cmin
THEOREM 31 [8]. Let D be a suffix dictionary. Then
l(lmax 1 ) \ cmax
ift>3
1 +
RLFF(LM,D) =
\
t
) cmin
ift = 2.
/lmax-\-1\
cmax
[\
2
J cmin
B. Adaptive Methods
I. The LZ77 Algorithm
This algorithm maintains a window and a lookahead buffer. The window
contains some characters backwards, i.e., the last coded characters. The lookeahed buffer contains some characters to be coded. The pointers of the LZ77
method point to the occurrences of the substrings from the lookeahed buffer
to the occurrences of the same substring in the window. Since it is possible
266
that only one character match can be found, the output can contain individual
characters too. Denote the length of the lookeahed buffer by L5. The algorithm
can work on a buffer of length w, which contains always a part of the source
text. The length of the window is w L5. In the current step we find the longest
substring in the windows that match for the lookahead buffer starting from the
beginning. The two matching substrings can overlap, but they cannot be the
same. This match is coded by a triplet (/, /, a)^ where / is the index of the found
substring in the window, ; is its length, and a is the first character that has not
matched. Then we move the buffer right on the text by / + 1 characters and
continue the process. Putting the character a to (i, j) ensures the working of the
algorithm in the case when we have no match at all.
More formally the algorithm works as follows. First we introduce some
notations. Let n be the length of the applied buffer, A is the basic alphabet, and S
is the source string. As mentioned before denote by Ls the length of the window
and by n the length of the buffer, and let Lc = 1 + \^og(n Ls)^ -\- [logiL^)!,
where the basis of the logarithm is |A|. LQ means the fixed length of the codes,
which are created from alphabet A too. Let S(l, /) be a real prefix of the string S,
and let /, 1 < / < ; a given integer. Let L(i) = max {/ : S(/, / + / 1) = S(/-f 1,
/ + /)}, and L(p) = maxi<^<y L(i), We refer to the S(j + 1, / + L(p)) string as a
reproducible extension of S(l, /) into S. It can be seen that S(j + 1, / + L(p)) is
the longest substring among the matching substrings of S beginning in S(l, /").
9. Let 5 = 01101101 and j=4. Then L(l) = 0, L(2) = 4,
L(3) = 1, L(4) = 0. So S(4 + 1,4 + 4) = 1101 is the reproducible extension
of S(l, 4) into S with/? = 2.
EXAMPLE
L5 = 8, n = 16 ^ Lc = 1 + log2(16 - 8) + log2 8 = 6.
Bi = 00000000100101011
B2 = 00000001101011011
B3 = 00010101110110111
pi = 8, h=3
p2 = 7, /i = 5
p3 = 4, /3 = 4
Ci = 111|010|1
C2 = 101|100|1
C3 = 011|011|1
DATA COMPRESSION
267
The LZ78 algorithm is a simple variation of the LZ77 method. It is obvious that the size of the window gives some restriction for finding matchings
in the coded part of the text. LZ78 eliminates this problem, since it records
all the previous matchings in an explicit dictionary. Of course this dictionary
can contain many words, if we compress a large text. Fortunately there is a
good data structure for making searching efficient; this is the trie. Each node of
the trie contains the number of the represented dictionary item. This number
is used for coding. Because of this representation, adding a new word to the
dictionary requires only creating a new child from the node representing the
longest match. Formally the LZ78 algorithm is as follows [72]:
Let Bi = S(l,), where n is the length of the buffer and let / = 1.
Construct an empty dictionary D using the trie data structure.
Consider the buffer B/, / > 1, and let Si = B/(l, //), where the length of
// 1 prefix of Si is the longest prefix of B^(l, 1) which matches a
dictionary item in D. Add Si to the dictionary D with a new node index.
Let pi be the index of the above matching dictionary item, or let pi be 0
if there is no match at all. Then the code word Q for Si is Q = Q1Q2,
where Q i is some representation of /?/, while Q2 is the last symbol of S/.
Modify the contents of Bi that we leave the first /^, and load the next li
characters. Increase the value of / by 1, and continue the algorithm with
Step 2.
268
01
010
10
11
Oil
101
1011
Number
Output
0,0
1,1
2,0
4,0
4,1
2,1
5,1
0,1
8,1
100
10
5,0
EXAMPLE 11. Consider again the same binary series as in the case of the
LZ77 algorithm
S = 001010110110111011011100.
Table 4 shows the dictionary generated by LZ77 for S. Figure 7 illustrates the
dictionary trie.
Decoding of an LZ78 compressed code starts with an empty dictionary.
In each step a pair of codes is read from the input. This code word refers to
an existing dictionary string or if it is zero, it contains only one symbol. Then
this new string is added to the dictionary in the same way as in the encoding
process. This way during the decoding the dictionary changes the same way as
in the case of encoding. So we get the proper source string.
4. The L Z W Algorithm
DATA COMPRESSION
269
Let pi be the index of the above matching dictionary item. Then the
code word Q for Si is some representation of pi.
Modify the contents of Bi that we leave the first /| 1, and load the
next li 1 characters. Increase the value of / by 1, and continue the
algorithm with Step 2.
Y. UNIVERSAL CODING
The Noiseless Coding Theorem gives a relation between the cost of the optimal
prefix code and the entropy of the given source. It is always assumed that
the probability distribution is explicitly given. Unfortunately sometimes this
distribution is unknown or it is impossible to determine the characteristics of
the source. A natural question is whether it is possible to find such a code that
is optimal for any probability distribution instead of a particular one. This kind
of code is called universal.
Many authors investigated the problem of finding universal codes
[24,25,42,69]. Universal codes can be classified into two classes. Some kind
of universal coding techniques are similar to statistical coding methods; i.e.,
they order a code to the characters of the basic alphabet. Other codes are based
on dictionary technique. A "statistical" universal code was introduced by Elias
[24,25]. The idea of the Elias codes is to represent each source string by integers and then order to each integer a code. This code is given explicitly and
can be applied for each message. Calculating the code of an integer x consists
of two phases. First we consider the binary representation of x^ prefaced by
[lg:x:J zeros. The binary value of x is expressed in the least possible bits, so it
begins with 1. Therefore the prefix property is held. In the second phase this
code is calculated for the integer [IgxJ + 1. Then the binary value of x is appended to this code without the leading 1. This way the resulting code word has
length [Ig^J + 2[lg(l + Llg^J)J + 1- EUas proved the following theorem:
THEOREM
where E (n) is the expected code word length for a source word of length n
divided by n and H (F) is the entropy of the source.
Table 5 summarizes the Elias codes of the first 8 integers.
EXAMPLE 12. Consider an alphabet that contains four symbols a^b^c, d.
Let S be hbaaadccd. The Elias code of S can be constructed as
Symbol
Frequency
Rank
Code word
a
b
c
d
3
3
2
2
1
2
3
4
1
0100
0101
01100
270
TABLES
Elias Codes
Number
Code
1
2
3
4
5
6
7
8
1
0100
0101
01100
01101
OHIO
01111
00100000
Frequency
Rank
Codeword
a
b
c
d
3
3
2
2
1
2
3
4
11
Oil
0011
1011
DATA COMPRESSION
H H
27 I
Fibonacci representation
Fibonacci code
1
2
3
4
5
6
7
8
1
10
100
101
1000
1001
1010
10000
11
Oil
0011
1011
00011
10011
01011
000011
It is also interesting to find a very simple code that still has the universal
property. One of the simplistic universal code was discovered by Neuhoff and
Shields [53]. The idea of the encoding is based on a simple dictionary coding
technique. The dictionary is formed where the original source is divided into
some blocks of a given length /. The dictionary will contain these /-blocks as
source words. The code word of a given dictionary source word is its location
in the dictionary using fixed-length encoding. To get universal code all possible
block lengths should be investigated and the one that produces the shortest code
should be chosen. Denote the length of this shortest code by L for a source
word of length n, Neuhoff and Shields proved the following theorem.
THEOREM 33 [53]. For any stationary, ergodic source F with alphabet
A and entropy H(F) the encoding rate ^ converges to entropy H(F) in both
expected value and almost surely as n^^ oo.
272
0000000011111111111000000000001111111111
wSbllwllblO
MTF code
List
2
1
2
1
2
4
3
1
2
a^ b, c, d
b, a, c, d
b, a, c, d
a, b, c, d
a, b, c, d
b, a, c, d
d, b, c, a
c, d, b, a
c, d, b, a
d, c, b, a
b
a
a
b
d
c
c
Bentley et al, showed that for arbitrary probabilities of symbols, the expected number of bits to encode one symbol of S using MTF coding is linear
in the entropy of the source.
Recently Albers and Mitzenmacher [2] presented a new list update algorithm that can be applied for data compression. They called it the Timestamp
(TS(0)) algorithm. TS(0) inserts the requested item x in front of the first item
DATA COMPRESSION
273
in the list that has been requested at most once since the last request to x. If x
has not been requested so far, it leaves the position of x unchanged. Albers and
Mitzenmacher proved that for TS(0) the expected number of bits to encode
one symbol of a source string is also linear in the entropy of the source, and
the constant is slightly better than that of the MTF coding.
C. Special Codes and Data Types
Sometimes using some special coding can compress data. For example, English
text can be coded where each character is represented by a 7-bit ASCII code.
This technique is widely used to save storage. Binary coded decimal (BCD) is a
way of storing integers. Using this, four bits represent each digit. This way we
can store 100 different numbers in one byte. Some other similar techniques for
saving space also exist, see [10] for details.
Yll. CONCLUSIONS
In this chapter we gave a review of the most important data compression techniques and some issues of the theories behind them. In the first part of the
chapter we presented some classical information theoretical theorems. These
results give the basis of many data compression methods, especially for statistical algorithms. The methods in the chapter can be divided into four groups.
These are statistical coding, dictionary coding, universal coding, and some special techniques. Statistical coding methods assume a priori knowledge of some
statistical characteristics of the data to be compressed. This information is usually the relative frequency of the symbols of the data. The most well-known
methods and the corresponding theory were presented in the chapter. Dictionary coding uses a completely different idea. In this case fragments of the data
are substituted with some code words ordered to these fragments by a given
dictionary. The dictionary can be static or can change dynamically during the
coding process. We presented many theoretical results for different static dictionary algorithms. Classical methods were also given in the chapter. Universal
coding is useful if we do not have any information on the characteristics of the
source data. So it is possible that they give worse results than some statistical
methods. Some simple universal coding methods were also described. Finally
we gave some well-known special methods. These techniques can be efficiently
applied for data with special properties.
REFERENCES
1. Abramson, N. Information Theory and Coding. McGraw-Hill, New York, 1963.
2. Albers, S., and Mitzenmacher, M. Average case analyses of list update algorithms with application to data compression. Algorithmica 2 1 : 312-329, 1998.
3. Aho, A. v., and Corasick, M. J. Efficient string matching: An aid to bibliographic search.
Commun. ACM 18(6): 333-340, 1975.
4. Angluin, D., and Smith, C. H. Inductive inference: Theory and methods. Comput. Surveys
15(3): 237-269, 1983.
274
DATA COMPRESSION
275
35. Huffman, D. A method for the construction of minimum-redundancy codes. In Proc. Inst.
Electr. Electron. Engineers 40(9): 1098-1101,1952.
36. Jones, D. W. AppUcation of splay trees to data compression. Commun. ACM 3: 280-291,
1988.
37. Johnsen, O. On the redundancy of binary Huffman codes. IEEE Trans. Inform. Theory 26(2):
220-222,1980.
38. Karp, R. M. Minimum-redundancy coding for the discrete noisless channel. IRE Trans. Inform.
Theory 7: 27-39, 1961.
39. Katajainen, J., Pentonnen, N., and Teuhola, J. Syntax-directed compression of program files.
Software-Practice Exper. 16(3): 269-276,1986.
40. Katajainen, J., and Raita, T. An approximation algorithm for space-optimal encoding of a text.
Computer}. 32(3): 228-237,1989.
41. Katajainen, J., and Raita, T. An analysis of the longest matching and the greedy heuristic in
text encoding./. Assoc. Comput. Mach. 39: 281-294, 1992.
42. Kieffer, J. C. A survey of the theory of source coding. IEEE Trans. Inform. Theory 39: 14731490,1993.
43. Knuth, D. E. Dynamic Huffman coding. / . Algorithms 6: 163-180, 1985.
44. Kraft, L. G. A device for quantizing, grouping, and coding amplitude modulated pulses. M. Sc.
Thesis, Department of Electrical, Engineering, MIT, Cambridge, MA, 1949.
45. Krause, R. M. Channels which transmit letters of unequal duration. Inform. Control 5:13-24,
1962.
46. Larmore, L. L., and Hirschberg, D. S. A fast algorithm for optimal length-limited codes, Technical report, Department of Information and Computer Science, University of California, Irvine,
CA, 1990.
47. Lelewer, D. A., and Hirschberg, D. S. Data compression. Technical Report, 87-10, Department
of Information and Computer Science. University of California, Irvine, CA, 1987.
48. Lovasz, L. Personal communication, 1995.
49. Mehlhorn, K. An efficient algorithm for constructing nearly optimal prefix codes. IEEE Trans.
Inform. Theory 26(5): 513-517, 1980.
50. Mukherjee, A., and Bassiouni, M. A. On-the fly Algorithms for data compresion. In Proc.
ACM/IEEE Fall Joint Computer Conference. 1987.
51. Moffat, A. A data structure for arithmetic encoding on large alphabets. In Proc. 11 th. Australian
Computer Science Conference. Brisbane, Australia, pp. 309-317.
52. Nagumo, H., Lu, M., and Watson, K. On-Une longest fragment first parsing algorithm. Inform.
Process. Lett. 59: 91-96, 1996.
53. Neuhoff, D. L., and Shields, P. C. Simplistic universal coding. IEEE Trans. Inform. Theory
44(2): 778-781,1998.
54. Rissanen, J. J. Generalized Kraft inequality and arithmetic coding. IBM J. Res. Develop.
20(3): 198-203, 1976.
55. Rissanen, J. J., and Langdon, G. G. Arithmetic coding. IBM J. Res. Develop. 23(2): 149-162,
1979.
56. Rissanen, J. J., and Langdon, G. G. Universal modeling and coding. IEEE Trans. Inform.
Theory 27(1): 12-23, 1981.
57. Rubin, F. Experiments in text file compression. Commun. ACM 19(11): 617-623, 1976.
58. Shuegraf, E. J., and Heaps, H. S. Selection of equifrequent word fragments for information
retrieval. Inform. Storage Retrieval 9: 697-711, 1973.
59. Shuegraf, E. J., and Heaps, H. S. A comparision of algorithms for database compression by use
of fragments as language elements. Inform. Storage Retrieval 10: 309-319, 1974.
60. Shannon, C. E. A mathematical theory of communication. Bell System Tech. J. 27: 398-403,
1948.
61. Stauffer, L. M., and Hirschberg, D. S. PRAM algorithms for static dictionary compression. In
Proc. 8th International Parallel Processing Symposium, pp. 344-348,1994.
62. Teng, S. H. The construction of Huffman-equevalent prefix code in NC. ACM SIGACT News
18: 54-61, 1987.
63. Teuhola, J., and Raita, T. Arithmetic coding into fixed-length code-words. IEEE Trans. Inform.
Theory 40(1): 219-223,1994.
276
I.
II.
III.
IV.
V.
INTRODUCTION 277
MODEL-BASED OBJECT RECOGNITION 278
PRINCIPLES OF GEOMETRIC HASHING 279
EXAMPLES 281
IMPLEMENTATION ISSUES 284
A. Footprint Quality and Matching Parameter 284
B. Rehashing 284
VI. APPLICATIONS 284
A. Molecular Biology 285
B. Medical Imaging 286
C. Other Applications 286
REFERENCES 286
INTRODUCTION
The Geometric Hashing technique was introduced, about a decade ago, as an
efficient method for object recognition in computer vision. Since then it has
been applied in many other fields, e.g., in medical imaging, molecular biology,
and computer-aided design. The underlying idea of Geometric Hashing is using
a database for storing pieces of information (features) of known geometric
objects, in such a way that will allow a fast recognition of an unknown query
object.
The main advantage of this technique is its ability to perform partial
matching between geometric objects, e.g., for recognizing objects in an image
which are partially occluded or have undergone some transformation. It does
not depend on the existence of any particular predefined features in the
matched objects. It is usually very easy to implement, and it performs fast and
accurately.
The Geometric Hashing technique is an indexing-based approach, where
local features of objects are encoded and stored in a database. Such approaches
are now widely recognized as the method of choice for implementing reliable
recognition systems that handle large model databases.
^ ^^
277
278
GILL BAREQUET
279
and by Schwartz and Sharir [26]. This technique, which uses the Geometric
Hashing method, originally solved the curve matching problem in the plane,
under the restrictive assumption that one curve is a proper subcurve of the
other one, namely: Given two curves in the plane, such that one is a (slight
deformation of a) proper subcurve of the other, find the translation and rotation
of the subcurve that yields the best least-squares fit to the appropriate portion
of the longer curve.
This technique was extended (by removing the curve-containment restriction) and used in computer vision for automatic identification of partially obscured objects in two or three dimensions. The Geometric Hashing technique
was applied [17,20,21,28] in various ways for identifying partial curve matches
between an input scene boundary and a preprocessed set of known object
boundaries. This was used for the determination of the objects participating
in the scene, and the computation of the position and orientation of each such
object.
280
GILL BAREQUET
In the preprocessing step, features of all the curves are generated, encoded,
and stored in a database. Each curve is scanned and footprints are generated
at equally spaced points along the curve. Each point is labeled by its sequential
number (proportional to the arclength) along the curve. The footprint is chosen
so that it is invariant under a rigid motion of the curve. A typical (though
certainly not exclusive) choice of a footprint is the second derivative (with
respect to arclength) of the curve function; that is, the change in the direction
of the tangent line to the curve at each point. Each such footprint is used as a
key to a hashing table, where we record the curve and the label of the point
along the curve at which this footprint was generated. The (expected) time
and space complexity of the preprocessing step is linear in the total number
of sample points on the curves stored in the database. Since the processing of
each curve is independent of the others, the given curves can be processed in
parallel. Moreover, adding new curves to the database (or deleting curves from
it) can always be performed without recomputing the entire hashing table. The
construction of the database is performed off-line before the actual matching.
In the recognition step, the query curve is scanned and footprints are computed at equally spaced points, with the same discretization parameter as for
the preprocessed curves. For each such footprint we locate the appropriate
entry in the hashing table, and retrieve all the pairs (curve,label) stored in it.
Each such pair contributes one vote for the model curve and for the relative
shift between this curve and the query curve. The shift is simply the difference
between the labels of the matched points. That is, if the footprint of the /th
sample point of the query curve is close enough to the footprint of the /th point
of model curve c, then we add one vote to the curve c with the relative shift
/ /. In order to tolerate small deviations in the footprints, we do not fetch
from the hashing table only the entry with the same footprints as that of the
point along the query curve, but also entries within some small neighborhood
of the footprint. A commonly used implementation of this process is by rangesearching (see, e.g., [23, p. 69] or [10]). The major assumption on which this
voting mechanism relies is that real matches between long portions of curves
result in a large number of footprint similarities (and hence votes) between the
appropriate model and query curves, with almost identical shifts. By the end
of the voting process we identify those (curve,shift) pairs that got most of the
votes, and for each such pair we determine the approximate endpoints of the
matched portions of the model and the query curves, under this shift. It is then
straightforward to compute the rigid transformation between the two curves,
with accuracy that increases with the length of the matched portions. The running time of the matching step is, on the average, linear in the number of sample
points generated along the query curve. This is based on the assumptions that
on the average each access to the hashing table requires constant time, and
that the output of the corresponding range-searching queries has constant size.
Thus, the expected running time of this step does not depend on the number of
curves stored in the database and on the total number of points on the curves.
Many generalizations and applications of the Geometric Hashing technique
have appeared in the literature. These include different choices of the allowed
transformations, specific domains in which the technique is used (e.g., locating
an object in a raster image, registration of medical images, molecule docking).
281
^3
3
2<
^2
P^
94
95
0^'
P?.
1^ )
^P3
r\(
U^
() 1
P6
P4
2
FIGURE I
10 11 12 13 14
and generalizations to higher dimensions. We note that in most cases the key to
success is defining a good footprint system (in the sense detailed above), so that
the "correct" solutions manifest themselves by sufficiently many correct votes.
In practice, every application of the Geometric Hashing technique has its own
special footprint setting, which strongly depends on the nature of the problem
in question.
lY. EXAMPLES
Let us illustrate the course of Geometric Hashing by a simple example of matching the two planar point sets shown in Fig. 1. The two sets, denoted by P and
Q, contain six and five points, respectively.
In the first matching experiment we allow only translations, and we assume
the availability of only the point coordinates. Here every pair of matched points,
one of each set, defines uniquely the translation that maps P to Q. (Specifically,
if the point p e P matches the point q e Q, then the sought translation is simply
qp.) Thus the matched feature is a point, and in the absence of any additional
information, each pair of points (p, q) (where p e P and q ^ Q) contributes
one vote for the translation ^ ^. (In situations where we trust some of the
matching pairs, we can weigh the votes.) The resulting voting table, which is
also a detailed description of the Minkowski difference between the sets Q and
P (denoted as Q 0 F), is shown in Fig. 2. Indeed, the "correct" translation (10,2)
received the largest number of votes (4). This is because four pairs of points,
one of each set (namely, (pi? ^i)? (ps? qi)-> (p4, ^4)5 and (p6, qs)) casted votes for
this translation. All the other 26 votes were (luckily) spread in the voting table
so that no other translation received four or more votes. Geometric Hashing
is thus a viable method when the number of "correct" votes (compared to a
4
3
2
1
0
1
1
1
1
1
7
1
2
2
1
8
1
4
2
1
10
1
1
1
1
11
2
1
1
12
13
1
1
1
14
282
GILL BAREQUET
283
Pin
P3,g2
(Is
P4,94
JP^
P6,95
4
3
2
1
X
X
P6,^2
(a
P4,94
P3,95
(a
<a
_r\
10 11 12 13 14
(a)(9 = 0, t = ( 1 0 , 2 )
FIGURE 3
o 12 13pi^
"j) 10 11
U
have significantly different footprints. This assumption allows us to significantly speed up the matching process by storing each set in a database (usually
a hashing table) whose keys are the footprints. As noted above, the database
implementation should allow not only fetching the entry (or entries) that match
a given key, but also fetching all the entries whose keys are close to the given key
up to some specified tolerance. This is usually achieved by implementing a data
structure that allows range-searching queries. In our example it suffices to store
only the set P in a database. For matching under translation only, each point
q e Q (or, rather, its footprint) is used as a key for a range-searching query in
the database that contains the set P. Each point p e P returned by the query
then casts a vote ior qp in the voting table, and the matching proceeds as
described above.
Refer again to the matching experiment in which we allowed rotations
and translations. In this experiment the matched feature was a pair of points.
Assuming that each set contains n points, each set has OirP') pairs and we
now potentially have 0(n^) candidate matches. An alternative used in many
Geometric Hashing works (e.g., [14]) is to use every pair of points as a basis
of a coordinate system, to specify the coordinates of all the other points of the
same set with respect to that system, and to store all the points in the database
in a redundant manner, so that each point is specified in terms of all the bases
in which it does not take part. The matching feature in this variant is a single
point: points are matched according to their basis-defined coordinates. The
information given by the match (namely, the two basesone of each set, and
the translation between the two matched points) is enough for casting a vote for
a unique rotation and a unique translation. Asymptotically we do here the same
amount of work. Each point appears (redundantly) 0(n) times in the database,
so the number of stored points is now 0(^). Then the matching step considers
every pair of points, one of each database, giving us a total of 0(n^) work.
The same ideas work for higher dimensions. In a 3-dimensional space, for
example, we can consider triples of points. For every pair of congruent triangles,
one defined by three points of the first set and the other defined by three points
of the other set, we cast a vote for the transformation that maps the first triple
to the second triple. Alternatively, we can have every three noncoUinear points
define a basis for a coordinate system and redundantly represent each point
by its coordinates with respect to all the bases. In the matching step each pair
of points, together with their respective coordinates according to some basis.
284
GILL BAREQUET
casts a vote for the appropriate transformation. In both methods we may use
footprints that depend on the application for pruning the matched pairs of
features.
V. IMPLEMENTATION ISSUES
A. Footprint Quality and Matching Parameter
The success of the Geometric Hashing technique crucially relies on the "descriptiveness" of the footprint system. That is, we expect features that should
match to have similar footprints, and expect features that should not match
to have footprints that differ significantly enough. In practice, the amount of
incorrect votes usually dominates the amount of correct votes, but when the
distribution of the incorrect votes does not have too-high random peaks, the
correct votes still exhibit an "accumulation point" in the voting table, which
suggests the correct solution for the matching problem.
As mentioned earlier, the running time of the matching step is, on average,
linear in the total number of features in the two sets. This is based on the
assumptions that on average each access to the hashing table requires constant
time, and that the output of the corresponding queries has constant size. This
is due to the nature of hashing and does not assume anything about the input
to the algorithm. Nevertheless, it requires a reasonable choice of the proximity
parameter , which should yield on average a constant number of output points
for each range-searching query. Improper choice of e, say, equal to the size of
the entire set, will result in a running time that is quadratic in the complexity
of the input.
B. Rehashing
Even when the footprint quality is satisfactory, it is desirable to have the distribution of footprints (in the space of invariants) as uniform as possible. This
is for optimizing the performance of the hashing table. A highly nonuniform
distribution interferes with the balance of the hashing bins that store the footprints, while the most occupied bin determines the worst-case performance
of the hashing table. When the probability density function of the footprints
is known, one can transform the input point coordinates so as to make the
expected distribution of footprints uniform (see Figs. 5a and 6a of [29, p. 16].
If the rehashing function is chosen carefully, the new hashing table can have
the same number of bins as that of the original table.
Yl. APPLICATIONS
As noted earlier, the first application of Geometric Hashing was for partial curve
matching in the plane [19,26]. This application was later extended in [17,20,
21,28] for identifying objects participating in a scene. (In fact the silhouettes
of the objects and of the scene were used to solve a 2-dimensional matching
285
286
GILL BAREQUET
the atom (obtained in a preprocessing step), are also considered. In all cases,
the footprints are stored in a hashing table, which makes it possible to retrieve
entries with some tolerance. Here this is needed not just because of the noisy
footprints, but also because of the conformational changes that might occur in
the molecule structures during the reaction between them.
B. Medical Imaging
The topic of medical image matching has attracted a lot of attention in the
medical literature. The problem arises when complementary information about
some organ is obtained by several imaging techniques, such as CT (computed
tomography) and MRI (magnetic resonance imaging). The goal is to match
(register) the various models of the same organ obtained by these methods, in
order to obtain a single improved and more accurate model. Such a registration
is needed because the orientations of the organ usually differ from one model
to another.
Many methods, which are similar to the methods for object recognition,
were proposed for the solution of this organ registration problem. These include, among many others, approximated least-squares fit between a small
number of markers, singular-value decomposition for matching point pairs,
high-order polynomials for a least-squares fit, "thin-plate spline" for registering
intrinsic landmarks or extrinsic markers, parametric correspondence, chamfer
maps, partial contour matching, moments and principal axes matching, and correlation functions. Detailed reviews of image-registration techniques are given
in [6,12]. Geometric Hashing was also exploited for alignment of medical data
by Barequet and Sharir [5] and by Gueziec et al. [16]. The former work matched
3-dimensional point sets (voxels), while the latter work registered 3dimensional curves extracted from the data.
C. Other Applications
Geometric Hashing has been applied to other matching problems as well.
Germain et aL [15] use this technique for matching real (human) fingerprints
for noncriminal identification applications. Barequet and Sharir [2,3] apply
Geometric Hashing to solve a computer-aided design problem, namely, for detecting and repairing defects in the boundary of a polyhedral object. These
defects, usually caused by problems in CAD software, consist of small gaps
bounded by edges that are incident to only one face of the model. Barequet and
Sharir [4] apply a similar technique to the reconstruction of a three-dimensional
surface (bounding a human organ) from a series of polygonal cross sections.
REFERENCES
1. Ballard, D. H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit.
13 (2): 111-122, 1981.
2. Barequet, G. Using geometric hashing to repair CAD objects. IEEE Comput. Set. Engrg. 4 (4):
22-28,1997.
287
3. Barequet, G. and Sharir, M. Filling gaps in the boundary of a polyhedron. Comput. Aided
Geom. Design 12 (2): 207-229, 1995.
4. Barequet, G. and Sharir, M. Piecewise-linear interpolation between polygonal slices. Comput.
Vision Image Understanding, 63 (2): 251-272, 1996.
5. Barequet, G. and Sharir, M. Partial surface and volume matching in three dimensions. IEEE
Trans. Pattern Anal. Mach. Intell. 19 (9): 929-948, 1997.
6. Brown, L. G. A survey of image registration techniques. ACM Comput. Surveys l^i 325-376,
1992.
7. BoUes, R. C. and Cain, R. A. Recognizing and locating partially visible objects: The localfeature-focus method. Int. J. Robot. Res. 1 (3): 637-643, 1982.
8. Besl, P. J. and Jain, R. C. Three-dimensional object recognition. ACM Comput. Surveys 17 (1):
75-154,1985.
9. Besl, P. J. and McKay, N. D. A method for registration of 3-D shapes. IEEE Trans. Pattern
Anal. Mach. Intell. 14 (2): 239-256, 1992.
10. Chazelle, B. A functional approach to data structures and its use in multidimensional searching.
SIAMJ. Comput. 17 (3): 427-462,1988.
11. Chin, R. T. and Dyer, C.R. Model-based recognition in robot vision. ACM Comput. Surveys
18 (1): 67-108, 1986.
12. van der Elsen, P. A. Pol, E. J. D., and Viergever, M. A. Medical image matchingA review with
classification. IEEE Engrg. Med. Biol. 12 (1): 26-39, 1993.
13. Fischer, D., Bachar, O., Nussinov, R., and Wolfson, H. J. An efficient computer vision based
technique for detection of three dimensional structural motifs in proteins./. Biomolec. Structure
Dynam. 9: 769-789, 1992.
14. Fischer, D., Norel, R., Nussinov, R., and Wolfson, H. J. 3-D docking of protein molecules. In
Proc. 4th Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science 684,
pp. 20-34. Springer-Verlag, Berlin, 1993.
15. Germain, R. S., Cafifano, A., and Colville, S. Fingerprint matching using transformation parameter clustering. IEEE Comput. Sci. Engrg. 4 (4): 42-49, 1997.
16. Gueziec, A. P., Pennec, X., and Ayache, N. Medical image registration using geometric hashing.
IEEE Comput. Sci. Engrg. 4 (4): 29-41, 1997.
17. Hong, J. and Wolfson, H. J. An improved model-based matching method using footprints. In
Proc. 9th Int. Conf. on Pattern Recognition, Rome, Italy, November 1988, pp. 72-78.
18. Huttenlocher, D. P. and UUman, S. Recognizing solid objects by alignment with an image. Int.
J. Comput. Vision 5 (2): 195-212, 1990.
19. Kalvin, A., Schonberg, E., Schwartz, J. T , and Sharir, M. Two-dimensional, model based,
boundary matching using footprints. Int. J. Robot. Res. 5 (4): 38-55, 1986.
20. Kishon, E., Hastie, T , and Wolfson, H. 3-D curve matching using splines. / . Robot. Systems
8 (6): 723-743, 1991.
21. Lamdan, Y., Schwartz, J. T , and Wolfson, H. J. Affine invariant model-based object recognition.
IEEE Trans. Robot. Automat. 6 (5): 578-589, 1991.
22. Linnainmaa, S., Harwood, D., and Davis, L. S. Pose determination of a three-dimensional
object using triangle pairs. IEEE Trans. Pattern Anal. Mach. Intell. 10 (5): 634-647, 1988.
23. Mehlhorn, K. Data Structures and Algorithms 3: Multi-Dimensional Searching and Computational Geometry (Brauer, W , Rozenberg, G., and Salomaa, A. Eds.). Springer-Verlag, Berlin,
1984.
24. Nussinov, R. and Wolfson, H. J. Efficient detection of three-dimensional structural motifs in
biological macromolecules by computer vision techniques. In Proc. Natl. Acad. Sci. USA 88:
10,495-10,499,1991.
25. Sankoff, D. and Kruskal, J. B. Time Warps, String Edits and Macromolecules. Addison-Wesley,
Reading, MA, 1983.
26. Schwartz, J.T. and Sharir, M. Identification of partially obscured objects in two and three
dimensions by matching noisy characteristic curves. Int. J. Robot. Res. 6 (2): 29-44, 1987.
27. Stockman, G. Object recognition and localization via pose clustering. Comput. Vision Graphics
Image Process. 40 (3): 361-387,1987.
28. Wolfson, H. J. On curve matching. IEEE Trans. Pattern Anal. Mach. Intell. 12 (5): 483-489,
1990.
29. Wolfson, H. J. and Rigoutsos, I. Geometric hashing: An overview. IEEE Comput. Sci. Engrg.
4 (4): 10-21, 1997.
I. INTRODUCTION 289
II. BASIC CONCEPTS A N D BACKGROUND 291
A. Acronyms, Notation, and Basic Assumptions 291
B. Definitions and Problem Statement 293
III. CHARACTERIZATION A N D REPRESENTATION OF DATA
COMMUNICATION NETWORKS 294
A. Network Characterization 295
B. Network Representation 297
C. DESNET: An Example of a Design Tool 300
IV. INTELLIGENT A N D HYBRID APPROACHES 305
A. Basic Principle of the Al-Based Approach 305
B. Basic Concepts and Background 306
C. The Inductive Learning Module 307
D. Numerical Applications with SIDRO 310
V. HEURISTIC APPROACHES 312
A. Conventional Heuristics and Meta-heuristics 312
B. Implementation of the Tabu Search Approach 315
C. Numerical Applications with Heuristic Methods 317
REFERENCES 325
I. INTRODUCTION
A typical data communication network essentially consists of a set of nodes representing workstations, switches, routers, and so on, linked to each other by
means of communication links. As shown in Fig. 1, such a network is generally
^ o#*
2 8 9
290
SAMUEL PIERRE
local access
network
local
access
network
backbone
network
host
^^
0
switching node
terminal
FIGURE I
Network hierarchy.
considered as a hierarchical structure integrating two levels: the backbone network at the first level, and the local access networks at the second level. The
backbone network is dedicated to the delivery of information from source to
destination. The local access networks are typically centralized systems that
essentially allow users to access hosts or local servers. In this chapter, the focus
is on the backbone network design considered as a distributed network.
The topological design of data communication networks consists essentially
of finding a network topology that satisfies at the lowest possible cost some
constraints related to quality of service and reliability [6,9,12,19,28,29,46],
Traditionally formulated as an integer programming problem and for various
reasons discussed and reported elsewhere [3,9-12,39,45], it is considered to be
a very difficult optimization problem [20]. In fact, if n indicates the number
of nodes, the maximum number of links is given by n(n l ) / 2 , and therefore the maximum number of topological configurations of n nodes is l"^'^"^)/-^.
For instance, it n= 11, the number of topological configurations that can be
exhaustively explored is 3.603 x 10^^; at the generation speed of 10^ configurations per second, the overall CPU time required for such an exploration is
1,142.46 years. Even by taking into account only the configuration aspect of
this problem, the risk of combinatorial explosion is already obvious.
In order to facilitate its resolution, the topological design problem is usually divided into three subproblems: (1) topological configuration taking into
account reliabiUty aspects, (2) routing or flow assignment, and (3) capacity
assignment [43]. Nevertheless, this division does not enable one to solve the
overall problem in a reasonable CPU time. As a result, this problem is realistically solved by means of heuristic methods that attempt to reduce the search
space of candidate topologies, even if that possibly leads to suboptimal solutions. Clearly, determining the optimal size of such a search space would be a
compromise between the exploration time and the quality of the solution. This
observation led many researchers to develop heuristic approaches leading to
"good" solutions, instead of optimal solutions [35,37,40,45].
The range and the nature of the heuristics vary widely [17]. Some of them
are inspired by formal optimization approaches [10,11]. Linear programming
29 I
Artificial intelligence
Branch X-change
Concave branch elimination
Cut saturation
Flow deviation
Genetic algorithm
Mesh network topology optimization and routing
292
SAMUEL PIERRE
SA
TS
WAN
MAN
LAN
Simulated annealing
Tabu search
Wide-area network
Metropolitan-area network
Local-area network
Notation
G
N
A
n
m
K
Ck
fk
Uk
Lk
-^max
d{i)
dc
C = (Q)
f = {fk)
Yij
^aver X/^^^ax
r
T
-'max
dk{Ck)
Variable unit cost
Fixed cost
CMD
A graph
Set of nodes of a graph
Set of edges of a graph
Number of nodes of a network
Number of links of a network
Maximum number of links
Diameter of the network, that is, the length of the
longest of the shortest paths over all node pairs in
the network
Connectivity degree of a network
Capacity of link k, that is, the maximum data rate
in bits per second (bps) carried by this link
Flow of link k, that is, the effective data rate in bps
on this link
Utilization of link k, that is, the ratio fk/Ck
Length of the link k
Maximum Euclidean distance between any node pair
of the network
Incidence degree of a node i, that is, the number of
links connected to it
Degree of a graph G, that is, the degree of the node
having the smallest degree among all the nodes
link capacity vector
Link flow vector
Traffic (number of packets per second exchanged)
between nodes / and /
Total traffic in a network
Average traffic between all the node pairs
Index traffic of link (i, j)
Traffic matrix
Average packet delay in a network
Maximum acceptable delay
Normalized average delay
Cost capacity function of link k
Cost in $/month/km for a given link capacity
Cost in $/month for a given link capacity
The most economical path between two nodes / and /
Assumptions
Poisson distribution of the traffic between node pairs
Exponential distribution of packet size with a mean of l//x bits/packet
293
y k^iCk-
fk
The average delay T given by (1) must be less than or equal to the maximum
acceptable delay T^axEquation (1) does not take into account the propagation delay and the
nodal processing time. These factors play a more important role in high-speed
networks where it is unrealistic to neglect them. Furthermore, the validity of
the previous assumptions (Section A) has been tested by simulation studies on
a variety of applications [12]. Results confirm the robustness of the model.
Thus, the average packet delay obtained under these assumptions is realistic
for medium-speed packet-switched networks. However, such assumptions can
be unrealistic if one is interested in estimating the delay of a particular packet
or the delay distribution rather than just the average value.
The cost dk of the link k is generally a function of the link length and
capacity. Therefore, the total link cost D is given by
m
D=J2dk(Ck).
(2)
k=i
The techniques used to solve the problem of the capacity assignment essentially depends on the nature of the cost capacity functions dk{Ck)^ which can be
linear, concave, or discrete [12]. In practice, the cost of the link k includes two
components: a fixed part and a variable part which depends on the physical
length Lk of this link:
dk(Ck) = (Variable unit cost))^L)^ + (Fixed cost)^.
(3)
Variable unit cost represents the price structure for leased communications
links; the fixed cost refers to a constant cost associated with a given link and
represents the cost of a modem, interface, or other piece of equipment used to
connect this link to its end nodes.
The flow assignment consists of determining the average number of packets
Xk on each link k [43]. For this purpose, a routing policy that determines the
route to be taken by packets between each source-destination pair is needed.
Routing strategies are generally classified into fixed, adaptative, and optimal
routing. In this chapter, for sake of simplicity, we adopt the fixed routing based
on the Euclidean distance between nodes.
294
SAMUEL PIERRE
Given:
- the number of nodes n and their location (Xj.Y,), with i=l, 2,.... n
- the traffic requirements T = (y..), with i, j=l, 2,..., n and i^tj
- the capacity options and their costs
- the maximum acceptable delay T
in (ms)
Design variables:
topological configuration
routing strategy
capacity assignment.
^
FIGURE 2
295
could these models be used to improve the network performance? What degree
of manoeuvrability must these models have? This section identifies the main
characteristics of a typical data communication network, then analyzes some
issues related to its representation, and finally presents some existing network
representation or design tools.
A. Network Characterization
A data communication network is characterized by different structures, types,
topologies, components, performance indexes, and management strategies. Regarding structures, one can distinguish between the distributed and centralized
networks. In centralized networks, terminals or workstations are related to a
single data source called a server through a variety of communication links.
Conversely, distributed networks are characterized by the multiplicity of routes
that link each source to a given destination.
Generally, three types of network should be distinguished: local-area network, metropolitan-area network, and wide-area network. A LAN is a local
communication system connecting several computers, servers, and other network components; it makes possible high-speed data transfer (1 to 100 Mbps)
through short distances, in small areas such as organizations, campuses, and
firms. A MAN essentially serves a large city; it also regroups several computers
with more or less reduced data throughput and makes it possible to link several
enterprises in one city. WANs are used to connect several cities located at some
tens of kilometers apart. Usually, the data throughput of a WAN is less than
100 Mbps.
The topology of a network informs on its configuration, that is, the way in
which its nodes are linked to each other; it also indicates the capacity of each
link in the network. If communication is established between two nodes through
a direct link, one can speak of a point-to-point link; every network consisting
of a point-to-point link is called a point-to-point network. Conversely, with a
multipoint link, communication is rather broadcasted from one node to several
nodes; every network consisting of multipoint links is a multipoint network or
general broadcasting network.
Point-to-point networks can have a star, tree, ring, or mesh topology. In a
star topology, all nodes are related by a point-to-point link to a common central
node called the star center. All communications placed in this type of network
must go through this node. In a tree topology, the network has a directory
structure that is hierarchically structured. The principal node of this topology
through which all applications pass is called a tree root. In this structure, the
common link takes the form of a cable (with several branches) to which one or
several stations are attached. In a ring topology, all the nodes are related to form
a closed ring, which, in its turn, takes a point-to-point form. A mesh topology
is formed by a number of links such that each node pair of the network is linked
by more than one path. Generally, a mesh topology is used with WAN.
In multipoint networks, there are two types of topologies: the bus topologies and the ring topologies. In a bus topology, each network node is set linearly
on a cable which constitutes a common physical link. The information is transmitted by any node through the entire bus in order to reach the other nodes of
the network. In a ring configuration, all the nodes are set on a closed circuit
296
SAMUEL PIERRE
formed by a series of point-to-point links. These nodes form a ring; the information within the ring is transmitted in one sense.
The main components of a network are nodes and Hnks. The nodes represent more or less advanced units, which could be terminals, servers, computers,
multiplexers, concentrators, switches, bridges, routers, or repeaters. Each of
these units has its own attributes. In the framework of topological design, the
most relevant factors are cost, capacity, availability, compatibility, and reliability [22].
The cost of a node includes purchasing and maintenance, as well as the cost
of related software. The capacity of a node refers to the speed of its processor,
the size of both programs and available memories. Availability is defined by
the percentage of time during which a node is usable, or else, as the probability
that a node could be available in a given instant. The compatibility of a node
can be defined as the concordance between the types of traffic which the node
manages and the types of links to which that node could be attached.
The reliability of a node is considered as the probability that it correctly
functions under given conditions. This means that the node is not a subject of
repair nor of any intervention other than that predicted in the technical manuals.
A link or a transmission support refers to a set of physical means put in place
in order to propagate electromagnetic signals that correspond to messages exchanged between an emitter and a receiver. There exist several types of transmission support. Each type has its distinct physical features, the way it carries data,
and its realm of use. Most known and used transmission supports are twisted
pairs, coaxial cables, electromagnetic waves, fiber-optic, and satellite links.
Like the nodes, each type of link has its own characteristics, which are attributes that could be taken into account during the topological design process.
The most important attributes are the following: length, capacity, cost, flow, delay, and utilization. These attributes have been defined in Section A (Notation)
of Section II.
The most usual indexes for measuring the performance of a network are
the following: response time, data throughput, stability, easiness of extension,
information security, reliability, availability, and cost. Response time can be
defined as the time delay between the emission of a message by a node and the
receipt of an answer. An efficient data throughput is the useful average quality
of information processed by a unit of time and evaluated under given conditions
and time period.
The stability of a network refers to its capacity to absorb a realistic traffic
pattern. It can be defined as the number of tasks that the system can perform
and the time needed to process each of these tasks in different instants, while the
network is functioning. For a given network, the easiness of extension represents
the capacity of this network to accept new users, or its capacity to be modified
without structural changes. Information security includes both the information
protection that restricts users' access to a part of the system, and the notion of
information integrity, which depends on the capacity of the system to alter or
lose information.
The reliability of a network can be defined as its ability to continue functioning when some of its nodes or links fail. It is often linked to the notion
of network connectivity [31,34]. The availability of a network refers to the
297
probability that this network is usable during a period of time, given redundancies, breakdown detection, and repair procedures, as well as reconfiguration
mechanisms.
Networks could assume several functions which could differ by their importance and use. Among these functions, one can mention routing, flow control,
congestion control, configuration management, performance management, error and breakdown handling, security, and accounting information management. Figure 3 synthesizes the characteristics of data communication networks.
B. Network Representation
The purpose of the network representation is to find the most efficient way to
model networks in order to make easier their design and analysis. To achieve
this objective, the representation must be based on the use of data structures
being sufficiently efficient (i.e., neither congested or complex) for minimizing the
execution times or optimizing the data storage process. A good representation
facilitates the implementation of both analysis and design algorithms [22],
There exist two types of representation: external and internal. The external
representation of a network deals with aspects related to data display. In a
design and analysis tool, external representation plays the important role of
taking into account the data supplied by the user, controlling their integrity
and organization, and storing in a manner that facilitates their use and update.
At the level of this representation, the user specifies the number of nodes, the
location of each node, the choice of links, and related characteristics as well as
the traffic requirements. Internal representation has the purpose of facilitating
the implementation of design and analysis algorithms. It also makes the latter
efficient in terms of memory consumption and execution times.
Ag Rhissa et aL [1] describe two types of network representation: the inheritance tree and the content tree. The inheritance tree defines the hierarchical
organization of the network's components and the administrative element included in these components. Such a representation can be transformed into an
object structure whose route is a superclass that contains information regarding
all the objects.
The content or instance tree defines the physical or logical relationships
between the different elements of the network; it allows us to identify, in an
unequivocal manner, an element from its positioning in the tree. This tree can
also be transformed into an object structure and makes it possible to model
the relationships between different objects in the form of an information management tree. The content tree essentially supplies information regarding the
management of configurations [42].
Users still suffer from difficulties while they manipulate the design and
representation models available in the marketplace. Their work is often constrained by the lack of flexibility of such models. Hence, it is necessary to
provide for a generic representation model. By generic model, we refer to the
capacity of representing a variety of types of networks and aspects related to
their performance and management. Therefore, the following questions can be
raised: What are the functions a representation model could offer to the user?
What data structure could one use in order to be able to take into account all
J
00
Computer Networks
structure
Type
Topology
Component
Management
Performance
Centralised
Link
Flexibility
Accountability
j Response time
Security
management
I
IVIetroplitan
Area
Type
Type
Stability
Reliability
\ Compatibility
Twisted pair
II
Coaxial
Cable
Bridge
Point to point
Gateway
Loop
Capacity
Availability
\ Capacity
Utilization
rate
Reliability
Repeater
Multipoint
Ring
Multiplexer
Concentrator
Integrated
FIGURE 3
Performance
management
Reliability
Delay
Length
Optical
Fiber
Failure
monitoring
Availability
Configuration
|I management
Flow
Security
Delay
[I
Backup
Flow control
Routing
299
the aspects of a data communication network under design? How does one
implement and validate such a model by taking into account the diversity of
topologies, structures, and management mechanisms associated with the concept of a network?
The representation of networks is considered an important aspect of topological design. Most models and tools were rather devoted to the aspects of
routing, evaluation, and optimization of performance. In these models, a network representation is often limited to capturing incomplete data and displaying the network in a graphical mode. Tools and models presented in this section
are not contradictory to these observations.
Dutta and Mitra [9] have designed a hybrid tool that integrates formal models for specifying problems (Lagrangian formulation), and heuristic models for
evaluating and optimizing network performance. This leads to the most needed
flexibility in both the processing of data and the application of constraints. For
this purpose, this tool first divides the network into a core part (Backbone)
and several other access networks. After this procedure, the designer limits
their work to the topological design of the backbone only. They consider as
data the following items: the location of the nodes, traffic between each node
pair, allowed maximum delay, reliability, and cost requirements. From this information, the tool decides the topological configuration, routing, flow, and
capacity assignment. All this is done by taking into account the constraints of
the problem.
In this approach, the network representation has been practically neglected
for the benefit of other topological design aspects. This is explained by the fact
that the objective of the designers was essentially the optimization of network
performance. Therefore, the resulting tool lacks both universality and interactive capabilities.
AUTONET [2] is a tool for analyzing the performance of a WAN that interconnects local networks. It is among the few tools that pay special attention
to network representation. In fact, AUTONET provides the users with sophisticated means to design networks and perform modifications and adjustments
required at any moment. It also offers a user-friendly interface suitable for various levels of users. One of its inconveniences remains the fact that it is dedicated
exclusively to the modeling and representation of WAN, without taking into account other types of networks. Thus, the local networks, which could be linked
to WAN, are considered as forming a single entity, and therefore their specific
characteristics are neglected. The other inconvenience is that the capture of
data is essentially oriented in performance evaluation, which is not necessarily
compatible with the requirements of network representation. As a result, these
data are incomplete and insufficient for adequately representing networks.
COMNET III [7] is the latest version of a series of tools produced by the
CACI Products Company. This tool simulates and analyzes the performance
of telecommunication networks, and offers the advantage of modeling and
representing all types of network. COMNET III also integrates sophisticated
graphical display options, allowing users to represent networks with a greater
flexibility. It is available to users of all levels, from beginners to experts. In
terms of universality, COMNET III is certainly better than the other tools
previously mentioned. However, like AUTONET, it has some inconveniences.
300
SAMUEL PIERRE
DESNET provides the user with a large number of functions. Its flexibility
and simplicity facilitate adding several functions without affecting the robustness and reliability of the system. For organization reasons, we have decided
to regroup the functions offered by DESNET in three classes according to their
type, extent, and domains of application. We have defined a class for the definition of networks, another to represent them, and another for their handling.
Network
Definition
This class of functions regroups all the applications that act as a single
entity. The functions allow the user to select the network to be processed and
permit the introduction of a new network in the system, or the deletion of an
existing network.
Network Representation
The class of functions that describe network representation has the role of
representing the network and its components in several manners, according to
the user's needs. The user has the choice among a graphical representation, a
matrix, or a list.
Network
Management
301
FIGURE 4
Network manipulation.
captured: this is the case of those related to network, nodes, Unks, and traffic.
There are also certain other functions that are accessible once the routing is
calculated: this is the case of flow and capacity assignment, and calculation of
network performance.
2. Example of DESNET's Working
NODE
NYK
LSA
CHI
DAL
BAL
SFO
MIA
DEN
Abscissa
Ordinate
92
12
12
98
64
26
56
90
86
15
2
90
99
89
36
61
302
SAMUEL PIERRE
FIGURE 5
Figure 7 represents the screen that captures a Unk. One should note on
this screen that the domain Cost is not activated, because the cost of a Unk
is calculated automatically and should be captured by the user. Similarly, for
the nodes, DESNET does allow the user to capture the same link twice. After
the capture of nodes and links, the user has the following choices: to capture the traffic, to confirm the capture, or to quit the system. For this example,
we have chosen the capture of the traffic matrix. We have supposed that the
traffic matrix is uniform with a constant value of 5 packets/s; the average size of
packets is equal to 1000 bits. Figure 8 represents the capture of traffic between
the nodes NYK and BAL.
Having confirmed the capture of these data, we can then access the data
processing part. Figure 9 shows the graphical representation of this network.
The numbers that appear beside the links represent their length in kilometers.
FIGURE 6
FIGURE 7
303
Furthermore, we have added to this network two new hnks called SEA and
CANADA5. The node SEA has 5 and 20 as coordinates, while the coordinates
of CANADAS are 45 and 15. During the addition of one or several nodes,
DESNET allows the user to choose the network's new name and to determine
exactly the number of nodes needing to be added. The addition of a node is
realized exactly in the same manner to that of the capture a node.
To link two new nodes added to an existing network, we have decided to
add four links, which are in this case the links between SEA and SFO, SEA and
CHI, SEA and CANADA5, and CANADA5 and NYK. As shown in Fig. 10, the
addition of a link is considered a data capture. In effect, all control operations
regarding the network's connectivity and integrity of data used by DESNET
during the capture of links are also applicable upon the addition. The user has
FIGURE 8
304
SAMUEL PIERRE
FIGURE 9
the right to add as many Hnks as she/he likes, provided that the maximum
number of Unks mmax is not exceeded. For the current example, there are a
maximum of 45 links.
After adding the nodes, it is convenient to add the traffic between these
new nodes and the remaining network nodes. In order to uniformly maintain
the traffic matrix, the traffic between each new node pair is maintained at
5 packets/s; the average size of packets is still 1000 bits. Figure 11 represents
the network obtained after the addition of SEA and CANADA5 nodes. The
new network contains 10 nodes and 16 links.
F I G U R E 10
FIGURE I I
305
306
SAMUEL PIERRE
examples (e'^) are feasible topologies in the sense they satisfy all the specified
constraints, whereas negative examples (e~) refer to topologies where only the
delay constraint is not satisfied. The solution of the current problem is then the
least-cost positive example, which has been generated up to the last perturbation cycle. In this way, perturbation cycles that aim at improving the current
solutions can be viewed as a method for generalizing a local search process. After a certain number of perturbation cycles, feasible solutions or improvements
to current solutions cannot be obtained by starting new cycles. As a result, refining current perturbation rules and discovering new ones from examples that
have been previously generated constitute the only ways for improving solutions. This can be done by means of machine learning mechanisms [4,5,8,27].
The integration of an inductive learning module into the basic knowledge-based
system has been considered as an extension of the example generator. In this
chapter, we are interested in the specific problem of inferring new design rules
that could reduce the cost of the network, as well as the message delay below
some acceptable threshold.
B. Basic Concepts and Background
A network topology can be characterized by three types of descriptors: local,
global, and structural [31]. These descriptors will be used either to state the
topological design problem or to describe the machine learning module integrated into SIDRO.
1. Local Descriptors
A local descriptor essentially characterizes one node or one link of a network. This category of descriptors includes the length of a link, the incidence
degree of a node, the traffic index of a link, the flow of a link, the capacity of a
link, the utilization ratio of a link, and the excess cost index of a link. Except
for the last one, all these concepts have already been defined in Section A of
Section II.
The excess cost of a link k can be defined as Ek = dk(Ck fk)/Ck. This
relation can be rewritten as Ek = dk(l Uk). The average excess cost of a
network can be defined as = )w(l U), where Dm is the average cost of
the links, and U a mean utilization ratio. The excess cost index Pk can now be
defined, for each link ^ of a given network topology, as
Pk^Ek/E
[dk(l-Uk)]/[D{l-U)].
Clearly, if the excess cost index Pk of a link is greater than 1, the excess cost of
that link is greater than the average excess cost of the network.
2. Some Global and Structural Descriptors
Global descriptors are features whose scope covers the whole network. For
instance, the following parameters are global descriptors: number of nodes,
number of links, total cost of the links, connectivity degree, normalized average delay, diameter, incidence degree of the network, and average traffic of
the network.
Structural (or semantic) descriptors essentially consist of classic concepts
derived from graph theory or conventional network design techniques [12,13].
307
(1) Pi(xi)
(2) P2(X2)
(3) P3te)
(4) P4(X4)
(5) Psixs)
Then Qi(hij, O) can lead to a cost reduction (CV: a).
308
SAMUEL PIERRE
Format 2.
(1) Pi(xi)
(2) PAxi)
(3) Pste)
(4) P4(X4)
(5) Psixs)
Then Qi(hij, O) can lead to the delay constraint
restoration (CV: a).
Pi(xi) denotes a premise built with some descriptors, and Qiihij, O) = pk 2i
perturbation operator built with a hypothesis ^/y, where / = 1 for a link addition, / = 2 for a link deletion, and / = 3 for a link substitution operator. The
index / specifies a hypothesis, that is, a basic criterion or a combination of basic
criteria that can be used for selecting potential links to be perturbed. Clearly,
because of the great number of possible ways to perturb a network topology,
it is unrealistic to adopt an exhaustive approach.
Once a perturbation operator Q/(/?//, O) is applied to an example in a learning process, the objective indicator O takes the value 1 if the aimed objective is
reached, and 0 otherwise. This operation is successively applied to a great deal
of examples. The likelihood factor CV is then computed as the success rate of
applying this perturbation operator to all the considered examples. For a CV
greater than or equal to 0.85, a new rule is inferred and built with the specific
operator Q/(^/;, O).
For the same / in hij^ it can happen that more than one hypothesis leads to
the searched objective: cost reduction or delay constraint restoration. As a result, a hypothesis preference criterion is needed for selecting only one hypothesis
from the appropriate space indexed by /. In the context of topological design,
the hypothesis preference criterion (HPC) chosen is the best cost-performance
ratio; it implies choosing the hypothesis leading to the greatest total link cost
reduction, while restoring or maintaining the delay constraint. According to
this HPC, with a positive example as seed (cost reduction objective), the only
hypothesis hij that can be selected is the one that enables the best cost reduction and preserves delay constraint. Conversely, with a negative example as
seed (delay constraint restoration objective), HPC allows only the hypothesis
enabling the restoration of the delay constraint and the highest cost reduction.
2. Notation Used for Describing the Learning Algorithm
Here is the list of notations used to describe the learning algorithm, which
intrinsically attempts to infer new perturbation rules leading to cost reduction
or delay constraint restoration:
X
Xc
^g
Rb
Tt
309
X=XU
Xc = 0.
Step 1. Select at random in a seed ^g.
Step 2. From the number of nodes w, the degree of connectivity K, the
normalized average delay 7^, and the diameter R of the seed ^g, obtain the
premises Pi(xi), Piixi), Psixs), PAMStep 3. Determine (/, /), with / = 1,2, 3 and / = 1 , 2 , . . . , qi (qi = number
of hypotheses of type /) such that applying the operator Qi(hij^ O) to e^ results
in a cost reduction if e^ is a positive example, or in the restoration of the delay
constraint if e^ is a negative example.
Step 4. In the case of more than one acceptable operator Qi(hijy O),
turn to the hypothesis preference criterion (the best cost-performance ratio)
to select the best conclusion for a new rule. Then infer from this the value
of xsixs = i) specifying the premise Psixs), together with the tentative rule
n = (Pu Pi. Ps, PA. PS. U /, Qiihii, O), 1).
Step 5. Apply r^ to examples stored in X in order to vaUdate this tentative
rule, that is, to verify that its application effectively results in a cost reduction or
in the delay constraint restoration, depending on the features of the considered
seed ^g. Then compute the value of the likelihood factor CV as the ratio Ug/ut,
where v^ denotes the number of successful applications of Tt, and Ut the total
number of applications of this tentative rule to the suitable examples taken
from X.
Step 6. If rt covers at least 85% of examples considered in X (CV > 0.85),
then it becomes a valid discovered rule to be inserted in the rule base Rb, and
go to Step 8.
Step 7. Reduce the short-term example base to the only examples not
covered by the tentative rule ft and return to Step 1.
Step 8. Update the long-term example base by keeping only nonredundant
examples which constitute Xc ( X = Xc).
3. Illustrative Example
Once the seed e^ is selected at Step 1, a premise-building procedure is used at
Step 2 for extracting the number of nodes w, the connectivity degree K, the normalized average delay 7^, and the diameter R. For instance, for a seed e^ characterized by the values of parameters n = 20, iC = 5, TJi = 0.7, R = 0.95Lmax^
this procedure builds the following premises:
310
SAMUEL PIERRE
311
IF
THEN
(ni)
IF
THEN
(ri8)
IF
THEN
(ri9)
IF
THEN
(ns)
IF
THEN
to 10 packets/s, and the average packet length has been taken to be 1000 bits.
Table 3 gives the Cartesian coordinates of the 20 nodes. Table 4 shows the
costs associated with the link capacity options. The total link cost D of this
topology is equal to $197,776/month. Figure 13 shows the solution resulting
from the application of some addition and deletion rules of Table 2 during three
perturbation cycles. We observe a cost reduction of 9.2% per month, whereas
the delay constraint has been preserved.
312
SAMUEL PIERRE
12
Y. HEURISTIC APPROACHES
Various heuristic methods for solving the topological design problem, in the
context of packet-switched networks, have been proposed. These methods are
generally incremental in the sense that they start with an initial topology and
perturb it repeatedly until they produce suboptimal solutions.
A. Conventional Heuristics and Meta-heuristics
BXC starts from an arbitrary topological configuration generated by a user or
a design program to reach a local minimum by means of local transformations.
TABLE 3
(X Yi)
x,
Yi
Xi
Yi
1
3
5
7
9
11
13
15
17
19
250
600
90
400
530
100
550
205
595
415
360
100
130
325
250
300
360
85
310
80
2
4
6
8
10
12
14
16
18
20
165
480
150
435
275
350
320
365
15
50
420
55
250
185
165
420
240
40
260
25
313
Variable cost
(($/month)/Km)
Fixed cost
($/month)
9.6
19.2
56.0
100.0
200.0
560.0
3.0
5.0
10.0
15.0
25.0
90.0
10.0
12.0
15.0
20.0
25.0
60.0
3 I4
SAMUEL PIERRE
CS method has been proposed [13]. This method is iterative and consists of
three main steps:
1. Find the saturated cut, that is, assess the minimal set of the most
utihzed hnks that, if removed, leaves the network disconnected.
2. Add new links across the cut in order to connect the two components.
3. Allow the removal of the least utilized links.
CS can be considered as an extension of BXC, in the sense that, rather than
exhaustively performing all possible branch exchanges, it selects only those
exchanges that are likely to improve throughput and cost.
MENTOR has been proposed by Kershenbaum et al. [21 ]. It essentially tries
to find a distributed network with all the following characteristics: (i) traffic
requirements are routed on relatively direct paths; (ii) links have a reasonable
utilization; (iii) relatively high-capacity links are used, thereby allowing us to
benefit from the economy of scale generally present in the relationship between
capacity and cost. These three objectives are, to some extent, contradictory.
Nevertheless, the MENTOR algorithm trades them off against one another to
create low-cost networks.
Another solving approach consists of using combinatorial optimization
meta-heuristic methods, such as SA and GA. The SA method starts by choosing
an arbitrary initial solution, then searches, in the set of neighbor solutions, a
new solution that, hopefully, improves the cost.
SA is a process whereby candidate solutions to a problem are repeatedly
evaluated according to some objective function and incrementally changed to
achieve better solutions [23,24,44]. The nature of each individual change is
probabilistic in the sense that there is some probability it worsens the solution.
In addition, an annealing schedule is followed whereby the probability of allowing a change that worsens the solution is gradually reduced to 0 [38]. If a
better solution is found, then it becomes the current solution; if not, the method
stops and at this step a local optima is reached [24].
This method has been applied to many combinatorial optimization problems [6]. Pierre etal. [35] adapted this method to solve the problem of topological design of packet-switched networks. This adaption consists of starting with
an initial topology that satisfies the reliability constraint, then applying the SA
algorithm with an initial high value of the temperature parameter, in order to
obtain a new configuration that minimizes the total link cost or improves the
mean delay.
The Genetic Algorithms (GA) have been introduced by Holland [17]. They
are inspired by the Darwin's Model and based on the survival of the fittest
species. Just as in nature where specimens reproduce themselves, in genetic
algorithms specimens also reproduce themselves. GAs are essentially characterized by the coding of the problem parameters, the solution space, the evaluation
function, and the way of choosing chromosomes to be perturbed. In practice,
from a generation to another, chromosomes that form the population have
a very high aptitude value. GAs start generally with a population generated
randomly. To undertake an efficient search of performing structures, genetic
operators are applied to this initial population in order to produce, within a
time limit, successive high-quality populations. We distinguish generally four
main genetic operators: reproduction, crossover, mutation, and inversion [17].
3 I 5
Pierre and Legault [32,33] adapted GAs for configuring economical packetswitched computer networks that satisfy some constraints related to quality of
service. The adaptation results show that the GA method can produce good
solutions by being applied to networks of 15 nodes and more.
An hybrid method has been introduced by Dutta and Mitra [9]. This
method integrates both the algorithmic approach and a knowledge-based system. It builds a solution by subdividing the topological design problem into
modules that can be individually solved by applying optimization models or
heuristic methods. Furthermore, it integrates the partial solutions to obtain a
global solution to the design problem. The system is made up of independent
modules that share a common blackboard structure. It usually provides good
solutions with minimal costs.
B. Implementation of the Tabu Search Approach
TS is an iterative improvement procedure that starts from an initial feasible solution and attempts to determine a better solution in the manner of an ordinary
(descent) local method, until a local optimum is reached [14,15]. This method
can be used to guide any process that employs a set of moves for transforming
one solution into another; thus it provides an evaluation function for measuring
the attractiveness of these moves [14,16].
I. Basic Principles
For a given combinatorial optimization problem, a solution space S may
be defined as the set of all feasible solutions. TS associates with each feasible
solution a numerical value that may be considered as the cost of the solution obtained by optimizing the cost function. For each solution Si in the space solution
S, there is a subset of S, say N(si), considered as a neigborhood of s/. This subset
contains a set of feasible solutions that may be reached from si in one move. TS
starts from an initial feasible solution s/, and then moves to a solution sy, which
is also an element of N(si), This process is repeated iteratively, and solutions
that yield cost values lower than those previously encountered are recorded.
The final cost recorded, when the search is interrupted, constitutes the overall
optimum solution. Thus, TS can be viewed as a variable neighborhood method:
each step redefines the neighborhood for which the next solution will be drawn.
A move from Sj to sy is made on the basis that sy (sj ^ Sj) has the minimum
cost among all the allowable solution in N(si), Allowability is managed by
a mechanism that involves historical information about moves made while
the procedure progresses. TS is a high-level procedure that can be used for
solving optimization problems; it escapes the trap of local optimality by using
short-term memory recording of the most recently visited solutions [16]. The
short-term memory constitutes a form of aggressive exploration that seeks to
make the best move possible satisfying certain constraints. These constraints are
designed to prevent repetition of some moves considered as forbidden (tabu).
Such attributes are maintained in a tabu list (LT).
TS permits backtracking to previous solutions which may ultimately lead,
via a different direction, to better solutions. This flexibility element is implemented through a mechanism called aspiration criteria [16]. The goal of aspiration criteria is to increase the flexibility of the algorithm while preserving the
316
SAMUEL PIERRE
Given
Space offeasiSle soCutions
X
F
N(s)
9^fiBow-fiood ofs e X
ILTil
f*
nbmax
Initialization
Choose by any heuristic an initial solution 5 X
s* := s
nbiter := 0
{Iteration counter)
bestiter := 0
m) = + CO
DO
nbiter := nbiter +1
generate a sample V* c N(s) of neighbour solutions
if [one of tfie tabu conditions is violated, i.e. ti(s,m) e LTi] or [at (east one of
aspiration criteria conditions is verified, i.e. aj(s,m) < Aj(s,m) ] then
Choose the best s' e V* minimizing f on V* {by a Heuristic)
If
f(s')<f(s*)
Then
s*:=s'
bestiter := nbiter
Result s*
basic features that allow the algorithm to escape local optima and avoid cyclic
behavior. Figure 14 presents a general version of TS.
2. Definition of the Moves
Mf
Mf
Mf
3 I 7
the removal of links k = (/, /) such that L(k) > a*Lniax) with
0<a <l^d(i) and d(j) remaining at least equal to the desired
connectivity degree after the link removal;
the removal of links k = (/, /) such that the excess index cost Pk is
less than 1, d(i) and d(j) remaining at least equal to the desired
connectivity degree after the link removal;
the removal of links k = (/, /) such that the utilization rate Uk is
less than a, with 0< a < 1, d(i) and d(j) remaining at least equal
to the desired connectivity degree after the link removal.
The addition moves cannot violate the connectivity constraint, but generally
have negative effects on the cost. They are defined as follows:
M^
M^
o;*Linax5 with
0 < Of < 1, by two other links v = (/, p) and w = (jy q)^ with L(v) <
a\ L(k) and L(w) < ai L(k), 0 < ai < 1 and 0 < a i < 1;
M2 the substitution of all link k = (/, /') such that 14 > a, with 0 < or <
1, by two other links v = (/, p) and w= (j, q) with Iip> 1 and
The neighborhood of a current topology s is a finite setN(5) of topologies
that are said to be feasible. In our implementation, this neighborhood cannot
contain more than 6 topologies, each of which is obtained from the current
topology by applying one move. The length of the tabu list is fixed to 7. The
average packet delay is calculated using Eq. (1), and the total link cost is computed using Eqs. (2) and (3).
C. Numerical Applications with Heuristic Methods
In order to evaluate the effectiveness and efficiency of TS approach, we have
considered a set of 20 nodes defined by the Cartesian coordinates given in
Table 5. The node coordinates give an Euclidean representation of the network
and are used to determine the distance between each pair of nodes or the length
i^ll
1
63
8
2
22
71
3
41
45
4
33
10
5
32
84
6
40
78
7
52
52
8
80
33
9
19
35
10
15
81
Node
Abscissa
Ordinate
11
27
3
12
27
16
13
70
96
14
48
4
15
4
73
16
10
71
17
56
27
18
82
47
19
9
54
20
95
61
318
SAMUEL PIERRE
TABLE 6
Capacity
(Kbps)
Fixed cost
($/month)
Variable cost
($/month/Km)
9.60
19.20
50.00
100.00
230.40
460.80
921.60
1843.20
650.00
850.00
850.00
1700.00
2350.00
4700.00
9400.00
18800.00
0.40
2.50
7.50
10.00
30.00
60.00
120.00
240.00
Traffic Matrix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
4
7
8
6
9
5
5
9
3
9
3
6
4
3
4
1
7
7
10
4
0
9
3
2
8
3
1
2
9
8
10
6
10
3
6
10
2
6
3
7
9
0
7
10
10
6
5
5
10
6
8
8
9
1
2
4
4
10
9
4
8
3
7
0
9
6
5
4
7
3
7
6
8
5
9
4
2
10
3
5
5
6
2
10
9
0
8
1
3
8
6
6
7
6
3
3
1
6
5
4
2
6
9
8
10
6
8
0
10
9
6
7
1
5
1
5
9
6
10
5
4
2
7
5
3
6
5
1
10
0
9
7
4
7
6
2
8
5
3
8
8
3
4
5
1
5
4
3
9
9
0
9
4
4
4
1
5
10
9
7
7
10
3
9
2
5
7
8
6
7
9
0
6
7
1
4
5
10
6
6
8
8
3
10
3
9
10
3
6
7
4
4
6
0
10
2
6
5
7
2
4
8
7
8
11
9
8
6
7
6
1
7
4
7
10
0
5
4
6
10
9
8
5
6
2
12
3
10
8
6
7
5
6
4
1
2
5
0
3
6
5
6
6
3
7
7
13
6
6
8
8
6
1
2
1
4
6
4
3
0
3
9
3
4
7
1
9
14
4
10
9
5
3
5
8
5
5
5
6
6
3
0
3
4
3
4
2
4
15
3
3
1
9
3
9
5
10
10
7
10
5
9
3
0
8
5
1
9
9
16
4
6
2
4
1
6
3
9
6
2
9
6
3
4
8
0
2
1
4
7
17
1
10
4
2
6
10
8
7
6
4
8
6
4
3
5
2
0
10
10
9
18
7
2
4
10
5
5
8
7
8
8
5
3
7
4
1
1
10
0
10
6
19
7
6
10
3
4
4
3
10
8
7
6
7
1
2
9
4
10
10
0
8
20
10
3
9
5
2
2
4
3
3
8
2
7
9
4
9
7
9
6
8
0
F I G U R E 15
319
initial topology, the application of the third removal move links Mf during the
first iteration cycle has generated, by deletion of links 1~14, 2-5, and 10-15,
the topology specified by Fig. 16 and Table 9. The cost of this topology was
checked against the cost of each topology generated from the initial topology
during the current iteration cycle; this cost is found to be the lowest. Thus, it
becomes the solution of the first perturbation cycle.
Therefore, iteration cycle 1 did not lead to a solution less expensive than
the current initial topology. By contrast, we observe a diminution of the average delay T. In fact, this one has changed from 81.84 to 73.78 ms, that is,
a diminution of 9.48%. For iteration cycle 2, this solution is kept as a new
starting topology to which different moves are applied. The link removal move
Mf led to a new topology with a total link cost D = $lll,420.70/month and
an average delay T = 95 ms.
In iteration cycle 3, the application of the substitution move M^ to the starting topology shown in Fig. 15 led to the new topology specified by Fig. 17 and
Table 10. The improvement percentage in terms of cost is 17.6%. By contrast,
the average delay is changed from 81.84 to 133.68 ms, with an increase of
51.84 ms. This delay did not pass the maximum acceptable delay T^ax, which
is fixed in this application at 250 ms. Since we are looking for a solution that
minimizes the total link cost, taking into consideration some performance constraints, this solution therefore becomes the best solution found during the
overall execution of this method.
320
SAMUEL PIERRE
TABLE 8
Link number
3
7
13
16
22
27
32
33
40
41
43
51
61
62
64
71
75
78
86
92
110
121
122
124
127
134
140
141
146
148
169
172
176
179
183
189
Link
1-4
1-8
1-14
1-17
2-5
2-10
2-15
2-16
3-6
3-7
3-9
3-17
4-11
4-12
4-14
5-6
5-10
5-13
6-7
6-13
7-18
8-17
8-18
8-20
9-12
9-19
10-15
10-16
11-12
11-14
13-20
14-17
15-16
15-19
16-19
18-20
134
170
74
70
216
18
6
210
302
174
388
348
14
302
96
296
192
120
134
78
204
138
144
94
500
550
86
70
158
68
108
94
16
130
322
74
230.40
230.40
100.00
100.00
230.40
19.20
9.60
230.40
460.80
230.40
460.80
460.80
19.20
460.80
100.00
460.80
230.40
230.40
230.40
100.00
230.40
230.40
230.40
100.00
921,60
921.60
100.00
100.00
230.40
100.00
230.40
100.00
19.20
230.40
460.80
100.00
0.58
0.73
0.74
0.70
0.93
0.93
0.62
0.91
0.65
0.75
0.84
0.75
0.72
0.65
0.96
0.64
0.83
0.52
0.58
0.78
0.88
0.59
0.62
0.94
0.54
0.59
0.86
0.70
0.68
0.68
0.46
0.94
0.83
0.28
0.69
0.74
Our results in terms of cost and mean delay are compared with results
provided by other heuristics. First, TS is compared with CS. For both methods,
the experience was based on the following choices: uniform traffic of 5 packets
per second between each pair of nodes, 2-connected topologies (the CS method
handles only 2-connected topologies), and the average length of packets being 1000 bits. We have used the capacity options and costs of Table 5. The
Cartesian coordinates of the nodes are given in Table 11. Table 12 shows the
comparative results provided by both methods. The last column of this table
provides the improvement rate obtained by TS versus CS. In all cases, TS offers
better solutions in terms of cost than CS. However, delays provided by CS are
generally better.
F I G U R E 16
T = 73.78 ms).
321
$ 102,419.91/month,
322
SAMUEL PIERRE
^ ^ 1
3
7
16
27
32
33
40
41
43
51
61
62
64
71
75
78
86
92
110
121
122
124
127
134
141
146
148
169
172
176
179
183
189
Link
1-4
1-8
1-17
2-10
2-15
2-16
3-6
3-7
3-9
3-17
4-11
4-12
4-14
5-6
5-10
5-13
6-7
6-13
7-18
8-17
8-18
8-20
9-12
9-19
10-16
11-12
11-14
13-20
14-17
15-16
15-19
16-19
18-20
Flow (Kbps)
Capacity (Kbps)
Utilization
182
144
70
106
6
98
294
194
388
340
54
310
96
288
400
120
134
78
204
164
144
94
508
558
276
158
28
108
128
102
130
330
74
230.40
230.40
100.00
230.40
9.60
100.00
460.80
230.40
460.80
460.80
100.00
460.80
100.00
460.80
460.80
230.40
230.40
100.00
230.40
230.40
230.40
100.00
921.60
921.60
460.80
230.40
50.00
230.40
230.40
230.40
230.40
460.80
100.00
0.78
0.62
0.70
0.46
0.62
0.98
0.63
0.84
0.84
0.73
0.54
0.67
0.96
0.45
0.86
0.52
0.58
0.78
0.88
0.71
0.62
0.94
0.55
0.60
0.59
0.68
0.56
0.46
0.55
0.44
0.56
0.71
0.74
(13 out of 14), costs provided by TS are better than those obtained by GA.
However, delays obtained by TS are relatively less than those obtained by GA.
In general, TS provides better solutions in terms of cost than GA.
We have also compared our results to those provided by SA. Our experience
used the same data previously mentioned. Table 14 gives a summary of the
results obtained by the two methods. These results confirm once again that TS
offers better solutions than SA in terms of cost.
Link
Flow (Kbps)
Capacity (Kbps)
Utilization
6
7
13
16
22
27
30
33
39
40
41
42
43
45
49
50
51
53
62
64
67
71
78
86
95
96
97
99
105
118
122
124
127
130
138
141
148
153
158
159
176
183
189
1-7
1-8
1-14
1-17
2-5
2-10
2-13
2-16
3-5
3-6
3-7
3-8
3-9
3-11
3-15
3-16
3-17
3-19
4-12
4-14
4-17
5-6
5-13
6-7
6-16
6-17
6-18
6-20
7-13
8-14
8-18
8-20
9-12
9-15
10-13
10-16
11-14
11-19
12-16
12-17
15-16
16-19
18-20
44
96
144
196
112
76
38
184
62
44
214
206
186
48
48
136
220
116
212
220
74
200
46
28
90
108
84
96
86
114
178
112
150
82
12
134
110
90
114
12
164
180
12
50.00
100.00
230.40
230.40
230.40
100.00
50.00
230.40
100.00
50.00
230.40
230.40
230.40
50.00
50.00
230.40
230.40
230.40
230.40
230.40
100.00
230.40
50.00
50.00
100.00
230.40
100.00
100.00
100.00
230.40
230.40
230.40
230.40
100.00
19.20
230.40
230.40
100.00
230.40
19.20
230.40
230.40
19.20
0.88
0.96
0.62
0.85
0.48
0.76
0.76
0.79
0.62
0.88
0.92
0.89
0.80
0.96
0.96
0.59
0.95
0.50
0.92
0.95
0.74
0.86
0.92
0.56
0.90
0.46
0.84
0.96
0.86
0.49
0.77
0.48
0.65
0.82
0.62
0.58
0.47
0.90
0.49
0.62
0.71
0.78
0.62
TABLE 11
Node Coordinates
Node
Abscissa
Ordinate
1
63
8
2
22
72
3
41
45
4
33
10
5
32
84
6
40
78
7
52
52
8
80
33
9
19
35
10
15
81
Node
Abscissa
Ordinate
11
27
3
12
27
16
13
70
96
14
48
4
15
4
73
16
10
71
17
56
27
18
82
47
19
9
54
20
95
61
Node
Abscissa
Ordinate
21
67
8
22
56
54
23
54
23
323
Cut Saturation
No. N
'max ($/month)
1
2
3
4
5
6
80
100
120
150
150
80
10
12
15
20
23
TABLE 13
19972
44262
56231
160984
381278
394455
Cap.
O
T
(Kbps) ($/month) (ms)
(ms)
50
100
100
32.73
37.28
109.29
17.69
7.16
10.80
230.40
460.80
460.80
D
No. N
'max ($/month)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
3
4
3
4
5
3
4
5
3
4
5
3
4
5
100
100
150
150
150
200
200
200
250
250
250
190
190
190
TABLE 14
10697
12938
27534
26860
30189
65142
63838
46774
113714
110815
116123
162627
163129
149522
(ms)
($/month)
(ms)
80.62
91.35
115.08
94.10
91.57
95.06
83.10
94.63
96.25
103.06
76.98
107.18
90.14
87.13
9867
11338
24322
22252
28407
60907
60013
41626
105495
103524
106122
148340
168118
139154
80.89
78.50
97.12
94.82
87.20
78.95
93.10
90.62
6
6
10
10
10
15
15
15
20
20
20
23
23
23
112
88.02
83.14
90.67
113
86.14
%T %D
15
16
5
17
15
16
8
13
12
18
6
7
6
12
8
7
9
9
-3
7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
58
43
23
60
63
45
TS
SA
No. N
69.56
77.48
141.86
103.73
58.12
6
6
10
10
10
15
15
15
20
20
20
23
23
23
80
8480
25272
43338
65161
144131
210070
%D
'max ($/month)
3
4
3
4
5
3
4
5
3
4
5
3
4
5
100
100
150
150
150
200
200
200
250
250
250
190
190
190
11490
16939
30433
28450
33982
65200
70434
54810
132872
151918
143341
174696
186871
164123
TS
T
(ms)
90.50
88.15
103.06
100.02
98.23
110
96.12
105
123.06
99.09
114.03
97.45
115.08
93.67
($/month) (ms)
9867
11338
24322
22252
28407
60907
60013
41626
105495
103524
106122
148340
168118
139154
80.89
78.50
97.12
94.82
87.20
78.95
93.10
90.62
112
88.02
83.14
90.67
113
86.14
%T %D
11
11
6
5
12
29
4
14
10
12
28
7
2
9
15
33
20
22
17
7
15
24
21
32
26
15
11
16
325
REFERENCES
1. Ag Rhissa, A., Jiang, S., Ray Barman, J., and Siboni, D. Network generic modeling for fault
management expert system. In International Symposium on Information^ Computer and Network Control, Beijing, China, Feb. 1994.
2. AUTONET / Performance-3, Network Design and Analysis Corporation, VA, USA, 1995.
3. Boorstyn, R. R., and Frank, H. Large-scale network topological optimization. IEEE Trans.
Commun, 25(1): 29-47, 1977.
4. Buchanan, B. G. Some approaches to knowledge acquisition. In Machine Learning: A Guide
to Current Research (Mitchell, T. M., Carbonell, J. G., and Michalski, R. S. Eds.), pp. 19-24.
Kluwer Academic, Dordrecht, 1986.
5. Carbonell, J. G., Michalski, R. S., and Mitchell, T. M. An overview of machine learning. In
Machine Learning: An Artificial Intelligence Approach (Michalski, R. S., Carbonell, J. G., and
Mitchell, T. M. Eds.), pp. 3-23. Tioga, Portola Valley, CA, 1983.
6. Coan, B. A., Leland, W. E., Vechi, M. R, and Weinrib, A. Using distributed topology update an
preplanned configurations to achieve trunk network survivability. IEEE Trans. Reliability 40:
404-416,1991.
7. COMNET III. A Quick Look at COMNET III, Planning for Network Managers. CACI Products Company, La Jolla, CA, 1995.
8. Dietterich, T., and Michalski, R. A comparative review of selected methods for learning
from examples. In Machine Learning: An Artificial Intelligence Approach (Michalski, R. S.,
Carbonell, J. G., and Mitchell, T. M. Eds.), pp. 41-82.Tioga, Portola Valley, CA, 1983.
9. Dutta, A., and Mitra, S. Integrating heuristic knowledge and optimization models for communication network design. IEEE Trans. Knowledge and Data Engrg. 5: 999-1017, 1993.
10. Gavish, B. Topological design of computer networks The overall design problem. Eur. J.
Oper. Res. 58:149-172,1992.
11. Gavish, B., and Neuman, I. A system for routing and capacity assignment in computer communication networks. IEEE Trans. Commun. 37(4): 360-366, 1989.
12. Gerla, M., and Kleinrock, L. On the topological design of distributed computer networks. IEEE
Trans. Commun. 25: 48-60, 1977.
13. Gerla, M., Frank, H., Chou, H. M., and Eckl, J. A cut saturation algorithm for topological
design of packet switched communication networks. In Proc. of National Telecommunication
Conference, Dec. 1974, pp. 1074-1085.
14. Glover, F. Tabu search: Improved solution alternatives for real world problems. In Mathematical
Programming: State of the Art (J. R. Birge and K. G. Murty, Eds.), pp. 64-92 Univ. of Michigan
Press, Ann Arbor, MI, 1994.
15. Glover, F. Tabu thresholding: improved search by nonmonotonic trajectories. INFORMS J.
Comput. 7: 426-442,1995.
16. Glover, E, and Laguna, M. Tabu search. In Modern Heuristic Techniques for Combinatorial
Problems (C. Reeves, Ed.), pp. 70-141. Blackwell Scientific, Oxford, 1993.
17. Holland, J. H. Adaptation in Natural and Artificial Systems. Univ. of Michigan Press, Ann
Arbor, MI, 1975.
18. Jan, R. H., Hwang, F. J., and Cheng, S. T. Topological optimization of a communication
network subject to reliability constraints. IEEE Trans. Reliability 42: 63-70, 1993.
19. Kamimura, K., and Nishino, H. An efficient method for determining economical configurations
of elementary packet-switched networks. IEEE Trans. Commun. 39(2): 278-288, 1991.
20. Karp, R. M. Combinatorics, complexity, and randomness. Commun. ACM 29(2): 97-109,
1986.
21. Kershenbaum, A., Kermani, P., and Grover, G. A. MENTOR: An algorithm for mesh network
topological optimization and routing. IEEE Trans. Commun. 39: 503-513, 1991.
22. Kershenbaum, A. Telecommunication Network Design Algorithms. IBM Thomas J. Watson
Research Center, USA, 1993.
23. Kirkpatrick, S. Optimization by simulated annealing: Quantitative studies. / . Stat. Phys. 34:
975-9S6,1984.
24. Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. Optimization by simulated annealing. Science
220: 671-680, 1983.
326
SAMUEL PIERRE
25. Kleinrock, L. Queueing Systems: Vol. II, Computer Applications. Wiley-Interscience, New
York, 1976.
26. MILS. The OPNET Modeler Simulation Environment, 1997. Available at http://www.
mil3.com/products/modeler/home.html.
27. Michalski, R. S. A theory and methodology of inductive learning. In Machine Learning: An
Artificial Intelligence Approach (Michalski, R. S., Carbonell, J. G., and Mitchell, T. M. Eds.),
pp. 83-134. Tioga, Portola Valley, CA, 1983.
28. Monma, C. L., and Sheng, D. D. Backbone network design and performance analysis: A
methodology for packet switching networks. IEEE J. Selected Areas in Commun. 4(6): 9 4 6 965,1986.
29. Newport, K. X, and Varshney, P. K. Design of survivable communications networks under
performance constraints. IEEE Trans. Reliability 40(4): 433-440, 1991.
30. Pierre, S., and Gharbi, I. A generic object-oriented model for representing computer network
topologies. Adv. Eng. Software 32(2): 95-110, 2001.
31. Pierre, S. Inferring new design rules by machine learning: A case study of topological optimization. IEEE Trans. Man Systems Cybernet. 28A (5): 575-585, 1998.
32. Pierre, S., and Legault, G. A genetic algorithm for designing distributed computer network
topologies. IEEE Trans. Man Systems Cybernet. 28(2): 249-258,1998.
33. Pierre, S., and Legault, G. An evolutionary approach for configuring economical packetswitched computer networks. Artif. Intell. Engrg. 10: 127-134, 1996.
34. Pierre, S., and Elgibaoui, A. A tabu search approach for designing computer network topologies
with unrehable components. IEEE Trans. Reliability 46(3): 350-359, 1997.
35. Pierre, S., Hyppolite, M.-A., BourjoUy, J.-M., and Dioume, O. Topological design of computer
communications networks using simulated annealing. Engrg. Appl. Artif. Intell. 8: 61-69,
1995.
36. Pierre, S. A new methodology for generating rules in topological design of computer networks.
Engrg. Appl. Artif. Intell. 8(3): 333-344, 1995.
37. Pierre, S. Application of artificial intelligence techniques to computer network topologies. Eng.
Appl. Artif. Intell. 6: 465-472, 1993.
38. Rose, C. Low mean internodal distance network topologies and simulated annealing. IEEE
Trans. Commun. 40: 1319-1326, 1992.
39. Saksena, V. R. Topological analysis of packet networks. IEEE J. Selected Areas Commun. 7:
1243-1252, 1989.
40. Samoylenko, S. I. Application of heuristic problem-solving methods in computer communication networks. Mach. Intell. 3(4): 197-210, 1985.
41. Schumacher, U. An algorithm for construction of a K-connected graph with minimum number
of edges and quasiminimal diameter. Networks 14: 63-74, 1984.
42. Siboni, D., Ag Rhissa, A., and Jiang, S. Une fonction intelligente de gestion globale des incidents
pour un hyperviseur de reseaux heterogenes. In Actes des quatorziemes journees internationales
d'Avignon, kV 94, Paris, 30 Mai-3 Juin 1994, pp. 379-387.
43. Suk-Gwon, C. Fair integration of routing and flow control in communications networks. IEEE
Trans. Commun. 40(4): 821-834, 1992.
44. Van Laarhoven, P. J. M., and Aarts, E. H. L. Simulated Annealing: Theory and Applications.
Reidel, D., Dordrecht, Holland, 1987.
45. Wong, R. T. ProbabiHstic analysis of a network design problem heuristic. Networks 15: 3 4 7 362, 1985.
46. Yokohira, T , Sugano, M., Nishida, T , and Miyahara, H. Fault-tolerant packet-switched network design and its sensitivity. IEEE Trans. Reliability 40(4): 452-460, 1991.
I. INTRODUCTION 328
A. Multimedia Presentation 328
B. Multimedia Database Management System 329
C. Multimedia Synchronization 330
D. Multimedia Networking 331
E. Reusability 331
R Other Considerations 333
II. DATABASE APPLICATIONS IN TRAININGTHE IMMPS PROJECT 333
III. DATABASE APPLICATIONS IN EDUCATIONA WEB DOCUMENT
DATABASE 341
IV. FUTURE DIRECTIONS 350
APPENDIX A: THE DESIGN AND IMPLEMENTATION OF IMMPS 350
APPENDIX B: THE DESIGN AND IMPLEMENTATION OF MMU 353
REFERENCES 364
Multimedia computing and networking changes the style of interaction between computer and human. With the growth of the Internet, multimedia applications such as
educational software, electronic commerce applications, and video games have brought
a great impact to the way humans think of, use, and rely on computers/networks. One of
the most important technologies to support these applications is the distributed multimedia database management system (MDBMS). This chapter summarizes research issues
and state-of-the-art technologies of MDBMSs from the perspective of multimedia presentations. Multimedia presentations are used widely in different forms from instruction
delivery to advertisement and electronic commerce, and in different software architectures from a stand alone computer, to local area networked computers and World Wide
Web servers. These different varieties of architectures result in different organization of
MDBMSs. The chapter discusses MDBMS architectures and examples that were developed at our university.
.^ * ^
327
328
TIMOTHY K. SHIH
I. INTRODUCTION
Multimedia computing and networking changes the way people interact with
computers. In line with the new multimedia hardware technologies, as well
as well-engineered multimedia software, multimedia computers with the assistance of the Internet have changed our society to a distanceless and colorful
global community. Yet, despite the fantasy gradually being realized, there still
exist many technique problems to be solved. This chapter summarizes state-ofthe-art research topics in multimedia computing and networking, with emphasis
on database technologies for multimedia presentations. The chapter addresses
many problems from the perspective of multimedia applications. Theoretical
details are dropped from the discussion not because of the lack of their importance but due to the avoiding of tediousness. A list of carefully selected
references serves as suggested readings for those who are participating in this
research area.
In this section, the discussion starts with the preliminary concepts of
multimedia presentations widely used in education, training, and demonstrations. The supporting multimedia database management systems (MDBMSs)
and related techniques in terms of reusability, distribution, and real-time considerations are then presented. In Section II and Section III, two MDBMSs are
discussed. The first is an intelligent multimedia presentation design system for
training and product demonstration. The second is a World Wide Web documentation database used in a distance learning environment. These systems are
also illustrated in the Appendices. Finally, some suggestions for future extensions of MDBMSs are presented in Section IV.
A. Multimedia Presentation
One of the key reasons for making multimedia computers become attractive
and successful is the availability of many varieties of multimedia presentation
software, including many CD-ROM titles carrying educational software,
entertainment, product demonstrations, training programs, and tutoring
packages. Most presentations are hypertext like multimedia documentation
[1,8,26]. These documents, while retrieved, involve a navigation topology,
which consists of hyperlinks jumping from hot spots to other presentation
windows. The underlying implementation mechanism of this type of documentation traversal may rely on a message passing system and an eventbased synchronization scheme. Discussions of these mechanisms are found in
[6,16,18,19,37,39,40].
Other new areas related to multimedia are intelligent tutoring [32] and intelligent interface [4,5,11,12,36]. The incorporation of Expert System technology has caused multimedia presentations to diversify. An intelligent multimedia
presentation can learn from its audiences through interactions. The feedback
is asserted into the knowledge base of the presentation. Therefore, after different audiences of different backgrounds interact with a tutorial or a training
program, the multimedia workstation may act according to the individuals and
give different appropriate guidance.
329
The first approach is too limited in that it is relatively hard to share resources
among multimedia documents. Also it is hard for a presentation designer to
keep track of various versions of a resource. Using a relation database will partially solve this problem. However, since multimedia documentation is objectoriented, using a relational database will introduce some overhead. Therefore,
many researchers suggest using an existing object-oriented database system for
building multimedia databases [9,12,13,29,41]. However, there are still some
problems yet to be solved:
Quality of service. Multimedia resources require a guarantee of the presentation quality. It is possible to use a specific file structure and program to
control and guarantee the quality of service (QoS). A traditional OODBMS
does not support QoS multimedia objects.
Synchronization. Synchronization of multiple resource streams plays an
important role in a multimedia presentation, especially when the presentation
is running across a network. In our projects, we are not currently dealing with
this interstream synchronization problem. We use an event-based synchronization mechanism in the current systems. Interstream and other synchronization
issues are left for future development. However, in the future we plan to break
multimedia resources into small pieces in order to implement interstream synchronization, which is especially important in a distributed environment.
Networking. A distributed database requires a locking mechanism for
concurrent access controls. Using our own architecture design, it is easy for us
to control some implementation issues, such as a two-phase locking mechanism
of the database, database administration, and control of traffic on the network.
In order to ensure the synchronization of a multimedia document, a multimedia system needs to have full control of data access activity and its timing. It is sometimes difficult to have a total control of the performance of an
330
TIMOTHY K. SHIH
object-oriented database, especially when the multimedia database is distributed. Therefore, if needed, a multimedia database must be built using only
primitive multimedia programming facilities.
Building a multimedia database requires three layers of architecture to be
considered.
the interface layer,
the object composition layer, and
the storage management layer.
The tasks that we must deal with in the interface level include object browsing,
query processing, and the interaction of object composition/decomposition.
Object browsing [8] allows the user to find multimedia resource entities to be
reused. Through queries, either text-based or visualized, the user specifies a
number of conditions to the properties of resource and retrieves a list of candidate objects. Suitable objects are then reused. Multimedia resources, unlike text
or numerical information, cannot be effectively located using a text-based query
language. Even natural language presented in a text form is hard to precisely
retrieve a picture or a video with certain content. Content-based information
retrieval research [21,49] focuses on the mechanism that allows the user to effectively find reusable multimedia objects, including pictures, sound, video, and
the combined forms. After a successful retrieval, the database interface should
help the user to compose/decompose multimedia documents. The second layer
works in conjunction with the interface layer to manage objects. Typically, object composition requires a number of links, such as association links, similarity
links, and inheritance links to specify different relations among objects. These
links are specified either via the database graphical user interface, or via a number of application program interface (API) functions. The last layer, the storage
management layer, needs to consider two performance issues: clustering and
indexing. Clustering means to organize multimedia information physically on
a hard disk (or an optical storage) such that, when retrieved, the system is able
to access the large binary data efficiently. Usually, the performance of retrieval
needs to guarantee some sort of QoS [44]. Indexing means that a fast-locating
mechanism is essential for finding the physical address of a multimedia object
[28]. Sometimes, the scheme involves a complex data or file structure. Media
synchronization should be considered in both issues.
There are other issues in multimedia database research, including transaction processing, object locking mechanisms and concurrent access, persistency, versioning, security, and referential integrity. Most of these are also issues
of traditional database research. The research of multimedia databases has become an important issue in the community of multimedia computing.
C. Multimedia Synchronization
Multimedia information is processed in real-time. Therefore, the temporal
aspect of multimedia information coherence-synchronization is one of the
most interesting research topics in multimedia computing [3,7,14,15,17,19,22,
23,25,31,42,45,47,48]. Synchronization mechanisms are, in general, divided
331
into the intra- and the interstream-based schemes. The former focuses on the
simuhaneous demonstration of one or more sources of information in one
multimedia resource stream (e.g., a video file contains both motion pictures and
sound data). The later discusses the simultaneous process of multiple streams
holding different resources (e.g., an animation video is synchronized with a
MIDI music file and a sound record).
Among a number of synchronization computation models, timed Petri net
[20,31] seems to be a powerful model for describing the behavior of real-time
systems. A Petri net is a bipartite directed graph with two types of nodes: places
and transitions. A place node holds tokens, which are passed to transitions. A
place can also represent a multimedia resource, which is demonstrated for a
period of time before its token is passed to a transition. A transition controls
the synchronization of those places adjacent to the transition. Timed Petri net
is found to be one of the most suitable models for multimedia synchronization controls. Another theoretical computation of synchronization is based on
temporal interval relations [2]. There are 13 types of relations between a pair
of temporal intervals. Temporal interval relations can be used to calculate the
schedule of a multimedia presentation [38]. Synchronization information can
also be embedded into multimedia resources. Synchronization can also be controlled by event and message passing. The research of multimedia synchronization becomes a challenge if the mechanism is to be implemented on a distributed
environment.
D. Multimedia Networking
In line with the success of the Internet, the needs of multimedia communication has brought the attention of many software developers and researchers
[10,24,34,35,46]. Two important issues are network transmission protocols
and network infrastructure. The former includes mechanisms enabling transmissions of a guaranteed quality and security. The later focuses on the structure
of communication system so that reliability and traffic management is feasible.
At the top of multimedia network systems (either intra- or inter-networking),
many applications are available. Multimedia client-server based applications
include the World Wide Web (WWW), multimedia documentation on demand (e.g., video-on-demand and news-on-demand), electronic catalog ordering, computer-supported cooperative work (CSCW), video conferencing and
distance learning, and multimedia electronic mail. The combination of multimedia and networking technologies changes the way people think of, use, and
rely on computers.
E. Reusability
Multimedia presentations are software, which need a specification to describe
their functions. Nevertheless, the well-defined specification does not realize an
improved functionality of a system. From the perspective of software development, reusability is one of the most important factors in improving the efficacy of multimedia presentation designs and multimedia database access. Many
332
TIMOTHY K. SHIH
As a super-class,
When a new object is created via the "new" operator,
As a prototype for specifying components of another class,
As the specification of a predicate, and
As a specification of problem conditions in problem statements.
333
334
TIMOTHY K. SHIH
335
MM resources
MDBMS
Addressee
Knowledge
Acquisition
Presentation
Navigation and
Topics Designer
MM resources
Res./Pre.
Database
'knowledge
navigation
resources and topics
presentation topics
A MM Presentation Module
presentation topics
knowledge
Presentation Carrier]
and
knowledge
Inference Engine
Addressee
Learning
System
presentation results
MM Device
Drivers
The Runtime Subsystem
MM presentation
FIGURE I
inheritance structure such as a tree or a DAG (directed acycUc graph) is suitable for our knowledge representation architecture. Figure 3 illustrates these
two structures. In the figure, a presentation window (i.e., PWin) is a composed
object that represents a topic that a presenter wants to discuss. A presentation
window may contain push buttons, one or more multimedia resources to be
Control
View
Knowledge
View
1
1
1
assert/retract knowledge
Presentation
Navigation
Knowledge
Inference
lllCSSclgCS
ii
messages
Windows/"MCI
function c alls
FIGURE 2
Window
Input/Output
^
Controls
multimedia
resources
Multimedia
Resoui
Databcise
336
TIMOTHY K. SHIH
PWin4
FIGURE 3
Ik5
presented, and a number of knowledge rules (e.g., k l , k2, k3). A message (e.g.,
m l , ml) with optional parameters can be passed between two presentation
windows (or passed back to the same presentation window). The graph edges
representing navigation links are shown in thin lines, with message names as
labels. The DAG edges representing knowledge inheritance are shown in thick
lines without labels. In the figure, to the right of each presentation window, we
show the knowledge rules that can be used in the presentation window. Even
though knowledge rules " k l " and "k2" are shared among PWin 1, PWin 2,
PWin 3, PWin 5, and PWin 6, they are stored only once in PWin 1. Note that
multiple inheritance is also allowed, as PWin 5 inherits knowledge rules from
both PWin 3 and PWin 4.
There are a number of restrictions applied to our message passing system
and knowledge inheritance system. For instance, a message passed between
two presentation windows has a unique name. Only the destination presentation window can receive the specific message sent to it. Each message has
only one source and one destination. A child presentation window inherits all
knowledge rules and facts from its parent presentation windows. The relation
of knowledge inheritance is transitive. However, the inheritance architecture is
acyclic. That is, a presentation window can not be a parent presentation window of its own. If a child presentation window contains a knowledge rule that
has the same rule name as one of the rules the child presentation window inherits (directly or indirectly), the rule defined in the child presentation window
337
overrides the one from its parent presentation windows. If two rules belonging
to two presentation windows have the same name, the rule should be stored in
a common parent presentation window only once to avoid inconsistency.
IMMPS can be used in general purpose presentations or demonstrations
in different fields such as education, training, product demonstration, and
others. This system serves as a sample application showing our multimedia
database research results [40,41]. The multimedia database of IMMPS has two
major layers: the frame object layer and the resource object layer. Figure 4
Frame
I reusable
Objects I ^ / ^
T
I
/
Layer I /
f^K'^N^
Keyword
v/ll>i
Storage
Management
Disk Storage for Resource Data
aggregation
association
resource
FIGURE 4
338
TIMOTHY K. SHIH
339
340
TIMOTHY K. SHIH
object classes and frame object classes. Groups in a presentation are co-related
since aggregation links and usage links are used among groups for resource
access and navigation. However, in a class database, object classes are individual
elements.
We are led to a two-layered approach by the nature of reusable objects. The
reuse of a presentation script (i.e., in a frame group) and the reuse of multimedia resources (i.e., a resource group) are the two levels of reuse. Other reuse
processes can be modeled by a combination of the two levels. For instance, the
reuse of a whole presentation can be achieved by grouping the presentation as a
frame group with several resource groups. The reuse of a single resource can be
achieved by treating the single resource as a resource group. Therefore, the twolayered approach is a natural design. A multimedia presentation is a collection
of frame groups and resource groups. Strictly speaking, a presentation is not a
part of the database, at least from the database modeling perspective. Similarly,
the underlying storage management layer is not a part of the database model.
However, the two-layered model is to emphasize the two levels of object reuse.
There are also other models that can be used for presentations (e.g., the Petri
net model). Petri nets, especially timed Petri nets, are suitable for interstream
synchronization, which is not the mechanism used in our presentation system.
Our system, on the other hand, relies on event-based synchronization with
assistance from a message passing mechanism. Therefore, the Petri net model
was not adopted.
We present an overview of the database storage management, which includes memory management and disk storage management. Memory management is used to run a presentation. A presentation contains a number of object
groups (frame groups and resource groups). Each object group is a set, represented by a dynamic array that contains addresses of frames or resource
descriptors. Object classes are also represented using dynamic arrays. A DAG
is used to store frames and the inheritance links among them. At the resource
object level, a digraph with bi-directional pointers (association links) is used
to store resource descriptor addresses. Each resource descriptor contains a
resource file name. The binary resource data are loaded at run time. This strategy avoids the dynamic memory from being over used by the huge amount of
binary data. Pointers (aggregation links) to resource descriptors are stored in a
frame for the frame to access its resources- The disk storage is rather complicated. Figure 5 pictures a simplified overview. The database server (MDBMS)
allows multiple databases. Two types of databases are allowed: the presentation database (i.e., P.Database) and the object class database (i.e., C.Database).
Each database has two files: a data file and an index file. A database may store
a number of presentations or object classes. Each presentation contains several
pointers to frame groups and resource groups. Each object group (frame or
resource) contains a number of chained records. Similarly, an object class has a
number of chained records. These records of resource groups (or resource object classes) are pointers to the BLOBs and attributes of multimedia resources,
which can be shared. The index file has two types for presentation indices
(P. Indices) and object class indices (C. Indices). Each part is a hashing table that contains data file offsets and other information, such as locking
statuses.
341
MDBMS
FIGURE 5
342
TIMOTHY K. SHIH
Instruction Design
Instruction Playback
Instruction
Annotation
Editor
Instruction
Annotation
Playback
instruction data
assessment
data 1 A s s e s s m e n t ]
JDBC/ODBC
Facility
library req.
andserv.
Virtual Lib. 1
Interface |
on-line hela
^messages^ On-Line Help 1
Facility
Internet
FIGURE 6
administrators. The instructors will use our system running on a Web browser
in conjunction with some commercial available software.
Adaptive to open architecture. A minimal compatibility is defined as
the requirement for the open architecture. Compatibility requirements include
presentation standard, network standard, and database standard.
Figure 6 illustrates the system architecture of our MMU system. On the instruction design side, we encourage instructors to use the Microsoft FrontPage
Web Document editor, or an equivalent on a Sun workstation, to design virtual courses. Virtual courses may also be provided via some Java application
programs, which are embedded into HTML documents. Since HTML and Java
are portable languages, multiplatform courses are thus feasible. An instruction
annotation editor, written as a Java-based daemon, is also running under the
Java virtual machine (which is supported by Web browsers). This annotation
daemon allows an individual instructor to draw lines, text, and simple graphic
objects on top of a Web page. Different instructors can use the same virtual
course but with different annotations. These annotations, as well as virtual
courses, are stored as software configuration items (SCIs) in the virtual course
database management system. An SCI can be a page showing a piece of lecture.
343
344
TIMOTHY K. SHIH
Database Layer
(^
Document
Resource Sharing
FIGURE 7
Database Layer
Document Layer
BLOB Layer
Multimedia sources: multimedia files in standard formats (i.e., video,
audio, still image, animation, and MIDI files). Objects in this layer are
shared by instances and classes.
345
Implementation Table
Starting URL: a unique starting URL of the Web document
implementation.
HTML files: implementation objects such as HTML or XML files.
Program files: implementation objects such as Java applets or ASP
programs.
Multimedia resources: implementation objects such as audio files.
Script name: foreign key to the script table.
Test record names: foreign key to the test record table.
Bug report names: foreign key to the bug report table.
Annotation names: foreign key to the annotation table.
346
TIMOTHY K. SHIH
FIGURE 8
347
container can have both read and write access by another user. Of course,
the accesses are prohibited in the current container object. Locking tables are
implemented in the instructor workstation. With the table, the system can
control which instructor is changing a Web document. Therefore, collaborative
work is feasible.
Web documents are reusable. Among many object reuse paradigms, classification and prototyping are the most common ones. Object classification
allows object properties or methods at a higher position of the hierarchy to
be inherited by another object at a lower position. The properties and methods are reused. Object prototyping allows reusable objects to be declared as
templates (or classes), which can be instantiated to new instances. A Web document in our system contains SCIs for script, implementation, and testing. As
a collection of these three phrases of objects, a Web document is a prototypebased reusable object. Object reuse is essentially important to the design of Web
documents. However, the demonstration of Web documents may take a different consideration due to the size and the continuous property of BLOB.
Web documents may contain BLOB objects, which are infeasible to be
demonstrated in real-time when the BLOB objects are located in a remote station due to the current Internet bandwidth. However, if some of the BLOB
348
TIMOTHY K. SHIH
objects are preloaded before their presentation, even though the process involves the use of some extra disk space, the Web document can be demonstrated
in real-time. However, BLOB objects in the same station should be shared as
much as possible among different documents. We aim to provide a system to
make distributed Web documents to be reused in a reasonable efficient manner.
The design goal is to provide a transparent access mechanism for the
database users. From different perspectives, all database users look at the same
database, which is stored across many networked stations. Some Web documents can be stored with duplicated copies in different machines for the ease
of real-time information retrieval. A Web document may exist in the database
at different physical locations in one of the following three forms:
Web document class
Web document instance
Web document reference to instance
A document class is a reusable object, which is declared from a document
instance. A document instance may contain the physical multimedia data, if
the instance is newly created. After the declaration of the document instance,
the instance creates a new document class. The newly created class contains the
structure of the document instance and all multimedia data, such as BLOBs.
The original document instance maintains its structure. However, pointers to
multimedia data in the class is used instead of storing the original BLOBs. When
a new document instance is instentiated from a document class, the structure
of the document class is copied to the new document instance and pointers
to multimedia data are created. This design allows the BLOBs to be stored
in a class. The BLOBs are shared by different instances instantiated from the
class.
A document instance is a physical element of a Web document. When a
database user looks at the Web document from different network locations,
the user can access the Web document in two ways. The first is to access the
document directly. The second mechanism looks at the document via document
reference. A document reference to instance is a mirror of the instance. When
a document instance is created, it exists as a physical data element of a Web
document in the creation station. References to the instance are broadcast and
stored in many remote stations.
When a document instance is retrieved from a remote station more than a
certain amount of iterations (or more than a watermark frequency), physical
multimedia data are copied to the remote station. The duplication process may
include the duplication of document classes, which contain the physical BLOBs.
The duplication process is proceeded according to a hierarchy distribution
strategy. Assuming that, N networked stations join the database system in a
linear order. We can arrange the N stations in a full m-ary tree according to a
breadth first order. A full m-ary tree is a tree with each node containing exactly
m children, except the trailing nodes. The nth station, where 1 <=n<= N, in
the linear joining sequence has its /th child, where 1 <= / <= m at the following
position in the linear order:
m''(n-
1) + / + 1.
349
where
i = {kl)mod
m, if
i = m, otherwise.
i\m;
350
TIMOTHY K. SHIH
names, and course numbers/titles. This virtual library is Web-savvy. That is, the
searching and retrieve processes are running under a standard Web brow^ser.
The library is updated as needed. The mechanism follows another guidance of
our project goals: adaptive to changing user needs. We are developing three Web
courses based on the virtual library system: introduction to computer engineering, introduction to multimedia computing, and introduction to engineering
drawing.
Distance learning, virtual university, or remote classroom projects change
the manner of education. With the tremendous growing amount of Internet
users, virtual university is a step toward the trend of future university. However, most development of distance learning systems relies on the high bandwidth of a network infrastructure. As it is not happening everywhere on the
Internet to meet such a high requirement, it is worthwhile to investigate mechanisms to cope with the real situation. Even in the recent future, with the next
generation of Internet, the increasing amount of users consumes an even higher
network bandwidth. The primary directive of the Multimedia Micro-University
Consortium is looking for solutions to realize virtual university. Some of our
research results, as pointed out in this chapter, adapt to changing network conditions. Using an off-line multicasting mechanism, we implemented a distributed
virtual course database with a number of on-line communication facilities to
fit the limitation of the current Internet environment. The proposed database
architecture and database system serves as an important role in our virtual
university environment. We are currently using the Web document development environment to design undergraduate courses including introduction to
computer science and others.
FIGURE A I
351
We present the graphical user interface of IMMPS. The first window is the
main window of IMMPS shown in Fig. A l . This window allows the user to
drop various presentation topics in the design area shown in the subwindow.
Presentation topics may include text, audio, etc. shown as buttons on the left
side of the window. The main window is used in conjunction with the presentation knowledge inheritance window and the presentation messages passing
window shown in Fig. A2. The knowledge inheritance hierarchy is constructed
by using different functions shown in the menu bar of the window. Inheritance
links among presentation window objects are thus created as the inheritance
relations are declared.
To add a usage (or message) link, the user accesses the presentation messages passing window. A vertical bar represents a presentation window indicated by its name on the top of that bar. Messages are passed when a button in
a presentation window is pushed (or, as a side effect of a knowledge inference).
Push buttons are shown as boxes on vertical bars. Each message, with its name
displayed as the label of a usage link (shown as a horizontal arrow), is associated with one or more parameters separated by commas. These parameters are
entered in the message window shown in the lower-left corner of Fig. A2.
To add rules or facts to the knowledge set of a presentation window, the
presentation intelligence window shown in Fig. A3 is used. When a message
is received by a presentation window, the corresponding actions that the presentation window needs to perform will be entered in the presentation action
control window in Fig. A3. The presentation button control window is used to
specify actions associated with each button when it is clicked.
352
TIMOTHY K. SHIH
basic_pract
ba5ic_pract scale-pract
decor_pFaGt
fast_pract
scale_pract
decor_pract
part_two
part_one
complete
Message:
Mm
^^^in^f^A:A/A^
bpen
* Parameters: i
FIGURE A2
window.
The presentation knowledge inheritance window and the presentation message passing
353
Frame N a m e
|basic_pract
Super Frame
Name:
|F^oo<
openO
closeO
proc_call(doseJrame(sca.le_jprac|]Qi^
assert(Pred)
assert(Pred)
retract(Pred)
rertact(Pred)
link_topic(tname|rnaproc_cail(link_t(l
Query:
3tart_basic_pract
if start_basic_prac:t then
[ send(open(scale_practD)) ]
if studentjevel(under) then
jToDo
proc_call(play_resou,
Skip
proc_cail(stop_resou^
1 send(open_win(basicf
Facts:
course_levei(rookie).
prerequsite(multimedia_pc).
prereqjjsite(artificialjntelligence).|
Back
send(open_win(basiC;
Start
proc_call(play_resourceCVideo1))
Stop
proc_caii(stop_resouce(yideo1))
proc_call(ff_resource(Video1))
proc_call(bk_re80urce(Video1))
send(close_win(basic,practbasic_practQ))
F I G U R E A 3 The presentation intelligence window, the presentation action control window, and the
presentation button control window.
We Spent about two years developing the system. In the first half of a year,
we surveyed other presentation systems and researches before documenting
a draft specification. The functional specification and design of IMMPS take
about nine months. Two Ph.D. students and three Masters students spent about
almost a year in the implementation, before the prototype was tested by a few
undergraduate students for one month. About 9000 lines of VB, 4000 lines of
VC, and another 4000 lines of Prolog code were used.
354
TIMOTHY K. SHIH
HB
_.^^^^jy^
B^ ^d4ew
csie207.avi
csie208.avi
csieSOl .avi
Selection Criteria
'
' '
- *
'
^ - - ' - - ^
' A . A ^ . ^ ^ . -
. . . . . ^ ^ :
s....i
Key W o r d s :
T e m p o r a l Endurance
Startup D e l a y :
H a r d w a r e Limitation :
Resolution :
Medium :
FIGURE A4
The development process may proceed through several cycles in the spiral
model that we have proposed. As the Web document evolves, the user may start
from a Web document only containing the script portion. The implementation
and test records are omitted at the beginning. In the next iteration, the user uses
the FrontPage editor to design his/her Web page and add intradirectory and
intrastation hyperlinks. The test record has a local testing scope in this cycle.
Assessment is conducted to decide the status of the document, including checking the percentage of completeness and inconsistency, to decide whether to
proceed with the next iteration. When the user starts to add interstation hyperlinks in the next cycle, the testing scope should be changed. Assessment is then
performed again to decide whether to post the Web page. The evolution process
of a Web document can be assessed from another perspective. A document may
start from an implementation without add-on control programs. Gradually,
the user adds Java applets. Regression tests should be proceeded at each cycle.
355
Resource Associations
Resource
basicavi
scale.avi
fast.avi
decor.avi
parti :avi
part2:avi
complete.avi
basicbmp
fastbmp
decor.bmp
parti :bmp
FIGURE
A5
scaleJtxt
scale.wav
avi
All Resources
basicavi
jfastavi
decor.avi
part1.avi
part2.avi
complete.avi
basicbmp
fastbmp
decor.bmp
parti .bmp
part2.bmp
The presentation reuse control window and tiie multimedia resource association
control window.
Another way to look at the evolution is that the user can start from a Web
document without image and audio records. The user can add images in the
next cycle, and audio records in the following cycle, and so on (for video clips).
The spiral model can be used in different types of Web document evolutions.
After a Web document is developed, the instructor can use the document
annotation editor (shown in Fig. A9) to explain his/her lecture. The annotations,
as well as audio descriptions, can be recorded with time stamps encoded for
real-time playback. Note that a Web document (serving as a lecture) can be
used by different instructors with different annotations.
Even though the annotation system allows instructors to explain course
materials, in a virtual university system, it is necessary to have on-line discussion
tools for students to ask questions and conduct group discussions. Figures AlO
and A l l illustrate a chat room tool, which allows multiple users to send messages to each other. The chat room tool is written in Java running on a standard
Web browser. The chat room tool, as well as a white board system (see Fig. A12),
has four floor control modes:
Free access: All participants can listen and talk.
Equal control: Only one person can talk but all participants can listen.
An individual user sends a request to the speaker for the floor. The speaker
356
TIMOTHY K. SHIH
Presentation Navigation
and Topic Designer
Object Reuse
Control System
Addressee Knowledge
Acquisition
Addressee Knowledge
Learning System
Microsoft Windows 95
FIGURE A6
A*Dr
iLawience Y. G Dens
Date
|6/15/98
wife
16:1?
^jic Werk'^'^^i^ M
Desciiption:
jThe Web document is a call for papers offlvsDMS'98 conference, i
jYou diouM use "ti.fle.bmp'' as the header of t\e main
jpagp. "logp.bmp" is fte ^nnbolof Tamkang University, which
jshould be used in a sub-header, "building-brnp" is a picture of the '
{conference venue.
''
llhe CFP ^ u l d contain a ^ r t description of theme, and a list of p
iselected topic in Distnbated Multimedia Systems. Also, the
-^
jfolbwing sub-sections should be included: paper submission^
jimportant dates^ ^onsor^ conference committee, and hote]/tre.vel f^
jinfoimalion.
^|
Ejqiected date/time of completion:
Percentage of completion:
J6/30/98 12:00
jgg^
Multimedia Resources:
^Transportatio
title.bmp logp.bmp building.bmp
bkmustc.vfav
FIGURE A7
g ]
tour.avi
ri|,^'?|
Hotel Reservation
Form
DMS'98 Tutorials
actions I
Copyright Form
f^^^^^B^mmmm^mmA'' -i
357
"H
pSToflO
iLawieiiM Y. G. Deng
Btig reports:
|nportl312
iTimofliyK.SMhT
Test Fracedine:
{Click on Hvt "Tmnsporteilion" anchor. You should see a map of fte eipei city, a .; ts&^Sope:
Local C
ObM
(t
Isabway map, and a bus schedule. A description of transportation is also
jinchded. Make sure ^ t the maps are clear.
I StsrtingtrRUi |hl^:/\^w;niine.tku.edatwM[ms98
Bug Description:
'I Zvss^kt^kmTStvm:
The bus schedule is not there.
I Bagi^od*:
] | |
|nportl312
Jf
Missing Objects
BedtlRLs:
httpj/*W!mine.fen.edn.tw/dms9iB/liofe
ht4:/Aii'WVJnine.1ku.edu.t*AinM98/dms5"3
trans.bmp
Redundant Objects:
nconsLsten&y i
;
FIGURE A8
ip Wf^rh
, tiu.edu. tw'J^i
1 he l&st few years have b e e n an e ^ l o s i v e growth oj^multimedia computagj communication and ^plications. This revolution
w a y people live. w o r k ; _ ^ interact wflh^each other, and is impacting the wa3/businesses, government^
"services, education, oitertainment, and h e j
is safe to say that the multimedia revolution is underway.
et, several issuesTelatedTomodeling, specification, a s ^ S ^ ^ a n d design of distributed multimedia systems and applications are
stin chaflengjdgto both researchers and practtooners. The purpose of this workshop is to serve as a forum for the exchange of
ideas ^ n o n g practicingeqgiiiee andj^esearchers fi'om around the world, as well as 1iihliht current activities and in^ortant
topics in the field o f ^ s S f c ^ e d m u t t i m e d i a systems]and technology. The workshop organizers seek contributions o f 1 ^ qualily
FIGURE A9
*!
iB
358
TIMOTHY K. SHIH
^'^m:r'^^ '
Member:
lARISA
IPETER
ISIAH
IMARY
ROGER
22:52 [MARY]:
I wil: change the floor Ctrl to Dicect Contace
Hello, Roger.
22:47 IKIAEY]:
Thanks
22^[is:iAHi:
22:50 [ROGER] :
Hello. Mary.
:M
Member
liai;^-S,:M*;.i;y>-"y4ftSf&
FIGURE A10
PR
I3
rrrFi^wr
msBm^mm^^^^B^
FIGURE A I I
359
-ferX.
.-J
lj*
://libr8L.mine .tku.edti.1w/mmutAiet.html
'
***
m4m^lnii\ikmmmM.M''^'
m&imm~
FIGURE A12
^___^
Trr^sTTT!!^
'.
'IM
grants the control to the individual (first come, first serve-based). The first
person logged in into the chat room has the first floor control. No chairperson
is assigned. That is, everyone has the same priority.
Group discussion: A participant can select a group of persons to whom
he/she wants to talk and agrees to another group to whom he/she wants to
listen. When an individual is asked to listen to another person, the individual
can decide if he/she wants to listen. If so, the name of this individual is added
to the listen group of the talking person.
Direct contact: Two persons talk and listen to each other (i.e., private
discussion). This is for a private conversation in the direct contact area illustrated in the GUI below. A person can have a private discussion with his/her
partner while still joining the chat room.
The control mode of the discussion is decided by the course instructor.
Drawing works the same as chatting in the three modes. Direct contact can be
turned on/off by the instructor.
We use a relational database management system to implement our objectoriented database. The following database schema is implemented on the MS
SQL server. However, a similar schema can be used in the Sybase or the Oracle
database server. The schema can be used by ODBC or JDBC applications, which
serve as open connections to various types of other database servers.
360
TIMOTHY K. SHIH
361
(ANAME
SNAME
CHAR
CHAR,
NOT NULL,
STARTING.URL
CHAR,
AUTHOR
CHAR,
VERSION
INT,
DATE-TIME
DATETIME,
ANNOTATIONJILE
CHAR,
PRIMARY KEY(ANAME),
FOREIGN KEY(SNAME) REFERENCES Script_table(SNAME),
FOREIGN KEY(STARTING-URL) REFERENCES
Implementation.table(STARTING.URL),
UNIQUE(ANAME, SNAME, STARTING.URL, AUTHOR,
ANNOTATION.FILE, VERSION)
)
CREATE TABLE Resource_table*
(RJD
INT
NOT NULL,
NAME
CHAR,
AUTHOR
CHAR,
DATE.TIME
DATETIME,
SIZE
INT,
DESCRIPTION
CHAR,
LOCATION
CHAR
NOT NULL,
TYPE
CHAR,
PRIMARY KEY(RJD),
UNIQUE(RJD, AUTHOR, NAME, LOCATION, TYPE)
)
CREATE TABLE Html.table*
(H JD
INT
NOT NULL,
NAME
CHAR,
AUTHOR
CHAR,
DATE.TIME
DATETIME,
DESCRIPTION
CHAR,
LOCATION
CHAR
NOT NULL,
PRIMARY KEY(HJD),
UNIQUE(HJD, AUTHOR, NAME, LOCATION)
)
CREATE TABLE Program.table*
(P JD
INT
NOT NULL,
NAME
CHAR,
AUTHOR
CHAR,
SIZE
INT,
DATE.TIME
DATETIME,
DESCRIPTION
CHAR,
TYPE
CHAR,
LOCATION
CHAR
NOT NULL,
PRIMARY KEY(PJD),
UNIQUE(PJD, AUTHOR, LOCATION, TYPE)
)
362
TIMOTHY K. SHIH
363
NOT NULL,
(BNAME
R JD
CHAR
INT
NOT NULL,
NOT NULL,
364
TIMOTHY K. SHIH
365
19. Schnepf, J. etal. Doing FLIPS: Flexible interactive presentation synchronization. In Proceedings
of the International Conference on Multimedia Computing and Systems, pp. 1\?>-111, 1995.
20. J. E. C. JR. and Roussopoulos, N. Timing requirements for time-driven systems using augmented
Petri nets. IEEE Trans. Software Engrg. 9(5):603-616,1983.
21. Kunii, T , Shinagawa, Y., Paul, R., Khan, M., and Khokhar, A. A. Issues in storage and retrieval
of multimedia data. Multimedia Systems 3:298-304,1995.
22. Leydekkers, P., and Teunissen, B. Synchronization of multimedia data streams in open distributed environments. In Network and Operating System Support for Digital Audio and Video,
Second International Workshop Heidelberg, Germany, (R. G. Herrtwich, Ed.), pp. 94-104,
1991.
23. Little, T. D., and Ghafoor, A. Multimedia synchronization protocols for broadband integrated
services. IEEE J. Selected Areas Commun. 9(9):1368-1382, 1991.
24. Little, T. D. C., and Ghafoor, A. G. Spatio-temporal composition of distributed multimedia
objects for value-added networks. IEEE Computer, 42-50, 1991.
25. Little, T. D. C., and Ghafoor, A. Synchronization and storage models for multimedia objects.
IEEE J. Selected Areas Commun. 8(3):413-427, 1990.
26. Lundeberg, A., Yamamoto, T., and Usuki, T. SAL, A hypermedia prototype system. In Eurographic Seminars, Tutorials and Perspectives in Computer Graphics, Multimedia Systems,
Interaction and Applications (L. Kjelldahl, Ed.), Chapter 10, 1991.
27. Oomoto, E., and Tanaka, K. OVID: Desibn and implementation of a video-object database
system. IEEE Trans. Knowledge Data Engrg. 5(4):629-643, 1993.
28. Ouyang, Y. C., and Lin, H.-P. A multimedia information indexing and retrieval method. In
Proceedings of the Second ISATED/ISMM International Conference on Distributed Multimedia
Systems and Applications, Stanford, CA, August 7-9, pp. SS-57, 1995.
29. Ozsu, M. X, Szafron, D., El-Medani, G., and Vittal, C. An object-oriented multimedia database
system for a news-on-demand application. Multimedia Systems, 3:182-203, 1995.
30. Paul, R., Khan, M. E, Khokhar, A., and Ghafoor, A. Issues in database management of multimedia information. In Proceedings of the 18th IEEE Annual International Computer Software
and Application Conference (COMPSAC94), Taipei, Taiwan, pp. 209-214,1994.
31. Prabhakaran, B., and Raghavan, S. V. Synchronization Models for Multimedia Presentation
with User Participation, Vol. 2 of Multimedia Systems. Springer-Verlag, Berlin, 1994.
32. Dannenberg, R. B. et al. A computer based multimedia tutor for beginning piano students.
Interface 19(2-3):155-173, 1990.
33. Rhiner, M., and Stucki, P. Database requirements for multimedia applications. In Multimedia
System, Interaction and Applications (L. Kjelldahl Ed.), pp. 269-281, 1991.
34. Rosenberg, J., Cruz, G., and Judd, T. Presenting multimedia documents over a digital network.
In Network and Operating System Support for Digital Audio and Video, Second International
Workshop Heidelberg, Germany, (R. G. Herrtwich Ed.), pp. 346-356, 1991.
35. Schurmann, G., and Holzmann-Kaiser, U. Distributed multimedia information handling and
processing. IEEE Network, November: 23-31, 1990.
36. Shih, T. K. An Artificial intelligent approach to multimedia authoring. In Proceedings of
the Second lASTED/ISMM International Conference on Distributed Multimedia Systems and
Applications, pp. 71-74, 1995.
37. Shih, T. K. On making a better interactive multimedia presentation. In Proceedings of the
International Conference on Multimedia Modeling, 1995.
38. Shih, T. K., and Chang, A. Y. Toward a generic spatial/temporal computation model for multimedia presentations. In Proceedings of the IEEE ICMCS Conference, 1997.
39. Shih, T. K., and Davis, R. E. IMMPS: A multimedia presentation design system. IEEE Multimedia Fall, 1997.
40. Shih, T. K., Kuo, C.-H., and An, K.-S. Multimedia presentation designs with database support.
In Proceedings of the NCS'95 Conference, 1995.
41. Shih, T. K., Kuo, C.-H., and An, K.-S. An object-oriented database for intelligent multimedia
presentations. In Proceedings of the IEEE International Conference on System, Man, and
Cybernetics Information, Intelligence and Systems Conference, 1996.
42. Shivakumar, N., and Sreenan, C. J. The concord algorithm for synchronization of networked
multimedia streams. In Proceedings of the International Conference on Multimedia Computing
and Systems, Washington, DC, May 15-18, pp. 31-40, 1995.
366
TIMOTHY K. SHIH
43. Smoliar, S. W., and Zhang, H. J. Content-based video indexing and retrieval. IEEE MultiMedia,
62-72, 1994.
44. Staehli, R., Walpole, J., and Maier, D. A quality-of-service specification for multimedia presentations. Multimedia Systems, 3: 251-263, 1995.
45. Steinmetz, R. Synchronization properties in multimedia systems. IEEE J. Selected Areas
Commun. 8(3):401-412, 1990.
46. Woodruff, G. M., and Kositpaiboon, R. Multimedia traffic management principles for guaranteed ATM netw^ork performance. IEEE]. Selected Areas Commun. 8(3):437-446, 1990.
47. Al-Salqan, Y. Y. et al. MediaWare: On Multimedia Synchronization, pp. 150-157, 1995.
48. Yavatkar, R. MCP: A protocol for coordination and temporal synchronization in multimedia
collaborative applications. In IEEE 52th Intl Conference on Distributed Computing Systems,
June 9-12, pp. 606-613,1992.
49. Yoshitaka, A., Kishida, S., Hirakaw^a, M., and Ichikaw^a, T. Knowledge-assisted content-based
retrieval for multimedia database. IEEE Multimedia Magazine 12-21, 1994.
50. Tracz, W. Software reuse myths. ACM SIGSOFT Software Engineering Notes 13(1):17-21,
1988.
51. Kaiser, G. E. et al. Melding software systems for reusable building Blocks. IEEE Software
17-24, 1987.
52. Lenz, M. et al. Software reuse through building blocks. IEEE Software 34-42, 1987.
53. Gargaro, A. et al. Reusability issues and Ada. IEEE Software 43-51,1987.
54. Prieto-Diaz, R. et al. Classifying software for reusability. IEEE Software 6-16, 1987.
55. Prieto-Diaz, R. Implementing faceted classification for software reuse. Commun. ACM 34(5):
88-97, 1991.
56. Ghezala, H. H. B. et al. A reuse approach based on object orientation: Its contributions in the
development of CASE tools. In Proceedings of the SSR'95 Conference, Seattle, WA, pp. 53-62.
57. Bieman, J. M. et al. Reuse through inheritance: A quantitative study of C"'""'' software. In
Proceedings of the SSR'95 Conference, Seattle, WA, pp. 47-52.
58. Bieman, J. M. et al. Cohesion and reuse in an object-oriented system. In Proceedings of the
SSR'95 Conference, Seattle, WA, pp. 259-262.
59. Burton, B. A. et al. The reusable software fibrary. IEEE Software 25-33, 1987.
60. Fischer, G. Cognitive view of reuse and redesign. IEEE Software 60-72, 1987.
61. Tyugu, E. Three new-generation software environments. Commun. ACM 34(6):46-59, 1991.
62. Tyugu, E. et al. NUTAn object-oriented language. Comput. Artif Intell. 5(6):521-542,1986.
II
DATA STRUCTURE IN
RAPID PROTOTYPING
AND MANUFACTURING
CHUA CHEE KAI, JACOB GAN, AND DU ZHAOHUI
School of Mechanical and Production Engineering, Nanyang Technological University,
Singapore 639798
T O N G MEI
Gintic Institute of Manufacturing Technology, Singapore 638075
I. INTRODUCTION 368
A. Data Structure Technique 368
B. Rapid Prototyping and Manufacturing Technology 369
II. INTERFACES BETWEEN CAD A N D RP&M 376
A. C A D Modeling 376
B. Interfaces between CAD Systems 379
C. Interface between CAD and RP&M: STL Format 382
D. Proposed Standard: Layer Manufacturing Interface (LMI) 386
III. SLICING 395
A. Direct Slicing of a CAD File 396
B. Slicing an STL File 397
C. Slicing an LMI File 398
D. Adaptive Slicing 399
IV. LAYER DATA INTERFACES 400
A. Scanning and Hatching Pattern 401
B. Two-Dimensional Contour Format 403
C. Common Layer Interface (CLI) 405
D. Rapid Prototyping Interface (RPI) 406
E. Layer Exchange ASCII Format (LEAF) 407
F SLC Format 409
V. SOLID INTERCHANGE FORMAT (SIF): THE
FUTURE INTERFACE 409
VI. VIRTUAL REALITY A N D RP&M 410
A. Virtual Prototype and Rapid Prototype 410
B. Virtual Reality Modeling Language (VRML) 411
VII. VOLUMETRIC MODELING FOR RP&M 412
REFERENCES 414
3 6 7
368
CHUA ETAL.
I. INTRODUCTION
A. Data Structure Technique
The efficient management of geometric information, such as points, curves, or
polyhedrons is of significant importance in many engineering appUcations,
such as computer-aided design (CAD), computer-aided manufacturing (CAM),
robotics, and rapid prototyping and manufacturing (RP&M). In addition to
representing the objects correctly and sufficiently, a good representation scheme
maps the original data objects into a set of objects that facilitate efficient storage
and computation.
In geometric computing encountered in engineering applications, it is often
necessary to store multiple representations of the same data in order to facilitate
efficient computation of a great variety of operators. Moreover, the same data
may be utilized by categories of users and across heterogeneous systems during
different phases of the product design and manufacturing process; thus more
than one representation may be necessary. Multiple representations incur a
significant overhead to ensure availability and consistency of the data.
A data structure is the form of organization imposed on the collection of
those data elements. It is defined by specifying what kind of elements it contains
and stating the rules of how to store the elements and how to retrieve them when
needed. Also, data structures are the materials from which computer programs
are built, just as physical materials are built from molecules of their component
substances. In engineering computing, they reduce to only a few simple entities,
mainly numbers and characters. Correspondingly, any kind of representation
should be realized with numbers and characters.
Data structures may be classified into linear and nonlinear types [1]. Linear
structures are those elements that have a sequential relationship. For example,
a list of houses along a street, is a linear structure: collections of likewise elements with a clearly defined ordering. Such structures are often represented in
diagrams by collections of boxes with lines to show their relationship. Linear
structures occupy a special place in the study of data structures because the
addressing of storage locations in a computer is nearly always linear, so the set
of memory storage locations in the machine itself constitutes a linear structure.
Linear structures may be further classified as addressable or sequential. Arrays
are important types of addressable structure: a specific element can be retrieved
knowing only its address, without reference to the others. In a sequential data
structure an element can only be reached by first accessing its predecessor in
sequential order. Because much engineering software is devoted to mathematically oriented tasks that involve solving simultaneous equations, arrays and
metrics play a large role in its design. Nonlinear data structures are of varied
sorts. One category important to the software designer is that of hierarchical
structures, in which each element is itself a data structure. One key feature
distinguishes a hierarchy: there is only a single unique path connecting any one
element to another. Given that computer memory devices are arranged as linear strings of storage locations, there is no "natural" way of placing nonlinear
structures in memory.
That more than one data structure type exist means there is no single data
structure type that can be suitable for all applications. Data structure is often
369
selectively designed for the efficiency of storing, retrieving, and computing operators. A good data structure is vital to the reliability and efficiency of a program
or software.
RP&M is a set of manufacturing processes that can fabricate complex
freeform solid objects with various materials directly from a CAD file of an object without part-specific tooling. Despite the fast development and worldwide
installation of 3289 systems in 1998 [2], there is no suitable single information
standard. The RP&M information processing involves transforming the CAD
file into a special 3D facet representation, slicing the geometric form of the part
into layers, generating the contours of the part for each layer, and hatching the
contours of each layer.
B. Rapid Prototyping and Manufacturing Technology
For better appreciation for the need of good data structure techniques to be used
in RP&M, a discussion on the various RP&M processes is in order. RP&M entails the fabrication of an object from its CAD by selectively solidifying or bonding one or more raw materials into a layer, representing a slice of the desired
part, and then fusing the successive layers into a 3D solid object. The starting
material may be liquid, powder, or solid sheets, while the solidification process may be polymerization, sintering, chemical reaction, plasma spraying, or
gluing.
RP&M technology has the potential of ensuring that quality-assured prototypes or parts are developed quickly for two major reasons. There are almost
no restrictions on geometrical shapes, and the layered manufacturing allows a
direct interface with CAD to CAM, which almost eliminates the need for process planning. These advantages of RP&M technology bring the results with
enhancing and improving the product development process and, at the same
time, reducing the costs and time required for taking the product from conception to market. The technology has already shown potential and valuable in
ever-increasing application fields, including the manufacturing/tooling industry,
automobile industry, architecture, and biomedical engineering.
I. RP&M Principle and Processes
Over the past few years, a variety of RP&M techniques has been developed.
RP&M processes are classified into variable energy processes and variable mass
processes [3]. Variable energy processes are those where a uniform mass is
selectively activated, removed, or bonded by a variable energy controlled by the
layer description. These processes include molecule bonding, particle bonding,
and sheet lamination. Variable mass processes are those where a constant source
of energy is used to fuse or solidify a variable mass controlled by the layer
description. These processes are the droplet deposition process, the particle
deposition process, and the melt deposition process.
Liquid Solidification Processes
The photopolymerization processes are those where the optical energy like
laser or UV light driven by layer information solidifies the scanned areas of thin
photopolymer resin layers.
370
CHUA ETAL.
UV Light
Source
Liquid
Surface
Elevator
Curable
Liquid
Formed
Object
Support
Structure
FIGURE I
SLA process.
37 I
Thirdly, the UV Ught is turned on for a few seconds. Part of the resin layer is
hardened according to the photomask. Then the unsolidified resin is collectedly
sunk out from the workpiece. After that, melted wax is spread into the cavities
created after collecting the uncured liquid resin. Consequently, the wax in the
cavities is cooled to produce a wholly solid layer. Finally, the layer is milled to
its exact thickness, producing a flat solid surface ready to receive the next layer.
Particle Bonding Processes
372
CHUAETAL.
Print layer
Spread Powder
Last Layer
Printed
Drop
Finished Part
FIGURE 2
process is repeated until the part is completed. Finally, the part is removed from
the build chamber, and the loose powder falls away. SLS parts may then require
some post-processing, such as sanding, depending upon the application.
Sheet Lamination Processes
Sheet lamination processes are those by which layer information in the form
of electronic signals is used to cut thin layers of sheet material to individual
two-dimensional cross-sectional shapes. Those two-dimensional layers are
bonded one upon another to form a complete three-dimensional object.
A representative sheet lamination process is the laminated object manufacturing (LOM) process, which was first developed by Helisys Inc, USA. The
LOM process is an automated fabrication method in which a three-dimensional object is constructed from a solid CAD representation by sequentially
373
Scanning
Mirrors
SLS Part
Powder
FIGURE 3
laminating the constituent cross sections [11]. The process consists of three
essential phases: preprocessing, building, and postprocessing.
The preprocessing phase encompasses several operations. The initial steps
include generating an image from a CAD-derived .STL file format of the part
to be manufactured, sorting input, and creating secondary data structures.
During the building phase, thin layers of adhesive-coated material are sequentially bonded to each other and individually cut by a CO2 laser beam (see
Fig. 4). The building cycle includes the foUow^ing steps. Firstly, a cross section
Feed roller
FIGURE 4
374
CHUAE7AL.
of the 3D model is created through measuring the exact height of the model
and slicing the horizontal plane accordingly. Secondly, the computer generates
precise calculations that guide the focused laser beam to cut the cross-sectional
outline, the crosshatches, and the model's perimeter. The laser beam power is
designed to cut exactly the thickness of one layer of material at a time. After
that, the platform with the stack of previously formed layers descends and a
new section of the material is advanced. The platform ascends and the heated
roller laminates the material to the stack with a single reciprocal motion, thereby
bonding it to the previous layer. Finally, the vertical encoder measures the height
of the stack and replays the new height to the slicing software, which calculates
the cross section for the next layer as the laser cuts the model's current layer.
These procedures repeat until all layers are built.
In the final phase of the LOM process, postprocessing, the part is separated
from support material. After that, if necessary, the finishing process may be
performed.
Droplet Deposition Processes
CAD Model
Object slicing
Layer
Information
Liquid imaging
Droplet deposition
Layer
Formed
Postprocessing such as
cleaning and support
removal
w
Completed
Objects
FIGURE 5
375
The ballistic particle manufacturing (BPM) process is a typical droplet deposition process [12]. The BPM process uses three-dimensional sohd model
data to direct streams of material at a target. The three-dimensional objects
are generated in a manner similar to that of an ink-jet printer producing twodimensional images.
Melt Deposition Processes
376
CHUA ETAL.
CAD
3D
Solid Triangular
* Validation
tion4
^ Model lApproximatio]
Geometry & Repair
Represen- * Orientation
*Assembly
tation
(STL...) & Nestling
2D/3D Data
* Compensation/
* Scaling
Reverse Engineering
7Y
RP
Readable
Model
*Slicing
*Hatching
*Scanningvectorgenerator
TT
Support
Generator
FIGURE 6
lately or to be created in the future will prove their potential as long as they
overcome the existing shortcomings. In this chapter, the involved manipulations
and related data structures and formats will be reviewed and discussed.
377
As the first tryout, the wire-frame modeling method uses lines and curves in
space to represent the edges of surfaces and planes for providing a 3-D illusion
of the object. When the object is displayed on a computer monitor, it appears
as if wires had been strung along the edges of the object, and they are those
edges being seen. So in a wire-frame modeler, entities such as lines and curves
(mostly arcs) are used to connect the nodes. It is obvious that the data structure
and database are relatively simple with a small amount of basic elements.
However, a wire-frame model does not have all the information about
a designed object. The most obvious drawback is that such a representation
includes all the "wires" necessary to construct the edge lines of every surface
and feature, even if those are normally hidden from view. As a result the image is
often confusing and difficult to interpret. Although it works in a limited extent,
the removal of the hidden-lines involves making some assumptions regarding
the nature of the nonexistent surfaces between the frames. If a wire-frame model
is used to build a part by RP&M, the systems would be confused as to which
side of the part should be solidified, sintered, or deposited because the frame
shows no direction. Furthermore, slicing manipulation in RP&M, which cuts
the 3-D model into a series of 2-D contours, would meet a large difficulty
when neither plane nor surface exists in wire-frame model to intersect with the
cross sections. What would be received would only a set of points instead of
contours.
Partly due to the difficulties of interpreting wire-frame images and partly
due to the lack of a full description of an object's information, this method has
not been well utilized by industry. Therefore, a wire-frame modeler is declared
not to be suitable for any commercialized RP&M systems. This brings us to
the concept of surface modeling and to the next module.
2. Surface Modeling
The addition of surface patches to form the faces between the edges of a
wire-frame model is an attempt to put more information about the designed
object into the computer model. CAD systems that can include surfaces within
the description of an object in this way are called surface modelers. Most of
the surface modeling systems support surface patches generated from standard
shapes such as planar facets, spheres, and circular cylinders, which is not enough
to represent arbitrary shapes and thus often the case with typical parameteric
378
CHUA T/\L.
surface forms such as Bezier surfaces, B-spline surfaces, and nonuniform Bspiine (NURBS) surfaces.
In order to store information about the inserted surfaces, some CAD systems make use of a face Ust in addition to the node and entity Hsts. Each face
Hst might consist of the sequence of the numbers of the entities that bound the
face. The data structure is such that entities are defined as sets of nodes and
faces are defined as sets of entities. For many applications, such as the aircraft
industry and the design of car body shells, the ability to handle complex surface
geometry, which is often called a sculptured surface for function or aesthetic
reasons, is demanded. Surface modelers have sufficient sophistication in their
surface types to allow these requirements to be satisfied while solid modelers
are often restrictive and not needed.
Although surface modeling is simple in data organization and highly efficient, it still holds partial information about the designed object. The biggest
question concerns the lack of expUcit topological information, which makes
certain validity checks with surface models difficult. The drawback frequently
happens while using STL format as an interface between CAD and RP&M, in
which the STL file can be regarded as a surface model with triangular facets to
represent an object. Topological information is not provided in surface modeling to indicate how the connection is among the primitives that make up the
object. Topological information may also be serviced to check the validation of
data in objects. Normally, it is unnecessary for a surface model to be constrained
as a close space. For RP&M application this will bring confusions as to which
side is filled with material and which side is empty. Moreover, a model with open
surfaces cannot be processed in RP&M. A solid modeling system is intended to
hold complete information about the object being designed for that purpose.
3. Solid Modeling
3 79
380
CHUA T/\L.
381
2. All the support structures required are generated in the CAD system
and sliced in the same way before the CAD model is converted into
HP/GL files.
3. Standard for the Exchange of Product Data Model (STEP)
The standard for the exchange of product data model, STEP, is a new
engineering product data exchange standard that is documented as ISO 10303
[18]. The aim of STEP is to produce a single and better standard to cover all
aspects of the product life cycle in all industries. European Action on Rapid
Prototyping (EARP) is working on using STEP as a tool for data transfer from
CAD to 3D layer manufacturing systems [19].
The reasons why STEP is recommended to be used as the interface between
CAD and RP&M are given as follows:
1. It will be an international standard format to exchange product data
and be supported by all CAD systems.
2. Since it is a complete representation of products for data exchange, the
information in STEP is enough for the data exchange from CAD to 3D
layer manufacturing systems.
3. It is efficient in both the file size and computer resources needed for
processing.
4. It is independent of hardware and software.
However, STEP still has some disadvantages as the interface between CAD and
RP&M.
1. It still carries much redundancy information that may be not necessary
to RP&M.
2. New interpreters and algorithms must be developed to transfer data to
rapid prototyping and manufacturing systems.
4. CT/MRI Data
382
CHUA TAL.
process CT data. Currently, there are three approaches to make models out of
CT scan information: through CAD systems, STL interfacing, and direct interfacing [20], Besides a CT scanner, MRI (magnetic resonance imaging), ultrasound imaging, X-ray imaging, etc. may all be tools to generate the layered
images that represent the human organs and also can be reconstructed into
what they represent. Recently, the most successful reverse engineering cases
that build human parts by RP&M with the layered image data are from a CT
or MRI scanner.
C. Interface between CAD and RP&M: STL Format
I. STL File
The STL file, introduced by 3D Systems, is created from the CAD database
via an interface to CAD systems [5,21,22]. It is a polyhedral model derived
from a precise CAD model by a process called tessellation. This file consists of
an unordered list of triangular facets representing the outside skin of an object.
The STL has two file formats, ASCII format and binary format. The size of an
ASCII STL file is larger than that of the same binary format file but is human
readable. In a STL file, triangular facets are described by a set of X, Y, and Z
coordinates for each of the three vertices and a unit normal vector to indicate
the side of the facet that is inside the object as shown in Fig. 7.
The following is a facet representation as an example trimmed from an STL
file that consists of a series of similar facets.
solid Untitledl
facet normal 9.86393923E-01
1.6439899lE-01 O.OOOOOOOOE+OO
outer loop
vertex 9J3762280E-01
7A0301994E-01
135078953E+00
vertex im078931E+00
5,78139828E-01
135078953E+00
vertex l,00078931E+00
5.78139828E-01
3,50789527E-01
endloop
endfacet
The accuracy level of a facet model represented by the STL format is controlled by the number of facets representing the model. Higher part accuracy
normal
vertex
vertex
vertex
F I G U R E 7 Facet model: (a) Triangular representation and (b) single facet (with permission from
Springer-Verlag).
383
Except for the flaws of redundant information and incomplete representation, all the problems mentioned previously would be difficult for slicing
384
CHUA ETAL.
F I G U R E 8 Edge and vertex redundancy in the STL file: (a) Duplicate edges and vertices in an STL file
and (b) coincident edges and vertices are stored only once (with permission from Springer-Verlag).
Crack
Slice plane
Stray scan-vectors
F I G U R E 9 Cracks in the STL causes lasers t o produce stray scan vectors: (a) Facet model with a
crack and (b) cross-sectional view of a slice (with permission from Springer-Verlag).
F I G U R E 10
(a) Correct and (b) incorrect orientation (with permission from Springer-Verlag).
FIGURE I I
(a) Correct and (b) incorrect normal (with permission from Springer-Verlag).
FIGURE [2
385
algorithms to handle and would cause failure for RP&M processes which essentially require a valid tessellated solid as input. Moreover, these problems
arise because tessellation is a first-order approximation of more complex geometric entities. Thus, such problems have becomes almost inevitable as long
as the representation of the solid model is made using the STL format, which
inherently has these limitations.
Instead of canceling the non-error-free STL format, many efforts have been
made to find out the invalidation within a STL file and repair the faceted geometric model [23-26] generalizing all STL-files-related errors and proposing a
generic solution to solve the problem of missing facets and wrong orientations.
The basic approach of the algorithm would be to detect and identify the boundaries of all the gaps in the model. The basis for the working is due to the fact
that in a valid tessellated model, there must be only two facets sharing every
edge. If this condition is not fulfilled, then this indicates that there are some
missing facets, which cause gaps in STL files. Figure 13 gives an example of
such a case. Once the boundaries of a gap are identified, suitable facets would
then be generated to repair and "patch up" the gaps. The size of the generated
facets would be restricted by the gap's boundaries while the orientation of its
normal would be controlled through comparing it with the rest of the shell.
This is to ensure that the generated facet orientation is correct and consistent
throughout the gap closure process, as shown in Fig. 14.
Some RP&M system vendors have built up the functions inside their data
processing software, like SGC from Qubical, Israel. In the multifunctional
Rapid prototyping system (MRPS) developed by Tsinghua University, China,
topological information is generated through reading an STL file. This information is not only useful for data checking and validation, but also helpful
for shortening the time on slicing with an upgraded algorithm, by which it
becomes unnecessary for searching all the facets in the file when slicing one
layer.
Besides RP&M vendors, several software companies supply products
based on STL format. Many functions include generation, visualization, manipulation, support generation, repair, and verification of STL files. Materialise
386
CHUA ETAL.
F I G U R E 14 A repaired surface with facets generated to patch up the gap (with permission from
Springer-Verlag).
(Belgium) provides almost all the operations in the Magic module and is popularly spread across the RP&M world. Imageware (MI, USA) offers its users
several competitive advantages for the creation, modification, and verification
of polygonal models. It begins every prototyping process by allowing the import and export of IGES, STL, DXF, ASCII, VRML, SLC, and 3D measurement
data.
387
1. Facet modeling should be supported. The facet model as used in the STL
format is accepted by nearly all RP&M systems. The STL file is, so far, the most
common way to send a CAD model to rapid prototyping and manufacturing
systems because it is a simple representation of the CAD model and is able to be
generated by most CAD systems. Therefore, the newly proposed format should
support the facet model. The optimization of the facet model (STL file) is necessary in the new format to avoid defects in the STL format. The improvements
and intended improvements of the LMI format are given as follows:
Add topological information. In the STL format, only unordered vertices with facet normals are provided to describe the faceted models. A vertex is
the simplest geometric entity that contains little topological information. In the
LMI format, the topological information is supplied by adding new geometric entities and topological data that refer to the topological relations between
geometric entities.
Remove redundant information. The STL file consists of a lot of unnecessary repeated information such as sets of coordinates and strings of text.
Normals in the STL can also be omitted because they can be derived from other
information.
Repair errors existing in the STL format like cracks, nonmanifold topology, and overlaps.
2. Support precise models. With increasing experience in data transfer between CAD systems and 3D layer manufacturing systems, it is obvious that
preservation of geometric accuracy and geometric intent is important. The
problems occur especially with high-accuracy downstream processes such as
slicing and high-accuracy machines. Therefore, the LMI format should support
precise models.
3. Be flexible and extensible. Because CAD technology, especially solid
modeling, and RP&M are two developing areas, flexibility and extensibility
of the interface between CAD and 3D layer manufacturing systems must be
considered to meet future needs.
4. Be relatively easy to be implemented and used. The format should be
easy to be created robustly by CAD systems and processed in the downstream
slicing process.
5. Be unambiguous. The LMI format should not contain ambiguous information.
6. Be independent of computer platforms and rapid prototyping and manufacturing processes and commercial systems. This format should be a neutral
file so that it can be created by commercial CAD systems.
7. Be as compact as possible. The LMI format should not contain redundant information that would make a file unnecessarily large.
2. Boundary Representation
388
CHUAETAL.
F I G U R E 15
'
F I G U R E 16
389
390
CHUAETAL.
object
1
j1
Shell
1
\f 1
Triangular Facet
Plane
'M
Loop
1 Edge [
Line
1
T2
P(^it-lf 1
F I G U R E 17
Verlag).
Data structure of the facet model in the LMI format (with permission from Springer-
facets. The data structure depicted in Fig. 18 will be mapped into the LMI
format file.
From the topological viewpoint, the facet modeling technique (indicating
triangular facet model) models the simplest solids that have a closed oriented
surface and no holes or interior voids. Each facet is bounded by a single loop of
adjacent vertices and edges; that is, the facet is homeomorphic to a closed disk.
Therefore the number of vertices V, edges , and facets F of the solid satisfy
vert2
vertl
facet 1
vert2
facet2
vertl
F I G U R E 18 Adjacency relationship of the facet model in the LMI format: (a) diagram and (b) storage
allocation description (see also Fig. 19).
391
+ F =2.
(1)
This fact can be used to check the correctness of the faceted model, especially
for checking errors such as cracks or holes and nonmanifolds.
4. Description of LMI Format
The LMI format is a universal format for the input of geometry data to
model fabrication systems based on RP&M. It is suitable for systems using the
layer-wise photocuring of resin, sintering, or binding of powder, the cutting
of sheet material solidification of molten material, and any other systems that
build models on a layer-by-layer basis.
The LMI format can represent both the precise model and the triangular
faceted model derived from the STL file. It can be divided into two parts: header
and geometry.
In B-rep, the fundamental unit of data in the file is the entity. Entities
are categorized as geometric or nongeometric. Geometric entities represent the
physical shapes and include vertices, faces, edges, surfaces, loops, and facets.
The nongeometric entity is the transformation matrix entity. All geometrical entities are defined in a three-dimensional Cartesian coordinate system called the
model coordinate system. The entity is defined by a series of parameters based
on a local coordinate system and a transformation matrix entity to transfer
entity to the model coordinate system if needed. The parameterized expression
of a general entity is given as
an
ai2
^i3
^14
^21
^22
^23
^24
^31
<^41
^^32
^42
^33
^43
<^34
^44
(2)
In facet modeling, vertices, edges, and facets are the three fundamental entities.
Transformation matrices are unnecessary in the facet model. Therefore, the
LMI format includes four sections for a facet model. Each section, except for
the Header Section, starts with a keyword specifying the entity type in the section and an integer number specifying the number of the entities in the section. In
the section, the data of an entity are a record. The section is ended by two words
END and a keyword that is the same as the keyword at the beginning of the section. The four sections in the LMI format are Header, Vertex, Edge, and Facet.
The Header Section is a unique nongeometry section in the LMI file and
designed to provide a human readable prologue to the file. The Vertex Section
includes geometrical information of all vertices in the facet model. Each vertex
is defined by its coordinates in 3D space. The Edge Section is a major section
for an edge-based B-rep modeling. It not only gives the geometrical information
about all the edges but also contains adjacency relationships between entities.
In facet modeling, it contains the information of the adjacent relationship of
edges and facets as well as edges and vertices. The Facet Section is a collection
of all facets in the facet model. The facet is defined by the edge indices.
392
CHUAETAL
F I G U R E 19
{
int vertex_index;
double x, y, z;
} Vertex;
Header Section
The Vertex Section consists of X, Y, and Z coordinates for all vertices in the
facet model. A vertex is a record. Figure 19 shows the vertex entity description.
The content of the Vertex Section in a LMI format file is given as follows:
Vertices: number of the vertices
No, X-coordinate
Y-coordinate
Z-coordinate
1 9,547921e+01 O.OOOOOOOe+00 O.OOOOOOe+OO
2 7.104792e+02 4,5000000e+00 OmOOOOe+00
End Vertices
Edge Section
The Edge Section in the LMI format provides information of all the edges
in a facet model. Each edge includes an edge index, two vertex indices, and two
facet indices associated with it. Each edge is composed of two half-edges that
have directions according to the vertices of the edges, vertl^^vertl associated
with facetl and vert2->vertl associated with facet! as shown in Fig. 20.
edge-index
vertl
vert2
facet2
facetl
F I G U R E 20
393
facet-index
half_edgel
half_edge2
half_edge3
F I G U R E 21
int facet_index;
inthalf_edgel,
half_edge2,
half_edge3;
} Facet;
Facet Sect/on
The Facet Section of the LMI facet model is used to define facets in the
model. Each facet is composed of three half-edges. Each half-edge is specified
by the edge index and the edge side according to facet fields in the edge, facet
field facetl associated with the half-edge vertl->vert2 and facetl with the halfedge vertl^-vertl. These half-edges are arranged based on the right-hand screw
rule. Figure 21 shows the facet entity description.
5. Comparison of the LMI Format with the STL format
(3)
394
CHUAETAL.
Routine begin
Open a STL file.
If it is not the end of the file
Read data for a facet.
For each of the three vertices
Check if the vertex exists.
If exists
Retrieve the pointer to the vertex.
Else
Create and store pointer,
end if
For each of the three pointers to vertices
Check if an edge exists between each pair of the vertices.
If exists
Retrieve the edge pointer.
Else
Create and store the edge,
end if
Create the facet with the three edge pointers.
Else
Output the LMI format file.
end if
End the routine,
m i
half-edges. Then
F = 3*(l/2) = 3 / 2 .
(4)
For the STL format, each triangle is described by a set of X, Y, and Z coordinates for each of the three vertices. Each coordinate is a double-precision value
that requires eight bytes. Therefore, the size of the storage used to describe
a facet in the STL format requires 72 bytes (3 vertices x 3 doubles/vertex x
8 bytes/double). Hence, the total size of triangles in the STL file is 72f, not
including normal description and other characters for specification. Then
Ss = 72F.
(5)
For the LMI format, all vertices are described by three coordinates in the Vertex
Section. The edges and facets are described respectively in the Edge Section
and Facet Section by index values. An index value usually requires two bytes.
Therefore, referring to Fig. 19, Fig. 20, and Fig. 2 1 , the sizes of Vertex Section,
Edge Section, and Facet Section are 26 V bytes (2 bytes -h 3 vertices x 8 bytes/
vertex), lOE (2 bytes+ 2 edges x 2 bytes/edge-f-2 facet x 2 bytes/edge), and
8F (2 bytes+ 3 edges x 2 bytes/edge) respectively. Therefore, the total size of
395
600000
500000
400000
I 300000
(0
200000
100000
0
136
224
364
536
652
848
1088
1896
Number of triangles
H^g
FIGURE 23
Comparison of sizes of STL (4) and LMI () format files. A, B-rep model.
(6)
= 36F + 52.
(7)
(8)
For the simplest solid object, the tetrahedron, F =4. However, usually, it is
much larger. Therefore, the size of the facet model in the LMI format is almost
half the size of the STL format. However, the real size of the LMI format file is
almost one-fourth of the size of the STL format file as shown in Fig. 23.
Figure 23 shows the comparison of the file size of STL format files with
the size of corresponding LMI format files. As illustrated in the graph, the file
sizes of both STL facet models and LMI facet models increase with the increase
in the number of triangles. However, the size of the B-rep model only depends
on the complexity of the part; it is not related to the number of triangles. In
addition, not only do both the sizes of the STL and LMI facet models increase
with the number of the triangles, but also the size of the LMI format is always
about one-fourth of the size of the STL format when the same part has the
same number of triangles. This is because there is still much other redundant
information in the STL file like normal and text.
III. SLICING
Rapid prototyping relies on "slicing" a 3-dimensional computer model to get a
series of cross sections that can then be made individually, typically 0.1-0.5 mm
in intervals. Slicing is the process of intersecting the model with planes parallel
396
CHUA TAL.
to the platform in order to obtain the contours. There are various ways to make
the shces, each of which has its own advantages and Umitations.
A. Direct Slicing of a CAD File
In RP&M, direct sHcing refers to sHcing the original CAD model. In this way,
one obtains the contours without using an intermediary faceted model such as
an STL file. Direct slicing has many benefits besides avoiding an intermediary
representation. One important benefit is that it makes it possible to use techniques embedded in the application program that results in parts that exhibit a
better surface finish, which comes from a better precise contour sliced with the
exact CAD model. The computation involves a geometry of the part designed
of higher order than that of its triangular description. An approach has been
proposed by Guduri etaL [31,32] to directly slice a model based on constructive
sohd geometry (CSG). In the process the primitives in the CSG assembly are
sliced individually, generating a cross section for each primitive. The whole contour in the sliced layer is then calculated through combining the primitive slices
based on the same Boolean operations as what connects amongst the primitives.
The contour of the part in each layer is a collection of piecewise continuous
curves. A similar method for manipulating CAD parts in a B-rep or surface
model has been introduced. No matter which modeling method is used, the
contour finding is realized through solving the degraded equations. For common quadric surfaces (sphere, cones, torus, and ellipsoids), the surface-plane
intersection calculation is exact by neglecting the round-off error. For example,
the description equation for the boundary of a sphere is listed as
X^ + Y^ + Z^ = R^
(9)
where X^Y^ Z represent the coordinates of a point on the surface of the sphere
that is centered at the original point (0,0,0) with a radius of R.
A given plane with height hi that is adopted to slice the sphere could be
generalized as
Z=hi.
(10)
For computing the intersection of the sphere and plane, it is just needed to
replace Z in (9) with Eq. (10). The result is then
X^ + Y^ = R^-hf,
(11)
which shows the intersection is a planar circle described by a definite formulation, which is more compact and accurate than an approximate representation.
For other surface catalogues, the method requires an approximation of the slice
contour of the primitive. Such an approximation is still much more accurate
and efficient than the linear approximations obtained from a faceted model.
It is obvious that this intersection calculation method overdepends upon
the knowledge of geometric representation in individual CAD systems. Slicing,
on the other hand, requires very good process knowledge and is dependent on
a user's specification. It also alters the relationship between the user and the
manufacturer. Therefore, slicing should be carried out by an expert normally
located at the manufacturing site instead of CAD vendors or RP&M users. This
397
(Xi,Yi,Zi)
F I G U R E 24
relationship will bring to RP&M many of the problems that could be avoided
by using a data exchange interface between incompatible CAD and RP&M
systems. However, an STL file can be generated so easily by users with little or
even no knowledge of the RP&M process that many efforts have been made to
slice an STL file.
B. Slicing an STL File
The operation of an STL file is classified as an indirect slice as the result of the
difference from slicing with a CAD model. Figure 24 illustrates the procedure of
the operation. The triangular facet used in the STL format simplifies the calculations further, since each facet is exactly convex. Mathematically, determining
the intersection of a triangular facet and a slicing plane reduces to computing
the intersection of the slicing plane with each of the three line segments that
define the triangular facets. If the coordinates of the facet vertices are denoted
(X/, 1^, Z/, / = 1 , . . . , 3) and the slicing plane is given by Z = ZQ^ where ZQ is
the height of the slicing plane, then the coordinates of the intersection point are
obtained by solving the equation
^o
^i
^i+i Z|
XQ
X^-
X/+1 Xi
^o-^i
Yi^\ Yj
(12)
where (Xo, YQ^ ZQ) are the coordinates of the intersection point. Normally there
will be two such intersection points for each slicing plane. These intersection
points are then assembled in the proper order to produce the approximate
planar contour of the part. It is obvious that the result from slicing a STL file is
only polygonal contours in terms of a series of loops at each layer. Each loop is
described by listing the vertices that compose the loop, and ordered according to
398
CHUAETAL
the right-hand rule (ccw for outer loops and cw for inner loops). This approach
may meet some special cases, namely degenerated facets, where a vertex, edge,
or entire facet He in the slicing plane, so that the algorithm should be robust
enough to handle them.
C. Slicing an LMI File
Anyway, the random method mentioned above is not efficient since to generate
each contour segment during the march is to intersect sequentially each facet
with the slicing plane without considering whether the intersection may happen.
The performance can be improved by utilizing topological information. Rock
and Wozny [33] and Crawford et aL [34] have implemented another approach
to the slicing facets model based on abstraction of the topological message from
the input file and explicitly representation of adjacenct facets through a faceedge-vertex structure. The enhanced method utilizes the edge information and
generates contours by marching from edge to edge. Each edge must reference its
two vertices and the two facets that define the edge. Each facet must reference its
neighboring faces and three edges. The method saves time on searching for the
sequential facet to be sliced while other irrelevant facets are put aside untouched
temporarily during current slicing. Given Rock's test case performance, it is
clear that the slicing algorithm enables it as an on-line operation at the process
controller during building.
The slicing of a LMI file is a topological information-based method. It is
briefly described using a flow chart in Fig. 25.
Compare the slicing algorithm based on the STL format and that on the
LMI format in terms of complexity:
Complexity of the STL-based algorithm = S*n + S*/7,
(13)
where S is the number of slices, n the number of facets in the model, and p the
average number of points in each slice.
The algorithm in Fig. 25 shows that the searching goes from the first facet to
the last facet for each slicing. Hence, the complexity of the algorithm searching
for triangles that intersect the z plane is S*n. S*p is the complexity of the
algorithm sorting intersection lines and linking them together, such that
Complexity of the LMI based algorithm = n-\- anS = n(l + aS),
(14)
where a <^ 1 and a is used to represent the average of the proportions over
the slices. The first n occurs during the first slice in the worst case; all n facets
must be checked for intersection with the slice. Subsequently, only these facets
connected directly to the facets being cut need to be checked, and their number
can only be a small proportion of n.
Since a <^ 1^ n(l -\- aS) <^ nS ior large , with S > 1. Comparing formula
(13) with (14), the cost of the slicing algorithm based on the STL format is
much higher than that based on the LMI format.
The LMI format is smaller than the STL format: in the slicing algorithm
based on the LMI format, the memory used to store a facet model can be easily
freed step by step by freeing the space used to store facets that have been sliced.
Therefore, the algorithm based on LMI uses less memory to process.
399
V
z=min z to max z
facet = the first facet to the last facet in set A
find lines of intersection
if the facet is below the next z plane, delete it, free it from
memory and put facets related to it at the corresponding
place in set A.
Free memory
F I G U R E 25
Summarily, the slicing algorithm based on the LMI format will cost less in
both space and processing time than those based on the STL format.
D. Adaptive Slicing
A lot of efforts have been made continuously on improving the accuracy and
efficiency of slicing. One of the results of the research is adaptive slicing [35].
The main objective is to control the staircase effect by a user-supplied tolerance.
The basic principles of adaptive slicing can be generalized as that vertical and
400
CHUA T/\L.
Building direction
Local normal
^;r
F I G U R E 26
Comparison of (b) adaptive slicing and (a) slicing with constant layer thickness.
near-vertical features are built with thick layers while features with a bigger p
(the angle between the local normal and building direction) are built with thin
layers (as shown in Fig. 26).
Most of the adaptive slicing methods produce unnecessary layers that contribute to increasing fabrication times without improving the overall quality
of the part surface; thus they are seldom commercialized. Tyberg and Bohn
present an approach of fabricating each part and feature independently so that
each thickness is commonly derived from the one part or feature existing at that
height whose surface geometry requires the thinnest layer to meet a tolerance
criterion [36].
An additional benefit is that one obtains a part with the desired surface
finish using the minimal number of layers. Despite these benefits, this building
technique cannot be used yet in practice. In some cases, this is due to the
underlying characteristics of the RP&M processes. For example, in the FDM
process, the layer thickness must be adjusted manually by the machine operator.
In the LOM process developed and commercialized by Helisys, the thickness
of the layer is determined by the current thickness of the sheet. In other cases,
the problem is to find the optimal parameters for a given layer thickness. Let
us take, for instance, the SLA. Although the SLA systems make it adjustable,
both the manufacturer and the supplier of resins specify building parameters
for the layer thickness within limited options.
401
-r
V
X
F I G U R E 27
C2
402
CHUA 7 AL.
F I G U R E 28
Boundary scanning: (a) segnnent contour and (b) generated continuous contour.
based on the fact that the scanner should be driven to follow the prescribed
path as soon as possible, limited by available torque.
In Wu's tracking control strategy, the minimum time optimal control problem with specified path and limited control torque is formulated. The algorithm uses information about the curvature of the contour to determine the
appropriate laser parameter to achieve the desired power density. A polygonal
approximation of the contour, such as that obtained from slicing a faceted part
model, is not accurate enough to support this scheme. Nevertheless, it is still
possible that an appropriate continuous contour is obtained through smooth
interpolation of the polygonal segment or adding an arc at sharp corners to
allow near constant tracking speed, shown as Fig. 28.
3. Model-Based Scanning
Model-based scanning is characterized as adopting unfixed scanning directions compared to raster scanning style. It is proved that besides many other
control parameters, the solidification sequence in each RP&M process will have
an effect on the physical performance of the final part. For example, initial investigations [38] indicate that fabrication of a metal part with SLS will require
local control of laser beam parameters, allowing these parameters to change
from layer to layer or even within different areas in a given layer. Such modelbased scanning is depicted conceptually in Fig. 29, where a part layer has been
divided into several regions based on part quality predictions from a physical
model of the process.
F I G U R E 29
Model-based scanning.
403
!
I
m ^ l
F I G U R E 30
404
CHUA ETAL.
405
SLC is 3D Systems' contour data format for importing external slice data,
and SLI is the company's machine-specific 2D format for the vector commands
that control the laser beam.
The Cubital Facet List (CFL) format is based on a polygon-based representation consisting of w-sided polygons that can have multiple holes. The format
avoids redundant vertex information and maintains topological consistency.
CFL consists of a header and fields containing the total number of vertices
(points) and facets in the object, a numbered sequence of vertex coordinates,
numbered facets (with a number of holes), and pointers back to their respective
vertices.
C. Common Layer Interface (CU)
The CLI format was developed in a Brite EuRam project (Basic Research in
Industrial Technologies for Europe/European Research on Advanced Materials)
with the support of major European car manufacturers. It is a universal format
for the input of geometry data to model fabrication systems based on RP&M.
The CLI format is intended as a simple, efficient, and unambiguous format for
data input to all 3D layer manufacturing systems, based on a two-and-a-halfdimensional layer representation. It is meant as a vendor-independent format
for layer-by-layer manufacturing technologies.
The CLI file can be in binary or ASCII format. In a CLI format, the part
is built by a succession of layer descriptions. The geometry part of the file is
organized in layers in ascending order. Every layer is the volume between two
parallel slices, and is defined by its thickness, a set of contours, and hatches
(optically). Contours represent the boundaries of solid material within a layer,
and are defined by polylines.
The CLI format has two kinds of entities. One is the polyline. A polyline
is defined by a set of vertex points (x, y), connected contiguously in listed
order by straight line segments. The polylines are closed, which means that
they have a unique sense, either clockwise or counterclockwise. This sense is
used in the CLI format to state whether a polyline is on the outside of the part or
surrounding a hole in the part. Counterclockwise polylines surround the part,
whereas clockwise polylines surround holes. This allows correct directions for
beam offsetting.
The other is the hatch, which is a set of independent straight lines, each
defined by one start point and one end point. One of the purposes of the hatch
is to distinguish between the inside and outside of the part. The other is that
hatches and open polylines are used to define support structures or filling structures, which are necessary for some 3D layer manufacturing systems like SLA,
to obtain a sofid model.
The advantages of the CLI format are presented as follows:
1. Since the CLI format only supports polyline entities, it is a simpler
format than the HP/GL format.
2. The slicing step can be avoided in some applications.
3. Errors in layer information are much easier to correct than those in 3D
information. Automated recovery procedures can be used, if required;
and editing is also not difficult.
406
CHUA ETAL.
407
Entity
Schema Variable
Name
TT^
Type
Definition
Int '
>0
J"?"
FIGURE 3 I
Type
Definition
Variable Type
NT,FLT,DBL,
STR, BOOL,
BITn
Type
Definition
408
CHUA ETAL.
LMT-file
^ - Part
^^ Layer
^ - 2D Primitives
In this example, the object is a LMT-file. It contains exactly one child, the
object PI. PI is a combination of two parts, one of which is a support structure
and the other is P2, again a combination of two others. The objects at the leaves
of the treeP3, P4, and Smust have been, evidently, sliced with the same z
values so that the required operations, in this case or and binary-or, can be
performed and the layers of PI and P2 are constructed.
In LEAF, some properties, like support-structure and open, can also be
attached to layers or even polyline objects, allowing the sender to represent the
original model and the support structures as one single part. In Fig. 33, all parts
inherit the properties of object, their ultimate parent. Likewise, all layers of the
object S inherit the open property, indicating that the contours in the layers are
always interpreted as open, even if they are geometrically closed.
Advantages of the LEAF format include the following:
1. It is easy to be implemented and used.
2. It is not ambiguous.
3. It allows for data compression and for a human-readable
representation.
4. It is both CAD system and LMT process independent.
5. Slices of CSG models can be represented almost directly in LEAF.
6. The part representing the support structures can be easily separated
from the original part.
The disadvantages of the LEAF format are the following:
1. A new interpreter is needed for connecting the 3D layer manufacturing
systems.
2. The structure of the format is more complicated than that of the
STL format.
3. The STL format cannot be translated into this format.
(LMT-file
(name Object) (radix 85) (units 1mm)...
(Part (name PI)...
(binary-or (Part (name S) (support-structure) (open)...
(Layer...))
(Part (nameP2)...
(or (Part (name P3))...
(Layer (name ...) (polyline ...)))
(Part (nameP4)...
(Layer (nameP4_Ll) (polyline...)))
)
P]
PI
^/
^
P^
P2
V
^
P4
P4-L1
)))
FIGURE 33
Object
409
The LEAF format is described in several levels, mainly at a logical level using
a data model based on object-oriented concepts, and at a physical level using a
LlSP-like syntax as show^n in Fig. 33. At the physical level, the syntax rules are
specified by several translation phases. Thus defined, it allows one to choose at
which level interaction with LEAF is desirable, and at each level there is a clear
and easy-to-use interface. It is doubtful if LEAF currently supports the needs
of all processes currently available but it is a step forward in that direction.
F. SLC Format
The SLC file format (3D Systems, 1994) was generated by 3D Systems for representing cross-sectional information [43]. It is a 2V2D contour representation
and also can be used to represent a CAD model. It consists of successive cross
sections taken at ascending Z intervals in which solid material is represented
by interior and boundary polylines. SLC data can be generated from various
sources, either by conversion from solid or surface models or more directly
from systems that produce data arranged in layers, such as from CT scanners.
The SLC format only contains the polyline that is an ordered list of
X-Y vertex points connected continuously by each successive line segment.
The polyline must be closed whereby the last point must equal the first point
in the vertex list.
One of the strengths of the SLC format is that it is a simple representation of
the solid object. In addition, the SLC format is directly accepted by rapid prototyping and manufacturing systems and does not need to be sliced in some cases.
However, the major weakness of the SLC format is that it can only approximately represent solid objects.
Through the comparison of several currently existing neutral formats and
several proposed formats used for RP&M systems, it is clear that each format
has its limitations in different aspects. Because of these, many efforts for improving the information process in RP&M and also for catching up with the
latest development in engineering have been carried out. The next section will
introduce these explorations, including solid interchange format, virtual reality
modeling language, and volumetric modeling to support RP8cM.
410
CHUA T/\L.
limitations of STL, the data representation capabilities of SIF must accommodate expected capabilities of future RP&M systems, including the use of
multiple materials, gradient material properties, part color specification, nonplanar build layers, explicit fiber directions, surface roughness, tolerance, and
embedded foreign components. Also SIF is designed to have strong extensibility
and additional annotation capabilities.
As with the proposed improvements to the STL-based data transfer, development and acceptance of an alternative RP&M data transfer mechanism will
require support of both RP&M systems developers and CAD vendors. Modifications would be required within both CAD and RP&M vendor products
to implement a new data interface. The proposed approach is to standardize
an industry-consensus SIF through the ANSI/ISO process when the technical
concepts have matured and RP&M industry support is formed. The basis for
this representation might be provided by the ASIC save file format, since ASIC
has been incorporated as the geometry engine for many commercially available
geometric modeling systems.
Although it has been announced to be under development and liable to
change, contents of the SIF/SFF (SFF here represents RP&M) already include
four major categories: lexical conventions, grammar, semantics, and example, whose detail is available on the Web site of http;//http.cs.berkeley.edu/
~ jordans/sif/SIF_SFF.html.
Another language, L-SIF (Layered Solid Interchange Format), can be developed to describe 2^D layers. Based on this standard, a slicer for generating
the layered description from the 3D description can be developed. Other translators might be developed to perform certain transformations on layer-based
data such as that obtained from laser digitizing measurements, CT, and MRI.
4 I I
base supported by CAD systems to supply the engineering detail while VR will
be linked to the CAD systems for the consideration of aesthetic, communication
and optimization.
SolidView, a software package for RP&M, designed to facilitate the preparation of STL files for RPScM, also contains the facilitation of 3D communication through the use of virtual prototypes. SoildView's viewing, measuring,
and annotating capabilities allow the user to create an electronic design review
using a virtual prototype. In conjunction with STL files, since each view has
its own set of measurement and annotations, users can literally "walk around"
the design, showing areas of change or special concern. Also furthermore, it is
possible to share the virtual prototype, in the form of a specific file, to anyone
on the Internet.
B. Virtual Reality Modeling Language (VRML)
The Virtual ReaHty Modeling Language (VRML) is a language for describing
interactive 3D objects and worlds. VRML is designed to be used on the Internet, intranets, and local client systems. VRML is also intended to be a universal
interchange format for integrated 3D graphics and multimedia. VRML may be
used in a variety of application areas such as engineering and scientific visualization, multimedia presentations, entertainment and educational titles, Web
pages, and shared virtual worlds. All aspects of virtual world display, interaction, and internetworking can be specified using VRML. It is the intention of its
designers that VRML become the standard language for interactive simulation
within the World Wide Web. VRML is based on the Open Inventor ASCII File
Format from Silicon Graphics, Inc., which supports descriptions of computer
graphics 3D scenes with polygonally rendered objects, lighting, materials, ambient properties, and realism effects. The first version of VRML allows for the
creation of virtual worlds with limited interactive behavior. These worlds can
contain objects that have hyperlinks to other worlds or data objects. When the
user selects a link to a VRML document from within a correctly configured
World Wide Web browser, a VRML viewer is launched for navigating and visualizing the Web. Future versions of VRML will allow for richer behaviors,
including animation, motion physics, and real time multiuser interaction.
The Bremen Institute for Industrial Technology and Applied Work Science (BIBA) has opened a Web dialog on replacing the STL format with
the Virtual Reality Modeling Language (VRML) format (http://www.biba.unibremen.de/users/bau/s2v.html). VRML will help to do the following:
Improve the communication and information exchange in the process
chain. As the different steps involved in the RP&M process chain are often
done by different companies and even to meet different requirements, there is a
need of communication and sharing information among different sites. By using
a standard like VRML, which is not restricted to RP, many standard software
tools, mostly independent from a specific hardware platform, are available,
mostly as shareware.
Lower the barrier for RP & M application. VRML can be used as a
replacement for STL with a slight barrier. It has a node structure. When they
are ever needed, RP-specific subnodes can easily be defined. By using VRML, all
412
CHUAETAL.
can be done with one standard. You do not have to buy several costly interfaces
to your CAD package, or use a conversion service where you are not even sure
whether you get it right the first time.
Open new markets. VRML reaches the consumer market and therefore
has enormous power in contrast to STL. When RP&M apply VRML right
now, we can profit from developments done by others. As concept modelers
like 3D Systems Actua and low- end "3D printers" appear, VRML opens the
mass market to them.
VRML files are more compact than STL, This is just because it uses
a list of numbered points followed by a list of triangles (three numbers, order
indicates surface normal direction). STL has a large overhead: For each triangle,
not only are all 3D points given, the surface normal vector is included, too. It
also makes verification and correction easier.
Advocates of replacing STL point out that VRML deals with more issues,
including 3D extension of the World Wide Web and future 3D telecommunication and networking standards that could also become the standard interface for
all design and manufacturing activities. This format will be applied for storing
and viewing with a 3D Web browser before printing for telemanufacturing. Advocates claim that the adoption of VRML would make RP&M more accessible
globally, but much is still left to do. VRML today provides a geometry-viewing
capability but has not addressed engineering needs such as the creation of very
complex geometries and the ability to use it in other analyses. Future versions
of VRML are expected to include NURBS, but today, such geometry creates
a heavy processing load. Additionally, software based on the VRML format
is necessarily developed to realize the special information process for RP&M,
such as support generation, slicing, and hatching.
4 I3
Unfortunately, neither solid modeling nor surface modeling has such a character as to analogize the additional forming process, even in principle. As a
result, it unnecessarily increases the complexity of information processing by
introducing sHce and scanning vector generation, which can be regarded as an
inverse procedure for stacking and layer generating in the building process.
When these flaws in the commonly used modeling systems are considered,
it is apt to introduce a voxel-based modeling method for the RP&M field.
Actually, voxel-based modeling is not a new concept in volume graphics, where
it has another name, volumetric modeling. Kaufuman et al, [45] proposed that
graphics is ready to make a paradigm shift from 2D raster graphics to 3D
volume graphics with implications similar to those of the shift from vector
to raster graphics. Volume graphics, voxelization, and volume rendering have
attracted considerable research recently. The term "voxel" represents a volume
element in volume graphics, just like the term "pixel" denotes a picture element
in raster graphics. Typically, the volumetric data set is represented as a 3D
discrete regular grid of voxels and commonly stored in a volume buffer, which
is a large 3D array of voxels. A voxel is the cubic unit of volume centered
at the integral grid point. Each voxel has numeric values associated with it,
which represent some measurable properties or independent variables (e.g.,
color, opacity, density, material, coverage proportion, refractive index, even
velocity, strength, and time) of the real object.
Unlike solid or surface modeling, volumetric modeling in discrete form
makes it close to the idea of the piece-by-piece building process used in RP&M.
In image-based systems, like SGC, successive layers of the part under construction are generated by the use of masks that either allow a light source to solidify
a photopolymer under the exposed regions or deposit material on the exposed
areas of the mask. Each mask is the image of the object's cross section, which
is easily generated by taking out all the voxels that have the same Z-axis coordinate value as that desired. Although much of the current installed RP&M
systems are vector-based, in which the sequential formation of the contours is by
scanning the object's cross section and hatching is needed to obtain the interiors,
image-based systems will dominate the market in the long term due to the perks
of faster speed and independence of objects' geometric complexity. Furthermore, generation of scanning vectors from an image is still feasible if necessary.
It is the biggest disadvantage that a typical volume buffer occupies a large
amount of memory. For example, for a moderate resolution o f 5 1 2 x 5 1 2 x 5 1 2
the volume buffer consists of more than 10^ voxels. Even if we allocate only one
byte per voxel, 128 Mbytes will be required. However, since computer memories
are significantly decreasing in price and increasing in their compactness and
speed, such large memories are becoming more and more feasible. Another
drawback is the loss of geometric information in the volumetric model. A voxelbased object is only a discrete approximation of the original continuous object
where the properties of voxels determine the object. In voxel-based models,
a discrete shading method for estimating the normal from the context of the
models is employed [46].
Chandru and Manohar [47] have proposed a system named G-WoRP, a
geometric workbench for rapid prototyping, in which the voxel representation scheme (V-rep) provides an efficient interface among the various modules.
414
CHUAETAL.
Several problems that are difficult using conventional geometry-based approaches have a simple solution using voxel models. These include estimation
of mass properties, interference detection, tolerance calculation, and implementation of CSG operation. Further, voxel-based models permit the designer to
analyze the object and modify it at the voxel level, leading to the design of custom composites of arbitrary topology. The generation of slices is made simple,
and reverse engineering is greatly facilitated.
Presently, there are two approaches to generating an object in volumetric modeled data. The first source of volume data can be produced from a
geometrical model, either closed surface or solid. The voxels are used to fill up
the interior of the object with exact size and correct properties according to the
accurate specification. It is obvious that the more voxels that are adopted, the
less the presentation error is. Of course, increasing the number of voxels may
need more space for storing and, in turn, cost more time to deal with. The second measure for building a voxel-based model is much like the process called
reverse engineering with many examples of successful applications. Volumetric
data sets are generated from medical imaging (e.g., CT, MRI, and ultrasonography), biology (e.g., confocal microscopy), geoscience (e.g., electron seismic
measurements), industry (i.e., industrial CT inspection), and molecular systems
(e.g., electron density maps) [47].
REFERENCES
1. Peter, P. S. Data Structures for Engineering Software. Computational Mechanics, USA, 1993.
2. Wholes, T. Rapid prototyping and toohng state of the industry. 1998 Worldwide Progress
Report, RPA of SME, Michigan, 1998.
3. Johnson, J. L. Principles of Computer Automated Fabrication. Palatino Press, Irvine, CA, 1994.
4. Hull, C. W. Apparatus for production of three-dimensional objects by steorolithography. U.S.
Patent, 4,575,330,1986.
5. Jacobs, P. F. Rapid Prototyping & Manufacturing. Society of Manufacturing Engineers, 1992.
6. Bourell, D. L., Beaman, J. J., Marcus, H. L., and Barlow, J. W Solid freeform fabricationAn
advanced manufacturing approach. In Proceedings of Solid Freeform Fabrication Symposium,
Austin, TX, August 12-14, 1991, pp. 1-7.
7. Levi, H. Accurate rapid prototyping by the solid ground curing technology. In Proceedings of
Solid Freeform Fabrication Symposium, Austin, TX, August 12-14, 1991, pp. 110-114.
8. Sachs, E., Cima M., Cornie, J. et al. Three dimensional printing: Rapid tooling and prototypes
directly from CAD representation. In Proceedings of Solid Freeform Fabrication Symposium,
Austin, TX, August 6-8, 1990, pp. 52-64.
9. Lee, S. J., Sachs, E., and Cima, M. Powder layer deposition accuracy in powder based rapid
prototyping. In Proceedings of Solid Freeform Fabrication Symposium, Austin, TX, August
9-11,1993, pp. 223-234.
10. Nutt, K. Selective laser sintering as a rapid prototyping and manufacturing technique. In
Proceedings of Solid Freeform Fabrication Symposium, Austin, TX, August 12-14, 1991,
pp. 131-137.
ll.Feygin M., and Hsieh, B. Laminated object manufacturing (LOM): A simple process. In
Proceedings of Solid Freeform Fabrication Symposium, Austin, Texas, August 12-14, 1991,
pp.123-130.
12. Richardson, K. E. The production of wax models by the ballistic particle manufacturing process.
In Proceedings of the 2nd International Conference on Rapid Prototyping, Dayton, OH, June
23-26,1991, pp. 15-20.
13. Kochan, D. Solid Freeform Manufacturing. Elsevier, Amsterdam, 1993.
4 I 5
14. Greulich, M., Greul, M., and Pintat, T. Fast, functional prototypes via multiphase jet solidification. Rapid Prototyping J. l(l):20-25, 1995.
15. Reed, K., Harrvd, D., and Conroy, W. Initial Graphics Exchange Specification (IGES) version
5.0. CAD-CAM Data Exchange Technical Center, 1998.
16. Li, J. H. Improving stereolithography parts qualityPractical solutions. In Proceedings of
the 3rd International Conference on Rapid Prototyping, Dayton, OH, June 7-10, 1992,
pp. 171-179.
17. Famieson R., and Hacker, H. Direct slicing of CAD models for rapid prototyping. Rapid Prototyping J. 1(2): 4-12, 1995.
18. Owen, J. STEP: An Introduction. Information Geometers, 1993.
19. Bloor, S., Brov^n, J., Dolenc, A., Owen J., and Steger, W. Data exchange for rapid prototyping, summary of EARP investigation. In Presented at Rapid Prototyping and Manufacturing
Research Forum, University of Waraick, Coventry, October, 1994.
20. Swaelens B., and Kruth, J. P. Medical applications of rapid prototyping techniques. In Proceedings of 4th International Conference on Rapid Prototyping, Dayton, OH, June 14-17, 1993,
pp. 107-120.
21. Vancraen, W., Swawlwns, B., and Pauwels, J. Contour interfacing in rapid prototypingTools
that make it work. In Proceedings of the 3rd International Conference on Rapid Prototyping
and Manufacturing, Dayton, OH, June 7-10, 1994, pp. 25-33.
22. Donahue, R. J. CAD model and alternative methods of information transfer for rapid prototyping systems. In Proceedings of 2nd Internation Conference on Rapid Prototyping, Dayton,
OH, June 23-26, 1991, pp. 217-235.
23. Bohn J. H., and Wozny, M. J. Automatic CAD-model repair: Shell-closure. In Proceedings of
Solid Freeform Fabrication Symposium, Austin, TX, August 3-5, 1992, pp. 86-94.
24. Makela, I., and Dolenc, A. Some efficient for correcting triangulated models. In Proceedings of
Solid Freeform Fabrication Symposium, Austin, TX, August 9-11, 1993, pp. 126-132.
25. Leong, K. R, Chua, C. K., and Ng, Y. M. A study of stereolithography file errors and repair.
Part 1. Generic Solution. Int. J. Adv. Manufact. Technol. (12):407-414, 1996.
26. Leong, K. R, Chua, C. K., and Ng, Y. M. A study of stereolithography file errors and repair.
Part 2. Special Cases. Int. J. Adv. Manufac. Technol. (12):415-422, 1996.
27. Chua, C. K., Can, G. K., and Tong, M. Interface between CAD and rapid prototyping systems.
Part 2: LMIAn improved interface. Int. J. Adv. Manufact. Technol. 13(8):571-576, 1997.
28. Chua, C. K., Can, G. K., and Tong M. Interface between CAD and rapid prototyping systems.
Part 1: A Study of existing interfaces. Int. J. Adv. Manufact. Technol. 13(8):566-570, 1997.
29. Weiler, K. J. Topology, as a framework for solid modeling. In Proceedings of Graphic
Interface '84, Ottowa, 1984.
30. Mortenson, M. E. Geometric Modeling, Wiley, New York, 1985.
31. Guduri, S., Crawford, R. H., and Beaman, J. J. A method to generate exact contour files for
solid freeform fabrication. In Proceedings of Solid Freeform Fabrication Symposium, Austin,
TX, August 6-8, 1992, pp. 95-101.
32. Guduri, S., Crawford, R. H., and Beaman, J. J. Direct generation of contour files from constructive solid geometry representations. In Proceedings of Solid Freeform Fabrication Symposium,
Austin, TX, August 9-11, 1993, pp. 291-302.
33. Rock, S. J., and Wozny, M. J. Utilizing topological information to increase scan vector generation efficiency. In Proceedings of Solid Freeform Fabrication Symposium, Austin, TX, August
12-14,1991, pp. 28-36.
34. Crawford, R. H., Das S., and Beaman, J. J. Software testbed for selective laser sintering.
In Proceedings of Solid Freeform Fabrication Symposium, Austin, TX, August, 12-14, 1991,
pp. 21-27.
35. Dolenc, A., and Makela, I. Slicing procedures for layered manufacturing techniques. Comput.
Aided Design 26(2):119-126, 1994.
36. Tyberg, J., and Bohn, J. H. Local adaptive slicing. Rapid Prototyping]. 4(3):119-127, 1998.
37. Wu, Y-J. E., and Beaman, J. J., Contour following for scanning control in SEE Application:
Control trajectory planning. In Proceedings of Solid Freeform Fabrication Symposium, Austin,
TX, August 6-8,1990, pp. 126-134.
38. Beaman, J. J. Solid Freeform Fabrication: A New Direction in Manufacturing: With Research
. and Applications in Thermal Laser Processing. Kluwer Academic, Dordrecht, 1997.
416
CHUA TAL.
39. Wozny, M. J. Systems issues in solid freeform fabrication. In Proceedings of Solid Freeform
Fabrication Symposium, Austin, TX, Aug. 3-5, 1992, pp. 1-15.
40. Du, Z. H. The Research and Development on Patternless Casting Mold Manufacturing Directly
Driven by CAD Model. Tsinghua University, Ph.D dissertation, Beijing, China, 1998.
41. Rock, S. J., and Wozny, M. J. A flexible file format for solid freeform fabrication. In Proceedings
of Solid Freeform Fabrication Symposium, Austin, Texas, August, 6-8, 1991, pp. 155-160.
42. Dolenc, A., and Melela, I. Leaf: A data exchanger format for LMT processes. In Proceedings
of the 3rd International Conference on Rapid Prototyping, Dayton, OH, June 7-10, 1992,
pp. 4-12.
43. 3D Systems, Inc., SLC File Specification, 3D System Inc. Valencia, VA, 1994.
44. Gibson, I., Brown, D., Cobb, S., and Eastgate, R. Virtual reality and rapid prototyping: Conflicting or complimentary. In Proceedings of Solid Freeform Fabrication Symposium, Austin,
TX, Aug 9-11,1993, pp. 113-120.
45. Kaufman, A., Cohen, D., and Yagel, R. Volume graphics. IEEE Computer 26(7):51-64,1993.
46. Kaufman, A. Volume Visualization. IEEE Computer Society Press, Los Alamitos, CA, 1990.
47. Chandru, V, and Manohar, S. G-WoRP: A geometric workbench for rapid prototyping. In
Proceedings of the ASME International Mechanical Engineering Congress, 1994.
DATABASE SYSTEMS
IN MANUFACTURING
RESOURCE PLANNING
M. AHSAN AKHTAR HASIN
Industrial Systems Engineering, Asian Institute of Technology, Klong Luang, Pathumthani
12120, Thailand
P. C. P A N D E Y
Asian Institute of Technology, Klong Luang, Pathumthani 12120, Thailand
I. INTRODUCTION 417
II. MRPII CONCEPTS A N D PLANNING PROCEDURE 418
A. Manufacturing Resource Planning (MRPII): What It Is 418
B. Hierarchical Manufacturing Planning and Control 419
III. DATA ELEMENT REQUIREMENTS IN THE MRPII SYSTEM 433
A. Database of the MRPII System 433
B. Data Storage and Retrieval in the MRPII System 438
C. Information Transaction in MRPII 440
D. Early-Stage Information Systems in Manufacturing
Planning 443
E. Information and Database Systems Centered around
MRPII 445
IV. APPLICATION OF RELATIONAL DATABASE MANAGEMENT
TECHNIQUE IN MRPII 452
V. APPLICATIONS OF OBJECT-ORIENTED TECHNIQUES
IN MRPII 457
A. Object-Oriented MRPII Modules 465
B. Object-Oriented Inventory Control System 465
C. Object-Oriented PAC System 467
D. Object-Oriented Capacity Planning System 475
E. Object-Oriented Bill of Materials 477
F MRPII as an Enterprise Information System 486
G. Scopes of Object Orientation 492
REFERENCES 494
I. INTRODUCTION
The overall manufacturing system is composed of complex functions and activities. Most of the functions and activities are interrelated in some way. For
Database and Data Communication Network Systems, Vol. 2
Copyright 2002, Elsevier Science (USA). All rights reserved.
4 I 7
4 I8
419
from inventory, product structure of the goods, and other functions to provide
necessary feedback in the earUer steps in a closed loop, in order to generate
amount and timings of purchasing and manufacturing.
It is a computer-based materials planning system, sometimes also known as
production and inventory planning and control system. This system introduced
a shift from traditional two-bin or periodic inventory control policy to a timephased discrete inventory policy.
The "closed loop MRP" provides information feedback that leads to the
capability of plan adjustments and regeneration. The MRPII system provides
some additional facilities, such as financial and accounting.
In the 1960s, a main-frame computer was necessary to run an MRPII system, but now, a Window-based PC or UNIX-based work station is sufficient to
perform the job [14].
Business Plan
Production Plan
More
detailed
Master Production
Schedule
Planning
Phase
Material
Requirements
Plan
Execution
& Control Phase
420
from years to days, when we gradually move from top to bottom. At any level,
the target is to find out what, when, and how much of a product or component
to produce and/or purchase. Basically, a commercial MRPII system starts functionality from a master production schedule (MPS), which is built from two
different functions of production plan, namely forecast and customer order.
The MRPII system begins with the identification of each component
through a product structure, and then finds out the requirements along with
the timings of all those components/subassemblies, based on forecast or customer order, in accordance with available stock on-hand. This misses a major
requirement, which is the plans are not generated in accordance with available
capacity. During order generation, to fulfill the requirements or production volume, it cannot consider the limitation of capacity of the plant. That is why the
system is termed as an infinite capacity planning system.
Some of the major characteristics of this system are as follows:
1. The MRPII system is mostly applicable to make-to-stock, discrete, batchoriented items. However, job shop, rate-based, and process manufacturing can
also be accommodated with variants or enhancement.
2. The MRPII system is divided into separate modules, which have good
interaction while preparing materials orders. The main modules are forecast,
bill of materials (BOM) or product structure, MPS, inventory, control, shop
floor control (SFC) or production activity control (PAC), purchasing, sales and
order processing, capacity management, etc. The details of these modules or
functions are discussed later.
3. The demand for finished goods, or end items, follows independent inventory, policies, like economic order quantity (EOQ), re-order point (ROP),
or others, whereas demand for subassemblies, components, or raw materials is
dependent on the demand for finished goods. The demand for end items are arranged in an MPS. The dependent demand can be calculated from this MPS by
time phasing, and is of lumpy type. This calculation is managed by the material
requirements planning (MRP) module of an overall MRPII system.
4. MRP is part of an overall resource planning (MRPII) system. The program, which generates the materials plan (i.e. orders), is known as the BOM
explosion program. Since an MRPII system can prepare plans for all manufacturing resources, such as manpower and machines, including materials, it is
popularly known as a resource planning system. The generation of a materials
plan in accordance with capacity constraint is still not possible.
5. The orders are scheduled based on due dates and an estimated lead time.
From the due dates, the order release dates are obtained by going backward,
equal to lead time (known as lead time offsetting), in the production calendar.
This scheduling process is known as backward scheduling.
6. An MRPII system is driven by MPS, which is derived from forecast, or
customer order, or a distribution requirements planning (DRP) system.
The general structure of an MRPII system is shown in Fig. 2.
The MRPII system is modular, based on the functions/departments it serves.
These modules share the manufacturing database, in order to generate combined materials plans for each department. The major modules are described
below with their functionality.
421
Z3Z
Rough-Cut Capacity
Planning (RCCP)
BOM
MRP
Inventor}/^
Records
Orders generated
A.
Capacity Requirements
Planning (CRP)
Manufacturing
Orders
Purchase
Orders
Scheduling
PAC/SFC
Vendors
1
Finished Goods
Materials Supply
I
Inventory
Control
Dispatching to customers
FIGURE 2 The closed loop system.
I. Bill of Materials
The bill of materials (BOM) of a product lists all the components, and
sometimes the resources (such as labor), required to build a product. The BOM
enables the planner to identify not only the components, but also their relationship in the final assembly.
There are several kinds of BOMs to facilitate different purposes. The most
commonly used BOM is a tree-type structure. An example of a BOM of a
product, a wooden table, is shown is Fig. 3. The major aspects of this BOM
are explained as follows: The BOM shows a multilevel product structure,
where components are arranged according to parent-component relationships.
The part identification (Part ID) numbers are shown in parentheses just below
the part name, inside the box. This information is necessary during explosion to
422
Table
(100)
,
Top
MQTL
Base
r202^
(1 week)
Body
(32021)
(1 week)
FIGURE 3
level 0
(1 week)
level 1
(2 weeks)
1
Legs 1
(32022)
level 2
(1 week)
find out the requirements of the total amounts of the individual parts. The contents in parentheses, outside of the box, show the lead times in weeks. For
manufacturing components, it is the manufacturing lead time, whereas it is the
purchasing lead time for the purchasing component. A critical path is the line
in the BOM that has the longest cumulative lead time. Here, it is 4 weeks, along
Table-Base-Legs. This information is necessary during MRP explosion, to find
out the total lead time required to manufacture a batch of products (Table). The
order release dates for the item are found by subtracting the lead time from the
order due date. This subtraction process is known as the lead time offsetting for
backward scheduling. The numbers on the left-hand side of the box show the
quantity required per (Q/per) assembly, which means number of components
necessary to build one unit of the parent item.
The item at level zero is the table which is known as the finished good, or
end item, and is sold to the customer. This is planned for production in the MPS
and is said to have an "independent demand." Below this are the subassemblies
and the components required to manufacture a table.
To facilitate easy display of this graphical BOM on a computer screen, it is
generally converted to another format, known as indented BOM. The indented
BOM of the Table is shown in Table 1.
The BOM in Fig. 3 is known as a multilevel BOM. When a subassembly is
shown only one level down, then it is known as a single-level BOM. A singlelevel BOM for Table (Part ID 100) would be as shown in Fig. 4, where it can
be said that "a Table (100) comprises two components, a Top (201) and a Base
(202)," whereas the general expression of a "where-used" format of the BOM
is the reverse presentation of that. For example, it can be said that "the Base
(202) is an item, which is used to make one Table (100)."
TABLE I The Indented BOM
Manufacturing Bill of Material for Table, Part I D No. 100
Part I D no.
Part name
100
Table
Base
202
Body
Legs
32021
32022
201
Top
Table
(100)
1
Top
(201)
423
Table (100)
Top (201)
(1 week)
L
Base
Base (202)
am
Tree Structure
Indented format
There are several other variants of BOM presentation schemes [1], which
are not elaborated here, as this chapter is not intended to include all possible
aspects of the MRPII system.
2. Master Production Scheduling
10
11
12
House
100
100
100
100
110
75
75
100
100
100
110
110
110
75
100
110
110
75
Office
110
110
75
75
75
75
Product
424
FROZEN
SLUSHY
LIQUID ZONE
time periods
FIGURE 5 Time fences.
bucket and a planning horizon of three months. It is assumed that each month
is comprises four weeks.
This MPS is still at the preliminary level, which requires adjustment in terms
of capacity. The capacity planning section is given later on. Some terminology
related to MPS are as follows
Delivery Promise. It is an amount not booked by any customer,
and thus, is available to promise to customers. A delivery promise can be
made on the available to promise (ATP) amount, where ATP = Beginning
inventory + scheduled receipts actual orders scheduled before next scheduled
receipt.
Scheduled receipts. It is the order for which delivery is scheduled, but not
yet fulfilled.
Time fences. As discussed earlier, the planning horizon is the time period,
generally from 3 months to 1 year, for which the MPS is prepared. This may
again be divided into separate zones, depending upon the level of certainty, as
shown in Fig. 5 [1]. For the frozen zone, capacity and materials are committed to specific orders. Generally, no changes are allowed in this time period,
thus is flagged as "Firm Planned." In the slushy zone, capacity and materials
are committed to a reduced extent. Changes are allowed. The liquid zone is
completely uncertain. No materials and capacity are committed for this time
period [1].
3. Inventory Control
425
Percentage
of dollar
usage
FIGURE 6
20
50
Percentage of total items in amoun
ABC Pareto curve.
A class items. Single-order inventory policy, and the Wagner-Whitin algorithm can be used to find out a refined lot size, or lot-for-lot (MRP) purchasing/production can be followed.
B class items. In this case, lot-for-lot (MRP) or a fixed lot size, obtained
using EOQ or EPQ {economic production quantity)^ or the least total cost
method can be followed.
C class items: In this case, bulk purchasing, fixed lot size, or the EOQ/twobin system/periodic review system can be followed.
The MPS (or the finished good) is based on an independent inventory policy,
whereas the MRP (or the lower-level components in the BOM of the finished
good) is based on a dependent policy.
Independent policies. These policies find out the ordering size (volume)
and timings of the finished goods. The most widely used policies in an MRPII
system are EOQ/EPQ, order point system, lot-for-lot (MRP), etc.
Dependent policies. The components, raw materials, and subassemblies,
which act as part of the finished good, do not have an independent demand
of their own. The demand of these components depends on the demand of the
finished goods. For example, the leg of a table is said to have dependent demand.
Lot-sizing rules, which are commonly used by MRPII professionals, are
discussed below.
Lot-for-lot, Order size is exactly equal to the requirement in a time bucket.
This is also known as MRP lot size.
Fixed lot. Irrespective of the requirement, a fixed lot or multiples of a lot
are ordered to achieve minimum cost. EOQ, the most commonly used rule to
determine a fixed lot, is discussed below.
The EOQ (or EPQ) formula tries to determine a lot size that offers minimum
cost in terms of the above cost elements. Suppose that for Top of the Table, the
order quantity is 400 units and the usage rate is 200 units per week, then Fig. 7
shows its inventory levels with time.
If the economic purchase quantity is Q (EOQ) per order, when each order
costs Q for ordering an item, the item has an annual demand rate of D, and
the unit purchase price is P at / percentage to carry the materials inside the
426
on-hand
Inventory
Level
9 (maximum
inventory level)
Q/2 (average level)
I
1
FIGURE 7
'
r
5
Time periods in weeks
Q =
/2 X 10 X 10000
0.20 X 10
316 units.
427
the planned order receipt date, and by lead time offsetting, the date on which the
order should be released for execution is known as the Flanned Order Release
date.
Scheduled receipts. Scheduled receipts are orders placed on manufacturing or purchasing and represent a commitment and capacity at work centers
allocated. Until the order is completed (closed), this order is termed an open
order.
Gross and net requirement. Gross requirement is the total amount required
for an item, computed from MPS, without considering on-hand inventory or
open orders. Net requirements are calculated as
Net requirements = Gross requirements scheduled receipts
on-hand inventory.
(2)
Table 3 shows the material requirements plan for the components of the
finished good, Wooden Table.
TABLE 3
Low
level code
MRP Computation
Weeks
Past
Item name
Table
Top
Base
Body
Legs
Gross requirement
Scheduled receipt
On-hand
Net requirement
Planned order receipt
Planned order release
Gross requirement
Scheduled receipt
On-hand
Net requirement
Planned order receipt
Planned Order Release
Gross requirement
Scheduled receipt
On-hand
Net requirement
Planned order receipt
Planned order release
Gross requirement
Scheduled receipt
On-hand
Net requirement
Planned order receipt
Planned order release
Gross requirement
Scheduled receipt
On-hand
Net requirement
Planned order receipt
Planned order release
6
70
40
40
40
40
40
40
30
30
10
10
10
10
10
0
20
20
20
30
10
10
10
10
20
20
0
0
20
20
20
80
70
0
0
10
10
10
10
0
20
20
0
30
30
428
TABLE 4
Gross requirements
10
18
10
15
Scheduled receipts
Projected on-hand
20
20
16
26
Net requirements
15
15
15
1
4
Gross requirements
10
18
10
15
18
10
10
15
20
Scheduled receipts
Projected on-hand
20
16
26
Net requirements
20
20
20
20
429
To/from 4H Scheduling U
PAC
^
MRP
Dispatching
A
, Materials
_^
Process
Monitoring
movement
workcenter
control
To/from
MRP
1^
1^
Mover
Information
Other systems
Monitor
^ Produce orders
Producer
Instruction
Work Centers
/updates
Dispatcher
Move orders
Material
Handling
Systems
PAC
Scheduler K"
Producer
IT:
Physical layer
Shop Floor
Shop Floor
are known: (i) Net change MRP^ a procedure where only the changed orders are
recalculated while others remain unchanged, and (ii) regenerate MRP systems,
where all the orders are recalculated, in case of any change.
5. Production Activity Control
Production activity control (PAC), or alternatively known as shop floor
control (SFC), describes the principles and techniques of planning and controling the production during execution of manufacturing orders on the production
floor [1,4]. It supports the following major functions:
Scheduling,
Dispatching,
Monitoring,
Control,
Capacity management, and
Physical materials arrangements.
The PAC functional elements and its model are shown in Fig. 8. After approval of the MRP generated orders, they are released as either purchasing
orders for those components purchased from the vendors or manufacturing orders for those components manufactured on the shop floor. While the purchasing orders go to the purchasing department as purchase request, or purchase
requisition (PR), the manufacturing orders are released to the shop floor.
One of the most important tasks of shop floor control is to assign priority
to jobs, in order to prepare a dispatching list [1,4]. The dispatching list arranges
the jobs in order to be processed at each work center according to a certain
priority. A typical dispatch list is shown below:
Dispatch list
Woric center ID: Assembly center
Wori< center: 5
Today: 16/9/98
Job No.
Part ID
Amount
Due date
Start date
Finish date
3151
3156
4134
100
50
60
27/09/98
30/09/98
10/10/98
32.3
30.0
40.5
23/09/98
26/09/98
01/10/98
27/09/98
30/09/98
05/10/98
1400
1500
1300
430
B B
Number of days
remained until due
Operation
days required
Rank
EDD
CR
1000
1st
1001(2)
1001 (0.50)
1001
2nd
1010(5)
1010(0.83)
1003
3rd
1009 (7)
1000 (1.29)
1009
4th
1003 (8)
1003 (1.60)
1010
5th
1000 (9)
1009 (2.33)
There are several priority rules. The two most common rules are (i) earliest
due date (EDD), where the job are arranged in a sequence of due dates, and
(ii) critical ratio (CR), which is an index as
CR =
The priority schedules, prepared in accordance with the above two rules, are
shown in Table 6.
Recently, information integration has gained momentum, leading toward
CIM. The ESPRIT (European Strategic Program for Research in Information
Technology) project is an attempt toward that. Out of several islands of automation in manufacturing, possibly PAC [4] is the best module, which requires
integration of the most heterogeneous hardware systems. The problem is aggravated because of the closed architecture of CNC {computer numerical control)
machines and other computer-controlled equipment, and nonimplementation
of common 7-layer OSI communication protocol, such as MAP/TOP (Manufacturing Automation Protocol/Technical and Office Protocol). Additionally,
PAC is highly a dynamic module, which goes through physical and logical
changes frequently, because of changes in customer requirements, changes of
equipment, disturbances, changes in product types, etc.
6. Capacity Management
431
Revise MPS
Revise MPS
and/or MRP
Yes
Execution in PAC
FIGURE 9
^~~
MPS
4
T^
w^
CRP
k
P
Capacity
control
MRP
jrurt/nd aiiig
F I G U R E 10
Resource
(RP)
W planning
w RCCP
PAC
4
^
432
TABLE 7
Table type
Production
vol. (period 1)
House
120
12
60
1500
Office
120
10
60
1260
Similarly, suppose that for period 2, total required work minutes is 2000
minutes.
Against the above required work minutes, the available work minutes (man
minutes) at that station is calculated as 1 worker, 5 working days per week,
1 shift per day, 8 hours per shift, 60 minutes per hour. Thus, the available
man-minutes in periods 1 and 2 = ( 5 x l x 8 x 60) = 2400 minutes.
Now, the load profile can be generated as shown in Fig. 11, where it is seen
that the required capacity does not match the available capacity in each period.
Some manual forward and backward adjustment may, thus, be necessary to
keep the required capacity within the available capacity.
A shop calendar is maintained for finding the available work hours. With
the assumption that each week comprises 40 work hours (excluding the provision of overtime). Table 8 is an example of a typical shop calendar.
7. Other Modules
Available capacity
2000
Time (min.)
(capacity)
L
Periods
FIGURE I I
433
Monday
Tuesday
Wednesday
Thursday
Friday
1
m
m
p/- m
14 I m
3'.,;
to I
11
12
13
14
15
18
19
20
21
22
25
26
27
28
29
Saturday
-t:-r^'/'M
^rJSi'1
^^^M-
SBB&IJ
434
computer-based MRPII systems are highly integrated, they maintain a single set
of files. The appointed users can retrieve the information either as display or
printout through standard reports. Numerous such built-in reports can be found
in MRPII commercial packages, which in the background, uses a standard query
system for the inquiries of the users.
Although the arrangement of data elements may vary from system to system, a typical system contains the following files:
Data input files
Master data files
Supporting data files
Transaction (On-going/continuous) data file
Output files
Manufacturing/purchasing order files
Status (of different departmental) data files
I. Master Data Files
There are four master data files: item master file (IMF), BOM file, work
center file, and routing file.
Item Master File
This is also known as the part master record (PMR) file, which contains the
records of all parts/components, and even resources, required to build products. This file allows the planner to enter into the system the necessary part
information, which may be categorized as the following:
Basic part information:
Fart identification, A unique alphanumeric number which is used as a key
to records. The ID may contain some group technology concepts in order to incorporate some feature qualifiers, such as design and manufacturing attributes,
according to a similarity of features available among several components.
Descriptions. A lengthy character field to describe the name of the part,
corresponding to the part ID.
User code. Although the above two items are common to any user, sometimes the users may differ as to which items they are interested in. For example,
the accounting department may want to include packaging materials inside the
BOM such that the cost of the packages may be automatically taken into account while calculating the total production cost of the product, whereas the
design and engineering department is generally not interested in defining the
packaging materials in the product design and BOM [45]. Similarly, the planning and the shop floor departments are generally interested in putting the tools
as resources inside the BOM to facilitate capacity planning, but as it is not a
part of the product design, the design department is not interested to include
it. By having a user code with different levels and degrees of access, the needs
of various interested groups can be served. The reports are also displayed or
printed as per user code and other requirements.
Date added and obsoleted. The planner may be interested to keep records
as to when the part was defined the first time, and afterward when the part
would no longer be necessary for any reason, such as design change.
435
This file describes the product structure, which is composed of parts described in the PMR. The records in this file each contain one parent-component
relationship, with additional data, such as quantity per assembly, quantity code,
BOM-type code, BOM level code, effectivity date, alternate part, lead time for
assembly and lead time offset [27,45]. The search procedure for a component
in the BOM, along with other information for the part, is explained later in
this section.
Work Center File
The work center (WC) file and the shop routing file may reside under the
PAC module. A WC is a single machine or a place, or a group of machines or
places, that perform a particular operation on the shop floor. This is required
to plan capacity and production schedules. Each WC must be defined by the
required information, and must be identified by a unique number, known as
436
The shop routing describes the sequence of operations, with required WCs
and tools, for producing a product. The routing should have an identifying
number distinct from the number of the part being manufactured. This allows
different parts to have the same shop routing [45]. A typical shop routing file
may contain the following information (data elements):
Part ID (to link a routing to a particular part), operation sequence number,
work center ID and descriptions, operation type (e.g., unit operation or batch
operation), tool name and number, run time, setup time, planned scrap, etc.
[27].
A standard shop routing is saved for a part. However, there are times when a
particular manufacturing order may require some deviations from the standard
one, due to specific reasons. In such a case, changes can be made in the standard
file and then saved as an "order routing file," which indicates its application to
a particular order only, whereas the standard file remains unchanged [27].
2. Supporting Data Files
Several other files contribute to the above master files to complete the system, depending upon the system's capability. Some major files are discussed,
with their functionality and major data elements, below.
Cost Data File
This is used to input and review the cost-related data of a part, defined in
IMF. Some major data elements are part ID (a key to IMF), part-type code, costtype code (manually or automatically calculated), accounting-type code, unit
of measure, costing date, direct material cost, direct labor cost, material and
labor burdens, etc. [27]. Sometimes, a key to the general ledger (GL) accounting
system is included in this file. Otherwise, a separate file that acts as an interface
between closedloop MRP and a financial and cost accounting system/module
may be introduced. The accounting system is also linked to different transaction
files, such as materials purchase and sales, to exchange accounting data from/to
accounts receivable (A/R) and accounts payable (A/P) files under an accounting
system/module.
Inventory/Stock Files
There are two files associated with inventory control. The stockroom
data file maintains information on available stockrooms in the company. Its
data elements are a unique stockroom identification number and name, location, a code to flag whether it is a nettable (a permanent storage for regular
transactions against usage) or nonnettable (a location for temporary storage,
such as an inspection storage or a materials review board) stock, size/floor
space, etc. The inventory part data file is necessary for linking part/component
437
These files link the part/component in IMF to its vendors, and other purchasing data/information. Several files may exist in a MRPII system to maintain
these data/information. The purchase part data file maintains the basic purchasing information of a part if it is a purchased part. The data elements that may
be found in this file are part ID (from IMF), purchasing lead times, purchaser's
ID, UOM, etc. The vendor data file keeps the records, such as vendor ID and
name, address, and FOB points, of all possible vendors/suppliers of input materials for the company. The vendor part data file identifies the relationships
between parts/components and their suppUers (i.e., it shows what part can be
purchased from what vendor, which means it basically provides a linkage between purchase part data file and vendor data file). It also provides price break
down information for a part.
Sales Files
Two major files can be found in a MRPII system. The sales part data file
identifies the MPS parts to be sold to the customers. It includes part ID (from
IMF), sales price break down, taxation code, and sales UOM. The customer
file keeps the records of permanent, potential, and past customers for future
reference.
tARP File
This file contains information regarding a part's lot size policy in order
to determine the amount to manufacture or purchase, and the lead time for
backward scheduling. Additionally, it may include the planner's ID and name.
Other information in this file are basically fed from other pertinent files, as
given above.
Capacity Planning File
Although the current MRPII systems are unable to manage the CRP process, some systems provide connectivity with a third party scheduling software.
This is beyond the scope of discussion in this chapter. However, some commercial MRPII systems provide only a RCCP module with limited capability.
A RCCP module may contain two files for this process. The shop calendar file
maintains a yearly calendar marked with weekly and other holidays, in order
to exclude those days during backward scheduling. A capacity maintenance file
may contain data on planned utilization of each work center, planned idleness
of a machine for maintenance, etc.
For permanent data maintenance (input, display, print) purposes, in addition to the above major files, several other secondary/optional files, distributed
in the pertinent modules, may be found in a system. Some examples of such
secondary modules that may be found in a system are quoting (for quotation in
438
In addition to the above files for permanent data maintenance, several other
files are necessary for regular and ongoing continuous data inputs against all
transactions. Some major transactional data files are discussed below along
with their functionality.
Physical Inventory File
Although the purchase orders are created automatically from the MRP
explosion program, in case of urgency, the MRPII systems provide an additional
file for entering a purchase order without any formal planning steps. This is
the unplanned purchase order file, which includes part ID, vendor ID, line and
delivery number, dates, quantity, price, etc.
To monitor and track the materials that are already purchased and waiting
for reception or are in transit, a separate file, reception record file or purchase
order transit file, is necessary to keep the records of amounts already arrived,
the possible date of arrival, transportation mode, etc.
B. Data Storage and Retrieval in the MRPII System
In most of the traditional MRPII systems, the record dictionary and pointer
[43] are used to identify and retrieve data, such as item/component/materials
data from the BOM file, an example of a tree-type BOM of a Pen being given in
Fig. 12, with the IMF, with selected fields, for the BOM being given in Table 9.
Pen
100
Body
101
Cap
102
I
Ink Refill
103
F I G U R E 12
Color
104
Cover
105
Cap-body
106
Color
104
439
TABLE 9
Record
address
Part ID
Part name
Vendor, and
(Mfg./Pur.)
First parent
record
First component
record
103
Ink Refill
TDCL (Pur.)
10
104
Color
TDCL (Pur.)
13
105
Cover
In-house (Mfg.)
16
102
Cap
In-house (Mfg.)
18
11
101
Body
In-house (Mfg.)
10
12
100
Pen
In-house (Mfg.)
12
106
Cap-body
In-house (Mfg.)
15
As the part ID is a unique value for any component, it is used as the key to
index and search data [43], as given in Table 10. The corresponding BOM file,
with parent-component relationships and selected fields, is given in Table 11.
Now, if it is desired to find out the single-level BOM for subassembly Body
(part ID 101) along with the information as to whether the component is a
Purchased part or an inhouse Manufactured part, and if purchased, then the
Vendor name, the record dictionary (Table 10) is searched. It is found that for
this part (Body), the record address in IMF (Table 9) is 5. From Table 9, it is
found that the pointer to the first parent record in BOM file (Table 11) is 10.
When searched in the BOM file (Table 11) against the record number 10, it is
found that the first component for Body (part ID 101) is 103 (Ink Refill). The
pointer additionally indicates that the next component record can be found at
address 13, which is the component with ID 104 (Color). The pointer at this
record, in turn, indicates that the next component for this part is at record 16,
where the component with ID 105 (Cover) is found. Thus, it is found that there
are three components, with IDs 103, 104, and 105, for the parent part Body
(ID 101). Additionally, the records at addresses 10, 13, and 16 give pointers
to the IMF (Table 9) as 1, 2, and 3. When searched at record addresses 1, 2,
and 3 in Table 9, the single-level BOM for subassembly Body (part ID 101) is
established as shown in Table 12.
TABLE 10 Item Master, Indexed
against Part I D (Key): Record
Address Dictionary
Part ID (Key)
Record address
100
101
102
103
104
105
106
440
TABLE 11
B O M File w i t h Pointers
Pointers
Record
No.
Parent
(Part ID)
Component
(Part ID)
12
100
101
11
100
102
10
101
103
13
101
104
16
101
105
18
102
104
15
102
106
Component next
where-used
Parent's next
component
11
Component's
IMF record addr.
5
4
18
13
16
15
3
7
Part ID
Part name
Vendor, and
(Mfg./Pur.)
103
Ink refill
TDCL (Pur.)
104
Color
TDCL (Pur.)
105
Cover
In-house (Mfg.)
441
Customer
00
/Sales \
(module/
3
CD
a
Order processing
^ M P S module)^
Complete; MRPII steps
Finished good
warehouse
(IC module)
/^
kiK
Accounting module .
Shipment slip
tMRP module)
Purchasing Department
(Purchase module)
1. Select vendor and
assign PRs to vendors
2. Add Purchase Order (P0)|
notes and headings
\
Purchase Orders
(known as open orders)
Vendors
FIGURE 14 Purchasing function subsystem.
442
MRP explosion
(MRP module)
Unplanned Manufacturing order
ier
Planned orders
Approved orders
Inventory Picklist Order routing list
Shop floor
(SFC/PAC module)
Stockroom
^
(Inventory Control Module,
Issue components as statedRaw in Picklist and update data
Materials on "On-hand"inventory
Vendors
(Pur. module)!
i_
Customers
(Sales module)
F I G U R E 15
Invoice
Payment
A vendor database with all possible vendor lists are maintained, in the MRPII
system, with their addresses, payment and shipment procedures, and possible
price breakdowns. The complete process is known as vendor scheduling. Once
a vendor is assigned to a particular PR and sent, the order status is changed
from PR to purchase order (PO).
3. Manufacturing Subsystem
The manufacturing subsystem is responsible for generating and managing
manufacturing orders (see Fig. 15). Generally, the orders are generated through
the MRP explosion program, although some orders may be generated manually,
as unplanned order. These orders need approval in the computer system. The
manufacturing orders contain two vital lists, namely the inventory pick list and
shop routing list. The inventory pick list lists all the raw materials and components needed to manufacture a particular product. The inventory people verify
this list and then send the required materials to the shop floor through a menu
item, "issue stock." At the same time, the shop routing list, which describes the
requirements of operations, work centers, machines, tools, and sequence of operations to produce a particular product, is sent to the shop floor. After necessary
prioritization, using some priority rules, a shop schedule is generated for maintaining the schedule within available capacity. Once production is complete for
a particular order, the finished goods are sent to the finished good stockroom for
shipment to the customers. When payment is also completed from the customer
side, the accounting module describes the order as a "closed order".
The three functional subsystems described above are the most important
subsystems, although there are several other functional subsystems in the complete MRPII system.
443
Changes oP
operating
conditions
Master Production
Plan
Status of jobs
and machines
Orders
Unsatisfying
SAC system
OSM system
Judgement
Satisfying
<CAPIS>
[Terminal output
device]
Dispatching
Production
schedule list
Basic \
Engineering]
Data 7
Computer
Production
Implementation
C3 C3 Factory^
[Terminal
S
r Production dataj Input device]
F I G U R E 16
A general flow chart of the CAPIS system (Ref. 17, copyright Taylor & Francis).
444
The SAC performs simulations to find out the optimal schedule, which is
then displayed at each work center by means of both visual display and/or
printed instruction sheets.
It can be observed that although this on-line computer system is powerful
in production scheduling, it lacks power in other functions of manufacturing,
such as forecasting, linking forecast to MPS, which are complemented in the
recent MRPII system.
2. Parts-Oriented Production Information System (POPIS)
This system is broader and one step closer to the MRPII system, in terms
of functionality, than the CAPIS system. It consists of three modulesdemand
forecasting and production planning subsystem (Module I), parts production
subsystem (Module II), and products assembly subsystem (Module III), all of
which again consist of several submodules (see Fig. 17) [17].
The first module forecasts demand and sales, which is known as "primary
information." It also manages data/information relating to the reception of
MODULE I: Demand forecasting and production planning sub-system
Demand forecasting
(primary information)
Determining
safety stock
Order received
(secondary information)
Parts explosion
Parts explosion
Yes
Crash - Parts
Production Programming
Parts production
planning
Production
assembly planning
Machine loading
Assembly loading
Assembling capacity^
vs. Assembly loa^
Sub1 Determining
I contracting | overtime
Production assembly
scheduling
Parts production
scheduling
Products assembly
Acquiring parts
sub-contracted
Parts production
I
Shipment
MODULE III:
Products Assembly Sub-system
General flow chart of POPIS (Ref. 17, copyright Taylor & Francis).
445
orders, the determination of safety stock, and then the preparation of production plan, periodically, based on the primary information.
The second module, having input from the first module, prepares production schedule for the dependent parts (produced for stock) by properly sequencing them according to priority rules, and then manages the data/information
of these produced parts along with the purchased parts as part of inventory
control.
The third module, based on the information and data from the first module
and materials along with information from the second module, prepares an
assembly plan and executes the plan.
Although this computer-based information system has been proven to be
useful for resource planning in manufacturing, it has not gained much popularity in today's manufacturing industries.
3. Communications-Oriented Production Information and Control
System (COPICS)
This computer-integrated production information and management system
was proposed by the International Business Machines Corporation as a concept
of management operating systems (MOS) [17]. It deals with establishing a
production plan based on forecast, and executing the plan as per capacity by
purchasing raw materials and shipping the finished goods to customers. This is
quite close to the idea of the current MRPII system.
This system introduced the idea of bill of materials of MRPII, from which
an explosion is run to find out component requirements (see Fig. 18).
E. Information and Database Systems Centered Around MRPII
On several occasions, MRPII has been used as the center of a company-wide
information system, sometimes leading toward the idea of enterprise resource
planning (ERP), and supply chain management (SCM). Additionally, in several cases, the concepts of MRPII have been utilized to build customized
information systems for companies. That means, these data and information systems can also be considered as a version of MRPII [13,18,23,25,37,
54].
Although argumental, it can be stated that MRPII evolved toward ERP with
the inclusion of several facilities, like electronic commerce and electronic data
interchange, and toward supply/demand chain management with the inclusion
of a network of companies who trade among themselves, either as a supplier
or as a buyer. MRPII plays the central role of managing the data/information
of resources in this connection (Fig. 19) [23].
The limitations of the MRPII system, either as an independent system or as
a part of a company-wide information system, have also been complemented
on several occasions by integrating information systems of other islands of
manufacturing (see Fig. 20) [18,25,36].
From a survey, Meyer [25] identified four clusters of information for
integration for manufacturing planning. These are: (i) internally oriented administrative systems, e.g., inventory status, purchasing, and accounting, (ii)
market-oriented administrative systems, e.g., order entry, sales, forecasting, and
distribution, (iii) internally oriented technical control systems, e.g., shop floor
446
1 Engineering and 1
Production data 1
1 Control
1
Cust omer Order [
cing
|servi
Forecasting
4r
4^
1 Master Production V
Schedule Planning [
copies
A
.
wi mvsntory
[lagemen t
11
4-
D
A
T
A
K1 i^Manufacturing
Activity planning
B
A
S
E
1 Order releas
1
1 1 Purchasing and
1 1 receiving
^
J Stores rontro 1
n_
r~
III
<
Id
1-
F I G U R E 18
...
II.J
control and quality reporting, and (iv) technical systems that link manufacturing to external groups and eventually the customer, e.g., design (CAD) and
engineering. It can be noted that out of four clusters of information, three
belong to the integrated MRPII system, where as only the last one is beyond
the scope of MRPII. Meyer has shown this in a two-dimensional map, as seen
in Fig. 2 1 , where the islands of integration are linked with each other through
a pivotal database, which is the MPS/MRP.
As opposed to the tradition of interfacing the databases of two systems, Hsu
and Skevington [18] formulated a fundamental approach to the information integration of manufacturing enterprises, which includes MRPII as a key provider
of data and information for several functions together for an enterprise. This
approach entails the use of an intelligent metadatabase, which differs fundamentally from the strategies of both interfacing and a prevailing "super database
management system." They applied the "feature" concept, traditionally known
to be a technique for CAD/CAM integration, in the case of information
modeling for all functions where MRPII takes a major part. This knowledge-
447
Capacity Management
Supply Chain Management
MRP
Aggregate Production
Planning & Scheduling
Procurement
Customers
Manufacturing |
Inventory
Periodical forecasts
Bundled orders
<(|?Informa^ion Mm- .-
Supply allocation
Rolling forecast
--of ^ j ^ ^ a i i ^ ^ y ^
Procurement
Replenishments
Order fulfillments
F I G U R E 19
Manufacturing
l.^^^y^'^-.rj
Distribution ||Customers
'^
3>
Sales Management
decision
Performance feedback
Daily Orders
Information as driver for supply/demand chain (Ref. 23, copyright Taylor & Francis).
ISO9000
QMS
MRPi;
QC (SPG)
Scheduling
CRP
* Maintenance Management
F I G U R E 2 0 Systems Integration around MRPII. SQC, statistical quality control, includes SRC (statistical process control); QMS, quality management system, includes Q A (quality assurance); CRP, capacity
requirements planning; GT, group technology; CAPP, computer-aided process planning; CAD, computeraided design; CAM, computer-aided manufacturing (Ref. 36, reproduced with permission from Inderscience Enterprises Limited).
448
process control
SFC
[uality
rting
' manufacturing
l^^gdesign eng.
F I G U R E 21
accounting
purchasing
inventory
control
distribution
order entry
Pivotal database
sales
While the subsystems in this integrated database, such MRP, can function independently, its status and directories would be monitored by the metadatabase.
The main difference in this system is that the conventional MRPII system's
functional entities, such as orders and schedules, are converted to features, and
the features and feature-based rules are then mapped onto data and knowledge
representations. In this case, an object-oriented (or frame-based) representation
for the feature hierarchy and dictionary is developed.
Harhalakis et al, [13] have also utilized the MRPII system's large database
for further integration, to create an enhanced database system for a company.
They are of the opinion that it is necessary to develop and share data from a
variety of sources, specially the MRPII system. The authors have proposed the
integration of MRPII with computer-aided design (CAD) and computer-aided
process planning (CAPP), where they modeled and analyzed the system using
Petri nets.
Since integrated information and databases have become the center of research in CIM, the MRPII system has been used to integrate CAD, CAPP,
Control engine
Interference and
Mapping algorithms
View-Entity
Relationship
generator
Database
Supervisor
F I G U R E 22
Enterprise schema
Entity-relationship 1
Catalog
t
Operating rules
Manufacturing Knowledge
Feature dictionary
Database dictionary
"1
METADATABASE
(at the lacility level)
[MIS
11 Control
u>
11
Engine
1
[ DSS
Enterprise
Schema
1 MRP II
Scheduling
Vlaterials
Planning
System
Enterprise Information
Planning System
DBMS 1 1
DBMS 2
1 DBMS n
DB 1
DB2
1 DBn
FIGURE 23
450
TABLE 13
CAD
MRPII
CAD
MRPII
Part number
Drawing number
Drawing size
BOM unit of measure
Part number
Drawing number
Drawing size
BOM unit of measure
Purchase inventory
unit of measure
Unit of measure
conversion factor
Source code
Standard cost
Lead time
Supersedes part no.
Superseded by part
number
Part number
Revision level
Effectivity start date
Effectivity end date
Status code
Part number
Revision level
Effectivity start date
Effectivity end date
Status code
Routing record
CAPP
MRPII
Routing number
Part description
Unit of measure
Operation number
Operation description
Work center ID no.
Setup time
Machining time
Handhng time
Run time
Feed
Speed
Depth of cut
Number of passes
Routing number
Part description
Unit of measure
Operation number
Operation description
Work center ID no.
Setup time
Status code
Run time
CAPP
ID number
Description
Department
Capacity (HR)
Rate code
Resource capacity
Dispatch horizon
Effectivity start date
Effectivity end date
Status code
ID number
Description
Department
Resource code
Begin date
End date
Status code
451
S t a t u s C o d e s [ 13]
P a r t revision status codes in MRPII, C A D , and C A R P
MRPII
R"Released"
H"Hold"
W"Working"
R"Released"
H"Hold"
Under review, pending approval, possibly with a new revision level. The part
should not be used by any system.
O"Obsolete"
W"Working"
R"Released"
H"Hold"
O"Obsolete"
R"Released"
H"Hold"
W"Working"
R"Released"
An active routing, whose process design has been finalized and approved.
H"Hold"
Under review, pending approval, possibly with a new revision level. The
routing should not be used by any system.
O"Obsolete"
R"Released"
H"Hold"
D"Delete"
W^"Working"
R"Released"
H"Hold"
O"Obsolete"
452
According to the authors [13], the Petri net theory has distinct advantages
that enabled them to model an integrated database system around MRPII. The
advantages are hsted as: (i) its ability to model simultaneous events, (ii) its
ability to represent conflicts, (iii) its potentiality to identify the status codes,
using Petri net graphs, and (iv) its hierarchical modeling capability.
453
--
Product
Product family
<:-i
Assembly r"<:
Part
H Product group
-<*.
Notations used:
<: Generalization
Association
<* Aggregation
Piece Part H
F I G U R E 24 An overview of an abstraction hierarchy for a product structure (Ref. 54, reproduced
with permission).
t_J t
Process
F I G U R E 25
"
1. ^
454
Net requirement
rPART_II>j-PERIOD_ID-rNET_REQI
Inventory out
-PART ID-iPERIOD ID-i-INV OUT-n
i l l
Inventory
PART_ID-r-PERIOD ID-|-ON_HANEr|
I 4
I 41
.
II
Cumulative gross requirement
I I
f I
GROSS_REQ-|
1JT
'11.
F=H
Gross requirement . .
I in
I ii I
GROSS_REQ
ilT i
F I G U R E 26
Relational view of net requirement computation (Ref. 54, reproduced with permission).
455
netting procedure is always feasible, and leads to net requirements at the same
level of abstraction [54].
Similarly, several other MRPII procedures such as lot sizing, MRP explosion, and lead time offsetting, can also be represented in a relational view.
Finally, a relational database system integrates the representation of abstraction hierarchies and planning procedures [54].
A prototype system, having database definition and processing in the relational DBMS Oracle V5. IB, its SQL interface Oracle SQL*PLUS V2.1, and
its C interface Oracle PRO*C VI.2, was developed. This system provides the
facilities of normal materials planning, and additionally reusability and simulation. It is true that some of the complex planning procedures cannot be represented solely as a sequence of relational operations. Those were implemented
in C programs. Database functions and relational operations were embedded
in these procedural programs as high-level SQL commands. All base tables are
loaded/unloaded via import/export procedures from a host-based MRP system
that supports a hierarchical database model. The authors hoped that despite
several drawbacks, continuing development of relational DBMS and the integration of database representation techniques and inferential processing will
allow the application of the above concepts to more complex planning systems'
design at a reasonable level of performance [54].
Olsen et aL [35] described the use of this approach in the case of customeroriented products, which need a large number of variants. As a solution to the
problem of managing a large number of product variants, if separate BOMs
are created for each variant, the database would become so large that it would
be impossible (and/or inefficient) to manage in the computer system. As a result, the authors suggested a generic structure of BOM. They designed this
generic BOM using a relational database technique, but their approach is different from the traditional method of generic BOM. Earlier attempts to develop generic BOM have been directed toward either creating planning BOMs
or improving the tabular structures only. As this technique has several limitations, including inflexibility, the authors proposed resolving it by the application of programming language notation [35], although there has been a report
of the use of the object-oriented technique for this [7,8]. The authors, however, disagreed with the idea of object-oriented techniques for this purpose as
it violates one requirement of describing components independently of their
utilization.
Olsen et aL [35] proposed building generic BOM by utilizing some constructs, namely procedure concept, variables, input concept, and selection (case)
statement, of programming languages. This has been termed as a procedureoriented approach. In this approach, the variant or customer-specific product
specification is established through input from the user. A prototype was tested
on a PC under MS Windows. The programs were implemented with SQL Windows case tools from the Gupta Corporation.
The description of the generic BOM (GBOM) as a program enables the
system to support the user in the product variant specification task by defining
each component with header and body, which is stored as a database record.
Variants are expressed simply through attributes.
Suppose that a "stool" has a component "seat," which may have three
options of colors, namely red, blue, and white. The followings are the codes
456
component 200 is
name( "stool");
end component;
body 200 is
include 400;
include 500;
end body;
In the above codes, a procedure, "component" has been illustrated. While the
head identifies a component and presents its attributes, the body presents the
goes-into relationships, i.e., BOM structure. Here, the head part of the component "seat" has a single attribute seatColor, which gives optional views of the
component, i.e., may have different colors of red, blue, or white, based upon
the user's selection. The other codes show it as a part of "stool" (ID: 200). The
body part of the "seat" may be defined as shown in Fig. 27 [35].
In the Figure, the head part of codes for only cover (ID: 450) has been
shown, although the seat is composed of cushion, chipboard, and cover. The
heads of other components can be defined similarly. The codes show that the
seat color may vary, depending upon the options of colors of cover, which may
be selected by the user during variant creation. The attribute coverColor is,
thus, used as a specification of the product variant. Further specifications with
options in part features can be added, such as options of different seat sizes.
The generic view of the complete procedure to create a specific BOM from the
generic one is shown in Fig. 28 [35].
The system, in Fig. 28, has three tasks. Generic BOM Task (GBST), Product
Variant Specification Task (PVST), and BOM Conversion Task (BCT). The
GBST specifies the generic BOM for one or more products. At the stage of
PVST, the attributes for a specific variant are given, in order to create a specific
BOM for a particular product. At the step of BCT, each component variant
referenced in the ABOM is replaced with a unique item number. The translation
tables (TTAB) contain the necessary information for conversion from attribute
values to component numbers [35].
The data structures, output from these three tasks, are: (i) The GBOM
describes all components used in a BOM; (ii) ABOM describes a specific product
variant; and (iii) NBOM describes the BOM in terms of indented BOM for use
in a conventional information system, such as for use in commercial MRPII
systems [35].
body 400 is
include 410; cushion
include 420; chipboard
include 450 with; cover
coverColor(seatColor);
end include;
end body;
FIGURE 27 A GBOM view [35].
component 450 is
name("cover")
coverColor(red|blue|white);
end component;
457
Task
Data output
Attribute-identified specific BOM
(ABOM)
Child
400.red
200
400.red 410
1 400.red 420
Quantity
1
1
1
Task
BOM Conversion Task
(BCT)
r
Level
1
.2
..3
Data output
Number-identified specific BOM
(NBOM)
I
F I G U R E 28
'
id
200
400.1
410
420
quantity
1
1
1
1
458
model and integrate data in a CIM environment. The authors present CIM as a
system composed of design and engineering data and some production-related
data, which is in fact an integrated MRPII system. Interestingly, the authors
prescribe the use of object technology in the case of the first type of data, and
the relational data technique for the MRPII-related functions. The logic behind
this is explained:
Data abstraction provides a means for organizing data into objects that
are semantically meaningful to application systems. Object Oriented programming techniques enhance the semantic content of data by defining
relationship^, possibly with inheritance of properties for CIM applications
[the authors here mean the engineering and design applications] because
the fundamental principles behind it are the creation and manipulation
of data objects, and these are exactly the activities done during typical
engineering, design and manufacturing processes. The object paradigm
also provides a flexible data model in which the complex data structures
found in many design and manufacturing applications can be created and
manipulated easily. This flexibility also helps in integration of the many
applications in a typical CIM environment. [47, p. 144]
At the same time, the authors were of the opinion that other aspects of CIM,
arguably the MRPII functions, having quite similar data, fit into the realm of
traditional relational database technology. As a result, the CIM database management system was shown as a hybrid system, comprising object technology
and relational data technology. This hybrid system was proposed as the ROSE
Data Manager. The framework of this system, as shown in Fig. 29, contains a
distributed object management component, which can be used to transfer data
between an object database (design, engineering, etc.) and a relational database
(MRPII functions).
In majority of the cases, the MRPII systems were built using the relational
database management systems, although it has shortcomings in processing longduration transactions, in storing images or large texture items, and in having
complex data structures. Currently, the research direction in database management systems of MRPII (and other manufacturing systems, such as CAD
and CAM) has moved from the relational database toward the object-oriented
Search engine
(extended SQL)
RO^F
1
Data Manager 1
Object database^
User Interface
manager
Application program
interface
Distributed object
manager
Relational
DBMS
459
460
Structure of
Information Manager
[GIMI
Data Object 1
Server
|
A
1
LEGEND
Global
G I M ] Information
Manager
F I G U R E 30
^Local
[ L I M ] Information I
Manager
^Virtual
I Enterprise
Units
-J
1 ,
Meta
Objects
461
Manufacturing
workflow
Event handler
class object
User
interface
class
object
User
interface
Manufacturing
workflow engine
class object
Manufacturing
workflow
class object
Manufacturing
control interface
class object
Manufacturing
control class
object
Manufacturing
data interface
class object
Manufacturing
data
class object
Database
interface
class object
FIGURE 3 I
462
TABLE 15
Pattern
Domain
Solutions
Functional object
class pattern
Function-oriented
system which is
hard to extract
objects
Interface object
class pattern
A various of objects
needed for
accessing other
objects
Multiple-strategy
pattern
Algorithms utilized
for object class
member function
Algorithm utiUzed
Chained strategy
If the previously generated member functions of
for object class
pattern (structural
object class can be implemented using the
member function
pattern)
multiple-strategy pattern, the objects are
generated through
connected with the supplementary algorithm in
the multi strategy
the object class.
pattern
TABLE 16
OrderManagement
MC class
Interface
ExecOrderAccept()
OrderAcception
OrderDataAccept(OrderData)
ExecOrderCheck()
OrderAvailabilityCheck
OrderDataCheck( OrderData)
ExecDuedateEstimate()
DuedataEstimation
OrderDataEstimate(OrderData)
ExecOrderSequence()
OrderSequencing
OrderDataSequence (OrderData)
463
Drivers
Logistic planning
Information system.
-^
( Implementation strategy^
Elements
Soft automation
Strategic
projects
F I G U R E 32
^^
\^
jC/ivr
Hard automation
1^^^^^
fc^r
'T
RTMS]
Entity/Relationship Model
F I G U R E 33
464
Operating System
<0S Abstraction>
Mes! aging
4
^
Models
Element
View
InterDomain
Services
[Type]"B_TEXT"
[TEXT]"ItemNo."
Mess iging
l^^^^iew>cotitroller = GUI
F I G U R E 34
Object
Request
Broker
[Type]"B_ENTRY'
[Name, Id]"Item"
representation, and only a part of the overall E/R model, which a user deals
with in the ERP system. Conceptually, the business object is a hierarchical composite object that has a root subject entity. For instance, a purchase order of an
ERP/MRPII system is a form created and filled out from the purchase request,
with associated information of a header, item names and IDs, vendor name,
price, etc., whereas an E/R relationship model would show it as a combination
of several entities interacting with it [58].
As Ng and Ip [58] explain, the presentation domain, responsible for managing the user interface and related matters, consists of three components, namely,
business object models, business object views, and business object elements. The
model, as the central control point of a business object, represents a hierarchical
data collection, with a root object that identifies the business object. It is responsible for managing the data cache, requesting data from servers, updating the
servers against transactions, and interacting with the user through the business
object view. Each E/R represented in the business object model is implemented
as a server object in the logic domain. The business object view is responsible
for displaying the information, using GUI windows, as per user input/output
through peripherals, without involving the server. The view layout is defined
in ASCII format in a file, termed as the view definition file (VDF), as shown in
Fig. 34.
The functions of the logic domain are to manage corporate data storage
and retrieval, trigger event-driven processes, and provide program logic for the
business operations. Its components are entities, relationships, transactions,
and databases (Fig. 35). In this logic model, the main target is to estabUsh the
entities object and their relationships object. The hierarchical object model is
shown in Fig. 36. The relationships manager in the logic domain is responsible
for spreading the event-driven business processes, providing query methods for
listing members of the relationship, assigning sequential keys to entity by querying the database, and resolving concurrency lock and registering interest. The
transaction manager is responsible for managing the activities of the database,
including data updating and consistency [58].
The reporting domain is responsible for presenting reports, graphs, and
queries to the user on screen and printer through the Presenter, communicating
465
Operating System
< 0 S Abstraction>
Transaction |i
Manager
Entities | j Relationships
Manager LJ Manager
Object
Inter-
Request
Domain
Broker
Service
with the logic domain through the database object to call the DBMS in order
to perform SQL operations through the Extractor, and analyzing, calculating,
and sorting operations for a report, and assimilating data.
A. Object-Oriented MRPII Modules
Development of object-oriented systems for individual modules/functions
of MRPII system have been reported by several authors [7,8,11,12,44,
49].
B. Object-Oriented Inventory Control System
Sinha and Chen [44] have suggested the object-oriented inventory control system using the hierarchical inventory management model, introduced by Browne
[4]. The multilevel (inventory control) system has been designed utilizing the
Object
D_
Eiement
Entity
r
Person
I String
-r
Organizatiorj
Place
Relationship!
Thing
X
_c
Composition j Reference!
_L
Event
Niimberf
IQty
P0C#
Inv-lov flFixed
asset H Orders
f [Employeef I Company fl Inv-lov
Fixed asset]
[^y|Qmiiaieol|^10r^^
Order lias linef Line refs itenf
466
b
Aggregate
I
'
Family 1
r^
Item 1
r~^
Item 2 Item 3
Item 4
r^
Item n
Data-object hierarchy
Management hierarchy
Aggregate Objects
. Family Objects
*
F I G U R E 3 7 Object-oriented hierarchical Inventory system: (a) Inventory hierarchy and (b) inventory
data-object hierarchy [44].
467
by Class Family, they are available within Item. Here, X represents the order
quantity of an item.
It is observed that this object orientation in inventory eliminates the duplication of data storage of ordering and holding costs for individual items, as
they are stored under Family or Aggregate. The authors have also developed,
as a user interface, different objects for this inventory control system. These are
system level objects, such as Print, Save_List, and Graph (Cost curve), and user
level objects, which are described above (e.g.. Item, Family, Aggregate [44].
From the success of this system, it is suggested that benefits of inheritance of
object orientation can be utilized to develop other subsystems of MRPII, where
many things are hierarchical in nature, e.g., (i) BOM, (ii) Aggregate planning
and MPS, (iii) shop level, family level, and work center level scheduling, and
(iv) capacity planning.
C. Object-Oriented PAC System
Reports also reveal that object-oriented systems for production activity control
(PAC), or alternatively known as shop floor control, subsystem of the overall
resource planning system, have also been developed.
Gausemeier et aL [11] report a case of successful development and implementation of such an object-oriented PAC system, integrated with other
modules of MRPII. In this respect, it must be noted that the MRPII system's
PAC module is limited in its functionality, thereby, needing additional third
party development of a detailed PAC system, and integration with it for fullfledged operation, or customized needs of control.
Gausemeier etaL [11] are of the opinion that although the material flow in
the shop floor has reached a high level of automation, information processing
related to that has not yet achieved that level. With differences between job
processing situations and environments in different shops, disturbances (e.g.,
accidental events), and corresponding capacity changes in a single shop, and
with ever-changing customer requirements from time to time, the PAC system
must be flexible enough to accommodate those changes. Additionally, it must be
noted that the PAC subsystem is one of the areas of a manufacturing information system (another, for instance, is the BOM-designed integrated system) that
must deal with complex databases, because of the presence of heterogeneous
types of hardware systems. Thus, these areas need special attention regarding
system development and maintenance. Several authors [11,12] have advocated
that object-oriented systems can be best utilized to deal with such complex
requirements (Fig. 38).
Gausemeier et aL [11] also opine that the PAC system needs to fulfill three
special requirements, such as, flexibility for requirement change from time to
time, a coordinated communication structure for integrating different devices,
and efficient scheduling and rescheduling capability in cases of disturbances and
overloads. This can be best satisfied by object orientation. Without detailing
the object structures and patterns, Gausemeier et aL [11] give a brief outline of
the system as given below.
Gausemeier etaL [11] draw an analogy between the basic construction and
operation principles of natural objects (or organisms) and manufacturing
468
a Graphic
Terminal
e:= User
MRPII System
Mfg. Orders^ ^ Reporting
MJRP
Interface
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ H ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
1 DNC
Configuration 1 Global 1 Maintenance 1 Tool
On-line
tool
Manager Management Management
Simulation
MMl
Tools
Software bus
Agents
MAP/
Field bus
interface
Objecte
^~5"
Materials flow
F I G U R E 38
469
the lOs calculate grades of competing devices, the one with the highest score is
selected for the operation. This is some kind of priority fixing mechanism. The
results of the scheduling are set into the time schedule which exists for every
lO. In cases of disturbances, which is quite common in any shop, the lO first
tries to fix it using strategies and procedures encapsulated in the method. In
case, it goes beyond the method, the operator is informed through dialogues
[11].
Grabot and Huguet [12] also presented an object-oriented PAC system
development method at Aerospatiale. The authors opine similarly to Gausemeier et al. that a PAC module changes from time to time, because of the
changes of jobs of different customers, disturbances in the shop, expansions
in shop size, inclusion and exclusion of resources, i.e., in general, because of
factory dynamics. This makes the development process of a PAC module and
subsequent integration with an MRPII system difficult, time consuming, and
costly. The authors are of the opinion that object-oriented technology, with
the ability to reuse past experience by utilizing reference models, can be of
tremendous help in this regard.
The authors consider a PAC system composed of PAC modules, which
communicate among themselves and with other external systems. There may
be several types of modules in a PAC system, depending upon the workshop
characteristics. The authors found several such modules, three of them being
given in Fig. 39, as example. These modules can be organized through a hierarchic or distributed structure. The design of a PAC system depends on the
association of these modules. Once they are defined, the activities involved in
corresponding functions associated with each must be described. The functions.
planning time
horizon Tl
aggregated data
Plan
planning time
horizon Tl
Plan
1 trigger
adapt
r T2<Tl'^^^"^^fReactk
planning
React K
time horizon T2<T1 '-~|Rea(
Dispatch
Follow-up
raw data
resources/modules synch.
planning time
horizon Tl
adapt(T4>T3)
laai
planning
horizon T2<T1
,
r aaapi
React K
T3<=T2
Dispatch
resources/modules synch.
F I G U R E 39
Dispatch
resources/modules synch.
Follow-up
raw data
aggregated data
Plan
aggregated data
Follow-up
Notations:
Tl : time horizon of the plan function
T2 : time horizon of the dispatch function
T3 : time horizon of the reaction
on the dispatch function
T4 : time horizon of the reaction
on the plan function
raw data
470
471
F I G U R E 40
The OM and OIG of the FUSION method have been suggested for describing each component, whereas the state-chart formaUsm has been suggested
for describing the internal behavior of the classes (i.e., dynamic model). Two
case tools, FUSION softcase and Paradigm Plus, have been tested and proven
to be useful. The authors are of the opinion that reusability is an essential
requirements in the case of PAC system development, which can be ensured by
developing first the generic components and then by efficiently managing object
and object pattern (component) hbraries [12].
Smith and Joshi [46] present an object-oriented architecture for the shop
floor control (SFC) system, though under the CIM environment. It must be
noted that SFC is a dynamic module of MRPII, which is a collection of physical
and conceptual objects, when viewed from the OO point, such as orders (to be
competed), parts (to be manufactured and/or assembled), machines, equipment,
material handling systems (e.g. conveyor belt, forks, robots), and people. The
target of OO SFC is to create suitable objects with necessary behavior with their
attributes, and methods for interaction among objects, in order to complete the
production as per plan generated by the MRPII system, be it as a stand alone
island, or under a CIM environment.
Production Management
(MRPII)
PAC Module
Order list
Parts_to_manufacture (material)
Manufactured_parts
Orders_follow-up
I
F I G U R E 41
Transportation system
between workshops
472
Completed
Job
Ne w-operation (operation)^
fi^y,
^
s^reate (job,resource)
Planned
Operation
(4)
Job-end(job)
Follow-up
Manager
, Job
DispatchQobj^
Manager
(5)
F I G U R E 42
Job-end(job)
(2)
\ |
New-operation? (3.2)
Dispatch
Manager
Dispatch(job,resources)
(4)
New-job?(last-job-completed)
^ .
n)
Smith and Joshi [46] have defined the organization of an SFC system as
hierarchical, with three levelsshop, workstation, and equipmentin order
to improve controllability and make it generic. The equipment level provides a
view of physical machines on the shop floor. The authors have created several
objects at the equipment level. Out of these objects, which are of particular
importance in the case of SFC for MRPII, are material processing machines
(MP) consisting of the machines, inspection devices, etc. that autonomously
processes a part, material handling machines (MH) for intraworkstation part
movement functions (e.g., loading, unloading, processing machines, and material transport machines (MT) for interworkstation part movement functions
(e.g., transporting parts from one station to another). At the workstation level,
a workstation is considered as a set of equipment level objects and additionally
some other objects, such as a part buffer. The shop floor level, composed of
workstation objects, is responsible for managing part routing and reporting
shop status to other modules.
This SFC system developed a shop floor equipment controller class, consisting of communications and storage classes. These aspects mostly relate to
the information integration of CIM hardware systems, which is beyond the
scope of this chapter. However, the system has also introduced scheduling as
part of the OOSFC (object-oriented SFC) information system, which has direct relevance to MRPIFs PAC aspect. The control cycle (function) has been
presented as a sequence of activities having the pseudo-codes (thus, generic)
S y s t e m Infrastructure
Systems
| Databases |
| Common facilitie^
| Communication!|
I Ohiprt
Object him
burjI
[ Languages
I Object services!
Applications
PAC
services
comp.
Generic &|
Task Mgt.
Workshop
management
comp.
Workshop
structure,
Store task,
F I G U R E 43
Equipment
management
comp.
Material
|
Management
Technical
Info. Mgt.
comp.
Product &
Process Spec,
Comp operator
polyvalence
Resource
management
tasks comp.
Production,
1
Maintenance,
Resource config.J
and Equipment
MCS
management
comp.
Planning,
dispatching,
Reaction to
perturbation,
and follow-up
Quality
management!
comp.
Quality
Control,
Traceability
473
as follows [46]:
wait for startup message
read configuration information
do until shutdown message received
call scheduler
call task executor
read messages
update system states
enddo
write configuration information
shutdown
The separate calls of execution and scheduling in the above codes of the SFC
system have kept scheduling separate from execution. While "execution" is
responsible for determining whether a particular action is valid, and performing
it if valid, "scheduling" is responsible for determining an efficient sequence of
actions. The scheduler is responsible for placing task request records in the task
queue. The task executor, after examining the records, executes the tasks if the
preconditions (such as, if machine is idle) are met. By separating the scheduling
and execution functions, different scheduling algorithms can be "plugged-in"
to the controller, depending on the production requirements. The system has
been implemented in C + + [46].
Moriwaki et al, [26] have developed an object-oriented scheduling system.
Although the system is real-time and has special reference to FMS and autonomous distributed manufacturing, the concept of the system is well applicable to the MRPII system in general. The study first discusses the architecture
of autonomous distributed manufacturing systems (for FMS), and then its realtime scheduling system. Since the architecture of FMS is beyond the scope of
this chapter, the following section discusses only the scheduling function as it
has direct relevance to MRPIFs PAC module.
A monitoring system has an important role in this scheduling system, as it
needs to monitor and inform the SFC system of the needs for real-time scheduling. The status of the objects is changed according to the status data transmitted
from the monitoring system. Thus, the objects are autonomous. The conflicts
between objects are resolved through negotiation. The ultimate manufacturing processes are done according to the schedules determined in the system. It
considers only machining processes, and not any assembly or transportation
process, and is based on the assumptions that the process plans and machining
times of individual work pieces are fixed [26].
Figure 44 shows the objects, their attributes, and methods. The objects
are classified into four classes^work pieces, process plans of the work pieces,
equipment, and operations. The objects representing the work pieces include
three classes of objectspart type, lot, and the part. While the "part type"
object represents the kinds of part, the "lot" object represents individual lots of
that part type, and the "part" object corresponds to individual work pieces in
the lot. The "equipment" class is represented by the objects of equipment type
and equipment. The process plans of individual part types are established by the
process objects and preference relations objects. The operation objects give the
474
Process
Object
Part Type
"
G
Lot
Part
H>CLot Size])
["(^Due Date)
Processes
-^
Preference
Relations
Operation
Uc^ate)
|c(^ositioiy
U)(gtatus)
Operation to be scheduled
Scheduled Operation
c^ime)
Sate)
[Finished Operation
^Unnecessary Operation
Fj"
Notations :
Object
Is-a relation
Cstatus)
iQbject h
<\ Object
P;'art-of relation
-<^Attribute)
[Object]
Attribute
F I G U R E 44
relations among the processes of the part types and feasible "equipment type"
that can perform the operation. An example of such processes, operations, and
their preference relations of part types A and B is shown in Fig. 45. In this
figure, Ui and bj are the processes, and the arcs are the preference relations
among the processes. Each process has been defined as a set of alternative
operations available (e.g., aik^ bjm), which specify a feasible type of machining
cells ("equipment" and "equipment type" objects) [26].
a3
'^ll
^2
r-i
^31
a.32
Part Type A
Part Type B
F I G U R E 45
475
Machining cell 1 ^
FIGURE 46
r Lot 2)
Object (coordinator)
I Selected
i^operationj
Modification of status
Rule
base
/Negotiated
\operation
Negotiation for an arrangement
StopJ
I An operation is ended | | A new lot is inputj
I Monitoring System
FIGURE 47
476
Ulrich and Durig [50] present a capacity planning and scheduling system,
using object-oriented technology. It is known that these two functions together
aim at achieving a production schedule on the time axis (time horizon) consuming (available) resources.
In order to ensure reusability of objects, a three-level OO system is suggested as follows [50]:
Universal platform. It provides a universal base (library) of available objects that are quite generic and, thus, applicable to several kinds of application
areas/functions.
Specific platform. This becomes more specific to a functional application
area (e.g., capacity planning). The objects are defined based on the field/function
in addition to the universal ones.
Individual supplements. For a field (such as capacity planning), individual
realizations need supplements to the specific platform. It may be an individual
capacity planning solution for a particular case, whereas the specific platform
defines the capacity planning method in general.
Three basic elements, as listed below, are identified as describing the problem
at a very high level of abstraction for the specific platform [50].
Resources. It represents capacity of resources, such as machines and
labor.
Planning items. These are the production orders, generated internally, or
ordered externally.
Time axis. It shows a shop calendar on a continuous time axis.
The following two data administration techniques for administering a set of
objects of the same kind that belong to a collection are developed: (i) Open
administration, which enables the user to have access to these objects and can
perform normal database operations, and (ii) hidden administration, which
does not allow the user to have access to these objects, and the administration
is performed automatically by the system [50].
The consumption resources comprise consumption elements, which perform the acts of consumption, for which a hidden data administration is necessary to manage those "acts." Each resource administers its adjoined consumption elements according to a well-defined pattern, e.g., intervals and areas. A
link to the shop calendar helps to perform that along the time axis. A scheduling of the consumption elements along the time axis according to the defined
pattern determines the availability of resources [50].
This kernel of resources and consumption elements, administered by the
data administration technique, is linked to the production process environment for consumption (Fig. 48). Production factors, such as machines and
labors, comprise resources, represented by workload or the available material
for consumption to the production process. The planning items utilize the consumption elements, which are allocated to resources. The user manages these
directly, using open data administration [50].
In order to demonstrate the use of individual supplement object classes, a
particular task of capacity-oriented time scheduling of production orders for
the period of 1 week to 3 months (i.e., short term planning) for final assembly
F I G U R E 48
477
Object-oriented view of the relation between planning items and production factors [50].
has been illustrated. For this situation, the hierarchies of abstract objects for
specific platforms and individual supplements are given in Table 17. The classes
in italics are additional for individual supplements. For every customer order, a
number of elementary orders may exist, which act as internal production orders,
and are released to fulfill a specific customer request. A new class "production
order" is introduced as an individual supplement class, comprising "customer
order" and "planning item". The "planning item" contains all general methods for performing planning activities within the scheduling process. The new
subclass "elementary order" inherits all these methods, in addition to some
supplementary methods for this specific application [50].
E. Object-Oriented Bill of Materials
Several authors [7,8,49] have reported successful development of an OOBOM
(object-oriented BOM) system, narrating its advantages. On the other hand
Olsen et aL, [35] disagree. According to Olsen et ai, althgugh the power of an
OO approach is recognized in general, the composite object hierarchy and inheritance features are not advantageous for modeling product structures. The
reason behind this is that in an OO system, the components inherit the features of their parents, which violate the requirement of describing components
independently of their utilization.
Trappey et aL [49] state that a conventional BOM structure, which manages data in relational database management style, cannot satisfy the needs requested by all departments in a company. In this respect, they mention several
limitations, such as rigidity of data structure, inability to describe behavioral
relations, and difficulty in changing data description. They advocate that the
object-oriented programming (OOP) concept can design a BOM (OOBOM)
478
Object
Model
Production planning model
Element of production
Factor of production
Planning item
Process data
Calendar
Data-administration
Open administration
Calendar administration
Production order administration
Planning item administration
Process data administration
Hidden administration
Resources
Interval covering resource
Area covering resource
Element of consumption
Interval covering element
Area covering element
Object
Model
Production planning model
Element of production
Factor of production
Assembly line
Production order
Customer order
Planning item
Elementary order
Process data
Bill of Materials
Production parameters
Calendar
Data-administration
Open administration
Calendar administration
Production order admin.
Customer order admin.
Elementary order admin.
Production factor administration
Assembly line admin.
Process data administration
Bill of materials admin.
Production parameter admin.
Hidden administration
Resources
Interval covering resource
Area covering resource
Element of consumption
Interval of consumption
Interval covering element
Production element
Setup element
Blocking element
Area covering element
View
Production planning view
One-dimensional view
Assembly line view
Two-dimensional view
Controller
Production planning controller
One-dimensional controller
Assembly line controller
Two-dimensional controller
479
480
PartLibrary
BomTree is invoked
if modification of a
product is needed
BomTree
r
Part creation starts
at PartLibrary, through
the use of PartWindow
When adding a sub-part,
1
PartWindow |
1\
D-rD
F I G U R E 49
481
I Material |
I SM00l|II Geometry |-j
II ID Number I
CNon-rotation)
^S^ffll)
|SM001||"SM002||SM003|
aggregation relationship
I ID Number I |SM0Q3 Categoo|
generalization/specialization relationship
|lDNo.|
|RM003|
|RM005|
I BOM Structure
I
F I G U R E 50
of an OO data model, and thus, the conceptual data model of OOBOM can
be mapped onto the abstraction and inheritance architectures of an OO data
model. The conceptual data model integrates the semantic relationships, such as
References, Referenced-by, Owns, Owned-by, Composed-of, and Part-of, with
object orientation concepts.
In an OOBOM system, the parts, which are assembled together to form a
subassembly, can have aggregation semantic relationships, i.e., is-part-of relationship, where the subassembly is termed an aggregation object and the parts
that go into subassembly are called the component objects. For instance, in
Fig. 50, SM003 is an aggregation object and ID number and SM003 Category
are the component objects. A generalization relationship, i.e., is-kind-of relationship, is used to define generic part objects from their possible categories or
options. For instance, in Fig. 50, Rotation and Nonrotation are the categories
of the generic object Shape Category. SM003-1 is a possible option of SM003;
thereby, it is having a specialization relationship, which is the converse of a
generalization one [8].
The authors propose two classes of objects, namely the BOM class, which
simply defines the physical and logical relationships between components in the
hierarchy of BOM, and the Part Property class, which defines the properties,
such as shape, color, and material, of the parts in the BOM class (Fig. 50 and 51).
They opine that this classification is similar to E-class (entity class)
and D-class (domain class) of an OSAM database respectively. The BOM
objects may reference (Reference semantic relationship) the properties in the
Part Property class. The Own semantic relationship can be used to generate/delete/change/modify an individual object to maintain engineering changes
that happen frequently in production organizations. While a BOM object
may contain a set of instances created by the user of the OOBOM database,
the property object does not contain the user-created instances. The property
object is only a property that specifies the data type and/or a permissible range
of values [8].
482
FIGURE 51
An object in the BOM class can get property information of the Property
class via Referencing (R), Generalization (G), Own (O), and Aggregation (A)
relationships. When referencing is needed, an attribute (which is a character
string that is the unique identification of the referenced object, or an address
pointer to the object) that establishes reference to another object is created,
thereby creating a relationship between two objects. Communications between
objects are established by sending messages through the execution of methods
and utilization of defined semantic relationships. Since several objects may try to
have a lot of communications, a control mechanism is necessary to simplify that.
The OOBOM allows a relationship and communication between objects when,
and only when, one object has an attribute that is a reference to another object.
The relationships are shown in Fig. 52. In the case of referencing, the messages
BOM
Objects
Property Objects
FIGURE 52
483
associated with a Referencing relationship are sent by the BOM object, and any
message associated with the Referenced-by relationship are sent by the property
object. The existence of a Referenced-by relationship means that a property object contains a hst of all the BOM objects that reference the property object [8].
In an Aggregation relationship (Fig. 52), messages associated with
composed-of relationships are sent from parent to child, whereas messages associated with part-of relationships are sent from child to parent. In an MRPII
environment, there may be frequent engineering changes. In such a situation, decomposed components need to be updated if their parent is modified. In an Own
relationship, BOM object may own another object, which may be a property
object as well. The owner object can order the destruction of the owned object
and can perform operations, like resetting the values of the owned objects' attributes. An example of an Own relationship is the creation and maintenance
of BOM and property objects [8].
Chung and Fischer [8] have Hsted possible attributes of both BOM and
property classes. The example of a property object is given in Table 18.
In another previous occasion, Chung and Fischer [7] illustrate the use of
an object-oriented database system, using ORION, a commercial OO database
system of Itaska, for developing BOM for use in the MRPII system. ORION is
a distributed database system. As such, the object identity also needs the site
identifier of the site in which the object is created. Being opposed to the traditional idea of using a relational database system for BOM, the authors mention
TABLE 18
CLASS NAME
P-RM005
DOCUMENTATION
CLASS ATTRIBUTES
Superclass
PROPERTY
Subclass
None
Class category
Property
OBJECT IDENTITY
Object Identifier
DATA ATTRIBUTES
RELATIONSHIP METHODS
Referenced-by-Relationship
Owned-by-Relationship
RM005
Part-of-Relationship
P-SMOOl, P-SM003
OTHER METHODS
END CLASS
484
1 PACKAGE_PENCIL |
1
1 BOX KIT
yh,
1 LABEL 1
1 PENCIL_ASS\|
1 BODY
/ \
1
1 REWIND MECHJ
1 CAP
ASSY]
several limitations, and advocate the use of OODB instead. The authors opine
that unlike relational data models, an OO data model allows the design of an
application in terms of complex objects, classes, and their associations, instead
of tuples and relationships. This characteristic of the OO technique is highly
desirable in a BOM system, as a BOM is a complex composite object.
Aggregation is useful and necessary for defining part-component hierarchies, and is an abstraction concept for building the composite objects from
the component objects. It must be remembered that BOM is a hierarchy that
relates parent-component relationships, and several components are logically
(logically during the modeling stage and, later on, physically on the shop floor)
assembled together to form a composite object. During operation of the BOM
module in the MRPII system, it may be necessary to perform some operations
(such as deletion, addition, storage management) in the entire BOM, considering it as a single logical unit. In the OOBOM proposed by Chung and Fischer
[7], the BOM has a composite object hierarchy with an is-part-of relationship between parent and component classes, and this composite object may be
treated as a single logical unit, and thus fulfills the requirement.
Fig. 53 is an example of a PACKAGE_PENCIL object class, having
component object classes of BOXJCIT, PENCIL_ASSY, and LABEL. The
PENCILJVSSY is again composed of several component classes. Thus, the
BOM shown has a hierarchy of object classes that can form a composite object
hierarchy.
The detailed information on the PENCIL_ASSY object class is shown in
Table 19.
The components in a BOM require a unique component identifier. This
(in the form of an object identifier) is created automatically in ORION, without the designer's interference, while instantiating an object. The object identifier is used to facilitate the sharing and updating of objects. The uniqueness
of an identifier allows a direct correspondence between an object in the data
representation and a subassembly in BOM.
During creation of a BOM, each part is defined independently, and then dependency relationships are established between parts and components defining
parent-component relationships. In the case of OOBOM, objects are created
independently, but then dependency relationships between object instances are
established. An example of such a BOM for PACKAGE-PENCIL is in Fig.
54. There exists the aggregation relationship between the component instances
associated with PACKAGE_PENCIL. For example, each PACKAGE_PENCIL
485
TABLE 19
PACKAGE_PENCIL CLASS
PENCILJiSSY
Class name
Object identity
OID
Documentation
Superclass
PACKAGE-PENCIL
Subclass
Attributes
Methods
PENCIL PACKAGE
BOX KIT
Symbols used:
^
is-part-of
4
generic instance of
version instance of
generic instance
version instance
class
F I G U R E 5 4 Hierarchy of object classes and instances [7]. (a) Composite link between generic
instances, (b) Hierarchy of version and generic instances of BOX_KIT. (c) Object version hierarchy.
486
F I G U R E 55
i.e., METAL_CASE. Again, M292 may have several options, like M292-1
and M-292-2. Thus, M-292 is a generic instance of PACKAGEJPENCIL, and
METAL_CASE is a generic instance of BOXJCIT. The generic instance may be
used to reference objects (e.g., iron) without specifying in advance the particular
version needed. Changes in the definition of a class must be propagated to its
generic instances unless a new generic or version instance is explicitly created.
When part/component is used to assemble several parents in a BOM, there
arises the situation of multiple inheritance, as shown in Fig. 55. In this situation, there may be name conflicts, which have been resolved by arranging
(user-defined or default ordering) the order of the parent class for each component class. In this example, the order of parent classes for the component
class PAPER is BOX_KIT, LABEL, and PROMOTION_GIFT (this is an object
class to promote sells). However, this order can be changed by the user, based
on requirement. In the case of name conflicts arising from the inheritance of
instance variables and methods, one must add new instance variables or modify
the name in a class lattice composed of the conflicting component classes [7].
Chung and Fischer [7] suggest that certain numerical algorithms should
be added to the OOBOM system, in the form of rule-based procedures, to
facilitate integration of OOBOM with the MRP system, which needs to perform
calculations for MRP. The authors also suggest that for graphical display, Pascal
language may be used for pixel-based graphic operations. However, several
other issues need to be addressed.
F. MRPII as an Enterprise Information System
On many occasions, OO information systems have been modeled, where MRPII
played either the central role of a company-wide information system or a major
role in it. [22,57].
An information system is considered a vital ingredient of a computerintegrated manufacturing (CIM) system. An efficient design of an information
system, based on data of different islands of information, is thus a must.
Ngwenyama and Grant [57] present a case study for modeling an information system involving MRPII as the center part. The primary reason for using
object-oriented paradigm is the recognition that the OO approach offers powerful, but simple, concepts for representing organizational reality and the system
designer's ideas about the system. They identified three levels of abstraction in
OO modeling: (i) organizational, (ii) conceptual, and (iii) technical, as shown
in Table 20.
487
TABLE 20
Levels of
modeling
Objects
Messages
Methods
Organizational
Business process
Information flows
Business procedures
Business activities
Physical flows
Data flows
Information processing
Organizational roles
Information views
Information handling
Conceptual
procedures
Communication procedures
Processes
Communication processes
Data
Subject databases
Information systems
Technical
Data flows
Database structures
Application programs
Operating procedures
Hardware
Systems software
Communication system
Ngwenyama and Grant [57] develop the information system using seven
conceptual models. The procedure for deriving a model is as follows: (i) identify candidate objects, (ii) classify them as active or passive, (iii) define their
characteristics, (iv) define the methods they enact v^ith or w^hich operate on
them, (v) define the messages that they send or receive, and (vi) construct the
diagrams. The models are show^n in Fig. 56.
The object of the global model is to capture, at the highest level of abstraction, the interactions among the organization system and/or subsystems.
This show^s the primary information links (messages). Developing the global
model requires the definition of major operating systems, also considered objects, embedded in the enterprise without regard to artificial boundaries, such
as departments. The MRPII system and its modules have the central role of
linking several business activities at this level. Next, the business model decomposes each subsystem (i.e., objects) into its set of related activities/processes.
These can be further divided into individual tasks. Later, the information flow
Global Model
Message Flow Model
H Business Model
Data Model
Responsibility Model
488
^ ANALYSIS STEPS
Manufacturing System
(MRPII)
Decomposition of System's functions
(^bjectname
%nction_name ^
functioncode
procedurename
inputdata
outputdata
controldata
y^ method
) ->
attributeset
i^ method
Entity Class
Semantic design
Function Class
Relationship
diagrams
Information model/
DESIGNSTEPS
FIGURE 57
^
OOMIS Methodology [22].
489
Master Production
master production schedule
Schedule (MPS)
cost (accounting)
purchase order
i
(forecasting)
worK order
(production mgtT)|
product structur Inventory Control
purcha^[e order
current inventory
exceptional conditions .
inventory data
d
F I G U R E 58
safety stock
similar to the IDEF technique, the functional diagrams are prepared indicating
inputs, outputs, and control mechanism, as shown in Figs. 58 and 59. In the
diagrams, information and materials must be distinguished by their different
logistic properties. The functional diagrams are converted into a function table
giving some additional information, like function code and procedure name.
The function table for inventory control function is shown in Table 21a. The
function code includes some codes for indicating decomposition level and order.
The data table represents output data of the functional diagrams. This data
may be modified and referenced by other functions, since the complete system's
functions are highly interrelated and interacting, as is the case of MRPII. A data
table contains function name and code received respectively from the functional
diagram and table; attribute_name, which is a part of the manufacturing data
and represents the output data name literally; attribute.of, which is the entire
manufacturing data; and a description of attribute. Next in the hierarchy is
the operation table, which describes how a manufacturing function handles its
operations. The operation table for inventory control function is shown in Table
21b. The operations in the operation table are the aggregated set of processes
or procedures of the unit functions, which become the class methods later in
the design phase [22].
exceptional conditions
Work Order
Control
purchase order
(purchasing)
material plan
(MRP) -^
F I G U R E 59
cost (account)
. Material Issue
and Receipts
current
Inventory Balance inventory
Management
490
Function code: F7
Operation
Description
inventory-trans
report-inventory
material-receipt
w^ork-order.proc
on line
proc-bom
In the design phase, the tables obtained in the analysis phase are translated
into an object-oriented information system (such as for MRPII) using OOMIS.
The aggregation and integration process converts the function, operation, and
data tables into a class dictionary having two types of classes, namely function
class and entity class (Fig. 57). As example, a part of the class dictionary is
shown in Table 22. Here, the operations of inventory control function (Table
21b) have been used to define the methods of the classes of Table 22. While the
function class describes the properties of the MRPII functions, the entity class
describes the data. The class dictionary can be translated into a specific data dictionary of OODBMS. The semantic relationships (aggregation, generalization,
unidirectional and bidirectional synchronous and asynchronous interaction)
are used to describe class relationships. Figure 60 illustrates the aggregation
hierarchy of the classes of Table 22 [22].
The aggregation process is carried out by identifying common concepts
in the function, operation, and data tables. The attribute_name tuples of the
data tables, which have the same attribute.of field, are aggregated to represent a manufacturing datum whose name is attribute.of. If the function tables
have the same input and control data, these are merged together, where the
merged function table contains the output data that are related to each function table. This combined table can refer to the operation tables, which are
then related to each function table, to choose a suitable method for organizing the function table. Then the integration process is carried out to resolve
491
A P a r t o f t h e Class D i c t i o n a r y [ 2 2 ]
fn_name: wori<_order-Control
obj^name: w o r k _ o r d e r
o b j - n a m e : inventory_date
fn.code: F7.1
proc_name: w_o_c_proc
in.data: work_order
BOMJnfo
inventory.data
out-data: exceptionaLconditions
con data: safety stock
product-id:
batchJd:
priority:
order.date:
due-date:
completion-date
quantity:
cost:
BOL-no:
Routing:
productJd:
partJd:
on-hand:
lead-time:
lot-size:
on.order:
location:
raw-material-type:
alternative:
work_order_control
update
update-inventory-status
obj_nanne: productionJnfo
obj-name: partJnfo
obj-nanne: BOMJnfo
product-id:
production_name:
description:
version_no:
uom:
partJnfo:
BOM_no:
BOL_no:
partJd:
material-code:
process-plan-no:
NC-prog-no:
GT-code:
drawing-no:
BOM-no:
productJd:
part-id:
quantity:
uom:
assembly Jnstruction:
update
update
add-bom
modify _bom
Partinfo
k
*
\
iDrawingno
^ 1 Materialcode 1
1 Nc_program# |
Product info
Bol no
Work order control
Bom no
In data
[Prawinginfo
ZZZZZZZI
Material info
Nc_program_info
Part id
Bol info
Work order
Bom info
Bol no
Part id
Product id
F I G U R E 60
492
conflicts in the aggregated tables. During this, each table is compared with other
tables to identify and remove conflicts between corresponding components [22].
The function class represents the characteristics of the manufacturing
(MRPII here) function. Methods of this class describe the operations of the
function. The attributes of the class are function code, procedure name, source
function, and manufacturing entities, which are the function's input, output,
and control data. Entity class describes the physical properties of the manufacturing (here, MRPII) data (or entities). The methods of this class come from
the operation tables, and are responsible for calculating the values of the manufacturing data or evaluating the status of the function [22].
If a new function is necessary in the system (here MRPII), it can easily
be designed by selecting the suitable classes from the existing class dictionary.
In case there is no such class, new classes may be designed by describing the
properties of the function [22].
Huarng and Ravi [19] are of the opinion that object orientation provides
the modeler the liberty of representing interactions between object behaviors.
This makes it possible to understand the dynamics of information flow. It must
be noted that several modules of MRPII, such as BOM and PAC, are highly
dynamic, particularly in a make-to-order environment, which in turn, makes
the information flow dynamic. Each function in an organization is composed of
a sequence of activities, or in other words, each function is a cycle of activities.
Such is the case for MRPII as well. For example, order generation can be represented as a sequence of behaviors of the following objects: gross requirement,
inventory on-hand, open order.
G. Scopes of Object Orientation
Although object orientation has got momentum in research in the areas of
manufacturing, including MRPII, its application has also been defined to be
confined within the boundary of some specific areas.
Nof [32,33] assesses what can and cannot be expected to be accomphshed
from the promising object orientation in manufacturing systems engineering.
It simplifies the burdensome complexity of manufacturing, including planning
and control and software development. It provides intuitive clarity, discipline,
and flexibility essential in the functions of manufacturing. Object-oriented programming (OOP) can increase a programmer's productivity. It is also flexible to
subsequent modifications. Although most of the MRPII functions do not contain complex data structures, some modules, such as PAC, BOM generation,
and its (optionally) graphical display, indeed need a complex data structure.
In such a case, object orientation is favorable. Object-oriented control (OOC)
and object-oriented modeling also provide distinct advantages, such as modularity of modeling and logic requirements, direct and intuitive mapping of real
objects to software objects, and minimization of efforts of software development through reusability. As discussed earlier, these are specially useful in the
case of PAC system design. Table 23 lists some manufacturing objectives, which
includes functions of MRPII, and the possibility of object orientation.
With the increase in the abilities of computations and extended functions in the MRPII system, customer expectations are also increasing. Software
493
TABLE 23
Objectives
Object-orientation promises
Information level
Data
Graphics
Information
Knowledge
What-if simulation
494
OODMBS. The reasons behind this are the limitations of RDBMS and the corresponding advantages of OODBMS. As stated, a relational data model has
several drawbacks including a lack of semantic expression, a limitation of supporting heterogeneous data types, a difficulty in representing complex objects,
and an inability to define a dynamic schema, whereas these are overcome in
object-oriented design and OODBMS. Thus, object orientation has been highly
recommended for such manufacturing information systems (e.g., MRPII).
REFERENCES
1. Arnold, J. R. T. Introduction to Materials Management. Prentice-Hall, Englewood Cliffs, NJ,
1991.
2. Berio, G. et al. The M*-object methodology for information system design in CIM environments. IEEE Trans. Systems Man Cybernet. 25(l):68-85, 1995.
3. Bertrand, J. W. M., and Wortmann, J. C. Information systems for production planning and
control: Developments in perspective. Production Planning and Control 3(3):280-289, 1992.
4. Browne, J. Production activity controlA key aspect of production control. Int. J. Prod. Res.
26(3):415-427,1988.
5. Browne, J., Harhen, J., and Shivnan, J. Production Management Systems: A CIM Perspective.
Addison-Wesley, Reading, MA, 1989.
6. Chang, S.-H. et al. Manufacturing bill of material planning. Production Planning and Control
8(5):437-450, 1997.
7. Chung, Y., and Fischer, G. W. Illustration of object-oriented databases for the structure of a
bill of materials (BOM). Comput. Indust. Eng. 19(3):257-270, 1992.
8. Chung, Y., and Fischer, G. W. A conceptual structure and issues for an object-oriented bill of
materials (BOM) data model. Comput. Indust. Engrg. 26(2):321-339, 1994.
9. Dangerfield, B. J., and Morris, J. S. Relational database management systems: A new tool for
coding and classification. Int. J. Open Production Manage. ll(5):47-56, 1991.
10. Du, T. C. T., and Wolfe, P. M. An implementation perspective of applying object-oriented
database technologies. HE Transactions, 29(9):733-742, 1997.
11. Gausemeier, J. et al. Intelligent object networksThe solution for tomorrow's manufacturing
control systems. In Proceedings of the 3rd International Conference on Computer Integrated
Manufacturing, Singapore, 11-14 July, 1995, pp. 729-736.
12. Grabot, B., and Huguet, P. Reference models and object-oriented method for reuse in production
activity control system design. Comput. Indus. 32(1):17-31, 1996.
13. Harhalakis. G. etal. Development of a factory level CIM model./. Manufact. Systems 9(2): 116128, 1990.
14. Hasin, M. A. A., and Pandey, P. C. MRPII: Should its simplicity remain unchanged? Indust.
Manage. 38(3):19-21, 1996.
15. Higgins, P. et al. Manufacturing Planning and Control: Beyond MRPII. Chapman & Hall,
New York, 1996.
16. Higgins, P. et al. From MRPII to mrp. Production Planning and Control 3(3):227-238,
1992.
17. Hitomi, K. Manufacturing Systems Engineering. Taylor & Francis, London, 1979.
18. Hsu, C , and Skevington, C. Integration of data and knowledge in manufacturing enterprises:A
conceptual framework. /. Manufact. Systems 6(4):277-285, 1987.
19. Huarng, A. S., and Krovi, R. An object-based infrastructure for IRM. Inform. Systems Manage.
15(2):46-52, 1998.
20. Jackson, D. F., and Okike, K. Relational database management systems and industrial engineering. Comput. Indust. Engrg. 2 3 ( l ^ ) : 4 7 9 - 4 8 2 , 1992.
21. Kemper, A., and Moerkotte, G. Object-oriented database management: Application in engineering and computer science. Prentice-Hall, Englewood Cliffs, NJ, 1994.
22. Kim, C. et al. An object-oriented information modeling methodology for manufacturing information systems. Comput. Indust. Engrg. 24(3):337-353, 1993.
495
23. Korhonen, P. et al. Demand chain management in a global enterpriseInformation management view. Production Planning and Control 9(6):526-531, 1998.
24. Kroenke, D. Management Information Systems. Mitchell McGraw-Hill, New York, 1989.
25. Meyer, A. De. The integration of manufacturing information systems. In Proceedings of
the International Conference on Computer Integrated Manufacturing, Rensselaer Polytechnic
Institute, Troy, New York, May 23-25, 1988, pp. 217-225.
26. Moriwaki, T. et al. Object-oriented modeling of autonomous distributed manufacturing system
and its application to real time scheduling. In Proceedings of the International Conference on
Object-Oriented Manufacturing Systems, Calgary, Alberta, Canada, 3-6 May, 1992, pp. 2 0 7 212.
27. Micro-Max User Manual. Micro-MRP, Foster City, CA, 1991-1992.
28. MRPII Elements: Datapro Manufacturing Automation Series. McGraw-Hill, Delran, NJ, 1989.
29. Nandakumar, G. The design of a bill of materials using relational database. Comput. Indust.
Engrg. 6(1):15-21, 1985.
30. Naquin, B., and Ali, D. Active database and its utilization in the object-oriented environment.
Comput. Indust. Engrg. 25(1-4):313-316, 1993.
31. Nicholson, T. A. J. Beyond MRPThe management question. Production Planning and Control
3(3):247-257, 1992.
32. Nof, S. Is all manufacturing object-oriented? In Proceedings of the International Conference on Object-Oriented Manufacturing Systems, Calgary, Alberta, Canada, 3-6 May, 1992,
pp. 37-54.
33. Nof, S. Y. Critiquing the potential of object orientation in manufacturing. Int. J. Comput.
Integr. Manufact. 7(1):3-16, 1994.
34. Ojelanki, K. N., and Delvin, A. G. Enterprise modeling for CIM information systems architectures: An object-oriented approach. Comput. Indust. Engrg. 26(2):279-293, 1994.
35. Olsen, K. A. et al. A procedure-oriented generic bill of materials. Comput. Indust. Engrg.
32(l):29-45, 1997.
36. Pandey, P. C , and Hasin, M. A. A. A scheme for an integrated production planning and control
system. Int. J. Comput. Appl. Technol. 8(5/6):301-306, 1995.
37. Park, H. G. et al. An object oriented production planning system development in ERP environment. Comput. Indust. Engrg. 35(1/2):157-160, 1998.
38. Pels, H. J., and Wortmann, J. C. Modular decomposition of integrated databases. Production
Planning and Control 1(3):132-146,1990.
39. Ramalingam, P. Bill of material: A valuable management tool: Part I. Indust. Manage. 24(2):2831, 1982.
40. Ramalingam, P. Bill of material: A valuable management tool: Part II. Indust. Manage.
25(l):22-25, 1983.
41. Ranganathan, V. S., and AH, D. L. Distributed object management: Integrating distributed
information in heterogeneous environment. Comput. Indust. Engrg. 25(l-4):317-320, 1993.
42. Ranky, P. G. Manufacturing Database Management and Knowledge-Based Expert Systems.
CIMware, Addingham, England, 1990.
43. Seymour, L. Data Structures, Schaum's Outline Series in Computers. McGraw-Hill, New York,
1997.
44. Sinha, D., and Chen, H. G. Object-oriented DSS construction for hierarchical inventory control.
Comput. Indust. Engrg. 21(l-4):441-445, 1991.
45. Smith, S. B. Computer-Based Production and Inventory Control. Prentice-Hall, Englewood
Cliffs, NJ, 1989.
46. Smith, J., and Joshi, S. Object-oriented development of shop floor control systems for CIM.
In Proceedings of the International Conference on Object-Oriented Manufacturing Systems,
Calgary, Alberta, Canada, 3-6 May, 1992, pp. 152-157.
47. Spooner, D., Hardwick, M., and Kelvin, W. L. Integrating the CIM environment using objectoriented data management technology. In Proceedings of the International Conference on Computer Integrated Manufacturing, Rensselaer Polytechnic Institute, Troy, New York, May 23-15,
1988, pp. 144-152.
48. Starmer, C , and Kochhar, A. K. Fourth generation languages based manufacturing control
systemsLessons from an application case study. Production Planning and Control 3(3):271279,1992.
496
DEVELOPING APPLICATIONS IN
CORPORATE FINANCE: AN
OBJECT-ORIENTED DATABASE
MANAGEMENT APPROACH
IRENE M. Y. W O O N
M O N G LI LEE
School of Computing, National University of Singapore, Singapore 117543
I. INTRODUCTION 498
II. FINANCIAL INFORMATION A N D ITS USES 499
A. Financial Statements 500
B. Using the Financial Statements 502
C. Financial Policies 503
III. DATABASE MANAGEMENT SYSTEMS 505
A. Components of a DBMS 505
B. Capabilities of a DBMS 506
C. Data Models 507
IV. FINANCIAL OBJECT-ORIENTED DATABASES 508
A. General Concepts 509
B. Object Modeling 510
C. Modeling Financial Policies 513
V. DISCUSSION 515
REFERENCES 516
Financial information systems has evolved from spreadsheets to simple database systems where data are stored in a central repository, to today's sophisticated systems that
integrate databases with complex applications. The object-oriented paradigm promises
productivity and reliability of software systems through natural modeling concepts and
cleaner and extensible designs while database management systems offer necessary capabilities such as persistence, integrity, and security. In this paper, we propose a unified
framework for modeling both financial information and policies. A financial information
system can be modeled and implemented using the object-oriented model. This forms the
basis for exploratory and complex business data analysis and strategic decision-making
support. In addition, we will illustrate that the object-oriented approach also provides
a uniform representation for expressing financial policy formulation.
Af%^
497
498
I. INTRODUCTION
In the early 1980s, a number of integrated financial spreadsheets-graphicsword processing, e.g.. Symphony and Excel, were used to capture, analyze, and
present the financial information of a company. The main problem with such
packages was their inability to automatically update links; when financial data
in one spreadsheet were changed, then many related spreadsheets would have
to be updated manually by the user to ensure the correctness of the financial reports. As a consequence, interest in databases as a medium for storing financial
data grew.
As the amount of data multiplies, the many features offered by a database
management system (DBMS) for data management, such as reduced application
development time, concurrency control and recovery, indexing support, and
query capabilities, become increasingly attractive and ultimately necessary. In
addition, consolidating data from several databases, together with historical
and summary information can create a comprehensive view of all aspects of
an enterprise. This is also known as data warehousing. The data warehouse
facilitates complex and statistical analysis such as:
On-line analytic processing (OLAP), which supports a
multidimensional model of data and ad hoc complex queries involving
group-by and aggregation, and
Exploratory data analysis or data mining where a user looks for
interesting patterns in the data.
Systems that were specifically developed for use by the financial community
included SAP, Peoplesoft, and Oracle Financials, which had Oracle or Sybase
as its data repository.
Research into artificial intelligence provided the impetus for the development of products that embed human expertise into financial software programs.
The expert's knowledge was typically represented in the form of a production
rule in the form: observation -^ hypothesis. FINSIM Expert [1] is one such
system used to analyze financial information for various purposes, e.g., credit
analysis and investment analysis. Real-world "expert" systems would typically
marry a traditional database with a knowledge base in which was encoded the
expert's cognitive processes, to allow access to financial data. This was the early
precursor to deductive databases. Research into deductive databases is aimed at
discovering efficient schemes for uniformly representing assertions and deductive rules and for responding to highly expressive queries about the knowledge
base of assertions and rules.
In many application domains, complex kinds of data must be supported.
Object-oriented (OO) concepts have strongly influenced efforts to enhance
database support for complex types. This has led to the development of object database systems. An important advantage of object database systems is
that they can store code as well as data. Abstract data type methods are collected in the database, and the set of operations for a given type can be found
by querying the database catalogs. Operations can be composed in ad hoc ways
in query expressions. As a result, the object-oriented DBMS is like a software
repository, with built-in query support for identifying software modules and
499
combining operations to generate new applications. Data consistency is preserved as the DBMS provides automatic management of constraints, which
include user-defined operations.
The features of the financial domain that make this technique appealing are:
Financial data have a relationship to one another, which can be naturally
expressed in the object-oriented modeling notation.
Within the framework of generally accepted financial accounting principles, accountants are free to choose from a number of alternative procedures to
record the effects of an event. The encapsulation feature in the object-oriented
approach allows these different procedures to be modeled such that the applications used are not affected.
Financial data requirements vary for different industries and for different
companies within the same industry, which is also supported by the encapsulation feature in the object-oriented approach.
The remainder of this paper is organized as follows. Section II gives a quick
tutorial in finance covering various relevant aspects such as financial statements
and their definition and use such as in implementing financial policies. This will
give the reader an appreciation of the domain area with a view to understanding
fragments of the model shown in later sections. Section III provides an introduction to database management systems covering its components, capabilities,
and data models. Readers well versed in finance and/or databases may choose
to skip the appropriate section(s). In Section IV, we show how the data in the
financial statements may be modeled using the OO paradigm. In addition, we
demonstrate how financial policies, which can be expressed in terms of financial
statement information, may be modeled.
500
H H
TABLE I
Assets
(000s) Liabilities
Current
Cash
Accounts Receivables
Inventories
Total current assets
100
400
500
1000
Fixed
Net Value of Equipment 1000
1000
Net Value of Vans
2000
Total fixed assets
Total assets
3000
Current
Accounts Payable
Total current liabilities
(000s)
300
300
Long-term
1700
Bank Loans
Total long-term liabilities 1700
Shareholder's fund
1000
Total liabilities
3000
bigger premises, hiring more workers, etc. To finance all these investments, they
decided to incorporate their business and float some shares to the public, i.e.,
in return for funds, members of the public became joint owners of the firm.
The value of Peter and Jane's business can be recorded and reflected in
simple financial models. We will look at three of these: the balance sheet, the
income statement, and the cash flow statement.
A. Financial Statements
Table 1 shows Peter and Jane's balance sheet. The balance sheet gives a financial
picture of the business on a particular day.
The assets that are shown on the left-hand side of the balance sheet can be
broadly classified as current and fixed assets. Fixed assets are those that will
last a relatively long time, e.g., shoe-making machinery. The other category of
assets, current assets, comprises those that have a shorter life-span; they can be
converted into cash in one business operating cycle, e.g., raw materials, semifinished shoes/parts of shoes, and unsold finished shoes. An operating cycle is
defined as a typical length of time between when an order is placed and when
cash is received for that sale. Whether an asset is classified as current or fixed
depends on the nature of the business. A firm selling vans will list it (a van) as a
current asset, whereas Peter and Jane using vans to deliver goods will list it as
a fixed asset. The liabilities represents what the firm owes, e.g., bank loans and
creditors, and this is shown on the right-hand side of the balance sheet. Just as
assets were classified based on their life span, so are liabilities. Current liabilities
are obligations that must be met in within an operating cycle while long-term
liabilities are obligations that do not have to be met within an operating cycle.
Also shown on the right-hand side of the balance sheet is the shareholder's fund,
e.g., money received from the issuance of shares.
The balance sheet shows the values of fixed assets at its net value, i.e.,
the purchase price less accumulated depreciation. Depreciation reflects the
accountant's estimate of the cost of the equipment used up in the production
process. For example, suppose one of the second-hand vans bought at 500 is
estimated to have 5 more useful years of life left and no resale value beyond that.
H H
50 I
3000
Expenses
Cost of goods sold
Admin and selling expenses
Depreciation
(1800)
(500)
(200)
Profit/loss
Earnings before interest and taxes 500
Interest expense
(100)
Pretax income
400
Taxes
(150)
Net income
250
According to the accountant, the 500 cost must be apportioned out as an expense over the useful Hfe of the asset. The straight-Une depreciation method
will give 100 of depreciation expense per year. At the end of its first year of
use, the van's net value will be 500 - 100 = 400 and at the end of its second
year of use, its net value will be 500 200 = 300, and so on.
Table 2 gives the income statement of Peter and Jane's business for the past
year. Rather than being a snapshot of the firm at a point in time, the income
statement describes the performance of the firm over a particular period of time,
i.e., what happened in between two periods of time. It shows the revenues, the
expenses incurred, and the resulting profit or loss resulting from its operations.
Details of income and expenses shown in the statement differ according to the
firm, industry, and country.
The last financial model is called the cash flow statement. This shows the
position of cash generated from operations in relation to its operating costs.
Consider some typical cash flows: cash outflows in the purchase of raw materials, payment of utilities and rents, cash inflows when debtors pay or bank
loans are activated. This statement shows the operating cash flow, which is the
cash flow that comes from selling goods and services. This should usually be
positive; a firm is in trouble if this is negative for a long time because the firm
is not generating enough cash to pay its operating costs. Total operating cash
flow for the firm includes increases to working capital, which is the difference
between the current assets and current liabilities. The total operating cash flow
in this case is negative. In its early years of operations, it is normal for firms to
have negative total cash outflows as spending on inventory, etc., will be higher
than its cash flow from sales.
Profit as shown in the income statement is not the same as cash flow. For
instance, sales on the income statement will tend to overstate actual cash inflows
because most customers pay their bills on credit. The cash flow statement shown
in Table 3 is derived from the balance sheet and the income statement. In
determining the economic and financial condition of a firm, an analysis of the
firm's cash flow can be more revealing.
502
(000s)
500
200
(150)
450
(700)
(250)
503
ratio of all current assets except inventory to current liabilities, and the ratio
of cost of goods sold to average inventory.
2. Activity ratios are constructed to measure how effectively a firm's assets
are being managed. The idea is to find out how quickly assets are used to
generate sales. Examples are the ratio of total operating revenues to average
total assets, and the ratio of total operating revenues to average receivables.
3. Debt ratios show the relative claim of creditors and shareholders on the
firm and thus indicate the ability of a firm to deal with financial problems and
opportunities as they arise. A firm highly dependent on creditors might suffer
from creditor pressure, be less of a risk-taker, and have difficulty raising funds;
i.e., it will have less operating freedom. Examples are the ratio of total debt to
total shareholders' fund, the ratio of total debt to total assets, and the ratio of
total debt to total shareholders' fund.
4. Coverage ratios are designed to relate the financial charges of a firm to
its ability to service them, e.g., the ratio of interest to pretax profit.
5. Profitability ratios indicate the firm's efficiency of operation. Examples
of ratios measuring profitability are the ratio of profit to sales, the ratio of profit
to total assets, and the ratio of profit to shareholders' funds.
C. Financial Policies
The total value of a firm can be thought of as a pie (see Fig. 1). Initially, the size
of the pie will depend on how well the firm has made its investment decision.
The composition of the pie (usually known as capital structure) depends on the
financing arrangements. Thus, the two major classes of decisions that financial
managers have to make are investment and financing.
A major part of the investment decision is concerned with evaluating longlived capital projects and deciding whether the firm should invest in them.
This process is usually referred to as capital budgeting. Although the types
and proportions of assets the firm needs tends to be determined by the nature
of the business; within these bounds, there will many acceptable proposals
that the firm may need to consider. Many rules can be employed to decide
which ones to select, e.g., NPV rule, payback rule, the accounting rate of return
rule, internal rate of return rule, profitability index, and capital asset pricing
model.
42%
Shareholder's
Fund
FIGURE I
58% Loan
504
After a firm has made its investment decisions, it can then determine its
financing decisions. This is what Peter and Jane have donethey identified
their business, worked out what they needed, and then looked for ways to raise
money for what they needed to start up the business. Thus, financing decisions
are concerned with determining how a firm's investments should be financed. In
general, a firm may choose any capital structure it desires: equity, bonds, loans,
etc. However, the capital structure of the firm has a great impact on the way in
which the firm pays out its cash flows. There are many theories put forward to
suggest the optimal capital structure of a firm [4,5].
From the examination of the domain area of investment and financing
decisions, it can be seen that there are many theories and methods supporting various arguments as to what poUcies financial managers should pursue to ensure that the firm achieves growth and profitability. Each theory/
method has its own assumptions, shortcomings, and advantages. This paper
will not attempt to model any of these theories or methods. Instead it will
look at the implementation of these policies and demonstrate how they can
be modeled so that the object-oriented implementation can be automatically
generated.
Some examples of policy implementations are:
Capital budgeting (investment).
1. Fund capital expenditures from profit only.
2. Expected return on investment > = X%.
3. Total operating revenues to average inventory ratio = X . . . Y.
4. The market value of any individual stock cannot exceed X% of
total account.
Capital structure (financing).
1. Debt < = X times share capital.
2. Interest expense > X times of operating cash flow.
3. Debt-equity ratio = X . . . Y.
4. Liquidity ratio > = X.
Policies are thus implemented as rules of thumbs, giving targets to be
achieved or indicators to monitor. These targets could be expressed as values,
ranges of values, and upper and lower bounds of values that ratios or variables must have. These targets could be absolutes or benchmarked to some
other ratio/variable in a firm and could be derived from comparison with
some industry average or derived from comparison with the past year's data,
etc. They have an impact of increasing/decreasing funds flow within the firm
by controlling investments made or changing the financial structure of their
firm.
In this section, we have shown that financial policies are expressed as target
values for ratios or financial variables to satisfy. These values are derived from
the financial statements, which in turn record and assign values to events and
operations that occur within a firm (such as accepting orders, producing goods,
and delivering goods). This section sets the scene for the rest of the paper and
allows the reader to appreciate examples given in subsequent sections.
505
506
Queries
Query Processor
and Optimizer
Transaction Manager
File Manager
Buffer Manager
Lock Manager
Recovery Manager
Disk Manager
FIGURE 2
Components of a DBMS.
fetches the requested page to the buffer from the disk. When a user issues a
query, the query optimizer uses information about how the data are stored to
generate an efficient execution plan.
In order to provide concurrency control and crash recovery in the DBMS,
the disk space manager, buffer manager, and file manager must interact with
the transaction manager, lock manager, and recovery manager. The transaction
manager ensures that transactions (or user programs) request and release locks
on data records according to some protocol and schedules the transaction executions. The lock manager keeps track of requests for locks while the recovery
manager maintains a log of changes and restores the system to a consistent state
after a crash.
B. Capabilities of a DBMS
Using a DBMS to manage data offers many advantages. These include:
Persistence, Large volumes of data can be managed systematically and
efficiently as a DBMS utilizes a variety of sophisticated techniques to store and
retrieve data. Persistence due to permanent storage of data is important to many
applications.
Data independence. The DBMS provide an abstract view of the data,
which insulate application programs from the details of data, representation,
and storage.
Control of data redundancy. When several users share the data, centralizing the administration'of data can minimize redundancy, without which an
undesirable inconsistency in data and wastage of storage space can occur.
Data integrity and security. The DBMS can enforce compliance to known
constraints imposed by application semantics. Furthermore, it can restrict access to different classes of users.
Data availability and reliability. In the event of system failure, the DBMS
provides users access to as much of the uncorrupted data as possible. It also has
the ability to recover from system failures without losing data (crash recovery).
507
The DBMS provides correct, concurrent access to the database by muhiple users
simultaneously.
High-level access. This is provided by the data model and language vv^hich
defines the database structure (also known as schema) and the retrieval and
manipulation of data.
Distribution. Multiple databases can be distributed on one or more machines, and viewed as a single database is useful in a variety of applications. This
feature is crucial especially when databases in the different departments of an organization may have been developed independently over time and the management now needs an integrated view of the data for making high-level decisions.
This list of capabilities is not meant to be exhaustive but it serves to highlight
the many important functions that are common to many applications accessing
data stored in the DBMS. This facilitates quick development of applications that
are also likely to be more robust than applications developed from scratch since
many important tasks are handled by the DBMS instead of being implemented
by the application.
C. Data Models
A central feature of any DBMS is the data model upon which the system is
built. At the conceptual level of a DBMS architecture, the data model plays
two important roles. First, the data model provides a methodology for representing the objects of a particular application environment, and the relationships among these objects (the conceptual or semantic role). Second, the data
model is structured to allow a straightforward translation from the conceptual
schema into the physical data structures of the internal level of the DBMS (the
representational role).
Many commercial DBMS today such as DB2, Informix, Oracle, and Sybase
are based on the relational data model proposed by [6]. At that time, most
database systems were based on either the hierarchical model [7] (IBM's IMS
and SYSTEM-2000) or the network model [8] (IDS and IDMS). The hierarchical and network models are strongly oriented toward facilitating the subsequent implementation of the conceptual schema. This is because, historically, the physical structure of a DBMS was designed first, and then a data
model was developed to allow conceptual modeling on the particular physical design. Thus, the hierarchical data model is based on underlying treeoriented data structures, while the network model is based on ring-oriented
data structures. The use of the hierarchical and network data models in the
semantic role is thus burdened with numerous construction roles and artificial
constraints.
On the other hand, the relational data model is simple and elegant: a
database is a collection of one or more relations, and each relation is a table
with rows and columns. The tabular representation of data allows the use of
simple, high-level languages to query the data. Despite its attractive simplicity,
however, the relational model must be enhanced to fulfil the two roles of a
conceptual level data model. In pursuit of discovering semantic enhancements
to the relational model, a rich theoretical foundation about data dependencies
and normalization was produced [9].
508
WOONANDLEE
509
fundamental OO modeling concepts. Subsequently, we will apply these concepts to the financial domain. A comprehensive exposition of the OO method
and its modeling and application can be found in [15,16].
A. General Concepts
The object-oriented approach views software as a collection of discrete objects
that incorporates both structure (attributes) and behavior (operations). This is
in contrast to conventional software development where data and behavior are
loosely connected.
An object is defined as a concept, abstraction, or tangible object with welldefined boundaries and meaning for the problem at hand. Examples of objects
are customer John Doe, Order No 1234, or a pair of shoes stock no 12345
B-PRI. All objects have an identity and are distinguishable. This means that
two objects are distinct even if they have the same attributes, e.g., two pairs
of shoes of the same color, style, and weight with the same stock number. The
term "identity" means that objects are distinguished by their inherent existence
and not by the descriptive properties they may have.
An object is an encapsulation of attributes and operations. Encapsulation
separates the external interface of an object from the internal implementation
details of the object. The external interface is accessible to other objects and
consists of the specifications of the operations that can be performed on the
object. The operations define the behavior of the object and manipulate the
attributes of that object. The internal implementation details are visible only to
the designer. It consists of a data section that describes the internal structure of
the object and a procedural section that describes the procedures that implement the operations part of the interface. This means that the implementation
of an object may be changed without affecting the applications that use it. For
example, the cost of a pair of shoes can be measured by different methods,
such as average cost, LIFO, and FIFO. If the firm decides to change its costing method for shoes, the encapsulation feature ensures that no changes will
have to be made to any application that requires the unit cost for a pair of
shoes.
A class is an abstraction that describes properties that are important to
the application. Each class describes a set of individual objects, with each object having its own value for each attribute but sharing the attribute names
and operations with other objects in the class. The objects in a class not only
share common attributes and common operations, they also share a common
semantic purpose. Even if a van and a shoe both have cost and size, they could
belong to different classes. If they were regarded as purely financial assets, they
could belong to one class. If the developer took into consideration that a person
drives a van and wears shoes, they would be modeled as different classes. The
interpretation of semantics depends on the purpose of each application and is
a matter of judgement.
Figure 3 shows that objects such as Leather and Shoe can be abstracted
into an Inventory Item class with attributes. Description, Quantity on Hand,
and Unit Cost. Operations that manipulate the attributes include Issue Stock
(which reduces the Quantity on Hand when the raw material is required for the
510
Objects
Leather - Batch No XXX
Leather - Batch No XYZ
Abstract
into
=>
Shoe SNo: 12346 B-PRI
FIGURE 3
Attributes
Description
Quantity on Hand
Unit Cost
Operations
Issue Stock
Write-off Stock
assembly process), Write-off Stock (which sets the value of Quantity on Hand
to 0 when the goods are considered damaged and unusable).
Note that the Inventory Item Class includes raw materials (as in leather
and rubber) and finished products (shoes). Hence we can design classes such as
Raw Materials and Products, which are refinements of Inventory Item.
B. Object Modeling
To define an object model for any domain, the following logical activities will
have to be carried out:
1. Identify the objects and classes and prepare a data dictionary showing
the precise definition of each.
2. Identify the association and aggregation relationships between objects.
3. Identify the attributes of the objects.
4. Organize and simplify the object classes through generalization.
These activities are logical as in practice; it may be possible to combine several
steps. In addition, the process of deriving an object model is rarely straightforward and usually involves several iterations.
The object modeling technique consists of three models, each representing
a related but different viewpoint of the system:
1. the object model, which describes the static structural aspect,
2. the dynamic model, which describes the temporal behavioral aspect,
and
3. the functional model, which describes the transformational aspect.
Each model contains references to other models; e.g., operations that are attached to objects in the object model are more fully expanded in the functional
model. Although each model is not completely independent, each model can be
examined and understood on its own.
I. Object Model
The object model describes the structure of the objects in the system
their identity, their relationships to other objects, their attributes, and their
operations. The object model is represented graphically with object diagrams
51
Invoice Line
Description
Quantity Delivered
9
Unit Price
shows
Calculate total
price
FIGURE 4
inventory Item
Description
Quantity in Hand
Unit Cost
Issue Stock
Write-off Stock
Association relationship.
containing object classes. Classes are arranged into hierarchies sharing common
structure and behavior and are associated with other classes. Classes define
attribute values carried by each object and the operations that each object
performs or undergoes. It is thus the framew^ork into which the dynamic and
functional model may be placed.
Objects and object classes may be related to other objects in several ways.
They could be dependent on one another in which case the relationship is
known as an association. Associations may be binary or tenary or of higher
order and are modeled as bidirectional. These links may also express the multiplicity of the relationship, either 1-1,1-to-many, or many-to-many. For example
the association between an Inventory Item and an Invoice Line is 1-to-many
because an inventory item may appear on several invoice lines. The solid ball in
Fig. 4 denotes the multiplicity of the Inventory Item objects in the Invoice Line
class.
The second type of relationship is aggregation, which expresses a "partwhole," or a "part-of" relationship in which objects representing the components of something are associated with an object representing the entire assembly. An obvious example from Peter and Jane's firm is the components that go
into making a shoe, e.g., leather, rubber, bells, and thread. Another example is
shown in Fig. 5, where an invoice is composed of its individual invoice lines. A
small diamond drawn at the assembly end of the relationship denotes aggregation. The Invoice class has a Status attribute which gives the state of the object.
We will elaborate on this in the next section on dynamic models.
Generalization is the relationship between a class and its more refined
classes. It is sometimes referred to as a "is-a" relationship. For example. Current Asset is the generalization for Cash, Inventory Item, and Accounts Receivables. Current Asset is called the superclass while Cash, Inventory Item, and
Invoice
Total Amount
Status
b
s
}
Issue Invoice
Calculate Total
FIGURE 5
Aggregation relationship.
Invoice Line
Description
Quantity Delivered
Unit Price
Calculate item
price
512
Current Asset
Opening Balance
Total To Date
Calculate Total
Cash
Inventory Item
Description
Quantity in Hand
Unit Cost
Currency
Convert
Issue Stock
Write-off Stock
FIGURE 6
Accounts
Receivables (A/R)
Name
Credit limit
Credit Terms
Update Credit Terms
Categorize A/R
Age A/R
Generalization relationship.
Accounts Receivables are called subclasses. Each subclass inherits all the attributes and operations of its superclass in addition to its own unique attributes
and operations. Figure 6 shows how this relationship is depicted. The subclass
Accounts Receivables has attributes Name, Credit Limit, and Credit Terms for
its customers and operations Update Credit Terms and Categorize A/R, which
determines good and bad payers. In addition to its own class attributes and
operations, objects of Account Receivables will also inherit the attributes and
operations of Current Asset.
This ability to factor out common attributes of several classes into a common class and to inherit the properties from the superclass can greatly reduce
repetition within design and programs and is one of the advantages of objectoriented systems.
2. Dynamic Model
513
/ 7 ^ No Payment
Outstanding
No payment &
Time > credit period
Payment
Overdue:
Send reminders
Fully Paid:
Issue receipt
Adjust Account Bals
Payment
Time Lapse/
Customer goes bust
Write-off:
Adjust Account Bals
FIGURE 7
3. Functional Model
The functional model describes those aspects of a system concerned with
transformation of values^unctions, mapping, constraints, and functional dependencies. The functional model captures what the system does regardless of
how and when it is done. The functional model may be represented with data
flow diagrams that show the dependencies between the inputs and the outputs
to a process. Functions are invoked as actions in the dynamic model and are
shown as operations on objects in the objection model. An example of an operation on Invoice (see Fig. 7) is "Issue receipt." The inputs to this process are the
invoice details, such as its number and customer name, and its corresponding
payment details, and its output is the receipt that is sent to the customer.
C. Modeling Financial Policies
In the previous section, we have shown how the OO paradigm can be used to
model financial information. Entities such as customers, customer orders, suppliers, requisition orders, and employees can be intuitively modeled as objects
and classes. Based on these classes, events and operations that occur in the daily
routine of running a firm such as Peter and Jane's can also be modeled. In this
section, we will examine how financial policies may also be modeled using the
OO approach.
We view financial data as a class that is a generalization of the Balance
Sheet, Income statement, and Cash Flow Statement classes. The accounts' hierarchy of each of the financial statements can be directly mapped as superclasses
and subclasses. For example, the Balance Sheet class is a superclass of the Asset,
Liability, and Shareholders' Fund classes, and the Asset class is in turn a superclass of the Current Asset and the Fixed Assets classes. Financial policies
can therefore be modeled as a class that interacts with the Financial Data class.
514
Financial Data
Name: Equipment
Value
Target
Date
(Express)
Finandai Policy
Objective
Decision-Maker
Get Value
FIGURE 8
Financial Ratio
Name
/Numerator
/Denominator
Value
Target
Compute Ratio
Compute Value
Get Target
Compute Ratio
Compute Value
FIGURE 9
5 I 5
shows the attributes and operations for a Financial Ratio class and an instance
of this class.
Constraints on derived objects can be embedded in the operations. For
example, most firms require their liquidity ratio to be within a certain range to
ensure that they are well positioned to handle business emergencies. However,
the value for the liquidity ratio is usually obtained by a trend analysis of the
past 10 years or by comparing with the industry norm. This requires complex
statistical computations and the constraint is better expressed procedurally in
the operations such as Get Target in Fig. 9.
To conclude, we see that the OO approach provides a uniform and systematic way in which financial information and policies are modeled. This
facilitates the automatic implementation of a financial database management
system through the use of commercially available tools and products such as
ObjectTeam and Intelligent OOA.
Y. DISCUSSION
A database management system offers many capabilities and handles important tasks that are crucial to applications accessing the stored data. The objectoriented technology provides a framework for representing and managing both
data and application programs. It is a promising paradigm to solve the so-called
impedance mismatch: the awkward communication between a query language
and a programming language that results when developing applications with a
database system. We have seen how the financial domain can be sensibly modeled using object-oriented concepts. The object-oriented approach also removes
the gap between an application and its representation. It allows the designer to
model the real world as closely as possible, which can be mapped directly to
design and implementation constructs.
Financial information can be viewed at two levels. At one level, we look
at the raw data that are associated with routine events and operations such
as sales, invoices, and payments. At another level, we abstract and generalize the raw financial data to analyze and interpret them. It is quite apparent
from the examples given in Section IV that the basic financial data associated
with the routine events and operations in a firm can be adequately represented
by the OO modeling notation. Basic financial data associated with the routine
events and operations in a firm can be abstracted to a higher level through the
class hierarchy, which allows strategic decisions and policy formulations. For
example, policy statements could be expressed in terms of financial ratios that
are derived from financial statements, which are composed of basic financial
data. The encapsulation feature proves useful as the composition of financial
classes such as Asset and Liability varies for different industries. For example,
the Current Asset class will have objects such as inventory items for the trading and manufacturing industry that will not exist in the service industry such
as banks. Instead, the Current Asset class in the banking industry will have
objects such as investment items, which include marketable securities, government bonds, and fixed deposits. In the same way, the key indicators used to
implement financial policies may be modified without affecting the application
5 I6
REFERENCES
1. Klein, M., FINSIM Expert: A KB/DSS for financial analysis and planning. In EUROINFO '88:
Concepts for Increased Competitiveness (H. J. BuUinger, E. N. Protonotaris, D. Bouwhuis, and
E Reim, Eds), 908-916. North-Holland, Amsterdam, 1988.
2. Van Home, J. C. Financial Management and Policy. 11th ed. Prentice-Hall, Englewood Cliffs,
NJ, 1998.
3. Miller, D. E. The Meaningful Interpretation of Financial Statements: The Cause and Effect
Ratio Approach. Am. Management Assoc, New York, 1996.
4. Myers, S. C. The capital structure puzzle. /. Finance, 39:575-592, 1984.
5. Smith, C. Raising capital: Theory and evidence. In The Revolution on Corporate Finance
(J. Stern and D. Chew, Jr, Eds.), 2nd ed. Blackwell, Oxford, 1992.
6. Codd, E. F. A relational model of data for large shared data banks. Commun. ACM 13(6):
377-387, 1970.
7. Tshichritzis, D. C , and Lohovsky, F. H. Hierarchical database management. ACM Comput.
Surveys 8(1):105-123, 1976.
8. Taylor, R., and Frank, R. CODASYL database management systems. ACM Comput. Surveys
8(1):67-104, 1976.
9. Maier, D. Theory of Relational Databases. Computer Science Press, Rockville, MD, 1983.
10. Chen, P. P. The entity-relationship model: Toward a unified view of data. ACM Trans. Database
Systems 1(1):166-192, 1976.
11. Atkinson, M., Bancilhon, E, Dewitt, D., Dittrich, K., Maier, D., and Zdonik, S. The objectoriented database system manifesto, deductive and object-oriented databases, 223-240. Elsevier
Science, Amsterdam, 1990.
12. Lieberman, A. Z., and Whinston, A. B. A structuring of an event accounting information system.
Accounting Rev. (April):246-258, 1975.
13. Haseman, W. D., and Whinston, A. B. Design of a multi-dimensional accounting system.
Accounting Rev. (January):65-79, 1976.
14. Everest, G. C , and Weber, R. A relational approach to accounting models. Accounting Rev.
(April): 340-359, 1977.
15. Booch, G. Object-Oriented Analysis and Design with Applications, 2nd ed. Addison-Wesley,
Reading, MA, 1994.
16. Rumbaugh, J., Jacobson, I. and Booch, G. The Unified Modeling Language Reference Manual.
Addison-Wesley, Reading, MA, 1999.
17. Woon, I. M. Y., and Loh, W. L. Formal derivation to object-oriented implementation of financial
policies. Int.]. Comput. Appl. Technol. 10(5/6):316-326, 1997.
KUNWOO LEE
School of Mechanical and Aerospace Engineering, Seoul National University,
Seoul 151-742, Korea
I. INTRODUCTION 518
A. Volume Visualization 518
B. Volume Graphics 518
C. Volume Modeling 519
D. Multiresolution Modeling 519
E. Scattered Data Modeling 519
R Feature Segmentation 520
II. REPRESENTATION OF VOLUMETRIC DATA 520
A. Introduction to Hypervolume 520
B. Mathematical Representation of a Hypervolume 521
C. Main Features and Application Areas of a Hypervolume
III. MANIPULATION OF VOLUMETRIC DATA 525
A. Derivatives of a NURBS Volume 525
B. Generation of a NURBS Volume 527
C. Gridding Methods of Scattered Data 536
IV RENDERING METHODS OF VOLUMETRIC DATA 538
V APPLICATION TO FLOW VISUALIZATION 540
A. Related Works 540
B. Application Examples 541
C. Feature Segmentation 544
VI. SUMMARY AND CONCLUSIONS 546
REFERENCES 547
523
5 I 7
5 I8
I. INTRODUCTION
Scientific data visualization aims to devise algorithms and methods that transform massive scientific data sets into valuable pictures and other graphic representations that facilitate comprehension and interpretation. In many scientific
domains, analysis of these pictures has motivated further scientific investigation.
A remarkable work on scientific visualization as a discipline started in 1987,
which was reported by the National Science Foundation's Advisory Panel on
Graphics, Image Processing, and Workstations [1]. The report justified the need
for scientific visualization and introduced the short-term potential and longterm goals of visualization environments, and emphasized the role of scientific
visualization in industrial fields.
The IEEE Visualization conference series has been a leading conference in
scientific visualization since 1990, and its importance has grown within ACM's
SIGGRAPH conference. Many researchers and practitioners get together at
the annual Eurographics Workshop on Visualization in Scientific Computing.
Similarly, hundreds of conferences and workshops in the world have developed
the theme. Also numerous journals and books worldwide involved in computer
science, physical science, and engineering now devote themselves, in part or in
full, to the topic.
Recent work in scientific visualization has been stimulated by various conferences or workshops as described above. It mainly includes topics in volume
visualization, volume graphics, and volume modeling. Also it also covers the
following topics: multiresolution modeling, scattered data modeling, and feature segmentation. Now we will introduce key aspects of each topic below,
commenting on the current status of work or requirements.
A. Volume Visualization
Volume visualization found its initial applications in medical imaging. The overall goal of volume visualization is to extract meaningful information from volumetric data. Volume visualization addresses the representation, manipulation,
and rendering of volumetric data sets without mathematically describing surfaces used in the CAD community. It provides procedures or mechanisms for
looking into internal structures and analyzing their complexity and dynamics.
Recent progresses are impressive, yet further research still remains.
B. Volume Graphics
As an emerging subfield of computer graphics, volume graphics is concerned
with the synthesis, manipulation, and rendering of 3D modeled objects, stored
as a volume buffer of voxels [2]. It primarily addresses modeling and rendering
geometric scenes, particularly those represented in a regular volume buffer,
by employing a discrete volumetric representation. The primary procedure for
its representation needs voxelization algorithms that synthesize voxel-based
models by converting continuous geometric objects into their discrete voxelbased representation.
Volume graphics is insensitive to scene and object complexities, and
supports the visualization of internal structures. However, there are several
5 I 9
520
Further research is needed to develop accurate, efficient, and easily implemented modeHng methods for very large data sets. In particular, we are interested in estimating and controlling errors. Well-known approaches for scattered
data modeling are the modified quadratic Shepard (MQS) [4], volume splines
[5,6], multiquadric [5,6], and volume minimum norm network (MNN) [7]. See
the references for more details on these methods.
F. Feature Segmentation
The term, "segmentation" has been defined as the process of identifying which
pixels or voxels of an image belong to a particular object, such as bone, fat, or
tissue in medical applications. Now it includes the extraction of intrinsic data
characteristics or meaningful objects from given original data, and so is often
referred to as feature extraction or segmentation.
Generally, the features are application-dependent, and so the mechanisms
for characterizing them must be flexible and general. Some researchers expect
that a certain mathematical model will make it possible to detect the feature
objects and/or represent them. Also others expect that expert system technology
will do.
Scientific data visualization has acquired increased interest and popularity
in many scientific domains. It mainly takes aims at devising mathematical principles, computational algorithms, and well-organized data structures, which
do transform massive scientific data sets into meaningful pictures and other
graphic representations that improve apprehension or inspiration. To achieve
these goals, many visualization tools and techniques have appeared and been integrated to some degree within a system. Few systems, however, fully meet their
users' various needs for visualization, and system developers are not allowed to
construct general visualization tools without considering a data dimensionality
and distribution or other intrinsic data characteristics, i.e., domain-dependent
features.
The crucial factor of these difficulties arises basically from a lack of mathematical models that represent a wide range of scientific data sets and realize
all standard visualization procedures such as volumetric modeling, graphical
representation, and feature segmentation within a unified system framework.
In this chapter, we describe a mathematical model necessary for establishing a foundation for the evolving needs of visualization systems as mentioned
above, and demonstrate its usefulness by applying it to data sets from computational fluid dynamics.
II. REPRESENTATION OF VOLUMETRIC DATA
A. Introduction to Hypervolume
To understand the internal structures or the relationships implied by scientific data sets, we first should describe their shapes, which represent a threedimensional geometric object or a region of interest in which physical phenomena occur. Also to visualize these phenomena at a given time, we must explain
the scene's appearance at that time.
52 I
For all of them, we suggest a hypervolume, as a mathematical formulation, which is based on a higher dimensional trivariate and dynamic NURBS
(nonuniform rational B-splines) representation [8-10]. The term "higher dimensional" presents no dependence of data dimensionality, and "trivariate and
dynamic" implies the capabilities of describing an evolving physical object in
spatial and temporal coordinates, respectively. NURBS, as a well-known interpolating function, plays a key role in transforming a discrete data set into a
continuous world.
The hypervolume consists of two different models that are independent of
each other. One is the geometry volume, which defines 3D geometric objects,
such as inhomogeneous materials or a region of interest in which physical phenomena occur, covered by the scientific data sets. The other is the attribute
volume, which describes scalar- or vector-valued physical field variables, such
as pressure, temperature, velocity, and density, as functions of four variables,
i.e., the three positional coordinates and time. It also can include the graphical
variables, such as color and opacity. The combination of these two volumes can
provide all the geometric, physical, and graphical information for representing,
manipulating, analyzing, and rendering scientific data sets. The relevant procedure will be explained later in detail.
(1)
[Gijki = (xijkh yijki^Zijki)} C R^ forms a tridirectional control grid in the
three-dimensional rectangular space, {hijki} are the weights, and {2S^^"(w)},
{Nj^'^iv)], [N^'^iw)], and {Ni^^t)} are the normalized B-spline basis functions
defined on the knot vectors
= l^Oj 5 tkt-l^
tkt^
522
PARK A N D LEE
Also note that the parameters, w, i/, and w^ in Eq. (1) represent three positional coordinates in the four-dimensional parameter space of the geometry
volume, and the parameter t is used for the time coordinate.
To get more understandable interpretation for Eq. (1), we can rewrite it as
G(u^ V, w^ t)
_ nio
Nt'{u)Nf"{v)Nt'{w)
T:ioET=onZohiikiJ:tohlH'Ht))N,'"{u)Nl-(v)N,''^(w)
'
^ '
(TULo hiH'Ht))
{T,ZO
J:T=O T,ZO
Nl^{u)N^{v)Nt"'(w)
hiikNl^(u)N^(v)Nt^{w))
(W
TUl, EHo EZo hk { ^ ^ ^ ) mu)N^{v)Nt"{t
hiikNt'{u)Nf^{v)Nt'{w)
(3)
Therefore, we obtain
^,
. _ E r : o Er=o
TZ^hijkGiik(t)Nl^{u)N^(v)Nt^(w)
where
From Eq. (4), we know that it has dynamic behavior since the control
grid term in Eq. (4), which is Eq. (5) or simply a NURBS curve, describes
a spatial movement or deformation as the parameter t (= time) elapses. Also
if we set the control grid term be constant at all elapsed time, then Eq. (4)
describes only the static geometric object; that is, it does not contain dynamic
behavior information. The static geometry volume is often referred to as a
NURBS volume, which has the form
i:ZoJ:T=oJ:"kZohiikN,>^(u)N^''(v)Nt"'(w)
Similar to Eq. (1), the attribute volume of a hypervolume can be expressed by
,,
^'
523
where
The attribute volume in Eq. (8) is also a quadvariate, i.e., u^ i/, w spatially
and t temporally, vector-valued piecewise rational function, of which the order
is the same as that of the geometry volume in Eq. (1). Note that Eq. (7) is also
a tensor product of the nonuniform rational B-spline of order ku^ kv, kw, and
kt in the w, v, w, and t direction respectively.
{^ijkl} C R^ forms a tridirectional control grid in the <3f-dimensional application space. For a fluid flow application, A/y^ = (Pijki^^ijki^^ijld) is defined
in this chapter where p G R M S a flow density, V e R^ is a flow velocity, and
^ G R^ is an internal energy per unit mass.
Finally, a hypervolume can be derived simply from Eqs. (1) and (7) by
the following procedures. We assume that the orders, ku, kv, kw, kt, the weight
hijki, the number of control vertices along each direction, nu, nv, nw, nt, and the
knot vectors, u, v, w, t of two volumes, i.e., Eqs. (1) and (7), are identical. Also
we substitute (^''Q^^^ for Gijk{t) in Eq. (4) and also (A.^^(^)) for P^ijk[t) in Eq. (8),
and then perform the vector sum of the two results. I^rom this summation, we
obtain our goal of a hypervolume. That is.
T^lo E ; = O L ^ ^ O
hiikNf"(u)Nf^(v)Nf{w)
-^-^nu
v-vMt/
(10)
hiik{C,,)Nhu)N^{v)mw)
-^^nw
E r : o E "=0 EZo
-%.TbiJi
\-Krt>iij
\-VTI>*/>/
hiikNt"(u)N^''(v)Nt(w)
V "'-* /
(12)
h,fkN^{u)Nf^{v)Nt"'{w)
hitky^iik(t)N^{u)N^{v)H^[tv)
where
"""> = ( S w ) -
'"'
524
This volume has the properties of convex hull, local control, and affine
invariance under geometric transformations such as translation, rotation, parallel, and perspective projections and can utilize other useful techniques developed in the CAGD (computer-aided geometric design) literature, because this
model is based on a NURBS representation.
The geometric volume, which describes a physical domain in the parametric form, provides many differential elements such as arc length, surface
area, and volume element, v^hich are often required to calculate numerically
any physical or interesting feature for a feature-based visualization.
The attribute volume, which describes field variables in the parametric
form, provides various expressions for the derivative operators such as gradient,
divergence, curl, and Laplacian, in connection with the geometry volume.
This volume allows existing visualization techniques to visualize a physical phenomena structure without changing their internal system structure, and
makes it possible to implement and enhance a feature-based visualization.
This volume permits multiple physical domains. That is, a set of hypervolumes can be dealt with from the decomposed physical domains in a systematic
way.
This volume enables us to represent/manage both the spatial and the
temporal domains. That is, the independent parameters, w, f, and w^ of a hypervolume govern a physical domain at an instant time if, and a parameter t
is used to record the time history of this physical domain. For example, in the
case of flow visualization, this volume can describe a complex physical motion
in a turbulent flow domain with a time history.
The hypervolume presented in this chapter can be applied to many physical
situations where interesting objects are evolving as time elapses. Note that
these situations typically have been described by numerical computations or
experimental measurements, and recorded into scientific data sets. Some of the
applications include:
the description of 3D geometric objects when they do not have
movement or deformation and their internal physical properties or field
variables do not vary as time elapses (e.g., the description of
homogeneous solids standing fixed),
the description of the spatial movement or deformation of 3D
geometric objects of which internal physical properties are constant
along time (e.g., the description of a rigid body motion of
inhomogeneous materials),
the description of historic records of physical field variables when 3D
geometric objects in which physical phenomena occur do not vary (e.g.,
the description of water flowing into a fixed space), and
the description of dynamics of 3D geometric objects with varying
physical field variables as time elapses (e.g., the description of gas
movement as its internal properties are varying).
Expressing these situations in terms of two constituent volumes of a hypervolume, we simply state that:
both the geometric volume and the attribute volume are static,
the geometric volume is static but the attribute volume is dynamic.
525
the geometric volume is dynamic but the attribute volume is static, and
both the geometric volume and the attribute volume are dynamic.
Note that "static" means no change along time w^hile "dynamic" denotes any
alteration as time elapses.
^{f.(.)-M=i:(f){|,^.(.)}.{^f,(x)},
dxP
(15)
where
\r)
r\(p-r)
-r-V(M,
V, w) = D^ D^D'Yiu,
V, w)
(16)
V tV) ^
EZoi:%oJ:iZoh.ikH'"{u)Nl-'(v)N,'"'(w)
n{u,v,w)
^
h(u,v,w)(17)
(18)
526
PARK A N D
LEE
(19)
and thus
^^^-E('fe)o^"'vo>
(20)
which is the rth partial derivative of a NURBS volume v^^ith respect to w. Next,
the (yth partial derivative of Eq. (20) with respect to v is given by
Finally, the pth partial derivative of Eq. (21) with respect to u is given by
-"-^M
D^
i^M
-E(y)a^"^"O^VDj/;
-E(f)DrD,^DiVD^/;
Thus we obtain
B^D^DLQ.
''ii
' M '-'ti
^V
'-^iii'
^W
'
- E (]) E (f)
Dt'Dt'Dn/DiDjh
(22)
527
which is the pth, qth, and rth partial derivatives of a NURBS volume with respect to w, Vy and w, respectively. Note that we can compute DuD$D^Q(Uy v, w)
and DtD$Dlh(u, v, w) in Eq. (22) by using the deBoor algorithm [11].
B. Generation of a NURBS Volume
In this section, we consider two simple algorithms for a NURBS volume generation. The first algorithm belongs to the generation of an interpolated volume
and the second is for swept volumes, all of which are based on NURBS parametric form.
I. Interpolated Volume
We describe how to construct a NURBS volume by an interpolation
method, which will be presented in detail below, with a volumetric data distributed over a regular Cartesian or curvilinear grid. If the volumetric data
are not available on a grid configuration, we need to go through the gridding
process and produce an array of values from original data.
The geometry volume of a hypervolume can be generated by the simple
extension of the same interpolation algorithm applied to the NURBS volume,
if we have time series information of the volumetric data. Similarly the attribute
volume can be constructed.
For a better understanding of the interpolation algorithm, we assume all homogeneous coordinates (i.e., weight values) have unit values, and thus the rational form of a NURBS volume is reduced to the simpler nonrational form. That
is, with an assumption, hijk = 1, and a relation, Y^H^o ^^"(^) = Jljlo ^f^{^) =
E ^ o H'^i^) = 1. the NURBS volume in Eq. (6) can be written as
nu
nv nw
\{u,v,w) = ^J^J2^ifkNl^(u)Nl^^v)N,^^(w).
(23)
/=:0 / = 0 k=0
Note that we need to use the rational form of Eq. (23), i.e., Eq. (6), if we have
an analytical shape [12]. For example, in analyzing or visualizing a pipe flow,
the pipe's geometry, which has the shape of a cylinder, cannot be represented
by a nonrational form.
Now, we are given a (nu -I-1) x (nv -h 1) x (nw -h 1) number of grid points
and our goal is to build a trivariate piecewise nonrational function given in
Eq. (23) that interpolates the grid data. The nonrational B-spline volume in
Eq. (23) can be written as
nu
/ nv / nw
= E
I nv
EC,7(')N,*^(z/) ^{u)
,=0 \ / = 0
(24)
nu
= J2^i(v,w)Nl^^u).
i=0
In Eq. (24), Q/(u/) and 8/(2/, w) are referred to as a control curve and a control
528
Cii(w) = J2^ijkNt^(w)
(25)
nv
Si(v,w) = J2Cii(w)Nl^'(v).
i=o
(26)
2^j=0
E nui=0 "sr^nv
^^^
^ijk
{nu + l){nv-^iy
* ^
(nv-\-l)(nw-{-ly
(nw-\-l)(nu-{-ly
From the knot vectors computed, we compute the Greville abscissa [14]
(^/, ?7y, ^)^), corresponding to the grid data P//^; they are given by
^i =
t=0,....,nu
;=0,....,w
ku
rjj = -
'^
(29)
few/
Now, with the Greville abscissa {(^/, rjj,^k)} computed in Eq. (29) and the
grid data set {Pi/)^}, we compute {Vijk} such that
/=0 \ / = 0
nu
\^=0
(30)
I nv
^=0 \ / = 0
ES.(/7;,a)^-^"fe) = P/;^,
(31)
/=0
where
nw
Qj(^k) = J2yijkNt''(^k)
(32)
nv
SMi,;k)
= Y.^iMk)Nf'{rti).
7=0
(33)
529
^S,(^o,?i)N,'"(/) = P.oi
1=0
=0
/=o
nu
^S,(r/i,fo)N;N?i) = P.io
i=0
nu
X]S,(r;i,Ci)2^^"(^,) = Pm
i=0
s,(r/i,W)N,-^"(?f) = P.i.t.
/=0
nu
^S,(^?o)Ni-^"(.) = P/.o
^=0
nu
i=0
s,(r7W)N,-^"(.) = P.
/=o
From the system of equations above, we can see that Si(rjj, f^^) is a control
net of a control curve that interpolates the {Vijk} along the u direction. Similarly, for fe = 0 , . . . , nw and / = 0 , . . . , nu, the system of (nw + 1) x (nu + 1)
equations in Eq. (33) is given by
nv
J2^0i(^o)N^''{rii) = So{r,i,?o)
nv
2Ciy(?o)N,*''K) = Si(/?y,Co)
7=0
530
PARK A N D LEE
c,(Co)N,^;,,) = S(;?y,?o)
/=o
nv
7=0
nv
1=0
/=0
nv
;=o
nv
7=0
From the system of equations above, we can see that Qy (f^) is a control net
of a control curve that interpolates the Sjirjj^ ^k) along the v direction. Again,
the system of (nu + 1) x (nv + 1) equations in Eq. (32) for / = 0 , . . . , nu and
/ = 0,...,!/ is given by
nw
^Voo^H^"(a) = Coo(a)
nw
^ V o u N / - ( a ) = Coi(a)
i^=0
vo.^H^"(a) = Co.(f^)
k=0
:=0
nw
nw
EVio^H^"(^^) = Cio(a)
k=0
nw
k=o
53 I
nw
k=0
nw
nw
J2ynulkNt''(^k)
k=0
CnulUk)
Cnunv(^k)-
jkw/
J2'^nunvkNr(^k)
k=0
From the system of equations above, we can see that Wijk is a control net of
a control curve that interpolates the Cij(^k) along the w direction. Therefore,
the Wijk is a control net or grid of a control volume that interpolates the {Pijk}
along each parametric direction.
2. Swept Volume
w
FIGURE I
532
FIGURE 2
(d) Compute the Greville points (= node points) of G(w) evaluated at the
Greville abscissa and also the tangent vectors at the same positions. For a better
understanding, see Fig. 2, which illustrates six node points, ni, ni, ns, n4, ns, n^
of G(w) when G(w) has six control points. In addition, compute the tangent
vector of G(w) at the point, P.
(e) Transform the section surface, S(w, f), by translation and rotation
through two steps as follows: The first step is to translate S(w, v) by (ni P).
The second step is to rotate S(w, v) translated in the first step. Flere, the rotational axis is determined from the cross product of two tangent vectors, one
being the tangent vector of G(w) at position P and the other being that at position ni. The rotational angle is measured from these two tangent vectors, and
the center of rotation is ni. In addition, we evaluate the node points of the
transformed S(u, v) at the Greville abscissa of S(w, v) already calculated in step
(c), and save the results together with position ni.
(f) Transform S(w, v) resulting from step (e) by a process similar to step
(e). That is, move the S(w, v) by (ni ni) and then rotate S(w, v) by the angle
between two tangent vectors, one being the tangent vector of G(w) at position
ni and the other being that at position n2. Here the rotational axis is calculated
from the cross product of these tangent vectors, and the rotation center is ni.
In addition, evaluate and save the node points of the transformed S(w, z/) at the
Greville abscissa already calculated in step (c). The same procedure is repeated
until the node points of the transformed S(w, v) are computed at the last node
point of G{w). Figure 2 illustrates Si, the section surface after translating from
n3 to n4, and Si, the section surface after rotation by the angle between two
tangent vectors of G{w) at positions ns and n^. Through the processes described
above, we can get the node points distribution in the u and v directions along
the w directional node position.
(g) Now we can get the control grid (or regular control point set) for generating a swept volume by using the interpolated method described in the previous
section, where the Greville abscissa (or node values) from step (c) and the node
points from steps (e) and (f) are used as input data. The knot vectors along
the u and v directions of a swept volume are the same as those of the section
surface S(w, v) and the w/-directional knot vector of a swept volume is the same
as that of the guide curve G{w),
Note that the procedure for constructing extruded volumes and revolved
ones can be easily derived from the swept volume algorithm described above.
tc
FIGURE 3
533
G,(w)
Four guide curves and three section surfaces for a swept volume.
The extruded volume can be viewed as a special case of swept volume when
the guide curve G(w) is a straight line. Similarly the revolved volume can be
generated if G(w) is a circular arc or a circle.
Furthermore, we will present another algorithm for constructing a swept
volume in a situation different from that described above. Its configuration is
shown in Fig. 3. That is, four guide curves, Gi(w), Giiw), G3(w/), and G4(w/),
and n section surfaces, Si(w, f), Siiw, z/),
, S(w, v), are used to define the
swept volume. Note that only three section surfaces are shown in Fig. 3. The
detail procedure is described as follows:
(a) Input four guide curves Gi(w), Giiw)^ G^iw), and G4(w) and n section
surfaces, Si(w, f), Silw, z/),, S(w, v). See Fig. 3 for w = 3.
(b) Arrange the four guide curves to have the same direction as that shown
in Fig. 3. All section surfaces are also adjusted to have the same configuration,
and their normal vectors are directed toward the same direction as the four
guide curves as shown in Fig. 3.
(c) Reparameterize the four guide curves such that the parameter for each
curve ranges from 0.0 to 1.0. Similarly, reparameterize all the section surfaces
such that each surface has a parameter range 0.0 to 1.0 along the u and v
directions.
(d) Raise the degree of the guide curves of lower degree such that the four
guide curves have the same degree, and insert proper knots such that each curve
has the same knot vector.
(e) Raise the degree of section surfaces of lower degree such that all the
section surfaces have the same degree along the u and v directions, and insert
proper knots such that each surface has the same knot vectors along u and v
directions.
(f) For the /th section surface, take the following procedures: Find ti, t2, ^3,
t4 such that Gi(^i) = S,(0,0), d i f e ) = S,(1,0), 03(^3) = S,(0,1), and 04(^4) =
S/(l,l). If ^1, ti, ^3, ^4 are not same, then for the /th guide curve (; = 1,2,3,4),
split G; (w)atw = tj, resulting in two splitted curves, G^j(w = 0:tj) and G^w =
tj : 1), and then reparameterize Gy(w/ = 0: tj) and G^Aw = tj:l) such that they
have the parameter range [0, (^1 +^2 + ^3 + ^4)/4] and [(^1 + fe + ^3 + fc^)/4,1],
respectively.
534
PARK A N D LEE
13
w
FIGURE 4
Note that each of the four corner points of the ith section surface hes on
the corresponding guide curve at the same parameter value, (ti +^2 + ^3+ ^4)/4
(= ^),which means that S/(0,0) = G i ( ^ ) , S,(1,0) = GiiU, S,(0,1) = Gal^^),
and S/(l,l) = G4(tfn). Thus ^^ is the common Greville abscissa (= node value)
of each guide curve, and the four corner points are the node points of the
corresponding guide curves.
(g) Transform all the section surfaces to form a single merged surface located at the new node points of the four guide curves at which the section
surfaces are not originally located. Figure 4 shows the configuration for this
transformation.
Let us denote the new node points by 01,02,03,04, where o/ is the node
point of G/(w^)(/ = 1,2,3,4), which can be computed in step (f). Also we denote
by nil, 11125 ni3 the node points of GI(M;) at which the section surfaces have
originally been located. Similarly, the node points of Giiw)^ G^iw), and G4(w;)
are denoted by n2i, n2251123, n3i, n32, n33, and n4i, n425 n43, respectively. Now,
we compute Sii,5i2,si3, which are arc lengths between oi and n n , ni2, ni3
along G\{w), Similarly, we compute 521,5225523,531,532,533, and 541,542,543 along
Giiw)^ G3(w^), and G4(w/), respectively.
The transformation consists of two steps. One is the transformation of
each section surface. That is, in Fig. 4, Si(^, f), S2(w, f), and S3(^, v) are transformed into Sliu, v), Sl(u, v), and Sl(u^ v) such that the four corner points of
each transformed surface are located at 01,02,03, and 04. The details of the
transformation will be illustrated later in Fig. 5. The other step is the weighted
sum of S*(w, f). First, we will explain the weighted sum. In Fig. 4, we compute
the weighting factor 5/ of each section surface S/(w, v) by
1
^1 = ^11
c-. +^c..
521 -Lc..
+ 531 J-c.
+ 541 >'
4
1
53
1
^2^ ~ 512 + 522 + ^32 + ^42 '
535
SXu,v)
/i^
FIGURE 5
from which we know that the weighting factors are determined from the inverse
of the average distance between the new node points oi, 02, 03, 04 and each
section surface Si(u, v). Thus, by using the weighting factors, we compute the
single merged surface S(w, v) by
siSt+528^+538^
S(w, v) =
-.
Si + S2 + S3
Now, we will explain the transformation from Sj (w, v) to S*(w, v) with Fig. 5.
For the ith section surface, take the following procedures:
(1) Compute the w-directional node values of Si(u,v) and then compute
the i/-ray curves, isoparametric curves along the v direction at each of the
w-directional node values. Here, we know that the number of i/-ray curves is
equal to that of the w-directional node points.
(2) Compute the w-ray curves and iso parametric curves along the u direction, at the staring and ending v parameters of S^(w, v). Again we know that
the number of the u-ray curves is two.
(3) Move the two w-ray curves to the positions oi, 02 and 03, 04. That is,
move both ending points of the lower u-ray to the positions oi and 02, and
the upper w-ray to 03 and 04, while preserving the tangent vectors of each of
the w-ray curves at its end points during the movement. Then evaluate the two
u-ray curves at the ^-directional node values already calculated in step (1).
(4) Move each of the i/-ray curves into two corresponding node points, one
being calculated from the upper u-ray and the other from the lower in step (3),
while preserving the tangent vectors of the v-ray curve at its end points during
the movement. This movement process is applied to all v-ray curves positioned
at the w-directional node points.
(5) Compute S*(w, v) by interpolating the node points of the v-ray curves
obtained in step (4).
(h) By using the interpolated volume method, we obtain the control grid of
a swept volume from the initial section surfaces and their transformed surfaces
obtained in step (g). Note that the u- and z/-knot vectors of the swept volume
are equal to those of anyone of the section surfaces, and the wz-knot vector is
equal to that of anyone of the guide curves.
536
PARK A N D LEE
H
O original point
O grid point
FIGURE 6
537
M
original point
O grid point
FIGURE 7
538
FIGURE 8
FIGURE 9
539
its four vertices. By using this method, the interior planes can be viewed if
the exterior planes are rendered as partially transparent. In real practice, the
number of planes embedded in the volume model cannot be too large.
Direct volume rendering involves "shooting" rays through an object. To
simplify the required summation or integration process [23], the object is usually divided into a large number of uniformly sized boxes called voxels. Then
for each voxel, the voxel intensity and its gradient magnitude should be calculated in order to classify and render the data set. The classification requires
calculating an optical density value for each voxel in the data set, called opacity.
The opacity is often used to put a stress on any meaningful surfaces or boundaries between different materials. Typically, the opacities are calculated from
either voxel intensities or a combination of voxel intensities and gradient information. Once the data set is classified and the opacity and color are computed,
we obtain two new voxel data sets, a color and an opacity data set. Then we
simply determine the color and opacity on sampling points (generally not on
voxel positions) by commonly trilinear interpolation, but this step may lower
the quality of the image. As an alternative, we may directly compute the color
and opacity on the sampling positions without using interpolation during the
rendering process. However, this step may increase the complexity of the algorithm and require more computational processes. Finally, the rendering images
are constructed by using several rendering techniques. The major distinction between the techniques is the order in which the voxels are processed to generate
an image, and they are grouped into image-order and object-order algorithms.
Examples of image-order algorithms are ray casting and Sabella's method [24].
The splatting algorithm [19], V-buffer algorithm [25], and sHce shearing algorithm [26] belong to object-order algorithms. Note that these methods require
a voxelization algorithm or an octree structure [18] as a preprocessing step,
and have a disadvantage that they are computationally intensive in any case.
As a generalization of the direct volume method, a method based on linear
transport theory has been proposed [23]. In this method, the rays are replaced
by virtual (light) particles moving through the data field. These particles interact with the 3D field data (which is represented by grid points) and collect
information about the light intensity for screen display. The general transport
theory model for volume rendering is given by the integro-differential equation.
The evaluation of this model requires very expensive computations, and thus
540
54 I
542
F I G U R E 10 A rectangular duct geometry. Reprinted from S. Park and K. Lee, (1997) with permission
from Elsevier Science.
developed in this chapter. Then the streamUnes or stream surfaces are computed
by a procedure to be explained later under Feature Segmentation (see next section). In addition, the pressure as an additional field variable is concurrently
derived from the generated hypervolume using the laws of physics. Finally, the
computed results are displayed with various different sizes and colors, which
map the computed numerical values to the visual images on a computer screen.
Our first example is an internal flow in a rectangular duct as shown in
Fig. 10, where several planes are removed to show the duct geometry in detail.
Figure 11 shows the four streamtubes in the duct. Note that the starting seed
curves of the streamtubes are four circles of the same radius, and the displayed
color and radius of the streamtubes indicate the magnitude of flow velocity and
the flow pressure, respectively.
Our second example illustrates the capabilities of multiple hypervolumes
applied to decomposed flow domains as shown in Fig. 12. Figure 13 shows a
F I G U R E I I Four streamtubes in a rectangular duct where color and radius indicate the flow velocity
and pressure, respectively. Reprinted from S. Park and K. Lee, (1997) with permission from Elsevier
Science.
543
F I G U R E 12 Decomposed flow domains in a tube banl<. Reprinted from S. Park and K. Lee, (1997)
with permission from Elsevier Science.
F I G U R E 13 Streamlines in a tube bank where the diameter of the particles indicates the pressure.
Reprinted from S. Park and K. Lee, (1997) with permission from Elsevier Science.
544
F I G U R E 14 Six streamtubes in a tube bank where color and radius indicate the flow velocity and
pressure, respectively. Reprinted from S. Park and K. Lee, (1997) with permission from Elsevier Science.
frame at an instant when particles, having left their start points, are located
at specific positions along their trajectories. In the figure, the diameter of the
particles indicates the instantaneous pressure along the path. Figure 14 shows
the six streamtubes in the tube bank in Fig. 12.
Our third example presents the stream surfaces around an airplane wing as
shown in Fig. 15. The starting curve is a line, and the line becomes curved along
its trajectory in the figure. Note that the color indicates the relative magnitude
of a flow velocity.
C. Feature Segmentation
In this section, we will show how a streamline is generated from the hypervolume as an example of feature segmentation. This example will show the
capabilities of our hypervolume for a feature-based visualization. As is well
known, a streamline in a steady fluid velocity field is a trajectory of a single small particle injected into the flow. The streamline, being integral curves,
gives insight into the global structure of the velocity vector field and serves
as a basic element for other flow visualizations; e.g., a stream surface can be
seen as a set of streamlines. The streamline in a steady state can be computed
by
X = xo + I y dt^
Jo
V = vo + / 2i dt
Jo
using a velocity vector field v(^) and an initial seed point xo(fe)
(34)
545
F I G U R E 15 Four stream surfaces around an airplane wing where color indicates the flow velocity.
Reprinted from S. Park and K. Lee, (1997) with permission from Elsevier Science.
If we assume that the acceleration a in Eq. (34) is constant during A^, then
the streamHne equation in Eq. (34) can be rewritten as
1
Ax = vA^+
-a(Atr.
(35)
Thus, we can get the streamUne if we can compute the term a in Eq. (35). The
acceleration a can be expressed by the velocity v, the partial derivatives d\/du,
d\/dv, d\/dw, and the contravariant basic vectors Vw, Vv, Vu/ as follows.
Since
dy
dt
9v du
du dt
9v dv
dv dt
9v dw
dw dt
where
du dy
du dz
'dt~'dx'dt
du
du dx
^~dt
8^a7~
dv
dv dx
dv dy
dv dz
dt
dx dt
dy dt
dz dt
and
dw
dwdx
dwdy
dwdz
=
\
-\
= Ww
dt
dx dt
dy dt
dz dt
546
PARK A N D LEE
(36)
Here we can easily get V^, Vz/, Vw/ from the geometry volume and v, d\/du,
dy/dv, dy/dw from the attribute volume suggested in this chapter. Note that the
parameters, u, f, and w represent three positional coordinates in the parameter
space of a hypervolume.
Therefore, wt can compute Ax during a time interval A^ from Eqs. (35)
and (36). Note that if the time interval A^ decreases, the assumption that a is
constant during the A^ is reasonably valid and so we can obtain a more realistic
streamline during A^. Finally, the streamline over a full domain of a fluid flow
can be easily obtained as follows:
(1) Find (t/o, VQ, WQ) such that 0(^0? ^Oj ^o) = XQ, where XQ is an initial seed
point and G{u, v, w) is the geometry volume suggested in this chapter. Note that
the Newton-Raphson method for solving this nonlinear equation is applied to
form
d(Ui, Vj, Wk) dV(Ui, l/y, Wk) d(Ui, Vj, WkY
du
dv
dw
Ui+1 - Ui
[-V(ui,Vj,Wk)],
Wk+1 - Wk
547
Despite many needs for a data model to build a flexible visualization system
and support various application developments, a very powerful data model has
not yet been established. The data model should describe adequately the full
range of data used, allow a simple evaluation for rendering purposes, and derive
all the required data, e.g., higher-order derivative data, from the given data set
without additional computations. These requirements or expectations will be
sufficiently satisfied if the hypervolume proposed in this chapter is chosen.
We have suggested the hypervolume model as a mathematical representation for modeling and rendering volumetric data, and have described two
methods for creating a volume model, depending on the spatial distribution of
volumetric data. The interpolated volume method can be applied to the gridded
distribution of volumetric data, and the swept volume method can be utilized
for volumetric modeling of the cross-sectional distribution of data sets. We have
also introduced several gridding methods for scattered volumetric data, from
which we can obtain discrete grid points to be directly used in the interpolated
volume method. In addition, we have mentioned several rendering methods of
volumetric data, all of which can be implemented with a hypervolume.
We are currently verifying that the hypervolume can provide a mathematical framework for the analysis of integrated, multidisciplinary simulations that
involve interactions between disciplines such as fluid dynamics, structural dynamics, and solid mechanics. The hypervolume model presented in this chapter
has a contribution in these senses.
REFERENCES
1. McCormick, B. H., DeFanti, T. A., and Brown, M. D. (Eds.). "Visualization in Scientific Computing." Computer Graphics 21(6): Nov. 1987.
2. Kaufman, A., Cohen, D., and Yagel, R. Volume graphics.Compwf^r 26(7):51-64, 1993.
3. Muraki, S. Volume data and wavelet transform. IEEE Comput. Graphics Appl. 13(4):50-56,
1993.
4. Franke, R., and Nielson, G. Smooth interpolation of large sets of scattered data. Internat. J.
Numer. Methods Engrg. 15:1691-1704, 1980.
5. Franke, R., and Nielson, G. Scattered data interpolation and applications: A tutorial and survey. In Geometric Modeling: Methods and Their Application (H. Hagen and D. Roller, Eds.),
pp. 131-160. Springer-Verlag, Berlin, 1990.
6. Nielson, G. et al. Visualization and modeling of scattered multivariate data. IEEE Comput.
Graphics Appl. ll(3):47-55, 1991.
7. Nielson, G. A method for interpolating scattered data based upon a minimum norm network.
Math. Comp. 40:253-271, 1983.
8. Casale M. S., and Stanton, E. L. An overview of analytic soUd modeling. IEEE Comput.
Graphics Appl. 5:45-56, 1985.
9. Lasser, D. Bernstein-Bezier representation of volumes. Comput. Aided Geom. Design 2:145149, 1985.
10. Lasser, D., and Hoschek, J. Fundamentals of Computer Aided Geometric Design. A K Peters,
Wellesley, MA, 1993.
11. de Boor, C. A Practical Guide to Splines. Springer-Verlag, New York, 1978.
12. Tiller, W. Rational B-splines for curve and surface representation. IEEE Comput. Graphics
Appl. 3:61-69,1983.
13. Hartley, R J., and Judd, C. J. Parametrization and shape of B-spline curves for CAD. Comput.
Aided Design 12:235-238, 1980.
14. Farin, G. Curves and Surfaces for Computer Aided Geometric Design, pp. 150-151. Academic
Press, San Diego, 1990.
548
PARK A N D LEE
15. Agishtein, M. E., and Migda, A. A. Smooth surface reconstruction from scattered data points.
Comput. Graphics (UK) 15(l):29-39, 1991.
16. Hibbard, W., and Santek, D. The VIS-5D system for easy interactive visuaUzation. In Proceedings of Visualization 90 (A. Kaufman, Ed.), pp. 28-35. IEEE Comput. Soc. Press, Los Alamitos,
CA, 1990.
17. Shepard, D. A two dimensional interpolation function for irregular spaced data. In Proceedings
23rd ACM National Conference, pp. 517-524,1968.
18. Levoy, M. Efficient ray tracing of volume data. ACM Trans. Graphics 9:245-261,1990.
19. Westover, L. Footprint evaluation for volume rendering. Comput. Graphics 24(4):367-376,
1990.
20. Nielson, G. M., and Hamann, B. Techniques for the visualization of volumetric data. In Visualization '90 (A. Kaufman, Ed.), pp. 45-50. IEEE Comput. Soc. Press, Los Alamitos, CA,
1990.
21. Nielson, G. M., Foley, Th. A., and Hamann, B., Lane, D. Visualizing and modeling scattered
multivariate data. IEEE Comput. Graphics Appl. 11:47-55,1991.
22. Pajon, J. L., and Tran, V. B. Visualization of scalar data defined on a structured grid. In Visualization '90 (A. Kaufman, Ed.), pp. 281-287. IEEE Comput. Soc. Press, Los Alamitos, CA,
1990.
23. Kriiger, W. The application of transport theory to visualization of 3D scalar data fields. Comput.
Phys. (July/Aug): 397-406,1991.
24. Sabella, P. A rendering algorithm for visualizing 3D scalar fields. ACM Comput. Graphics
22:51-58, 1988.
25. Upson, C , and Keeler, M. V-BUFFER: Visible volume rendering. ACM Comput. Graphics
22:59-64,1988.
26. Lacroute, P., and Levoy, M. Fast volume rendering using a shear-warp factorization of the
viewing transformation. Comput. Graphics 28(4):451-458,1994.
27. Kriiger, W., and Schroder, P. Data parallel volume rendering algorithms for interactive visualization. Visual Comput. 9:405-416, 1993.
28. Hessehnk, L., and Delmarcelle, T. Visualization of vector and tensor data sets. In Frontiers in
Scientific Visualization (L. Rosenblum et al., Eds.). Academic Press, New York, 1994.
29. Delmarcelle, T , and Hesselink, L. A unified framework for flow visualization. In Engineering
Visualization (R. Gallagher, Ed.). CRC Press, Boca Raton, FL, 1994.
30. Kenwright, D., and Mallinson, G. A streamline tracking algorithm using dual stream functions.
In Proceedings of Visualization '92, pp. 62-68. IEEE Comput. Soc. Press, Los Alamitos, CA,
1992.
31. Hultquist, J. P. M. Constructing stream surfaces in steady 3D vector fields. In Proceedings of
Visualization '92, pp. 171-178. IEEE Comput. Soc. Press, Los Alamitos, CA, 1992.
32. de Leeuw, W. C , and van Wijk, J. J. A probe for local flow field visualization. In Proceedings
of Visualization '93, pp. 39-45. IEEE Comput. Soc. Press, Los Alamitos, CA, 1993.
33. Helman, J., and Hesselink, L. Visualizing vector field topology in fluid flows. IEEE Comput.
Graphics Appl. 11:36-46,1991.
34. Globus, A., Levit, C , and Lasinski, T. A tool for visualizing the topology of 3D vector fields.
In Proceedings of Visualization '91, pp. 33-40. IEEE Comput. Soc. Press, Los Alamitos, CA,
1991.
35. Delmarcelle, T , and Hesselink, L. Visuafizaing second-order tensor fields with hyperstreamlines.
IEEE Comput. Graphics Appl. 13:25-33, 1993.
36. Silver, D., and Zabusky, N. J. Quantifying visualizations for reduced modeling in nonlinear
science: Extracting structures from data sets. / . Visual Commun. Image Represent. 4:46-61,
1993.
37. Kerlick, G. D. Moving iconic objects in scientific visualization. In Proceedings of Visualization
'90 (A. Kaufman, Ed.), pp. 124-130. IEEE Comput. Soc. Press, Los Alamitos, CA, 1990.
38. Buning, P. G., and Steger, J. L. Graphics and flow visualization in CFD. In AIAA 7th CFD
Conference, Cincinnati, OH, AIAA Paper 85-1507-CP, pp. 162-170,1985.
39. Park, S., and Lee, K. High-dimensional trivariate NURBS representation for analyzing and
visualizing fluid flow data. Computers and Graphics 21(4):473-482, 1997.
THE DEVELOPMENT OF
DATABASE SYSTEMS FOR
THE CONSTRUCTION OF
VIRTUAL ENVIRONMENTS
WITH FORCE FEEDBACK
HIROO IWATA
Institute of Engineering Mechanics and Systems^ University ofTsukuba,
Tsukuba 305-8573, Japan
I. INTRODUCTION 550
A. Haptic Interface 550
B. Issues in Haptic Software 552
II. LHX 554
A. Basic Structure of LHX 554
B. Implementation of LHX 556
C. Haptic User Interface 557
III. APPLICATIONS OF LHX: DATA HAPTIZATION
557
A. Basic Idea of Haptization 557
B. Methods of Haptization 559
C. Volume Haptics Library 561
IV. APPUCATIONS OF LHX: 3D SHAPE DESIGN USING
AUTONOMOUS VIRTUAL OBJECT 563
A. Shape Design and Artificial Life 563
B. Methods for Interaction with Autonomous Virtual
Objects 564
C. Direct Manipulation of Tree-Like Artificial Life 564
D. Manipulation of Autonomous Free-Form Surface 565
V OTHER APPLICATIONS OF LHX 570
A. Surgical Simulator 570
B. Shared Haptic World 570
C. HapticWeb 571
VI. CONCLUSION 571
REFERENCES 571
549
550
HIROOIWATA
I. INTRODUCTION
It is well known that sense of touch is inevitable for understanding the real
world. Force sensation plays an important role in the manipulation of virtual
objects. A haptic interface is a feedback device that generates skin and muscle
sensation, including sense of touch, weight, and rigidity. We have been working in research on haptic interfaces in virtual environments for a number of
years. We have developed various force feedback devices and their applications. In most cases of haptic interfaces, software of a virtual environment is
tightly connected to the control program of force displays. This problem is a
hazard for development of further applications of haptic virtual environments.
In 1991, we started a project for the development of a modular software tool
for a haptic interface. The system was called VECS at the time. We have been
improving the software tools to support various force displays and their applications [3,4,6]. Our latest system is composed of seven modules: the device
driver of force display, haptic renderer, model manager, primitive manager,
autonomy engine, visual display manager, and communication interface. The
system is called LHX, named for library for haptics. Various types of force
displays can be plugged into LHX. This chapter introduces techniques and applications in the development of a database system for haptic environments
using LHX.
A. Haptic Interface
A haptic interface, or force display, is a mechanical device that generates reaction forces from virtual objects. Research activities into haptic interfaces
have been rapidly growing recently, although the technology is still in a
state of trial-and-error. There are three approaches to implement an haptic interface: tool handling-type force display, exoskeleton-type force display,
and object-oriented-type force display. We have developed prototypes in each
category.
I. Tool Handling-Type Force Display
Tool handling-type force display is the easiest way to realize force feedback.
The configuration of this type is similar to a joystick. Virtual world technology
usually employs glove-like tactile input devices. Users feel anxious when they
put on one of these devices. If the glove is equipped with a force feedback
device, the problem is more severe. This disadvantage obstructs practical use of
the haptic interface. Tool handling-type force display is free from being fitted
to the user's hand. Even though it cannot generate force between the fingers, it
has practical advantages.
We developed a 6-DOF (degree-of-freedom) force display which has a ball
grip [5]. The device is called "HapticMaster" and is commercialized by Nissho
Electronics Co. (Fig. 1). The HapticMaster is a high-performance force feedback
device for desktop use. This device employs a parallel mechanism in which a
top triangular platform and a base triangular platform are connected by three
sets of pantographs. The top end of the pantograph is connected with a vertex
FIGURE I
551
HapticMaster.
of the top platform by a spherical joint. This compact hardware has the ability
of carrying a large payload. Each pantograph has three DC motors.
The grip of HapticMaster can be replaced to fit specialized applications. A
simulator of laparoscopic surgery, for example, has been developed by attaching
a real tool for surgery on the top plate of the HapticMaster.
2. Exoskeleton-Type Force Display
In the field of robotics research, master manipulators are used in teleoperation. Most master manipulators, however, have large hardware with high
costs, which restricts their application areas. In 1989, we developed a compact
master manipulator as a desktop force display [2]. The core element of the
device is a 6-DOF parallel manipulator, in which three sets of pantograph link
mechanisms are employed. Three actuators are set coaxially with the first joint
of the thumb, forefinger, and middle finger of the operator.
We improved the device by increasing the degrees-of-freedom for each finger. We have developed a new haptic interface that allows 6-DOF motion for
three independent fingers: thumb, index finger and middle finger [12]. The device has three sets of 3-DOF pantographs, at the top of which three 3-DOF
gimbals are connected. A thimble is mounted at the end of each gimbal. The
thimble is carefully designed to fit the finger tip and is easy to put on or take
552
HIROO IWATA
3 DOF manipulator
FIGURE 2
off. The device applies 3-DOF force at each finger tip. The user can grasp and
manipulate virtual objects by their three fingers. Figure 2 illustrates the mechanical configuration of the device. This device is easily realized by replacing
the top platform of the HapticMaster with three gimbals with thimbles.
3. Object-Oriented-Type Force Display
Object-oriented-type force display is a radical idea of design of a haptic
interface. The device moves and deforms to present shapes of virtual objects. A
user of the device can make contact with a virtual object by its surface. It allows
natural interaction as compared to that of exoskeleton and tool handUng type.
However, it is fairly difficult to implement. Furthermore, its ability to simulate
virtual objects is limited. Because of these characteristics, object-oriented type
is effective for specific applications. We focused on 3D shape modeling as an
application of our object-oriented-type force display. We have developed a prototype named Haptic Screen [13]. The device employs an elastic surface made
of rubber. An array of actuators is set under the elastic surface. The surface
deforms by the actuators. Each actuator has force sensors. Hardness of the surface is made variable by these actuators and sensors. Deformation of a virtual
object occurs according to the force applied by the user. Figure 3 illustrates the
mechanical configuration of the Haptic Screen.
B. Issues in Haptic Software
Software tool for development of a virtual environment is one of the key technologies in the field of virtual reality. There are commercially available software
tools such as WorldToolKit, dVS, and Super Scape VRT [7-9]. Those softwares
are not designed to support a haptic interface.
553
projector
554
HIROO IWATA
II. LHX
A. Basic Structure of LHX
In order to deal with the issues in haptic software discussed in the previous
section, LHX is composed of seven modules: a device driver of force display, a
haptic renderer, a model manager, a primitive manager, an autonomy engine, a
visual display manager, and a communication interface. By dividing these modules, force displays and virtual environments are easily reconfigured. Figure 4
shows the basic structure of LHX.
Functions }i those modules are as follows:
I. Device Driver
A device driver manages sensor input and actuator output for a haptic
interface. Various types of haptic interface can be connected to LHX by changing the device driver. We developed a device driver of above-mentioned force
displays so that they can be connected to LHX.
secondary process
'
Shared Memory
) r"
primary process
FIGURE 4
555
spnng
dumper
FIGURE 5
2. Haptic Renderer
LHX supports tw^o haptic renders: surface renderer and volume renderer.
Surface renderer is implemented by a spring and dumper model (Fig. 5). Volume
renderer is implemented by mapping voxel data to force and torque (Fig. 6).
b. Exoskeleton-Type Renderer
mapping
P(^xJ^Ji)
force
556
HIROO IWATA
Users of the LHX program determine the methods for interaction between the
virtual objects and the operators.
4. Primitive Manager
LHX has a network interface by which multiple force displays are connected to each other. Multiple users can simultaneously interact in the same
virtual environment. This function enables easy construction of a groupware program. LHX supports TCP/IP so that the system can use the existing
Internet.
7. Visual Display Manager
The visual display manager generates graphic images of a virtual environment. This module translates the haptic model into OpenGL format. HMD,
stereo shutter glasses, and a spherical screen are supported as visual displays.
B. Implementation of LHX
LHX is currently implemented in SGI and Windows NT workstations. Considering the connection of a haptic interface to a visual image generator, SGI and
Windows NT workstations are the most promising platforms. C + + is used
for its implementation. Since our force displays are interfaced with PCs, the
device driver module is implemented in a PC. The host workstation and PC are
connected by RS-232C.
LHX is composed of two processes: visual feedback and force feedback.
The visual feedback process runs the visual display manager, primitive manager,
and autonomy engine. The force feedback process runs the other modules.
Shared memory is used for communication between these processes. The required update rate of force feedback is much higher than that of visual feedback. Images can be seen continuously at an update rate of 10 Hz. On the
other hand, force feedback requires 40 Hz at least. In LHX the force feedback
process has a higher priority than the visual feedback process. LHX enables a
high update rate of force display in a complex virtual environment.
557
slide bar
FIGURE 7
Haptic icon.
558
HIROO IWATA
FIGURE 8
Visual information essentially consists of a two-dimensional image. A threedimensional scene is recognized by binocular parallax cues or motion parallax.
Complex 3D objects are often difficult to comprehend because of occlusion. A
possible method for visual representation of such objects is semi-transparent
graphics. However, multiple objects are overlapped in the image. This drawback
leads to difficulty in distinguishing objects.
Visual representation of higher-dimensional and multiparameter data sets
is a much harder problem. A typical technique for visualizing those data
is iconification. For example, a vector field of fluid dynamics is visualized
by streamline. Effectiveness of the technique depends on icon design. Inadequate design of icons leads to a misunderstanding of volume data. Moreover,
higher-dimensional values, such as four or five, are difficult to be mapped to
icons.
The major objective of our research is representation of volume data by
force sensation. Force sensation plays an important role in the recognition of
3D objects. An example of haptic representation of scientific data is found in
the work of Brooks et aL [1]. A complex molecular docking task is assisted
by a force reflective master manipulator. In this work, force display is used
for magnifying the length and scale of molecules. We are proposing the haptic
mapping of general physical fields.
Force sensation contains six-dimensional information: three-dimensional
force and three-dimensional torque. Therefore, higher-dimensional data can be
represented by force sensation. The basic idea of volume haptization is mapping voxel data to force and/or torque (Fig. 6). The Haptic Master is used for
559
volume haptization. The force display is combined with real-time visual images
of volume data.
B. Methods of Haptization
Volume data consist of scalar, vector, or tensor data found in a threedimensional space. Data consisting of single scalar values is the simplest case.
Often multiple parameters occur at the same voxel, some of w^hich are sets of
scalars and vectors. For example, data from computational fluid dynamics may
consist of a scalar for density and vectors for velocity. Visualization of such
multiparameter data sets is a difficult problem. A combination of visualization
techniques can be used, but there is a danger of the image becoming confusing.
We propose representation of volume data by force sensation. Our force
display has an ability to apply six-dimensional force and torque at the fingertips.
Values at each voxel can be mapped to force and/or torque. Visual information
has an advantage of presenting whole images of objects. On the other hand,
haptic information has an advantage in presenting complex attributes of local
regions. In our system, a visual image of volume data is represented by direct
volume rendering using semi-transparent graphics.
Methods of haptization can be classified into the following three categories:
I. Haptic Representation of Scalar Data
There are two possibilities for mapping scalar data to force/torque. One is
mapping scalar values to torque vectors:
where S(x, y, z) is a scalar value at each voxel and a is a scaling factor. In
this case, the direction of these torque vectors is the same. The user's hand
is twisted at each voxel. The other method is mapping gradients of scalar values
to three-dimensional force vectors (Fig. 9):
F
=a[-gr^dS(x,y,z)].
(2)
This formula converts the scalar field to a three-dimensional potential field. The
user's hand is pulled toward a low-potential area. This method magnifies the
FIGURE 9
560
HIROO IWATA
transition area of density data. As for medical imaging such as ultrasound scanning, voxel data are classified according to density. Representation of gradients
by force will be effective in such applications.
2. Haptic Representation of Vector/Tensor Data
(3)
where V(x, y, z) are vector data at each voxel. Tensor data is given by a matrix
that has nine components. These components cannot be directly mapped to
a haptic channel. Some components must be selected according to the user's
interest:
F or T = a(Tij,Tki,Tmn),
(4)
Multiparameter data sets that consist of one or more scalars and vectors
can be mapped to force and torque. For example, velocity is mapped to force
and density is mapped to one component of torque:
F =
aV(x,y,z)
% = b[S(x, y, z)l
(5)
(6)
force (velocity)
F I G U R E 10
torque (vorticity)
561
[ Shared Memory ^
j^bjecf
transformation!
parameter
data
562
HIROOIWATA
F I G U R E 12
563
564
HIROOIWATA
565
SELECT
RAMIFICATION
POINT h I
SELECT GROWTH
DIRECTION Ljk
/
Lik
n A
/L
^ilJ
SET BRANCH
( / /
i i
\ /
..-,.
(c)
566
HIROO IWATA
F I G U R E 14
Virtual tree.
The basic theme of the form is created through deformation of the sphere. We
implemented three functions in the autonomous surface:
a. Restoration
Each lattice of the surface is connected by four springs (Fig. 15). These
springs generate surface tension. If the surface is pulled or pushed by the
user, the surface tension restores the surface to the original sphere (Fig. 16).
Unexpected shapes are found during the autonomous deformation process.
The user can manipulate the surface while it is autonomously deforming. In
F I G U R E 15
F I G U R E 16
An example of restoration
567
568
HIROO IWATA
F I G U R E 17
The user can put out "food" for the surface. If the food comes near, the
surface reaches out to it. The motion of each lattice is caused according to the
distance between the surface and the food. Transition from the original surface
is determined by a negative exponential function (Fig. 17). If the food is large,
a large area of the surface deforms.
c. Avoidance from Enemy
The user can put out "enemy" for the surface. If the enemy comes near, the
surface avoids it. The motion of each lattice is caused according to the distance
between the surface and the enemy. Transition from the original surface is
Gc-enemy
F I G U R E 18
F I G U R E 19
569
570
HIROO IWATA
2.5
2
1.5
1
0.5
condition(1)
with autonomy
condition(2)
without autonomy
Laparoscopic surgery requires training in a virtual environment. We are collaborating with Olympus Co. on developing a training simulator for laparoscopy
[15]. Two specialized HapticMasters are used for both hands. LHX supports
two force displays simultaneously.
Primitives of LHX include autonomous free-form surfaces. The surface has
functions similar to that of living creatures. The free-form surface has surface
tension, which enables restoration to the original shape. We developed virtual
tissue by using an autonomous free-form surface. An operator can feel viscoelasticity of the virtual tissue. The virtual tissue can be cut at any place in
realtime.
57 I
C. HapticWeb
The "HapticWeb" is a WWW client, which enables a user to feel the rigidity
or weight of a virtual object. HapticWeb uses the haptic renderer of LHX. It
realizes force feedback from the VRML data set. Users can feel the rigidity or
weight of virtual objects stored in a WWW server. The system was demonstrated
at SIGGRAPH'96.
We also developed an authoring tool for the HapticWeb. Parameters of the
rigidity and weight of virtual objects are presented by 3D icons. The user can
change the rigidity or weight of virtual objects by manipulating slide bars.
We observed the behavior of users at SIGGRAPH'96. A total of 647 people
experienced our system. Of these, 637 people (98%) could feel force feedback,
446 people (69%) found haptic icons, and 642 people (99%) of them could
manipulate the slide bars. The result shows that the design of the haptic icon
was successful.
VI. CONCLUSION
This chapter introduced techniques and applications in the development of a
database system for a haptic environment. We developed a software infrastructure for a haptic interface named LHX. It achieved the reproductivity of
software of virtual environments with force feedback. The software tool has
been improved through various applications including haptization of scientific
data and 3D shape manipulation. Future work will be the further development
of practical software tools such as the volume haptic library.
REFERENCES
1. Brooks, E P. et al. Project GROPEHaptic displays for scientific visualization. ACM
SIGGRAPH Comput. Graphics 24(4):177-185, 1990.
2. Iwata, H. Artificial reality with force-feedback: Development of desktop virtual space with
compact master manipulator. ACM SIGGRAPH Comput. Graphics 24(4):165-170, 1990.
3. Iwata, H., and Yano, H. Artificial life in haptic virtual environment. In Proceedings ofICAT'93,
pp. 91-96, 1993.
4. Iwata, H., and Yano, H. Interaction with autonomous free-form surface. In Proceedings of
ICAT94, pp. 27-32, 1994.
5. Iwata, H. Desktop force display. In SIGGRAPH'94 Visual Proceedings, 1994.
6. Yano, H., and Iwata, H. Cooperative work in virtual environment with force feedback. In
Proceedings ofICAT/VRST9S,
pp. 203-210, 1995.
7. World Tool Kit User's Guide, SENSE8.
8. dVS, http://www.ptc.com/products/division/index.htm.
9. ViScape, available at http://wwrsv.superscape.com.
10. Mark, W. R. et al. Adding force feedback to graphics system: Issues and solutions. In Proceedings of SIGGRAPH'96, pp. 447-452,1996.
11. Brochure of SensAble Technologies. GHOST: General Haptics Open Software Toolkit, 1996.
12. Iwata, H., and Hayakawa, K. Force display for grasping virtual object by three fingers. In
Proceedings of the Eleventh Symposium on Human Interface, pp. 395-400,1995. [In Japanese.]
13. Iwata, H., and Ichigaya, A. Haptic screen. In Proceedings of the Virtual Society of Japan Annual
Conference, Vol. 1, pp. 7-10, 1996. [In Japanese.]
572
HIROOIWATA
14. Iwata, H., and Noma, H. Volume haptization. In Proceedings of IEEE Symposium on Research
Frontiers in Virtual Reality, pp. 91-96, 1993.
15. Asano, T., Yano, H., and Iwata, H. Basic technology of simulation system for laparoscopic
surgery in virtual environment with force display. In Medicine Meets Virtual Reality, pp. 2 0 7 215, lOS Press.
16. Hashimoto, W., and Iwata, H. Multi-dimensional data browser with haptic sensation. In Transactions of the Virtual Reality Society of Japan, Vol. 2, No. 3, pp. 9-16, 1997. [In Japanese.]
17. Hashimoto, W., and Iwata, H. Haptic representation of non-invasive region for surgery based
on volumetric Data. In Transactions of the Virtual Reality Society of Japan, Vol. 3, No. 4,
pp. 197-202, 1998. [In Japanese.]
DATA COMPRESSION
IN INFORMATION
RETRIEVAL SYSTEMS
S H M U E L T O M I KLEIN
Department of Computer Science, Bar-Ilan University,
Ramat Can 52900, Israel
I. INTRODUCTION 573
II. TEXT COMPRESSION 579
A. Huffman Coding 582
B. Huffman Coding without Bit Manipulations 585
C. Space-Efficient Decoding of Huffman Codes 595
D. Arithmetic Coding 602
E. Dictionary-Based Text Compression 604
III. DICTIONARIES 607
IV. CONCORDANCES 609
A. Using Variable-Length Fields 611
B. Model-Based Concordance Compression 618
V BITMAPS 622
A. Usefulness of Bitmaps in IR 622
B. Compression of Bitmaps 624
VI. FINAL REMARKS 631
REFERENCES 631
INTRODUCTION
As can be seen from the title, we shall concentrate on techniques that are at the
crossroads of two disciplines: data compression (DC) and information retrieval
(IR). Each of these encompass a large body of knowledge that has evolved over
the past decades, each with its own philosophy and its own scientific community.
Nevertheless, their intersection is particularly interesting, the various files of
large full-text IR systems providing a natural testbed for new compression
methods, and DC enabling the proliferation of improved retrieval algorithms.
A chapter about data compression in a book published at the beginning
of the twenty-first century might at a first glance seem anachronistic. Critics
will say that storage space is getting cheaper every day; tomorrow it will be
Database and Data Communication Network Systems, Vol. 2
Copyright 2002, Elsevier Science (USA). All rights reserved.
5 73
574
SHMUELTOMI KLEIN
almost given for free, so who needs complicated methods to save a few bytes.
What these critics overlook is that for data storage, supply drives demand:
our appetite for getting ever-increasing amounts of data into electronic storage
grows just as steadily as does the standard size of the hard disk in our current
personal computer. Most users know that whatever the size of their disks, they
will fill up sooner or later, and generally sooner than they wish.
However, there are also other benefits to be gained from data compression,
beyond the reduction of storage space. One of the bottlenecks of our computing
systems is still the slow data transfer from external storage devices. Similarly,
for communication applications, the problem is not storing the data but rather
squeezing it through some channel. However, many users are competing for the
same limited bandwidth, effectively reducing the amount of data that can be
transferred in a given time span. Here, DC may help reduce the number of I/O
operations to and from secondary memory, and for communication it reduces
the actual amount of data that must pass through the channel. The additional
time spent on compression and decompression is generally largely compensated
for by the savings in transfer time.
For these reasons, research in DC is not dying out, but just the opposite is
true, as evidenced by the recent spurt of literature in this area. An international
Data Compression Conference has convened annually since 1990, and many
journals, including even popular ones such as Byte, Dr. Dobbs, IEEE Spectrum,
Datamation, PC Magazine, and others, have repeatedly published articles on
compression recently.
It is true that a large part of the research concentrates on image compression. Indeed, pictorial data are storage voracious so that the expected profit of
efficient compression is substantial. The techniques generally applied to images
belong to the class of lossy compression, because they concentrate on how to
throw away part of the data, without too much changing its general appearance. For instance, most humans do not really see any difference between a
picture coded with 24 bits per pixel, allowing more than 16 million colors, and
the same picture recoded with 12 bits per pixel, giving "only" about 4000 different color shades. Of course, most image compression techniques are much
more sophisticated, but we shall not deal with them in the present survey. The
interested reader is referred to the large literature on lossy compression, e.g., [1].
Information retrieval is concerned, on the one hand, with procedures
for helping a user satisfy his information needs by facilitating his access to
large amounts of data, on the other hand, with techniques for evaluating his
(dis)satisfaction with whatever data the system provided. We shall concentrate
primarily on the algorithmic aspects of IR. A functional full-text retrieval system is constituted of a large variety of files, most of which can and should be
compressed. Some of the methods described below are of general applicability,
and some are specially designed for an IR environment.
Full-text information retrieval systems may be partitioned according to the
level of specificity supported by their queries. For example, in a system operating
at the document level, queries can be formulated as to the presence of certain
keywords in each document of the database, but not as to their exact locations
within the document. Similarly, one can define the paragraph level and sentence
level, each of which is a refinement of its predecessor. The highest specificity
575
level is the word level, in which the requirement is that the keywords appear
within specified distances of each other. With such a specificity level, one could
retrieve all the occurrences of A and B such that there are at least two but at
most five words between them. In the same way, the paragraph and sentence
levels permit also appropriate distance constraints; e.g., at the sentence level one
could ask for all the occurrences of A and B in the same or adjacent sentences.
Formally, a typical query consists of an optional level indicator, m keywords, and m 1 distance constraints, as in
level: Ai (/i, ui) Ai (/i, wi) A^_i(/
(1)
The li and ui are (positive or negative) integers satisfying // < ui for 1 < / < m,
with the couple (//, ui) imposing lower and upper limits on the distance from Ai
to A/+1. Negative distance means that A/+i may appear before Ai in the text.
The distance is measured in words, sentences, or paragraphs, as prescribed by
the level indicator. In case the latter is omitted, word level is assumed; in this
case, constraints of the form A (1,1) B (meaning that A should be followed
immediately by B) are omitted. Also, if the query is on the document level, then
the distances are meaningless and should be omitted (the query degenerates
then into a conjunction of the occurrences of all the keywords in the query).
In its simplest form, the keyword A/ is a single word or a (usually very
small) set of words given expUcitly by the user. In more complex cases a keyword Ai in (1) will represent a set of words A/ = U/Li A / J all of which are
considered synonymous to A/ in the context of the given query. For example,
a variable-length-don't-care-character * can be used, which stands for an arbitrary, possibly empty, string. This allows the use of prefix, suffix, and infix
truncation in the query. Thus Ai could be comput*, representing, among others,
the words computer, computing, computerize, etc., or it could be *mycin,
which retrieves a large class of antibiotics; infix truncation also can be useful
for spelling foreign names, such as Ba*tyar, where * could be matched by h, k,
kh, ch, sh, sch, etc.
Another possibility for getting the variants of a keyword is from the use
of a thesaurus (month representing January, February, etc.), or from some
morphological processing (do representing does, did, done, etc.). Although
these grammatical variants can be easily generated in some languages with
simple morphology like English, sophisticated linguistic tools are needed for
languages such as Hebrew, Arabic, and many others. One of the derivatives
of the 2-character word daughter in Hebrew, for example, is a 10-character
string meaning and when our daughters, and it shares only one common letter
with its original stem; a similar phenomenon occurs in French with the verb
f a i r e , for example.
For all these cases, the families A/ are constructed in a preprocessing stage.
Algorithms for generating the families identified by truncated terms can be
found in [2], and for the families of grammatical variants in [3].
This general definition of a query with distance constraints allows great
flexibility in the formulation of the query. For example, the query solving
(1,3) d i f f e r e n t i a l equations will retrieve sentences containing solving
d i f f e r e n t i a l equations, as well as solving t h e s e d i f f e r e n t i a l equations
and solving t h e r e q u i r e d d i f f e r e n t i a l equations, but not solving
576
577
of the occurrence. For every word W, let C(W) be the ordered Hst of the coordinates of all its occurrences in the text. The problem of processing a query
of type (1) consists then, in its most general form, of finding all the m-tuples
(^1, . . . , Ufn) of coordinates satisfying
V/G {!,..., m}
3j e {!,.,.,Hi]
with
^,GC(A;)
and
li < d(ai, di^i) < Ui
where d{x^ y) denotes the distance from x to y on the given level. Every m-tuple
satisfying these two equations is called a solution.
In the inverted files approach, processing (1) does not involve directly the
original text files, but rather the auxiliary dictionary and concordance files. The
concordance contains, for each distinct word W in the database, the ordered
list C(W) of all its coordinates in the text; it is accessed via the dictionary that
contains for every such word a pointer to the corresponding list in the concordance. For each keyword Ai in (1) and its attached variants A^y, the lists C(A^y)
are fetched from the concordance and merged to form the combined list C( A/).
Beginning now with Ai and A2, the two Hsts C(Ai) and C(A2) are compared,
and the set of all pairs of coordinates (ai^ ai) that satisfy the given distance constraints (/i, ^1) at the appropriate level is constructed. (Note that a unique ai can
satisfy the requirements with different ai, and vice versa.) C( A2) is now purged
from the irrelevant coordinates, and the procedure is repeated with A2 and A3,
resulting in the set {(^1, ai, as)} of partial solutions of (1). Finally, when the last
keyword A^ is processed in this way, we have the required set of solutions.
Note that it is not really necessary to always begin the processing with the
first given keyword Ai in (1), going all the way in a left-to-right mode. In some
cases, it might be more efficient to begin it with a different keyword Ay, and to
proceed with the other keywords in some specified order.
The main drawback of the inverted files approach is its huge overhead: the
size of the concordance is comparable to that of the text itself and sometimes
larger. For the intermediate range, a popular technique is based on assigning
signatures to text fragments and to individual words. The signatures are then
transformed into a set of bitmaps, on which Boolean operations, induced by
the structure of the query, are performed. The idea is first to effectively reduce
the size of the database by removing from consideration segments that cannot
possibly satisfy the request, then to use pattern-matching techniques to process
the query, but only over thehopefully smallremaining part of the database
[4]. For systems supporting retrieval only at the document level, a different
approach to query processing might be useful. The idea is to replace the concordance of a system with i documents by a set of bitmaps of fixed length t.
Given some fixed ordering of the documents, a bitmap B(W) is constructed for
every distinct word W of the database, where the ith bit of B( W) is 1 if W
occurs in the /th document, and is 0 otherwise. Processing queries then reduces
to performing logical OR/AND operations on binary sequences, which is easily
done on most machines, instead of merge/collate operations on more general
sequences. Davis and Lin [5] were apparently the first to propose the use of
bitmaps for secondary key retrieval. It would be wasteful to store the bitmaps
in their original form, since they are usually very sparse (the great majority of
578
SHMUELTOMI KLEIN
the words appear in very few documents), and we shall review various methods
for the compression of such large sparse bit-vectors. However, the concordance
can be dropped only if all the information we need is kept in the bitmaps.
Hence, if we wish to extend this approach to systems supporting queries also
at the paragraph, sentence, or word level, the length of each map must equal
the number of paragraphs, sentences, or words respectively, a clearly infeasible
scheme for large systems. Moreover, the processing of distance constraints is
hard to implement with such a data structure.
In [6], a method in which, basically, the concordance and bitmap approaches are combined is presented. At the cost of marginally expanding the
inverted files' structure, compressed bitmaps are added to the system; these
maps give partial information on the location of the different words in the text
and their distribution. This approach is described in more detail in Section V.
Most of the techniques below were tested on two real-life full-text information retrieval systems, both using the inverted files approach. The first is the
Tresor de la Langue Frangaise (TLF) [7], a database of 680 MB of French language texts (112 miUion words) made up from a variety of complete documents
including novels, short stories, poetry, and essays, by many different authors.
The bulk of the texts are from the 17th through 20th centuries, although smaller
databases include texts from the 16th century and earlier. The other system is
the Responsa Retrieval Project (RRP) [8], 350 MB of Hebrew and Aramaic
texts (60 million words) written over the past ten centuries. For the sake of
conciseness, detailed experimental results have been omitted throughout.
Table 1 shows roughly what one can expect from applying compression
methods to the various files of a full-text retrieval system. The numbers correspond to TLF. Various smaller auxiliary files are not mentioned here, including
grammatical files, and thesauri.
For the given example, the overall size of the system, which was close to
2 Gbytes, could be reduced to fit onto a single CD-Rom.
The organization of this chapter is as follows. The subsequent sections
consider, in turn, compression techniques for the file types mentioned above,
namely, the text, dictionaries, concordances, and bitmaps. For text compression, we first shortly review some background material. While concentrating
on Huffman coding and related techniques, arithmetic coding and dictionarybased text compression are also mentioned. For Huffman coding, we focus in
particular on techniques allowing fast decoding, since decoding is more important than encoding in an information retrieval environment. For dictionary and
H H
TABLE I
File
Full size
(MB)
Compressed size
(MB)
Text
700
245
65
30
18
40
Concordance
400
240
40
Bitmaps
800
40
95
Dictionary
Total
543
(%)
5 79
concordance compression the prefix omission method and various variants are
suggested. Finally, we describe the usefulness of bitmaps for the enhancement
of IR systems and then show how these large structures may in fact be stored
quite efficiently.
The choice of the methods to be described is not meant to be exhaustive.
It is a blend of techniques that reflect the personal taste of the author rather
than some well-established core curriculum in information retrieval and data
compression. The interested reader will find pointers to further details in the
appended references.
580
A
B
C
D
0
01
110
1011
(a)
FIGURE
complete.
A
B
C
D
E
A
B
C
D
E
11
110
1100
1101
11000
(b)
11
Oil
0011
1011
00011
(c)
A
B
C
D
E
1
00
010
0110
0111
(d)
Examples of codes: (a) Non-UD, (b) UD nonprefix, (c) prefix noncomplete, and (d)
about 12, 10, and 8% respectively, while J, Q, and Z occur each with probability less than 0.1%. Similar phenomena can be noted in other languages. The
skewness of the frequency distributions can be exploited if one is ready to abandon the convenience of fixed-length codes, and trade processing ease for better
compression by allowing the codewords to have variable length. It is then easy
to see that one may gain by assigning shorter codewords to the more frequent
characters, even at the price of encoding the rare characters by longer strings,
as long as the average codeword length is reduced. Encoding is just as simple as
with fixed-length codes and still consists in concatenating the codeword strings.
There are however a few technical problems concerning the decoding that must
be dealt with.
A code has been defined above as a set of codewords, which are binary
strings, but not every set of strings gives a useful code. Consider, for example,
the four codewords in column (a) of Fig. 1. If a string of O's is given, it is easily
recognized as a sequence of A's. Similarly, the string 010101 can only be parsed
as BBB. However, the string 010110 has two possible interpretations: 0 | 1011 |
0 = ADA or 01 I 0 I 110 = BAG. This situation is intolerable, because it violates
our basic premise of reversibility of the encoding process. We shall thus restrict
attention to codes for which every binary string obtained by concatenating
codewords can be parsed only in one way, namely into the original sequence of
codewords. Such codes are called uniquely decipherable (UD).
At first sight, it seems difficult to decide whether a code is UD, because infinitely many potential concatenations must be checked. Nevertheless, efficient
algorithms solving the problem do exist [9]. A necessary, albeit not sufficient,
condition for a code to be UD is that its codewords should not be too short. A
precise condition has been found by McMillan [10]: any binary UD code with
codewords lengths { i , . . . , } satisfies
E2-''^l-
(2)
i= l
For example, referring to the four codes of Figure 1, the sum is 0.9375,0.53125,
0.53125, and 1 for codes (a) to (d) respectively. Case (a) is also an example
showing that the condition is not sufficient.
However, even if a code is UD, the decoding of certain strings may not be
so easy. The code in column (b) of Fig. 1 is UD, but consider the encoded string
11011111110: a first attempt to parse it as 110 | 11 | 11 | 11 | 10 = BAAAIO
would fail, because the tail 10 is not a codeword; hence only when trying to
58 I
decode the fifth codeword do we reaHze that the first one is not correct, and
that the parsing should rather be 1101 | 11 | 11 | 110 = DAAB. In this case, a
codeword is not immediately recognized as soon as all its bits are read, but
only after a certain delay. There are codes for which this delay never exceeds a
certain fixed number of bits, but the example above is easily extended to show
that the delay for the given code is unbounded.
We would like to be able to recognize a codeword as soon as all its bits are
processed, that is, with no delay at all; such codes are called instantaneous, A
special class of instantaneous codes is known as the class of prefix codes: a code
is said to have the prefix property, and is hence called a prefix code, if none
of its codewords is a prefix of any other. It is unfortunate that this definition
is misleading (shouldn't such a code be rather called a nonprefix code.^), but
it is widespread and therefore we shall keep it. For example, the code in Fig.
1(a) is not prefix because the codeword for A (0) is a prefix of the codeword
for B (01). Similarly, the code in (b) is not prefix, since all the codewords start
with 11, which is the codeword for A. On the other hand, codes (c) and (d) are
prefix.
It is easy to see that any prefix code is instantaneous and therefore UD.
Suppose that while scanning the encoded string for decoding, a codeword x
has been detected. In that case, there is no ambiguity as in the example above
for code (b), because if there were another possible interpretation y that can
be detected later, it would imply that x is a prefix of y, contradicting the prefix
property.
In our search for good codes, we shall henceforth concentrate on prefix
codes. In fact, we incur no loss by this restriction, even though the set of prefix
codes is a proper subset of the UD codes: it can be shown that given any UD
code whose codeword lengths are {i, . . . , ^}, one can construct a prefix code
with the same set of codeword lengths [11]. As example, note that the prefix
code (c) has the same codeword lengths as code (b). In this special case, (c)'s
codewords are obtained from those of code (b) by reversing the strings; now
every codeword terminates in 11, and the substring 11 occurs only as suffix
of any codeword. Thus no codeword can be the proper prefix of any other.
Incidently, this also shows that code (b), which is not prefix, is nevertheless UD.
There is a natural one-to-one correspondence between binary prefix codes
and binary trees. Let us assign labels to the edges and vertices of a binary tree
in the following way:
every edge pointing to a left child is assigned the label 0, and every edge
pointing to a right child is assigned the label 1;
the root of the tree is assigned the empty string A;
every vertex v of the tree below the root is assigned a binary string,
which is obtained by concatenating the labels on the edges of the path
leading from the root to vertex v.
It follows from the construction that the string associated with vertex i/ is a
prefix of the string associated with vertex w if and only if z/ is a vertex on the
path from the root to w. Thus, the set of strings associated with the leaves of any
binary tree satisfies the prefix property and may be considered as a prefix code.
Conversely, given any prefix code, one can easily construct the corresponding
582
noi)
FIGURE 2
binary tree. For example, the tree corresponding to the code {11, 101, 001,
000} is depicted in Fig. 2.
The tree corresponding to a code is a convenient tool for decompression.
One starts with a pointer to the root and another one to the encoded string,
which acts as a guide for the traversal of the tree. While scanning the encoded
string from left to right, the tree-pointer is updated to point to the left, respectively right, child of the current node, if the next bit of the encoded string is a 0,
respectively a 1. If a leaf of the tree is reached, a codeword has been detected,
it is sent to the output, and the tree-pointer is reset to point to the root.
Note that not all the vertices of the tree in Fig. 2 have two children. From the
compression point of view, this is a waste, because we could, in that case, replace
certain codewords by shorter ones, without violating the prefix property, i.e.,
build another UD code with a strictly smaller average codeword length. For
example, the node labeled 10 has only a right child, so the codeword 101
could be replaced by 10; similarly, the vertex labeled 0 has only a left child, so
the codewords 000 and 001 could be replaced by 00 and 01, respectively. A
tree for which all internal vertices have two children is called a complete tree,
and accordingly, the corresponding code is called a complete code. A code is
complete if and only if the lengths {li} of its codewords satisfy Eq. (2) with
equality, i.e., Y!i=\ ^~^' = 1A. Huffman Coding
To summarize what we have seen so far, we have restricted the class of codes
under consideration in several steps. Starting from general UD codes, we have
passed to instantaneous and prefix codes and finally to complete prefix codes,
since we are interested in good compression performance. The general problem
can thus be stated as follows: we are given a set of n nonnegative weights
{w^i,..., w/}, which are the frequencies of occurrence of the letters of some
alphabet. The problem is to generate a complete binary variable-length prefix
code, consisting of codewords with lengths ii bits, 1 < / < w, with optimal
compression capabilities, i.e., such that the total length of the encoded text
n
(=1
583
length (ACL) Y.U Pi^iLet us for a moment forget about the interpretation of the Ij as codeword
lengths, and try to solve the minimization problem analytically without restricting the Ij to be integers, but still keeping the constraint that they must satisfy
the McMillan equality Yll=i ^~^' = 1. To find the set of ^/'s minimizing (3), one
can use Langrange multipliers. Define a function L(i, ...,) oi n variables
and with a parameter k by
which yields
To find the constant X, substitute the values for ; derived in (4) in the McMillan
equality
t=l
1=1
from which one can derive k = W/ln 2. Plugging this value back into (4), one
finally gets
, = - l o g 2 ( ^ ) = - l o g 2 A .
This quantity is known as the information content of a symbol with probability
/?/, and it represents the minimal number of bits in which the symbol could be
coded. Note that this number is not necessarily an integer. Returning to the sum
in (3), we may therefore conclude that the lower limit of the total size of the
encoded file is given by
Pi 1^S2 Pi 1
(5)
The quantity H= Yll=i Pi log2 pi has been defined by Shannon [12] as the
entropy of the probability distribution { p i , . . . , p}, and it gives a lower bound
to the weighted average codeword length.
In 1952, Huffman [13] proposed the following algorithm which solves the
problem:
1. If w = 1, the codeword corresponding to the only weight is the
null-string; return.
2. Let wi and W2, without loss of generality, be the two smallest weights.
584
In the straightforward implementation, the weights are first sorted and then
every weight obtained by combining the two that are currently the smallest is
inserted in its proper place in the sequence so as to maintain order. This yields
an 0(n^) time complexity. One can reduce the time complexity to O(nlogn)
by using two queues, one, Qi, containing the original elements, the other, Qi,
the newly created combined elements. At each step, the two smallest elements
in Qi U Q2 are combined and the resulting new element is inserted at the end
of Q25 which remains in order [14].
THEOREM.
CLAIM 3. Without loss of generality one can assume that the smallest
weights w\ and wi correspond to sibling nodes in T\.
585
P and assigning the weight w\ to ^'s left child and W2 to its right child. Then
the ACL for T^ is
Ad4 = M3 + (w/i + wi) < Ml + (wi + W2) = Ml,
which is impossible, since 74 is a tree for n elements with weights wi,.. .,Wn
and T[ is optimal among all those trees.
Using the inductive assumption, Ti, which is an optimal tree ior n1
elements, has the same ACL as the Huffman tree for these weights. However, the
Huffman tree for w^i,..., u/ is obtained from the Huffman tree for (wi + M/2),
W3,'' -yWn in the same way as Ti is obtained from T2. Thus the Huffman tree
for the n elements has the same ACL as Ti; hence it is optimal.
B. Huffman Coding without Bit Manipulations
In many applications, compression is by far not as frequent as decompression.
In particular, in the context of static IR systems, compression is done only
once (when building the database), whereas decompression directly affects the
response time for on-line queries. We are thus more concerned with a good
decoding procedure. Despite their optimality, Huffman codes are not always
popular with programmers as they require bit manipulations and are thus not
suitable for smooth programming and efficient implementation in most highlevel languages.
This section presents decoding routines that directly process only bit blocks
of fixed and convenient size (typically, but not necessarily, integral bytes), making it therefore faster and better adapted to high-level languages programming,
while still being efficient in terms of space requirements. In principle, byte decoding can be achieved either by using specially built tables to isolate each bit
of the input into a corresponding byte or by extracting the required bits while
simulating shift operations.
I. Eliminating the Reference to Bits
We are given an alphabet E, the elements of which are called letters, and a
message (= sequence of elements of U) to be compressed, using variable-length
codes. Let L denote the set of N items to be encoded. Often L = S, but we do
not restrict the codewords necessarily to represent single letters of S. Indeed,
the elements of L can be pairs, triplets, or any -grams of letters, they can
represent words of a natural language, and they can finally form a set of items
of completely different nature, provided that there is an unambiguous way to
decompose a given file into these items (see, for example, [15]). We call L an
alphabet and its elements characters, where these terms should be understood
in a broad sense. We thus include also in our discussion applications where N,
the size of the alphabet, can be fairly large.
We begin by compressing L using the variable-length Huffman codewords
of its different characters, as computed by the conventional Huffman algorithm.
We now partition the resulting bit string into fe-bit blocks, where k is chosen
so as to make the processing of fe-bit blocks, with the particular machine and
high-level language at hand, easy and natural. Clearly, the boundaries of these
blocks do not necessarily coincide with those of the codewords: a fe-bit block
586
SHMUELTOMI KLEIN
may contain several codewords, and a codeword may be split into two (or
more) adjacent ^-bit blocks. As an example, let L = {A, B, C, D}, with codewords
{0,11,100,101} respectively, and choose k = 3, Consider the following input
string, its coding and the coding's partition into 3-bit blocks:
1 1
The last line gives the integer value 0 < / < 2^ of the block.
The basic idea for all the methods is to use these fe-bit blocks, which can
be regarded as the binary representation of integers, as indices to some tables
prepared in advance in the preprocessing stage.
In this section we first describe two straightforwardalbeit not very
efficientmethods for implementing this idea.
For the first method, we use a table B of 2^ rows and k columns. In fact, B
will contain only zeros and ones, but as we want to avoid bit manipulations, we
shall use one byte for each of the kl^ elements of this matrix. Let i = Ii - - Ik
be the binary representation of length k (with leading zeros) of /, for 0 < / < 2^,
then B(/, /') = I/, for 1 < ; < ^; in other words, the /th line of B contains
the binary representation of /, one bit per byte. The matrix B will be used to
decompose the input string into individual bits, without any bit manipulation.
Figure 3(a) depicts the matrix S for ^ = 3.
The value 0 or 1 extracted from B is used to decode the input, using the
Huffman tree of the given alphabet. The Huffman tree of the alphabet L of our
small example is in Fig. 4(a).
A Huffman tree with N leaves (and N 1 internal nodes) can be kept as
a table H with N 1 rows (one for each internal node) and two columns.
The internal nodes are numbered from 0 to N 2 in arbitrary order, but for
convenience the root will always be numbered zero. For example, in Fig. 4(a),
B
0
1
2
3
4
5
6
7
1
0
0
0
0
1
1
1
1
2 3
0 0
0 1
1 0
1 1
0 0
0 1
1 0
1 1
(a)
0
1
2
3
4
5
6
7
1
0
2
4
6
0
2
4
6
(b)
2
0
0
0
0
1
1
1
1
587
-A
-B
-C
-D
(b)
(a)
FIGURE 4
Example of Huffman code, (a) Tree form and (b) table form.
if newind > 0
else
then
ind ^^ newind
output(newind)
ind <- 0
until
end
input is exhausted
H(ind,S(n,2))
5(^,1).
588
The first statement extracts the leftmost bit and the second statement shifts
the fe-bit block by one bit to the left. Figure 3(b) shows Table S ior k = 3.
Hence we have reduced the space needed for the tables from k2^ + 2(N 1)
to 2^"^^ + 2(N 1), but now there are three table accesses for every bit of the
input, instead of only two accesses for the first method.
Although there is no reference to bits in these algorithms and their programming is straightforward, the number of table accesses makes their efficiency rather doubtful; their only advantage is that their space requirements
are linear in N (kis a constant), while for all other time-efficient variants to
be presented below, space is at least Q(N log N). However, for these first two
methods, the term 2^ of the space complexity is dominant for small N, so that
they can be justifiedif at allonly for rather large N,
2. Partial-Decoding Tables
Recall that our goal is to permit a block-per-block processing of the input
string for some fixed block-size k. Efficient decoding under these conditions
is made possible by using a set of m auxiliary tables, which are prepared in
advance for every given Huffman code, whereas Tables B and S above were
independent of the character distribution.
The number of entries in each table is 2^, corresponding to the 2^ possible
values of the ^-bit patterns. Each entry is of the form (W, / ) , where W is a
sequence of characters and / (0 < / < m) is the index of the next table to be
used. The idea is that entry /, 0 < / < 2^, of Table 0 contains, first, the longest
possible decoded sequence W of characters from the fe-bit block representing
the integer / (W may be empty when there are codewords of more than k bits).
Usually some of the last bits of the block will not be decipherable, being the
prefix P of more than one codeword; / will then be the index of the table
corresponding to that prefix (if ? = A, then / = 0). Table / is constructed in
a similar way except for the fact that entry / will contain the analysis of the
bit pattern formed by the prefixing of P to the binary representation of /. We
thus need a table for every possible proper prefix of the given codewords; the
number of these prefixes is obviously equal to the number of internal nodes of
the appropriate Huffman tree (the root corresponding to the empty string and
the leaves corresponding to the codewords), so that m= N 1.
More formally, let Py, 0 < / < N 1, be an enumeration of all the proper
prefixes of the codewords (no special relationship needs to exist between ; and
Py, except for the fact that PQ = A). In Table / corresponding to Py, the /th
entry, T(/,/), is defined as follows: let B be the bit string composed of the
juxtaposition of Py to the left of the ^-bit binary representation of /. Let Wbe
the (possibly empty) longest sequence of characters that can be decoded from
B, and Pi the remaining undecipherable bits of B; then T(/, i) = (W, I),
Referring again to the simple example given above, there are three possible
proper prefixes: A,1,10, hence three corresponding tables indexed 0,1,2 respectively, and these are given in Fig. 5. The column headed "Pattern" contains for
every entry the binary string decoded in Table 0; the binary strings decoded by
Tables 1 and 2 are obtained by prefixing " 1 , " respectively "10," to the strings
in "Pattern."
Table 0
Pattern
Entry
0
1
2
3
4
5
6
7
FIGURE 5
589
Table 2
Table 1
for Table 0
i.
000
kkk
CA
CAA
001
010
Oil
100
101
110
111
kk
A
AB
C
D
BA
B
1
2
0
0
0
0
1
C
DA
D
BAA
BA
B
BB
1
0
1
0
1
2
0
CA
CB
DAA
DA
DB
Partial-decoding tables.
For the input example given above, we first access Table 0 at entry 1, which
yields the output string AA, Table 1 is then used with entry 6, giving the output
B, and finally Table 2 at entry 7 gives output DB.
The utterly simple decoding subroutine (for the general case) is as follows
(M(i) denotes the ith block of the input stream, / is the index of the table
currently being used, and T(/, I) is the th entry of table /):
Basic Decoding Algorithm
/ ^ 0
for / ^ 1 to length of input
(output, /) ^T(j,M(i))
end
do
As mentioned before, the choice of k is largely governed by the machineword structure and the high-level language architecture. A natural choice in
most cases would be ^ = 8, corresponding to a byte context, but k = 4 (halfbyte) or ^ = 16 (half-word) are also conceivable. The larger k is, the greater the
number of characters that can be decoded in a single iteration, thus transferring
a substantial part of the decoding time to the preprocessing stage. The size of the
tables, however, grows exponentially with k, and with every entry occupying
(for N < 256 and fe = 8) 1 to 8 bytes, each table may require between 1 and 2
Kbytes of internal memory. For N > 256, we need more than one byte for the
representation of a character, so that the size of a table will be even larger, and
for larger alphabets these storage requirements may become prohibitive. We
now develop an approach that can help reduce the number of required tables
and their size.
3. Reducing the Number of Tables: Binary Forests
590
conventional Huffman decoding algorithm no more than once for every block,
while still processing only fe-bit blocks. This is done by redefining the tables
and adding some new data structures.
Let us suppose, just for a moment, that after deciphering a given block
B of the input that contains a "remainder" P (which is a prefix of a certain
codeword), we are somehow able to determine the correct complement of P
and its length , and accordingly its corresponding encoded character. More
precisely, since a codeword can extend into more than two blocks, I will be
the length of the complement of P in the next i^-bit block which contains also
other codewords; hence 0 < I < kAn the next iteration (decoding of the next
fe-bit block not yet entirely deciphered), table number I will be used, which is
similar to Table 0, but ignores the first i bits of the corresponding entry, instead
of prefixing P to this entry as in the previous section.
Therefore the number of tables reduces from N-1 (about 30 in a typical
single-letter natural-language case, or 700-900 if we use letter pairs) to only
fe (8 or 16 in a typical byte or half-word context), where entry / in Table
, 0 < < fe, contains the decoding of the k I rightmost bits of the binary
representation of /. It is clear, however, that Table 1 contains two exactly equal
halves, and in general table 1(0 < I < k) consists of 2^ identical parts. Retaining
then in each table only the first 2^~^ entries, we are able to compress the needed
k tables into the size of only two tables. The entries of the tables are again of
the form (W, /); note however that ; is not an index to the next table, but an
identifier of the remainder P. It is only after finding the correct complement of
P and its length i we can access the right table i.
For the same example as before one obtains the tables of Fig. 6, where table t
decodes the bit strings given in "Pattern," but ignoring the t leftmost bits, ? = 0,
1, 2, and / = 0, 1, 2 corresponds respectively to the proper prefixes A, 1, 10.
The algorithm will be completed if we can find a method for identifying
the codeword corresponding to the remainder of a given input block, using
of course the following input block(s). We introduce the method through an
example.
Pattern
Entry
for Table 0
Table 0
Table 1
000
AAA
AA
1
2
3
4
5
6
7
001
010
Oil
100
101
110
111
AA
A
AB
C
D
BA
B
1
2
0
0
0
0
1
A
B
1
2
0
FIGURE 6
Table 2
C
FIGURE?
59I
592
n
v^n
0 0000000
10000000
0 0000000
00 000000
@i
0 0000000
FIGURE 8
1 0000000
01 000000
0-CZ
10000000
0 0000000
10000000
String in node v is denoted by VAL(t/). Figure 8 depicts the forest obtained from
the tree of our example, where the pointer to each tree is symboHzed by the
corresponding proper prefix. The idea is that the identifier of the remainder in
an entry of the tables described above is in fact a pointer to the corresponding
tree. The traversal of this tree is guided by the bits of the next )^-bit block of
the input, v^hich can directly be compared w^ith the contents of the nodes of the
tree, as will be described below.
Consider now also the possibility of long codewords, which extend over
several blocks. They correspond to long paths so that the depth of some trees
in the forest may exceed k. During the traversal of a tree, passing from one level
to the next lowest one is equivalent to advancing one bit in the input string.
Hence when the depth exceeds fe, all the bits of the current fe-bit block were
used, and we pass to the next block. Therefore the above definition of VAL(i/)
applies only to nodes on levels up to k; this definition is generalized to any node
by the following: VAL(i/) for a node v on level /, with ik< j < (i + l)k, i > 0,
is the concatenation of the labels on the edges on the path from level iktov.
In the second step, we compress the forest as could have been done with
any Huffman tree. In such trees, every node has degree 0 or 2; i.e., they appear
in pairs of siblings (except the root). For a pair of sibling nodes (a, b), WAL(a)
and VAL(fc) differ only in the /th bit, where / is the level of the pair (here
and in what follows, the level of the root of a tree is 0), or more precisely,
; = (level 1) mod ^ + 1. In the compressed tree, every pair is represented by
a unique node containing the VAL of the right node of the pair, the new root is
the node obtained from the only pair in level 1, and the tree structure is induced
by the noncompressed tree. Thus a tree of I nodes shrinks now to (t l ) / 2
nodes. Another way to look at this "compression" method is to take the tree of
internal nodes, and store it in the form of a table as described in the previous
section. We use here a tree-oriented vocabulary, but each tree can equivalently
be implemented as a table. Figure 9 is the compressed form of the forest of
Fig. 8.
We can now compare directly the values VAL stored in the nodes of the
trees with the ife-bit blocks of the Huffman encoded string. The VAL values
have the following property: let i/ be a node on level / of one of the trees in the
compressed forest, with ik < j < (i + 1)^, / > 0 as above, and let I{B) be the
593
0-
1 0000000
E
^
/
/
(ffi) J1 C0000000
D
01 000000
D
C
^ 1 0000000
FIGURE 9
0-
1 0000000
G
F
Compressed forest.
Thus after accessing one of the trees, the VAL of its root is compared with
the next fe-bit block B. If B, interpreted as a binary integer, is smaller, it must
start with 0 and we turn left; if B is greater or equal, it must start with 1 and
we turn right. These comparisons are repeated at the next levels, simulating
the search for an element in a binary search tree [16, Section 6.2.2]. This leads
to the modified algorithm below. Notations are like before, where ROOT(if)
points to the rth tree of the forest, and every node has three fields: VAL, a ^-bit
value, and LEFT and RIGHT, each of which is either a pointer to the next level
or contains a character of the alphabet. When accessing Table / , the index is
taken modulo the size of the table, which is 2^~L
Revised Decoding Algorithm
i ^ 1
/ ^ 0
repeat
(output, tree-nbr) <- T ( / , S(i) mod 2^~^)
/<-/ + !
until
if tree-nbr / 0 then
input Is exhausted
TRAVERSE ( ROOT(tree-nbr) )
until
end
i <r- i -\-l
[advance to next fe-bit block]
a character was output
594
SHMUELTOMI KLEIN
B
F I G U R E 10
595
Due to the restrictions on the choice of r, there are only few possible values.
For example, for ^ = 8, one could use a quaternary code (r = 2^), where every
codeword has an even number of bits and the number of tables is reduced by
a factor of 3, or a hexadecimal code (r =2^)^ where the codeword length is a
multiple of 4 and the number of tables is divided by 15. Note that for alphabets
with N < 31, the hexadecimal code can be viewed as the classical method using
"restricted variability" (see, for example, [20]): assign 4-bit encodings to the 15
most frequent characters and use the last 4-bit pattern as "escape character"
to indicate that the actual character is encoded in the next 4 bits. Thus up to
16 least frequent characters have 8-bit encodings, all of which have their first
4 bits equal to the escape character.
Referring to the Huffman tree given in Fig. 7, suppose that a character
corresponding to a leaf on level i appears with probability 2~^, then the corresponding 2^-ary tree is given in Fig. 10. Note that the only proper prefixes
of even length are A and 00, so that the number of tables dropped from 6
to 2.
However, with increasing r, compression will get worse, so that the right
tradeoff must be chosen according to the desired application.
C. Space-Efficient Decoding of Huffman Codes
The data structures needed for the decoding of a Huffman encoded file (a
Huffman tree or lookup table) are generally considered negligible overhead
relative to large texts. However, not all texts are large, and if Huffman coding
is applied in connection with a Markov model [21], the required Huffman forest
may become itself a storage problem. Moreover, the "alphabet" to be encoded
is not necessarily small, and may, e.g., consist of all the different words in the
text, so that Huffman trees with thousands and even millions of nodes are not
uncommon [22]. We try here to reduce the necessary internal memory space by
devising efficient ways of encoding these trees. In addition, the suggested data
structure also allows a speed-up of the decompression process, by reducing the
number of necessary bit comparisons.
596
SHMUELTOMI KLEIN
597
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
29
30
31
32
33
61
62
63
64
010]
Oil
looi
101
llOOl
110l|
11101
1111
ooool
0001
OOlOl
0011
010001
01001
OlOlOl
01011
1010101
101011 Ol
10101110|
10101111
lOllOOOOl
iiooiiool
11001101
1100111001
110011101
124
125
126
127
111011001
11101101 ol
1110110110|
1110110111
198
199
111111111 ol
1111111111
FIGURE I I
Let Bs(k) denote the standard s-bit binary representation of the integer k
(with leading zeros, if necessary). Then the /th codeword of length /, for
; = 0 , 1 , . . . , ; 1, is Bi(base(i) + / ) . Let seq(i) be the sequential index of
the first codeword of length /:
seq(m) = 0
seq(i) = seq(i 1) + rii-i
598
n,
base(f)
$eq(/)
diffO)
3
4
5
6
7
8
9
10
1
3
4
8
15
32
63
74
0
2
10
28
72
174
412
950
0
1
4
8
16
31
63
126
0
1
6
20
56
143
349
824
The following small example, using the data above, shows how such
savings are possible. Suppose that while decoding, we detect that the next codeword starts with 1101. This information should be enough to decide that the
following codeword ought to be of length 9 bits. We should thus be able, after
having detected the first 4 bits of this codeword, to read the following 5 bits as a
block, without having to check after each bit if the end of a codeword has been
reached. Our goal is to construct an efficient data structure that permits similar
decisions as soon as they are possible. The fourth bit was the earliest possible
in the above example, since there are also codewords of length 8 starting
with 110.
Decoding with sk-trees
The suggested solution is a binary tree, called below an sk-tree (for skeleton
tree), the structure of which is induced by the underlying Huffman tree, but
which has generally significantly fewer nodes. The tree will be traversed like
a regular Huffman tree. That is, we start with a pointer to the root of the
tree, and another pointer to the first bit of the encoded binary sequence. This
sequence is scanned, and after having read a zero (respectively, a 1), we proceed
to the left (respectively, right) child of the current node. In a regular Huffman
tree, the leaves correspond to full codewords that have been scanned, so the
decoding algorithm just outputs the corresponding item, resets the tree pointer
to the root and proceeds with scanning the binary string. In our case, however,
we visit the tree only up to the depth necessary to identify the length of the
current codeword. The leaves of the sk-tree then contain the lengths of the
corresponding codewords.
The formal decoding process using an sk-tree is depicted in Fig. 12. The
variable start points to the index of the bit at the beginning of the current
599
{
tree_pointer
<
root
< 1
start < 1
while i < length_of_string
{
if string[z] = 0
tree_pointer
else
tree_pointer
if value(tree_pointer)> 0
<
<
left(tree_pointer)
right(tree_pointer)
{
codeword < string[start (start + value(tree_pointer) 1)]
output i table[ I{codeword)diff[ value(treepointer)] ]
tree_pointer < root
start < start + value(tree_pointer)
i < start
else
i < i + 1
}
}
I ^ ^ B FIGURE 12 Decoding procedure using sk-tree.
codeword in the encoded string, which is stored in the vector string[]. Each
node of the sk-tree consists of three fields: a left and a right pointer, which are
not null if the node is not a leaf, and a value field, which is zero for internal
nodes, but contains the length in bits of the current codeword, if the node is a
leaf. In an actual implementation, we can use the fact that any internal node
has either zero or two sons, and store the value-&eld and the right-field in the
same space, with left = null serving as flag for the use of the right pointer. The
procedure also uses two tables: table[j]^ 0 < / < w, giving the /th element (in
nonincreasing order of frequency) of the encoded alphabet; and diff[i] defined
above, for / varying from m to k, that is, from the length of the shortest to the
length of the longest codeword.
The procedure passes from one level in the tree to the one below according
to the bits of the encoded string. Once a leaf is reached, the next codeword
can be read in one operation. Note that not all the bits of the input vector are
individually scanned, which yields possible time savings.
Figure 13 shows the sk-tree corresponding to Zipf's distribution for n =
200. The tree is tilted by 45, so that left (right) children are indicated by
arrows pointing down (to the right). The framed leaves correspond to the last
codewords of the indicated length. The sk-tree of our example consists of only
49 nodes, as opposed to 399 nodes of the original Huffman tree.
Construction ofskrtrees
600
-10
^-6
f
9
^-8
--r-^
7
^-9
-*r
-40
-r^
9
-^-10
full subtrees of depth h > 1 have been pruned. A more direct and much more
efficient construction is as follows.
The one-to-one correspondence between the codewords and the paths from
the root to the leaves in a Huffman tree can be extended to define, for any binary
string S = si Sg, the path P(S) induced by it in a tree with given root TQ. This
path will consist oi e -\-l nodes r/, 0 < / < e, where for / > 0, r^ is the left
(respectively, right) child of r^_i, if s/ = 0 (respectively, if s/ = 1). For example,
in Fig. 13, P ( l l l ) consists of the four nodes represented as bullets in the top
line. The skeleton of the sk-tree will consist of the paths corresponding to the
last codeword of every length. Let these codewords be denoted by L/, m < / < k;
they are, for our example, 000, 0100, 01101,100011, etc. The idea is that P(Li)
serves as "demarcation line": any node to the left (respectively, right) of P(Li)y
i.e., a left (respectively, right) child of one of the nodes in P(Li), corresponds
to a prefix of a codeword with length < / (respectively, > /).
As a first approximation, the construction procedure thus takes the tree
obtained by |J/=m ^(^i) (there is clearly no need to include the longest codeword
Lk, which is always a string of k I's); and adjoins the missing children to turn it
into a complete tree in which each internal node has both a left and a right child.
The label on such a new leaf is set equal to the label of the closest leaf following it
in an in-order traversal. In other words, when creating the path for L^, one first
follows a few nodes in the already existing tree, then one branches off creating
new nodes; as to the labeling, the missing right child of any node in the path will
be labeled / + 1 (basing ourselves on the assumption that there are no holes),
but only the missing left children of any new node in the path will be labeled /.
A closer look then implies the following refinement. Suppose a codeword
Lj has a zero in its rightmost position, i.e., L/ = ofO for some string a of length
/ 1. Then the first codeword of length / + 1 is a 10. It follows that only when
getting to the / th bit can one decide if the length of the current codeword is / or
/ + 1. However, if Li terminates in a string of I's, L/ = ^01"^, with a > 0 and
\P\-\-a = i - 1, then the first codeword of length / + 1 is ^10"^"^^ so the length
of the codeword can be deduced already after having read the bit following ^. It
follows that one does not always need the full string U in the sk-tree, but only
its prefix up to and not including the rightmost zero. Let L* = ^ denote this
prefix. The revised version of the above procedure starts with the tree obtained
by U t m P(Li)' The nodes of this tree are depicted as bullets in Fig. 13. For
each path P(L*) there is a leaf in the tree, and the left child of this leaf is the new
60 I
To evaluate the size of the sk-tree, we count the number of nodes added
by path P(LJ), for m < i < k. Since the codewords in a canonical code, when
ordered by their corresponding frequencies, are also alphabetically sorted, it
suffices to compare Li to L^_i. Let y(m) = 0, and for / > m, let y(i) be the
longest common prefix of Lj and L/_i, e.g., y(7) is the string 10 in our example.
Then the number of nodes in the sk-tree is given by
size = 2 f X]max(0, |L*| -
|K(/)|)
J - 1,
since the summation alone is the number of internal nodes (the bullets in
Fig. 13).
The maximum function comes to prevent an extreme case in which the
difference might be negative. For example, if L6 = 010001 and Ly = 0101111,
the longest common prefix is y (7) = 010, but since we consider only the bits up
to and not including the rightmost zero, we have Ly = 01. In this case, indeed,
no new nodes are added for P(Ly),
An immediate bound on the number of nodes in the sk-tree is
since on the one hand, there are up to ^ 1 paths P(L*) of lengths < k 2,
but on the other hand, it cannot exceed the number of nodes in the underlying
Huffman tree, which is 2w 1. To get a tighter bound, consider the nodes in the
upper levels of the sk-tree belonging to the full binary tree F with k 1 leaves
and having the same root as the sk-tree. The depth of f is ^ = nog2(^ 1)1?
and all its leaves are at level d or d 1, The tree F is the part of the sk-tree
where some of the paths P(L*) must be overlapping, so we account for the
nodes in F and for those below separately. There are at most 2^1 nodes in
F; there are at most k 1 disjoint paths below it, with path P(L*) extending at
most / 2 [logii^ 1)J nodes below P, for log2 (k 1) < i <k. This yields
as bound for the number of nodes in the sk-tree
/^-2-Llog2(^-l)J \
2^ + 2f
Y.
0 =2^+(^-2-Llog2(^-l)J)(^-l-Llog2(^-l)J).
There are no savings in the worst case, e.g., when there is only one codeword
of each length (except for the longest, for which there are always at least two).
More generally, if the depth of the Huffman tree is ^( w), the savings might not be
significant, but such trees are optimal only for some very skewed distributions.
In many applications, like for most distributions of characters or character pairs
or words in most natural languages, the depth of the Huffman tree is 0(log ),
and for large w, even the constant c, if the depth is c log2 w, must be quite small.
For suppose the Huffman tree has a leaf on depth d. Then by [30, Theorem 1],
the probability of the element corresponding to this leaf is /? < 1/F^+i, where
Fj is the /th Fibonacci number, and we get from [17, Exercise 1.2.1-4] that
p < ( l / 0 ) ^ - \ where 0 = (1 + V5)/2 is the golden ratio. Thus if d > c log2 w,
602
SHMUELTOMI KLEIN
we have
Cl0g2
/ I \C10g2
603
.37456
0.364-f-
N0.40
next 30% to B, etc. The new subdivision can be seen next to the second bar from
the left. The second character to be encoded is D, so the corresponding interval
is [0.34, 0.40). Repeating now the process, we see that the next character, A,
narrows the chosen subinterval further to [0.364,0.388), and the next A to
[0.3736, 0.3832), and finally the last C to [0.37360, 0.37456).
To allow unambiguous decoding, it is this last interval that should be transmitted. This would, however, be rather wasteful: as more characters are encoded, the interval will get narrower, and many of the leftmost digits of its
upper limit will overlap with those of its lower limit. In our example, both
limits start with 0.37. One can overcome this inefficiency and transmit only
a single number if some additional information is given. For instance, if the
number of characters is also given to the decoder or, as is customary, a special
end-of-file character is added at the end of the message, it suffices to transmit
any single number within the final interval. In our case, the best choice would
be y = 0.3740234375, because its binary representation 0.0101111111 is the
shortest among the numbers of the interval.
Decoding is then just the inverse of the above process. Since y is between
0.1 and 0.4, we know that the first character must be B. If so, the interval has
been narrowed to [0.1, 0.4). We thus seek the next subinterval that contains
y, and find it to be [0.34, 0.40), which corresponds to D, etc. Once we get to
[0.37360, 0.37456), the process must be stopped by some external condition;
otherwise we could continue this decoding process indefinitely, for example, by
noting that y belongs to [0.373984, 0.374368), which could be interpreted as
if the following character were A.
As has been mentioned, the longer the input string, the more digits or bits
are needed to specify a number encoding the string. Compression is achieved
by the fact that a frequently occurring character only slightly narrows the current interval. The number of bits needed to represent a number depends on
the required precision. The smaller the given interval, the higher the precision
necessary to specify a number in it; if the interval size is /?, \ log2 p\ bits might
be needed.
To evaluate the number of bits necessary by arithmetic coding, we recall
the notation used in Section II. A. The text consists of characters x\X2" -xw^
each of which belongs to an alphabet {^i,..., Un). Let Wi be the number of
occurrences of letter ^^, so that W = Xl/Li ^i ^^ ^^e total length of the text, and
let pi =WilW be the probability of occurrence of letter ai,! <i < n. Denote
by pxj the probability associated with the /th character of the text.
604
After having processed the first character, xi, the interval has been narrowed
to size px^; after the second character, the interval size is pxiPxi^ etc. We get
that the size of the final interval after the w^hole text has been processed is
Pxi Px2"' Pxw Therefore the number of bits needed to encode the full text is
j=l
j=l
i=l
w^here we get the second equality by summing over the letters of the alphabet
with their frequency instead of summing over the characters of the text, and
where H is the entropy of the given probability distribution. Amortizing this
per character, we get that the average number of bits needed to encode a single
character is just H, which has been shown in Eq. (5) to be the information
theoretic lower bound.
We conclude that from the point of view of compression, arithmetic coding has an optimal performance. However, our presentation and the analysis
are oversimplified: they do not take into account the overhead incurred by the
end_of_file character nor the fractions of bits lost by alignment for each block to
be encoded. It can be shown [28] that although these additions are often negligible relative to the average size of a codeword, they might be significant relative
to the difference between the codeword lengths for Huffman and arithmetic
codes. There are also other technical problems, such as the limited precision
of our computers, which does not allow the computation of a single number
for a long text; there is thus a need for incremental transmission, which further
complicates the algorithms, see [34].
Despite the optimality of arithmetic codes, Huffman codes may still be
the preferred choice in many applications: they are much faster for encoding
and especially decoding, they are less error prone, and after all, the loss in
compression efficiency, if any, is generally very small.
E. Dictionary-Based Text Compression
The text compression methods we have seen so far are called statistical methods,
as they exploit the skewness of the distribution of occurrence of the characters.
Another family of compression methods is based on dictionaries, which replace
variable-length substrings of the text by (shorter) pointers to a dictionary in
which a collection of such substrings has been stored. Depending on the application and the implementation details, each method can outperform the other.
Given a fixed amount of RAM that we would allocate for the storage of a
dictionary, the selection of an optimal set of strings to be stored in the dictionary
turns out to be a difficult task, because the potential strings are overlapping.
A similar problem is shown to be NP-complete in [35], but more restricted
versions of this problem of optimal dictionary construction are tractable [36].
For IR applications, the dictionary ought to be fixed, since the compressed
text needs to be accessed randomly. For the sake of completeness, however.
605
we mention also adaptive techniques, which are the basis of most popular
compression methods. Many of these are based on two algorithms designed by
Ziv and Lempel [37,38].
In one of the variants of the first algorithm [37], often referred to as LZ77,
the dictionary is in fact the previously scanned text, and pointers to it are of
the form (d, ), where d is an offset (the number of characters from the current
location to the previous occurrence of a substring matching the one that starts
at the current location), and I is the length of the matching string. There is
therefore no need to store an explicit dictionary. In the second algorithm [38],
the dictionary is dynamically expanded by adjoining substrings of the text that
could not be parsed. For more details on LZ methods and their variants, the
reader is referred to [25].
Even once the dictionary is given, the compression scheme is not yet well
defined, as one must decide how to parse the text into a sequence of dictionary
elements. Generally, the parsing is done by a greedy method; i.e., at any stage,
the longest matching element from the dictionary is sought. A greedy approach
is fast, but not necessarily optimal. Because the elements of the dictionary are
often overlapping, and particularly for LZ77 variants, where the dictionary is
the text itself, a different way of parsing might yield better compression. For
example, assume the dictionary consists of the strings D = {abc, ab, cdef, d, de,
ef, f} and that the text is S = abcdef; assume further that the elements of D are
encoded by some fixed-length code, which means that rlog2(|D|)] bits are used
to refer to any of the elements of D. Then parsing S by a greedy method, trying to
match always the longest available string, would yield abc-de-f, requiring
three codewords, whereas a better partition would be ab-cdef, requiring only
two.
The various dictionary compression methods differ also by the way they
encode the elements. This is most simply done by a fixed length code, as in the
above example. Obviously, different encoding methods might yield different
optimal parsings. Returning to the above example, if the elements abc, d, de,
ef, f, ab, cdef of D are encoded respectively by 1, 2, 3, 4, 5, 6, and 6 bits,
then the parsing abc-de-f would need 9 bits for its encoding, and for the
encoding of the parsing ab-cdef, 12 bits would be needed. The best parsing,
however, for the given codeword lengths, is abc-d-ef, which neither is a greedy
parsing nor does it minimize the number of codewords, and requires only seven
bits.
The way to search for the optimal parsing is by reduction to a well-known
graph theoretical problem. Consider a text string S consisting of a sequence
of n characters SiS2---S, each character Si belonging to a fixed alphabet
E. Substrings of S are referenced by their limiting indices; i.e., S/ S/ is the
substring starting at the ith. character in S, up to and including the /th character.
We wish to compress S by means of a dictionary D, which is a set of character
strings {ai, a i , . . . } , with ai G E + . The dictionary may be explicitly given and
finite, as in the example above, or it may be potentially infinite, e.g., for the
Lempel-Ziv variants, where any previously occurring string can be referenced.
The compression process consists of two independent phases: parsing and
encoding. In the parsing phase, the string S is broken into a sequence of consecutive substrings, each belonging to the dictionary D, i.e., an increasing sequence
606
with S/.+i S/.^j G D for / = 0 , 1 , One way to assure that at least one
such parsing exists is to force the dictionary D to include each of the individual
characters of S. The second phase is based on an encoding function X: D >
{O,!}*, which assigns to each element of the dictionary a binary string, called
its encoding. The assumption on k is that it produces a code that is UD. This
is most easily obtained by a fixed length code, but as has been seen earlier, a
sufficient condition for a code being UD is to choose it as a prefix code.
The problem is the following: given the dictionary D and the encoding
function A, we are looking for the optimal partition of the text string S, i.e., the
sequence of indices /'i, /'i? is sought, that minimizes X]/>o M^ij+i * S/y+j )|.
To solve the problem, a directed, labeled graph G = JV^E) is defined for
the given text S. The set of vertices is V = { 1 , 2 , . . . , w, + 1}, with vertex i corresponding to the character Sj for / < w, and + 1 corresponding to the end of
the text; E is the set of directed edges: an ordered pair (/, /), with / < /, belongs
to E if and only if the corresponding substring of the text, that is, the sequence
of characters S/ Sy_i, can be encoded as a single unit. In other words, the
sequence S/ S/_i must be a member of the dictionary, or more specifically for
LZ77, if / > / + 1, the string Si - Sy_i must have appeared earlier in the text.
The label L/y is defined for every edge (/, /) G as \X(Si S;_i)|, the number
of bits necessary to encode the corresponding member of the dictionary, for
the given encoding scheme at hand. The problem of finding the optimal parsing of the text, relative to the given dictionary and the given encoding scheme,
therefore reduces to the well-known problem of finding the shortest path in G
from vertex 1 to vertex n-\-l. In our case, there is no need to use Dijkstra's
algorithm, since the directed graph contains no cycles, all edges being of the
form (/, /) with i < /. Thus by a simple dynamic programming method, the
shortest path can be found in time 0 ( | | ) .
Figure 15 displays a small example of a graph, corresponding to the text
abbaabbabab and assuming that LZ77 is used. The edges connecting vertices /
to / + 1, for / = 1 , . . . , , are labeled by the character S/.
As an example of an encoding scheme, we refer to the on-the-fly compression routine recently included in a popular operating system. It is based on [39],
a variant of LZ77, using hashing on character pairs to locate (the beginning of)
F I G U R E 15
607
recurrent strings. The output of the compression process is thus a sequence of elements, each being either a single (uncompressed) character or an offset-length
pair (J, ). The elements are identified by a flag bit, so that a single character is
encoded by a zero, followed by the 8-bit ASCII representation of the character,
and the encoding of each (J, i) pair starts with a 1. The sets of possible offsets
and lengths are split into classes as follows: let Bfn(n) denote the standard m-bit
binary representation of n (with leading zeros if necessary), then, denoting the
encoding scheme by AM,
A,M(offset d) =
AM(length) =
lB6(d-l)
if
l<d<64
01Bs(d-65)
if
nBn(d-321)
if
if
= 2
V+^0Bj(l-2-2^)
if
for
;=0,1,2,....
III. DICTIONARIES
All large full-text retrieval systems make extensive use of dictionaries of all
kinds. They are needed to quickly access the concordance, they may be used
for compressing the text itself, and they generally provide some useful additional
information that can guide the user in the choice of his keywords.
Dictionaries can of course be compressed as if they were regular text, but
taking their special structure into account may lead to improved methods [40].
A simple, yet efficient, technique is the prefix omission method (POM), a formal
definition of which can be found in [2], where it is called front-end compression.
The method is based on the observation that consecutive entries in a dictionary mostly share some leading letters. Let x and y be consecutive dictionary
entries and let m be the length (number of letters) of their longest common prefix. Then it suffices to store this common prefix only once (with x) and to omit it
from the following entry, where instead the length m will be kept. This is easily
generalized to a longer list of dictionary entries, as in the example in Fig. 16.
Note that the value given for the prefix length does not refer to the string
that was actually stored, but rather to the corresponding full-length dictionary
entry. The compression and decompression algorithms are immediate.
If the dictionary entries are coded in standard format, with one byte per
character, one could use the first byte of each entry in the compressed dictionary
to store the value of m. There will mostly be a considerable gain, since the
average length of common prefixes of consecutive entries in large dictionaries
is generally much larger than 1. Even when the entries are already compressed.
608
SHMUELTOMI KLEIN
dictionary
entry
FORM
FORMALLY
FORMAT
FORMATION
FORMULATE
FORMULATING
FORTY
FORTHWITH
F I G U R E 16
prefix
length
0
4
5
6
4
8
3
4
stored
suffix
FORM
ALLY
T
ION
ULATE
ING
TY
HWITH
609
original
permuted
sorted
compressed
sufHx
771
JACM
JASIS
JACM/
ACM/J
CM/J A
M/JAC
/JACM
JASIS/
ASIS/J
SIS/JA
IS/JAS
S/JASI
/JASIS
IPM/
PM/I
M/IP
/IPM
IPM
F I G U R E 17
/IPM
/JACM
/JASIS
ACM/J
ASIS/J
CM/JA
IPM/
IS/JAS
JACM/
JASIS/
M/IP
M/JAC
PM/I
S/JASI
SIS/JA
0
1
3
0
1
0
0
1
0
2
0
2
0
0
1
/IPM
JACM
SIS
ACM/J
SIS/J
CM/JA
IPM/
S/JAS
JACM/
SIS/
M/IP
JAC
PM/I
S/JASI
IS/JA
The key for using the permuted dictionary efficiently is a function get(x)5
which accesses the file and retrieves all the strings having x as prefix. These
strings are easily located since they appear consecutively, and the corresponding
original terms are recovered by a simple cyclic shift. To process truncated terms,
all one needs is to call get() with the appropriate parameter. Figure 18 shows
in its leftmost columns how to deal with suffix, prefix, infix, and simultaneous
prefix and suffix truncations. The other columns then bring an example for
each of these categories: first the query itself, then the corresponding call to
get(), the retrieved entries from the permuted dictionary, and the corresponding
reconstructed terms.
lY. CONCORDANCES
Every occurrence of every word in the database can be uniquely characterized
by a sequence of numbers that give its exact position in the text. Typically, such
a sequence would consist of the document number J, the paragraph number
p (in the document), the sentence number s (in the paragraph), and the word
number w (in the sentence). The quadruple (d, p,s,w) is the coordinate of the
occurrence, and the corresponding fields will be called for short d-field, p-field.
X*
*X
X*Y
*X*
get(/X)
get(X/)
get(Y/X)
get(X)
F I G U R E 18
JA*
*M
J*S
*A*
get(/JA)
get(M/)
get(S/J)
get(A)
/JACM, /JASIS
M/IP, M/JAC
S/JASI
ACM/J, ASIS/J
JACM, JASIS
IPM, JACM
JASIS
JACM, JASIS
610
SHMUELTOMI KLEIN
s-field, and w-field. In the following, we assume for the ease of discussion that
coordinates of every retrieval system are of this form; however, all the methods
can also be applied to systems with a different coordinate structure, such as
book-page-line-word, etc. The concordance contains, for every word of the
dictionary, the lexicographically ordered Ust of all its coordinates in the text;
it is accessed via the dictionary that contains for every word a pointer to the
corresponding list in the concordance. The concordance is kept in compressed
form on secondary storage, and parts of it are fetched when needed and decompressed. The compressed file is partitioned into equi-sized blocks such that
one block can be read by a single I/O operation.
Since the list of coordinates of any given word is ordered, adjacent coordinates will often have the same d-field, or even the same d- and p-fields, and
sometimes, especially for high-frequency words, identical d-, p-, and s-fields.
Thus POM can be adapted to the compression of concordances, where to each
coordinate a header is adjoined, giving the number of fields that can be copied
from the preceding coordinate; these fields are then omitted. For instance in
our model with coordinates (J, p, s, w/), it would suffice to keep a header of
2 bits. The four possibilities are don't copy any field from the previous coordinate, copy the d-field, copy the d- and p-fields, and copy the d-, p-, and s-fields.
Obviously, different coordinates cannot have all four fields identical.
For convenient computer manipulation, one generally chooses a fixed
length for each field, which therefore must be large enough to represent the maximal possible values. However, most stored values are small; thus there is usually
much wasted space in each coordinate. In some situations, some space can be
saved at the expense of a longer processing time, as in the following example.
At RRP, the maximal length of a sentence is 676 words! Such long sentences can be explained by the fact that in the Responsa literature punctuation
marks are often omitted or used very scarcely. At TLF, there is even a "sentence" of more than 2000 words (a modern poem). Since on the other hand
most sentences are short and it was preferred to use only field-sizes that are
multiples of half-bytes, the following method is used: the size of the w-field is
chosen to be one byte (8 bits); any sentence of length I > 256 words, such that
i = 80it -I- r(0 < r < 80), is split into k units of 80 words, followed (if r > 0)
by a sentence of r words. These sentences form only a negligible percentage of
the database. While resolving the storage problem, the insertion of such "virtual points" in the middle of a sentence creates some problems for the retrieval
process. When in a query one asks to retrieve occurrences of keywords A and
B such that A and B are adjacent or that no more than some small number of
words appear between them, one usually does not allow A and B to appear
in different sentences. This is justified, since "adjacency" and "near vicinity"
operators are generally used to retrieve expressions, and not the coincidental
juxtaposition of A at the end of a sentence with B at the beginning of the
following one. However in the presence of virtual points, the search should be
extended also into neighboring "sentences," if necessary, since the virtual points
are only artificial boundaries that might have split some interesting expression.
Hence this solution further complicates the retrieval algorithms.
The methods presented in the next section not only yield improved compression, but also get rid of the virtual points.
6 I I
612
the Other hand, consecutive coordinates having the same value in one of their
other fields correspond to a certain word appearing more than once in the same
sentence, paragraph, or document, and this occurs frequently. For instance, at
RRP, 23.4% of the coordinates have the same s-field as their predecessors,
41.7% have the same p-field, and 51.6% have the same d-field.
Note that the header does not contain the binary encoding of the lengths,
since this would require a larger number of bits. By storing a code for the lengths
the header is kept smaller, but at the expense of increasing decompression time,
since a table that translates the codes into actual lengths is needed. This remark
applies also to the subsequent methods.
(ii) Allocate three bits in the header for each of the p-, s-, and w-fields,
giving 8 possible choices for each.
The idea of (ii) is that by increasing the number of possibilities (and hence
the overhead for each coordinate), the range of possible values can be partitioned more efficiently, which should lead to savings in the remaining part of
the coordinate. Again three methods corresponding to a, b, and c of (i) were
checked.
B. Using some fields to encode frequent values.
For some very frequent values, the code in the header will be interpreted
directly as one of the values, and not as the length of the field in which they are
stored. Thus, the corresponding field can be omitted in all these cases. However,
the savings for the frequent values come at the expense of reducing the number
of possible choices for the lengths of the fields for the less frequent values.
For instance, at RRP, the value 1 appears in the s-field of more than 9 million
coordinates (about 24% of the concordance); thus, all these coordinates will
have no s-field in their compressed form, and the code in the part of the header
corresponding to the s-field, will be interpreted as "value 1 in the s-field."
(i) Allocate 2 bits in the header for each of the p-, s-, and w-fields; one of
the codes points to the most frequent value,
(ii) Allocate 3 bits in the header for each of the p-, s-, and w-fields; three
of the codes point to the three most frequent values.
There is no subdivision into methods a, b, and c as in A (in fact the method
used corresponds to a), because we concluded from our experiments that it is
worth keeping the possibility of using the previous coordinate in case of equal
values in some field. Hence, one code was allocated for this purpose, which left
only two codes to encode the field lengths in (i) and four codes in (ii). For (ii)
we experimented also with allowing two or four of the eight possible choices
to encode the two or four most frequent values; however, on our data, the
optimum was always obtained for three. There is some redundancy in the case
of consecutive coordinates having both the same value in some field, and this
value being the most frequent one. There are then two possibilities to encode
the second coordinate using the same number of bits. In such a case, the code
for the frequent value should be preferred over that pointing to the previous
coordinate, as decoding of the former is usually faster.
C. Combining methods A and B.
6 I 3
Choose individually for each of the p-, s-, and w-fields the best of the
previous methods.
D. Encoding length combinations.
If we want to push the idea of A further, we should have a code for every
possible length of a field, but the maxima of the values can be large. For example,
at RRP, one needs 10 bits for the maximal value of the w-field, 9 bits for the
s-field, and 10 bits for the p-field. This would imply a header length of 4 bits
for each of these fields, which cannot be justified by the negligible improvement
over method A(ii).
The size of the header can be reduced by replacing the three codes for the
sizes of the p-, s-, and w-fields by a single code in the following way. Denote by
/p, /s, and /w the lengths of the p-, s-, and w-fields respectively, i.e., the sizes (in
bits) of the binary representations without leading zeros of the values stored in
them. In our model 1 < /p, 4, /w < 10, so there are up to 10^ possible triplets
(/p,/s,/w)- However, most of these length combinations occur only rarely, if
at all. At RRP, the 255 most frequent (/p, /s, /w) triplets account already for
98.05% of the concordance. Therefore:
(i) Allocate 9 bits as header, of which 1 bit is used for the d-field; 255 of
the possible codes in the remaining 8 bits point to the 255 most
frequent (/p, 4, /w) triplets; the last code is used to indicate that the
coordinate corresponds to a "rare" triplet, in which case the p-, s-, and
w-fields appear already in their decompressed form.
Although the "compressed" form of the rare coordinates, including a 9-bit
header, may in fact need more space than the original coordinate, we still save
on the average.
Two refinements are now superimposed. We first note that one does not
need to represent the integer 0 in any field. Therefore one can use a representation of the integer n 1 in order to encode the value w, so that only
Llog2(w 1)J -h 1 bits are needed instead of Llog2 n} -\-l. This may seem negligible, because only one bit is saved and only when is a power of 2, thus
for very few values of n. However, the first few of these values, 1, 2, and 4,
appear very frequently, so that in fact this yields a significant improvement. At
RRP, the total size of the compressed p-, s-, and w-fields (using method D) was
further reduced by 7.4%, just by shifting the stored values from nto n 1.
The second refinement is based on the observation that since we know from
the header the exact length of each field, we know the position of the left-most
1 in it, so that this 1 is also redundant. The possible values in the fields are
partitioned into classes C/ defined by Co = {0}, Cj = [i : 2^"^ < I < 2^}, and the
header gives for the values in each of the p-, s-, and w-fields the indices / of the
corresponding classes. Therefore Hi < 1, there is no need to store any additional
information because Co and Ci are singletons, and for I e Ci for / > 1, only the
/ 1 bits representing the number I 2^~^ are kept. For example, suppose the
values in the p-, s-, and w-fields are 3 , 1 , and 28. Then the encoded values are 2,
0, and 27, which belong to Ci, Co, and C5 respectively. The header thus points to
the triplet (2, 0, 5) (assuming that this is one of the 255 frequent ones), and the
rest of the coordinate consists of the five bits 01011, which are parsed from left
6 I4
to right as 1 bit for the p-field, 0 bits for the s-field, and 4 bits for the w-field.
A similar idea was used in [15] for encoding run lengths in the compression of
sparse bit vectors.
(ii) Allocate 8 bits as header, of which 1 bit is used for the d-field; the
remaining 7 bits are used to encode the 127 most frequent (/p, /g, /w)
triplets.
The 127 most frequent triplets still correspond to 85.19% of the concordance at RRP. This is therefore an attempt to save one bit in the header of each
coordinate at the expense of having more noncompressed coordinates.
Another possibility is to extend method D also to the d-field. Let fc be a
Boolean variable corresponding to the two possibilities for the d-field, namely
T = the value is identical to that of the preceding coordinate; thus omit it, or F =
different value, keep it. We therefore have up to 2000 quadruples (fo, /p, 4, /w),
which are again sorted by decreasing frequency.
(iii) Allocate 8 bits as header; 255 of the codes point to the 255 most
frequent quadruples.
At RRP, these 255 most frequent quadruples cover 87.08% of the concordance. For the last two methods, one could try to get better results by
compressing also some of the coordinates with the nonfrequent length combinations, instead of storing them in their decompressed form. We did not,
however, pursue this possibility.
1. Encoding
Note that for a static information retrieval system, encoding is done only
once (when building the database), whereas decoding directly affects the response time for on-line queries. In order to increase the decoding speed, we use
a small precomputed table T that is stored in internal memory. For a method
with header length k bits, this table has 2^ entries. In entry / of T, 0 < / < 2^,
we store the relevant information for the header consisting of the fe-bit binary
representation of the integer /.
6 I 5
7.
8.
9.
10.
11.
12.
13. end
PI
and statements similar to the latter for the s- and w-fields. After statement 11
we should insert
if P1 < 0 then put - P lin p-field of COOR
and similar statements for the s- and w-fields.
The decoding of the methods in D is equivalent to that of A. The only
difference is in the preparation of the table T (which is done only once). While
for A to each field correspond certain fixed bits of the header that determine the
length of that field, for D the header is nondivisible and represents the lengths
616
SHMUELTOMI KLEIN
Ignoring
p-field
14.1
35.2
46.5
54.2
60.2
Preceding
s-field
24.2
40.2
51.1
58.8
64.5
Coordinate
w-field
3.0
5.8
8.6
11.4
14.0
Using
p-field
9.6
25.2
36.5
45.0
51.7
Preceding
s-field
17.9
33.0
44.3
52.6
58.9
Coordinate
w-field
1.9
4.4
7.1
9.7
12.4
79
83
87
93
119
120
99
99
99
99
99
99
of all the fields together. This does not affect the decoding process, since in
both methods a table-lookup is used to interpret the header. An example of the
encoding and decoding processes appears in the next section.
3. Parameter Setting
All the methods of the previous section were compared on the concordance of RRP. Each coordinate had a {d^p^s^w) structure and was of length
6 bytes (48 bits). Using POM, the average length of a compressed coordinate
was 4.196 bytes, i.e., a compression gain of 30%.
Table 3 gives the frequencies of the first few values in each of the p-, s-, and
w-fields, both with and without taking into account the previous coordinate.
The frequencies are given in cumulative percentages; e.g., the row entitled s-field
contains in the column headed / the percentage of coordinates having a value
< / in their s-field. We have also added the values for which the cumulative
percentage first exceeds 99%.
As one can see, the first four values in the p- and s-fields account already for
half of the concordance. This means that most of the paragraphs consist of only
a few sentences and most of the documents consist of only a few paragraphs.
The figures for the w-field are different, because short sentences are not preponderant. While the (noncumulative) frequency of the values / in the s-field is a
clearly decreasing function of /, it is interesting to note the peek at value 2 for the
p-field. This can be explained by the specific nature of the Responsa literature, in
which most of the documents have a question-answer structure. Therefore the
first paragraph of a document usually contains just a short question, whereas
the answer, starting from the second paragraph, may be much longer.
When all the coordinates are considered (upper half of Table 3), the percentages are higher than the corresponding percentages for the case where identical
fields in adjacent coordinates are omitted (lower half of Table 3). This means
that the idea of copying certain fields from the preceding coordinate yields to
savings, which are, for the small values, larger than could have been expected
from knowing their distribution in the noncompressed concordance.
Using the information collected from the concordance, all the possible variants for each of the methods in A and B have been checked. Table 4 lists for
each of the methods the variant for which maximal compression was achieved.
The numbers in boldface are the frequent values used in methods B and C; the
other numbers refer to the lengths of the fields. The value 0 indicates that the
field of the preceding coordinate should be copied.
I H H
6 I 7
TABLE 4
Method
p-field
s-field
w-fleld
A(i)a
A(i)b
A(ii)a
A(ii)b
B(i)
B(ii)
C
025 10
135 10
0 1 2 3 4 5 6 10
123456710
02410
0 1 2 3 3 4 5 10
025 10
0259
1359
01234569
12345679
0149
01233459
01233459
046 10
356 10
0 1 3 4 5 6 7 10
1 2 3 4 5 6 7 10
04610
0345356 10
356 10
The optimal variants for the methods A(ii) are not surprising: since most
of the stored values are small, one could expect the optimal partition to give
priority to small field lengths. For method C, each field is compressed by the
best of the other methods, which are A(i)a for the p-field, B(ii) for the s-field,
and A(i)b for the w^-field, thus requiring a header of l + 2 - | - 3 - h 2 = 8 bits
(including one bit for the d-field).
The entries of Table 4 w^ere computed using the first refinement mentioned
in the description of method D, namely storing n 1 instead of w. The second
refinement (dropping the leftmost 1) could not be applied, because it is not
true that the leftmost bit in every field is a 1. Thus for all the calculations with
methods A and B, an integer n was supposed to require [log2(w 1)J -h 1 bits
ior n> 1 and one bit for n= 1,
As an example for the encoding and decoding processes, consider method
C, and a coordinate structure with (h^^ hp, hs, h^) = (8, 8, 8, 8), i.e., one byte
for each field. The coordinate we wish to process is (159, 2, 2, 35). Suppose
further that only the value in the d-field is the same as in the previous coordinate.
Then the length D of the d-field is 0; in the p-field the value 1 is stored, using
two bits; nothing is stored in the s-field, because 2 is one of the frequent values
and directly referenced by the header; and in the w-field the value 34 is stored,
using 6 bits. The possible options for the header are numbered from left to
right as they appear in Table 4; hence the header of this coordinate is 0-10011-11, where dashes separating the parts corresponding to different fields
have been added for clarity; the remaining part of the coordinate is 01-100010.
Table T has 2^ = 256 entries; at entry 79 (= 01001111 in binary) the values
stored are (TOT, F l , SI, Wl) = (8,2, - 2 , 6). When decoding the compressed
coordinate 0100111101100010, the leftmost 8 bits are considered as header
and converted to the integer 79. Table T is then accessed with that index,
retrieving the 4-tuple (8,2, - 2 , 6), which yields the values (P,S,W) = (2, 0, 6).
The next TOT = 8 bits are therefore loaded into COOR of size 4 bytes, and
after the three shifts we get
COOR = 00000000 - 00000010 - 00000000 - 00100010.
Since TOT = ? + S + Wthe value of the d-field is copied from the last coordinate. Since P I < 0, the value - S I = 2 is put into the s-field.
On our data, the best method was D(i) with an average coordinate length
of 3.082 bytes, corresponding to 49% compression relative to the full 6-byte
coordinate, and giving a 27% improvement over POM. The next best method
was C with 3.14 bytes. Nevertheless, the results depend heavily on the statistics
618
SHMUELTOMI KLEIN
of the specific system at hand, so that for another database, other methods
could be preferable.
The main target of the efforts was to try to eliminate or at least reduce the
unused space in the coordinates. Note that this can easily be achieved by considering the entire database as a single long run of words, which we could index
sequentially from 1 to N, N being the total number of words in the text. Thus
Llog2 NJ + 1 bits would be necessary per coordinate. However, the hierarchical
structure is lost, so that, for example, queries asking for the cooccurrence of several words in the same sentence or paragraph are much harder to process. Moreover, when a coordinate is represented by a single, usually large, number, we lose
also the possibility of omitting certain fields that could be copied from preceding
coordinates. A hierarchical structure of a coordinate is therefore preferable for
the retrieval algorithms. Some of the new compression methods even outperform the simple method of sequentially numbering the words, since the latter
would imply at the RRP database a coordinate length of 26 bits = 3.25 bytes.
B. Model-Based Concordance Compression
For our model of a textual database, we assume that the text is divided into documents and the documents are made up of words. We thus use only a two-level
hierarchy to identify the location of a word, which makes the exposition here
easier. The methods can, however, be readily adapted to more complex concordance structures, like the 4-level hierarchy mentioned above. In our present
model, the conceptual concordance consists, for each word, of a series of (J, w)
pairs, d standing for a document number, and w for the index, or offset, of a
word within the given document:
wordi : (di,wi)(di,W2)
" - (di,Wm,)
[d2,Wi){d2,W2)-'[d2,Wrm)
word?
For a discussion of the problems of relating this conceptual location to a physical
location on the disc, see [7].
It is sometimes convenient to translate our 4-level hierarchy to an equivalent
one, in which we indicate the index of the next document containing the word,
the number of times the word occurs in the document, followed by the list of
word indices of the various occurrences:
wordi : (Ji, mi; w/i, w/2, , ^mi)
wordi :
(d2,m2;
wu...,Wm2)
(dm^N;
^ I j ...^W^mw)
6 I 9
620
SHMUELTOMI KLEIN
where the last equality uses the well-known combinatoric identity that permits
summation over the upper value in the binomial coefficient [11]. Second, we
note that we can rewrite the probability as
( D 3 ^ ) X ( I - ^ ) X ( I - D ^ ) X(I-DI4T2)If J <$C D, this is approximately (N/D) x (1 d/D)^~^, which is in turn approximately proportional to ^-^(^^i)/^ or y^, for y == ^-(^-i)/^, Xhis last form
is that of the geometric distribution recommended by Witten et aL [42],
The encoding process is then as follows. We wish to encode the d-field of the
next coordinates (d,m;wi,...
, Wm)- Assuming that the probability distribution
of d is given by Pr( J), we construct a code based on {Pr(d)}^~^^^, This assigns
codewords to all the possible values of d, from which we use the codeword
corresponding to the actual value d in our coordinate. If the estimate is good,
the actual value d will be assigned a high probability by the model, and therefore
be encoded with a small number of bits.
Next we encode the number of occurrences of the term in this document.
Let us suppose that we have T occurrences of the term remaining (initially, this
will be the total number of occurrences of the term in the database). The T
occurrences are to be distributed into the N remaining documents the word
occurs in. Thus we know that each document being considered must have at
least a single term, that is, m = 1 -f- x, where x > 0. If T = N, then clearly
X = 0 (m = 1), and we need output no codem conveys no information in this
case. If T > N, then we must distribute the T N terms not accounted for over
the remaining N documents that contain the term. We assume, for simplicity,
that the additional amount, x, going to the currently considered document
is Poisson distributed, with mean X = (T N)/N, The Poisson distribution is
given by Pr(x) = e'^^- This allows us to compute the probability of x for all
possible values (x == 0 , 1 , . . . , T N) and to then encode x using one of the
encodings above.
We must finally encode all the m offsets, but this problem is formally identical to that of encoding the next document. The current document has W words,
so the distribution of w, the first occurrence of the word, is given by the probabilities ( ^ I ^ ) / ( ^ ) ' Once this is encoded, we have a problem identical to the
initial one in form, except that we now have m 1 positions left to encode
and Ww locations. This continues until the last term, which is uniformly
distributed over the remaining word locations.
62 I
Then we encode the next document, but this is again a problem identical
in form to the initial problemonly we now have one fewer document (N 1)
having the term, and d fewer target documents (D d) to consider.
The formal encoding algorithm is given in Fig. 19. We begin with a conceptual concordance, represented for the purpose of this algorithm as a list of
for s < 1 t o 5
{
D
T
N
<
<
<
do
^-
for i < 1 to N
{
/ * process document i * /
T - N)
D
T
}
4
<
D-{di-di.i)
T-rrii
.code{d, N, D)
construct a code Ci based on probabilities that d = k\ \ (^jv / / ( A T ^ I ) f
r e t u r n Ci{d)
m_code(a;, A, max)
F
<
ES^o ^
TT
Id
\ ^'t -j
1 max
return C2{x)
w_code(if;, m, W)
construct a code C3 based on probabilities that w = k:
r e t u r n Cs{w)
m H
{C^rn^)/(j^i)}
622
SHMUELTOMI KLEIN
entries. Our concordance controls S different words. For each word, there is an
entry for each document it occurs in, of the form ( 4 , w/; t ^ i , . . . , Wrm), where
4 , w/, and Wj are given similarly to the second representation defined above.
Note that we do not encode the absolute values di and w//, but the relative
increases dj 4_i and Wj Wj-i; this is necessary, because we redefine, in each
iteration, the sizes D, W, and T to be the remaining number of documents, the
number of words in the current document, and the number of occurrences of
the current word, respectively.
In fact, one should also deal with the possibility where the independence
assumptions of the previous section are not necessarily true. In particular, we
consider the case where terms cluster not only within a document, but even at
the between document level. Details of this model can be found in [43].
Y. BITMAPS
For every distinct word W of the database, a bitmap JB(W) is constructed,
which acts as an "occurrence" map at the document level. The length (in bits)
of each map is the number of documents in the system. Thus, in the RRP for
example, the length of each map is about 6K bytes. These maps are stored
in compressed form on a secondary storage device. At RRP, the compression
algorithm was taken from [44], reducing the size of a map to 350 bytes on
the average. This compression method was used for only about 10% of the
words, those which appear at least 70 times; for the remaining words, the list
of document numbers is kept and transformed into bitmap form at processing
time. The space needed for the bitmap file in its entirety is 33.5 MB, expanding
the overall space requirement of the entire retrieval system by about 5%.
At the beginning of the process dealing with a query of the type given in
Eq. (1), the maps B(Aij) are retrieved, for / = 1 , . . . , m and / = 1 , . . . , w/. They
are decompressed and a new map ANDVEC is constructed:
ANDVEC = / \ f \ / ^ ( A / ) j .
The bitmap ANDVEC serves as a "filter," for only documents corresponding
to 1 bits in ANDVEC can possibly contain a solution. Note that no more than
three full-length maps are simultaneously needed for its construction.
For certain queries, in particular when keywords with a small number of
occurrences in the text are used, ANDVEC will consist only of zeros, which
indicates that nothing should be retrieved. In such cases the user gets the correct
if somewhat meager results, without a single merge or collate action having been
executed. However, even if ANDVEC is not null, it will usually be much sparser
than its components. These maps can improve the performance of the retrieval
process in many ways to be now described.
A. Usefulness of Bitmaps in IR
First, bitmaps can be helpful in reducing the number of I/O operations involved
in the query-processing phase. Indeed, since the concordance file is usually too
623
624
are therefore encouraged to change their poUcy and to submit more complex
queries!
Another possible application of the bitmaps is for getting a selective display
of the results. A user is often not interested in finding all the occurrences of a
certain phrase in the database, as specified by the query, but only in a small
subset corresponding to a certain author or a certain period. The usual way
to process such special requests consists in executing first the search ignoring
the restrictions, and then filtering out the solutions not needed. This can be
very wasteful and time-consuming, particularly if the required subrange (period
or author(s)) is small. The bitmaps allow the problem to be dealt with in a
natural way, requiring only minor changes to adapt the search program to this
application. All we need is to prepare a small repertoire TZ of fixed bitmaps,
say one for each author, where the 1 bits indicate the documents written by
this author, and a map for the documents of each year or period, etc. The
restrictions can now be formulated at the same time the query is submitted.
In the construction algorithm, ANDVEC will not be initialized by a string
containing only I's, but by a logical combination of elements of TZ, as induced
by the additional restrictions. Thus user-imposed restrictions on required ranges
to which solutions should belong on one hand, and query-imposed restrictions
on the co-occurrence of keywords on the other, are processed in exactly the
same way, resulting in a bit vector, the sparsity of which depends directly on
the severity of the restrictions. As was pointed out earlier, this may lead to
savings in processing time and I/O operations.
Finally, bitmaps can be also helpful in handling negative keywords. If a
query including some negative keywords Q is submitted at the document-level,
one can use the binary complements B(Di) of the maps, since only documents
with no occurrence of Di (indicated by the 0 bits) can be relevant. However,
for other levels, the processing is not so simple. In fact, if the query is not
on the document level, the bitmaps of the negative keywords are useless, and
ANDVEC is formed only by the maps of the positive keywords. This difference in the treatment of negative and positive keywords is due to the fact that
a 0 bit in the bit vector of a positive keyword means that the corresponding
document cannot possibly be relevant, whereas a 1 bit in the bit vector of a
negative keyword Dj only implies that Dj appears in the corresponding document; however, this document can still be retrieved, if Dj is not in the specified
neighborhood of the other keywords. Nevertheless, even though the negative
keywords do not contribute in rendering ANDVEC sparser, ANDVEC will still
be useful also for the negative words: only coordinates in the relevant documents must be checked not to fall in the vicinity of the axis, as imposed by the
(li.Ui),
B. Compression of Bitmaps
It would be wasteful to store the bitmaps in their original form, since they
are usually very sparse (the great majority of the words occur in very few
documents). Schuegraf [45] proposes to use run-length coding for the compression of sparse bit vectors, in which a string of consecutive O's terminated by a
625
^0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0
10 1
1 10 1 01
00 1 1 0 1 1 1 0 0 1 0
F I G U R E 2 0 Hierarchical bit-vector compression, (a) Original vector and two derived levels and (b)
compressed vector.
626
for our applications. Because of the structure of the compressed vector, we call
this the TREE method, and shall use in our discussion the usual tree vocabulary:
the root of the tree is the single block on the top level, and for a block x in f y+i,
which is obtained by ORing the blocks y i , . . . , ^r, of f/, we say that x is the
parent of the nonzero blocks among the ji.
The TREE method was proposed by Wedekind and Harder [48]. It appears
also in Vallarino [49], who used it for two-dimensional bitmaps, but only with
one level of compression. In [50], the parameters (block size and height of the
tree) are chosen assuming that the bit vectors are generated by a memoryless
information source; i.e., each bit in VQ has a constant probability po for being 1,
independently from each other. However, for bitmaps in information retrieval
systems, this assumption is not very realistic a priori, as adjacent bits often
represent documents written by the same author; there is a positive correlation
for a word to appear in consecutive documents, because of the specific style of
the author or simply because such documents often treat the same or related
subjects.
We first remark that the hierarchical method does not always yield real
compression. Consider, for example, a vector VQ for which the indices of the
1 bits are of the form />o for / < /o/^o- Then there are no zero-blocks (of size
ro) in VQ'^ moreover all the bits of Vi for / > 0 will be 1, so that the whole tree
must be kept. Therefore the method should be used only for sparse vectors.
In the other extreme case, when VQ is very sparse, the TREE method may
again be wasteful: let d = rlog2 /ol ? so that a d-hit number suffices to identify
any bit position in VQ. If the vector is extremely sparse, we could simply list the
positions of all the 1 bits, using d bits for each. This is in fact the inverse of the
transformation performed by the bit vectors: basically, for every different word
W of the database, there is one entry in the inverted file containing the list of
references of W, and this list is transformed into a bitmap; here we change the
bitmap back into its original form of a list.
A small example will illustrate how the bijection of the previous paragraph
between lists and bitmaps can be used to improve method TREE. Suppose that
among the ro ri ri first bits of VQ only position / contains a 1. The first bit
in level 3, which corresponds to the ORing of these bits, will thus be set to 1
and will point to a subtree consisting of three blocks, one on each of the lower
levels. Hence in this case a single 1 bit caused the addition of at least ro + ri + ri
bits to the compressed map, since if it were zero, the whole subtree would have
been omitted. We conclude that if ro + ri + ri > ^, it is preferable to consider
position / as containing zero, thus omitting the bits of the subtree, and to add
the number / to an appended list L, using only d bits. This example is readily
generalized so as to obtain an optimal partition between tree and list for every
given vector, as will now be shown.
We define // and kj respectively as the number of bits and the number of
blocks in z/y, for 0 < j <t. Note that rj kj = Ij, Denote by T(/, /) the subtree
rooted at the /th block of f/, with 0 < j <t and 1 <i < kj. Let S(/, /) be the
size in bits of the compressed form of the subtree T(/, /), i.e., the total number
of bits in all the nonzero blocks in T(/, /), and let N(/, /) be the number of 1
bits in the part of the original vector VQ, which belongs to T(/, / ) .
627
During the bottom-up construction of the tree these quantities are recursively evaluated for 0 < / < ^ and 1 < / < kj by
I ^^"^ber of 1 bits in block / of VQ
if / = 0,
if / > 0;
628
0 10 1
0-00100
0-10010
1-11110
0-01001
1-10100
\)-N(i,j)<S{i,j),
(8)
since any index that will be added to L will use only c-\-l bits for its encoding.
In fact, after recognizing that L will be compressed, we should check again
the blocks already handled, since a subtree T(/, /) may satisfy (8) without satisfying (6). Nevertheless, we have preferred to keep the simplicity of the algorithm
and not to check again previously handled blocks, even at the price of losing
some of the compression efficiency. Often, there will be no such loss, since if
we are at the top level when \L\ becomes large enough to satisfy (7), this means
that the vector VQ will be kept in its entirety as a list. If we are not at the top
level, say at the root of T(/, /) for / < t^ then all the previously handled trees
will be reconsidered as part of larger trees, which are rooted on the next higher
level. Hence it is possible that the subtree T(/, /), which satisfies (8) but not (6)
(and thus was not pruned at level /), will be removed as part of a larger subtree
rooted at level / + 1.
2. Combining Huffman and Run-Length Coding
As we are interested in sparse bit strings, we can assume that the probability
/7 of a block of k consecutive bits being zero is high. If p > 0.5, method NORUN
assigns to this 0 block a codeword of length one bit, so we can never expect
a better compression factor than k. On the other hand, k cannot be too large
since we must generate codewords for 2^ different blocks.
In order to get a better compression, we extend the idea of method NORUN
in the following way: there will be codewords for the 2^ 1 nonzero blocks
of length ^, plus some additional codewords representing runs of zero-blocks
of different lengths. In the sequel, we use the term "run" to designate a run of
zero-blocks of k bits each.
The length (number of fe-bit blocks) of a run can take any value up to /o/fe,
so it is impractical to generate a codeword for each: as was just pointed out, k
629
630
hj+i
for/=0,...,^-l.
(9)
63 I
Fi = l,
P, = F,_i + P,_2
for
i>2.
REFERENCES
1. Pennebaker, W. B., and Mitchell, J. L. JPEG: Still Image Data Compression Standard. Van
Nostrand Reinhold, New York, 1993.
2. Bratley, P., and Choueka, Y. Inform. Process. Manage. 18:257-266, 1982.
632
633
48. Wedekind, H., and Harder, T. Datenbanksysteme 11. Wissenschaftsverlag, Mannheim, 1976.
49. Vallarino, O. Special Issue, SIGPLAN Notices, 2:108-114, 1976.
50. Jakobsson, M. Inform. Process. Lett. 14:147-149, 1982.
51. Fraenkel, A. S. Amer. Math. Monthly 92:105-114, 1985.
52. Boyer, R. S., and Moore, J. S. Commun. ACM 20:762-772, 1977.
53. Faloutsos, C , and Christodulakis, S. ACM Trans. Office Inform. Systems 2:267-288, 1984.
Panepistimioupolis,
I. INTRODUCTION 635
II. GATEWAY SPECIFICATIONS 637
A. Common Gateway Interface 637
B. Netscape Server API 638
C. Web Application Interface 639
D. Internet Server API 639
E. FastCGI 640
F. Servlet Java API 640
G. Comparisons between CGI and Server APIs 641
III. ARCHITECTURES OF RDBMS GATEWAYS 643
A. Generic DBMS Interfaces 644
B. Protocols and Interprocess Communication Mechanisms 646
C. Experiments and Measurements 648
D. State Management Issues 653
E. Persistent or Nonpersistent Connections to Databases 654
R Template-Based Middleware 654
IV. WEB SERVER ARCHITECTURES 655
V PERFORMANCE EVALUATION TOOLS 656
A. Analytic Workload Generation 657
B. Trace-Driven Workload Generation 657
VI. EPILOGUE 659
REFERENCES 660
INTRODUCTION
In the second half of the 1990s, the WWW evolved into dominant software technology and broke several barriers. Despite the fact that it was initially intended
for a wide (universal) area network, the WWW today enjoys a tremendous
Database and Data Communication Network Systems, Vol. 3
Copyright 2002, Elsevier Science (USA). All rights reserved.
635
636
success and penetration even into corporate environments (intranets). Key players in the business software arena, like Oracle, Informix, and Microsoft, recognize the success of this open platform and constantly adapt their products
to it. Originally conceived as a tool for the cooperation between high-energy
physicists, this network service is nowadays becoming synonymous with the
Internet as it is used vastly by the majority of Internet users even for tasks like
e-mail communication.
The universal acceptance of the WWW stimulated the need to provide access to the vast legacy of existing heterogeneous information.^ Such information
ranged from proprietary representation formats to engineering and financial
databases, which were, since the introduction of WWW technology, accessed
through specialized tools and individually developed applications. The vast
majority of the various information sources were stored in Relational DBMS
(RDBMS), a technology that enjoys wide acceptance in all kinds of applications and environments. The advent of the Web provided a unique opportunity
for accessing such data repositories, through a common front-end interface,
in an easy and inexpensive manner. The importance of the synergy of WWW
and database technologies is also broadened by the constantly increasing management requirements for Web content ("database systems are often used as
high-end Web servers, as Webmasters with a million of pages of content invariably switch to a web site managed by database technology rather than using file
system technology," extract from the Asilomar Report on Database Research
[4]).
Initial research on the WWW database framework did not address performance issues, but since this area is becoming more and more important, relevant
concern is begining to grow. The main topic of this chapter, namely the study of
the behavior exhibited by WWW-enabled information systems under a heavy
workload, spans a number of very important issues, directly associated with the
efficiency of service provision. Gateway specifications (e.g., CGI, ISAPI^) are
very important due to their use for porting existing systems and applications
to the WWW environment. Apart from the gateway specifications used, the
architecture of the interface toward the information repository (i.e., RDBMS)
is also extremely important (even with the same specifications quite different
architectures may exhibit quite different behaviour in terms of performance).
The internal architecture of HTTP server software is also an important issue in
such considerations. Older single-process architectures are surely slower in dispatching clients' request than newer multithreaded schemes. Prespawned server
instances (processes or threads) also play an important role.
Lastly, as part of our study, we are considering how existing WWW-based
systems are being measured. We provide a brief description of some of the most
important performance evaluation tools.
^The significance of this issue triggered the organization, by W3C, of the Workshop on Web
Access to Legacy Data, in parallel with the Fourth International WWW Conference, held in Boston,
MA in December 1995.
^In this chapter, server APIs like IS API and NSAPI, will also be referred to as Gateway specifications despite the flexibility they offer for the modification of a basic server functionality. Since
we are mostly concerned with interfaces to legacy systems, such features of server APIs will not be
considered.
637
i Chapt0fstoais \
Web
browser
FIGURE I
HTTP
request/response
Web
server
JZJ
It should be noted that the entire chapter is focused on the three-tier scenario where access to a legacy information system is always realized through
a browser-server combined architecture (i.e., HTTP is always used), as shown
in Fig. 1 (as opposed to an applet-based architecture where access to the legacy
system can be realized directly, without the intervention of HTTP: a 2-tier
scheme).
As illustrated in Fig. 1, we do not consider network-side aspects (e.g., network latency, HTTP). The network infrastructure underneath the Internet is
continuously becoming more and more sophisticated with technologies like
ATM and xDSL. The best-effort model for Internet communications is being supplemented with the introduction of QoS frameworks like Integrated
Services/Differentiated Services. Lastly, on the application layer, HTTP/1.1,
Cascading Style Sheets (CSS), and Portable Network Graphics (PNG) show
W3C's intention to "alleviate Web slowdown" [24].
The CGI is a relatively simple interface mechanism for running external programs (general purpose software, gateways to existing systems) in the context
of a WWW information server in a platform independent way. The mechanism has been in use in the WWW since 1993. By that time, CGI appeared in
the server developed by the National Center for Supercomputing Applications
(NCSA http demon, httpd). The CGI specification is currently in the Internet
Draft status [10]. Practically, CGI specifies a protocol for information exchange
between the information server and the aforementioned external programs as
well as a method for their invocation by the WWW server (clause 5 of the
Internet Draft). Data are supplied to the external program by the information
638
639
640
HADJIEFTHYMIADES A N D MARTAKOS
64 I
Netscape and Apache (such techniques will be discussed below). They are to the
server side what applets are to the client side. Servlets can be used in many ways
but generally regarded as a replacement to of CGI for the dynamic creation of
HTML content. Servlets first appeared with the Java Web Server launched in
1997 [9]. Servlets allow the realization of three-tier schemes where access to
a legacy database or another system is accomplished using some specification
like Java Database Connectivity (JDBC) or Internet Inter-ORB Protocol (HOP).
Unlike CGI, the Servlet code stays resident after the dispatch of an incoming
request. To handle simultaneous requests new threads are spawned instead of
processes. A Web server uses the Servlet API, a generic call-back interface to
control the behavior of a Servlet (i.e., initializationtermination through the
invocation of specific methods). Servlets may be loaded from the local file system
or invoked from a remote host much like in the applet case.
As reported in [9], FastCGI and Java Servlets accomplish comparable performances much higher than that of conventional CGI. Specifically, Organic
Online, a San Francisco-based Web development company, measured a 3 4 requests/s throughput for FastCGI or Servlets while CGI managed to handle
only 0.25 requests/s.
Some concerns about the performance of Servlets in Web servers other than
the Java Web Server are raised in [27]. Servlets execute on top of the Java Virtual
Machine (JVM). Spawning the JVM is a time- and resource-consuming task (a
time of 5 s and a memory requirement of 4 MB is reported for a standard
PC configuration). Such a consumption could even compromise the advantage
of Servlets over conventional CGI if a different instance of JVM was needed
for each individual request reaching the Web server. In the Java Web Server
case there is no need to invoke a new instance of JVM since the Servlet can
be executed by the JVM running the server module (a "two-tier" scheme). In
the Apache case, the Servlet engine is a stand-alone application (called JServ),
running independently to the server process. The Web server passes requests
to this prespawned engine, which then undertakes their execution (a proxylike scheme). A quite similar scheme is followed in every other Web server
whose implementation is not based in Java. Indicative examples are the latest
versions of Microsoft's IIS and Netscape's FastTrack. Popular external Servlet
engines include JRun from Live Software, Inc. (http://www.livesoftware.com/)
and WAICoolRunner from Gefion Software (http://www.gefionsoftware.com/).
WAICoolRunner has been used in some of the experiments documented in
subsequent paragraphs.
G. Comparisons between CGI and Server APIs
Gateway specifications are compared in terms of characteristics or the performance they can achieve. A detailed comparative analysis of the characteristics
of CGI and server APIs can be found in [12,37].
Specifically, CGI is the most widely deployed mechanism for integrating
WWW servers with legacy information systems. However, its design does not
match the performance requirements of contemporary applications: CGI applications do not run within the server process. In addition to the performance
overhead (new process per request), this implies that CGI applications cannot
642
HADJIEFTHYMIADES A N D MARTAKOS
modify the behavior of the server's internal operations, such as logging and
authorization (see Section B above). Finally, CGI is viewed as a security issue,
due to its connection to a user-level shell.
Server APIs can be considered as an efficient alternative to CGI. This is
mainly attributed to the fact that server APIs entail a considerable performance
increase and load decrease as gatew^ay applications run in or as part of the server
processes (parctically the invocation of the gatew^ay module is equivalent to a
regular function calP) instead of starting a completely new process for each new
request, as the CGI specification dictates. Furthermore, through the APIs, the
operation of the server process can be customized to the individual needs of each
site. The disadvantages of the API solution include the limited portability of the
gateway code, which is attributed to the absence of standardization (completely
different syntax and command sets) and strong dependence on internal server
architecture.
The choice for the programming language in API configurations is extremely restricted if compared to CGI (C or C + + Vs C, C + + , Perl, Tcl/Tk,
Rexx, Python, and a wide range of other languages). As API-based programs
are allowed to modify the basic functionality offered by the Web server, there
is always the concern of a buggy code that may lead to crashes.
The two scenarios involve quite different resource requirements (e.g.,
memory) as discussed in [37]. In the CGI case, the resource needs are proportional to the number of clients simultaneously served. In the API case, resource
needs are substantially lower due to the function-call-like implementation of
gateways and multithreaded server architecture.
In terms of performance, many evaluation reports have been published
during the past years. Such reports clearly show the advantages of server APIs
over CGI or other similar mechanisms but also discuss performance differences
between various commercial products.
In [19], three UNIX-based Web servers (NCSA, Netscape, and OpenMarket) have been compared using the WebStone benchmark (see Section V. B).
Under an extensive load, the NSAPI configuration achieved 119% better
throughput"^ than the CGI configuration running in NCSA. In terms of connections per second, the Netscape/NSAPI configuration outperformed NCSA/CGI
by 7 3 % . OpenMarket/CGI achieved slightly better performance than the
NCSA/CGI combination. Under any other load class, NSAPI outperformed
the other configurations under test but the gap was not as extensive as in the
high-load case.
In [45], the comparison between server API implementations and CGI
shows the ISAPI is around five times faster than CGI (in terms of throughput). NSAPI, on the other hand, is reported only two times faster than CGI. A
quite similar ratio is achieved for the connections per second metric. In terms
of average response times, ISAPFs performance is reported to be 1/3 that of
NSAPFs.
In [44], published by Mindcraft Inc. (the company that purchased
WebStone from Silicon Graphics), quite different results are reported for the
^Differences arise from the start-up strategies followed: some schemes involve that gatev^ay
instances/modules are preloaded to the server while others follow the on-demand invocation.
^Throughput measured in MBps.
643
644
.waofettt
_^HTTP
"communication (interface A)
_Any gateway
"specification (interface B)
(Memi$^lMmM^
_Proprietary
"protocol (interface C)
l>^M^|e^i>g^imk
Any generic
database API
(interface D)
Same or separate
processes
:|PMS
FIGURE 2
645
A second option for generic access to a DBMS is the SQL Call-Level Interface (CLI). The SQL CLI was originally defined by the SQL Access Group
(SAG) to provide a unified standard for remote data access. The CLI requires
the use of intelligent database drivers that accept a call and translate it into the
native database server's access language. The CLI is used by front-end tools to
gain access to the RDBMS; the latter should incorporate the appropriate driver.
The CLI requires a driver for every database to which it connects. Each driver
must be written for a specific server using the server's access methods and network transport stack. The SAG API is based on Dynamic SQL (statements still
need to be prepared^ and then executed). During the fall of 1994, the SAG CLI
became an X/Open specification (currently, it is also referred to as X/Open CLI
[42]) and later on an ISO international standard (ISO 9075-3 [21]). Practically,
the X/Open CLI is a SQL wrapper (a procedural interface to the language): a
library of DBMS functionsbased on SQLthat can be invoked by an appHcation. As stated in [30], "a CLI interface is similar to Dynamic SQL, in that SQL
statements are passed to the RDBMS for processing at runtime, but it differs
from Embedded SQL as a whole in that there are no embedded SQL statements
and no precompiler is needed."
Microsoft's Open DataBase Connectivity (ODBC) API is based on the
X/Open CLI. The 1.0 version of the specification and the relevant Software
Development Kit (SDK), launched by Microsoft in 1992, have been extensively criticized for poor performance and limited documentation. Initially,
ODBC was confined to the MS Windows platform, but later was ported to
other platforms like Sun's Solaris. The ODBC 2.0, which was announced in
1994, has been considerably improved over its predecessor. A 32-bit support contributed to the efficiency of the new generation of ODBC drivers.
In 1996, Microsoft announced ODBC 3.0. Nowadays, most database vendors (e.g., Oracle, Informix, Sybase) support the ODBC API in addition to
their native SQL APIs. However, a number of problems and hypotheses undermine the future of ODBC technology. ODBC introduces substantial overhead (especially in SQL updates and inserts) due to the extra layers of its
architecture (usually, ODBC sits on top of some other vendor-specific middleware like Oracle's SQL*Net). It is a specification entirely controlled by
Microsoft. The introduction of the OLE/DB framework gave grounds to the
suspicion that Microsoft does not seem committed to progress ODBC any
further.
Since the advent of the Java programming language, a new SQL CLI has
emerged. It is named JDBC (Java DataBase Connectivity) and was the result
of a joint development effort by Javasoft, Sybase, Informix, IBM, and other
vendors. JDBC is a portable, object-oriented CLI, written entirely in Java but
very similar to ODBC [36]. It allows the development of DBMS-independent
Java code that is, at the same time, independent of the executing platform.
JDBC's architecture, similar to ODBC's, introduces a driver manager (JDBC
driver manager) for controlling individual DBMS drivers. Applications share
a common interface with the driver manager. A classification of JDBC drivers
^It is requested by the RDBMS to parse, validate, and optimize the involved statement and,
subsequently, generate an execution plan for it.
646
suggests that they are either direct or ODBC-bridged. Specifically, there are four
types of JDBC drivers [23]:
Type 1 refers to the ODBC-bridged architecture and involves the introduction of a translation interface betv^een the JDBC and the ODBC driver.
ODBC binary code and the required database client code must be present in
the communicating party. Thus, it is not appropriate for applets running in a
browser environment.
Type 2 drivers are based on the native protocols of individual DBMS
(i.e., vendor-specific) and were developed using both Java and native code (Java
methods invoke C or C + + functions provided by the database vendor).
Type 3 drivers are exclusively Java based. They use a vendor-neutral
protocol to transmit (over TCP/IP sockets) SQL statements to the DBMS, thus
necessitating the presence of a conversion interface (middleware) on the side of
the DBMS.
Type 4 drivers are also exclusively based on Java (pure Java driver) but,
in contrast to Type 3, use a DBMS-specific protocol (native) to deliver SQL
statements to the DBMS.
As discussed in [31], Type 1 drivers are the slowest, owing to the bridging
technique. Type 2 drivers are on the extreme other end. Type 3 drivers load
quickly (due to their limited size) but do not execute requests are fast as Type 2.
Similarly, Type 4 execute quite efficiently but are not comparable to Type 2.
Type 1 is generally two times slower than Type 4. In the same article, it is
argued that the highest consumption of CPU power in JDBC drivers comes from
conversions between different data types and the needed translation between
interfaces. JDBC is included, as a standard set of core functions, in the Java
Development Kit (JDK) ver. 1.1.
B. Protocols and Interprocess Communication Mechanisms
Another very important issue in the architecture shown in Fig. 2 is the C interface (i.e., the interface between gateway instances and the database demon).
As discussed in [15], the definition of interface C involves the adoption of a
protocol between the two cooperating entities (i.e., the gateway instance and
the database demon) as well as the selection of the proper IPC (Interprocess
Communication) mechanism for its implementation.
In the same paper we proposed a simplistic, request/response, client/server
protocol for the realization of the interface. The gateway instance (ISAPI/NSAPI
thread, CGI process, Servlet, etc.) transmits to the database demon a ClientRequest message, and the database demon responds with a ServerResponse. The
ClientRequest message indicates the database to be accessed, the SQL statement
to be executed, an identifier of the transmitting entity/instance, and the layout
of the anticipated results (results are returned merged with HTML tags). The
Backus-Naur Form (BNF) of ClientRequest is:
ClientRequest = DatabaseName SQLStatement [Clientldentifier] ResultsLayout
DatabaseName = '^ OCTET
SQLStatement = * OCTET
647
648
The efficiency of CORBA ORB for building interfaces between Java applications is discussed in [36]. It is reported that CORBA performs similarly (and,
in some cases, better) to Socket-based implementations while only buffering
entails a substantial improvement in Socket communications. Another comparison between several implementations of CORBA ORB (e.g., Orbix, ORBeline)
and other types of middleware like Sockets can be found in [13]. In particular,
low-level implementations such as Socket-based C modules and C+-|- wrappers for Sockets significantly outperformed their CORBA or RPC higher-level
competitors (tests were performed over high-speed ATM networkstraffic was
generated by the TTCP protocol benchmark tool). Differences in performance
ranged from 20 to 70% depending on the data types transferred through the
middleware (transmission of structures with binary fields has proved considerably "heavier" than scalar data types). A very important competitor of CORBA
is DCOM developed by Microsoft. DCOM ships with Microsoft operating systems, thus increasing the impact of this emerging specification.
C. Experiments and Measurements
In this section, we present two series of experiments that allow the quantification of the time overheads imposed by conventional gateway architectures and
the benefits that can be obtained by evolved schemes (such as that shown in
Fig. 2).
Firstly, we examined the behavior of a Web server setup encompassing a
Netscape FastTrack server and Informix Online Dynamic Server (ver. 7.2), both
running on a SUN Ultra 30 workstation (processor, SUN Ultra 250 MHz; OS,
Solaris 2.6) with 256 MB of RAM. In this testing platform we have evaluated
the demon-based architecture of Fig. 2 and the ClientRequest/ServerResponse
protocol of the previous Section, using, on the B interface, the CGI and NSAPI
specifications (all tested scenarios are shown in Fig. 3) [17].
The IPC mechanism that we have adopted was Message Queues, member
of the System V IPC family. Quite similar experiments were also performed with
BSD Sockets [18] but are not reported in this section. Both types of gateway
instances (i.e., CGI scripts and NSAPI SAFs) as well as the server demon were
programmed in the C language. The server demon, for the D interface, used the
Dynamic SQL option of Embedded SQL (Informix E/SQL). Only one database
demon existed in the setup. Its internal operation is shown in Fig. 4, by means
of a flowchart. It is obvious; from Fig. 4, that the database demon operates in a
Ol
CGI
FastTrack
Webserver
i NSAPI i
: SAF
y
FIGURE 3
CGI
0
(0 C
cd o
I-'
vU
InftMirrix
Scenario 1:
ICGI - RDBMS
Scenario 2:
|CGI - demon - RDBMS
Scenario 3:
NSAPI-demon-RDBMS
649
Exaoul^SOt
Releasee^
I
Pees results
to gateway
FIGURE 4
generic way, accessing any of the databases handled by the DBMS and executing
any kind of SQL statement. If the, until recently, used DBMS connection (e.g.,
to a database or specific user account) is the same as the connection needed
by the current gateway request, then that connection is being used. Lastly, we
should note that incoming requests are dispatched by the demon process in an
iterative way.
As shown in Fig. 3, we compared the conventional (monolithic), CGIbased Informix gateway (C and Embedded SQLStatic SQL option) against
the CGI <^ Database Demon o Informix scheme (Scenario 2) and the NSAPI
SAF o Database Demon <^ Informix combined architecture (Scenario 3). In all
three cases, the designated database access involved the execution of a complicated query over a sufficiently populated Informix table (around 50,000 rows).
The table contained the access log accumulated in a Web server over a period of
six months. The layout of the table followed the Common Log Format found
in all Web servers, and the row size was 196 bytes. The executed query was the
following: Select the IP address and total size of transmitted bytes (grouping by
the IP address) from the access log table where the HTTP status code equals 200
(i.e.. Document follows). The size of the HTML page produced was 5.605 KB
in all scenarios (a realistic page size, considering the published WWW statistics
[3,5]). The tuples extracted by the database were embedded in an HTML table
(ResultsLayout = "TABLE").
The above described experiments were realized by means of an HTTP
pinger, a simple form of the benchmark software to be discussed in Section V.
The pinger program was configured to emulate the traffic caused by up to
16 HTTP clients. In each workload level (i.e., number of simultaneous HTTP
clients), 100 repetitions of the same request were directed, by the same thread of
the benchmark software, to the WWW server. The recorded statistics included:
Connect time (ms): the time required for establishing a connection to
the server.
650
HADJIEFTHYMIADES A N D MARTAKOS
Response time (ms): the time required to complete the data transfer
once the connection has been estabUshed.
Connect rate (connections/s): the average sustained throughput of the
server.
Total duration (s): total duration of the experiment.
The pinger program executed on an MS Windows NT Server (Version 4) hosted
by a Pentium II 300-MHz machine with 256 MB of RAM and a PCI Ethernet
adapter. Both machines (i.e., the pinger workstation and the web/database
server) were interconnected by a 10-MBps LAN and were isolated from any
other computer to avoid additional traffic, which could endanger the reliability
of the experiments.
From the gathered statistics, we plot, in Fig. 5, the average response time
per request. The scatter plot of Fig. 5 is also enriched with polynomial fits.
Figure 5 shows that the CGI ^o- demon architecture (Scenario 2) performs systematically better than the monolithic CGI gateway (Scenario 1) and worst than
the NSAPI ^f> demon configuration (Scenario 3), irrespective of the number of
24000 - r
20000
-\
A'
16000 - 4
CD
CO
c
o
Q.
CO
CD
oc
CD
O)
CO
12000
-\
k_
>
<
8000 -\
FIGURE 5
Scenario 1: CGi-RDBMS
4000 - 4 -,
T
8
Scenarios: NSAPI-DBdemon
12
No Of HTTP Clients
651
HTTP clients (i.e., threads of the pinger program). The performance gap of
the three solutions increases proportionally to the number of clients. Figure 5
also suggests that for a relatively small/medium workload (i.e., 16 simultaneous users), the serialization of ClientRequests in Scenarios 2 and 3 (i.e., each
ClientRequest incurs a queuing delay due to the iterative nature of the single
database demon) does not undermine the performance of the technical option.
A number of additional tests were performed in order to cover even more
specifications and gateway architectures. Such tests were performed in the
Oracle RDBMS (Ver. 7.3.4) running on a Windows NT Server operating system
(Ver. 4). The Web server setup included Microsoft's Internet Information Server
(IIS) as well as Netscape's FastTrack Server (not operating simultaneously). Both
the Web servers and RDBMS were hosted by a Pentium 133 HP machine with
64 MB of RAM. In this setup we employed, on the B interface (Fig. 2), the CGI,
NSAPI, ISAPI, and Servlet specifications, already discussed in previous paragraphs. On the D interface we made use of Embedded SQL (both Static and
Dynamic options), ODBC, and JDBC. Apart from conventional, monolithic
solutions, we have also evaluated the enhanced, demon-based architecture of
Fig. 2. All the access scenarios subject to evaluation are shown in Fig. 6.
The IPC mechanism that we adopted for the demon-based architecture
(Scenario 7, Fig. 6) was Named Pipes. All modules were programmed in the
C language and compiled using Microsoft's Visual C. Similarly to the previous
set of experiments, only one database demon (Dynamic SQL) existed in this
setup. The flowchart of its internal operation is identical to that provided in
Fig. 4. The database schema and the executed query were also identical to the
previous series of experiments.
In this second family of experiments we employed the same HTTP pinger
application with the previously discussed trials. It was executed on the same
workstation hosting the two Web servers as well as the RDBMS. The pinger
was configured to emulate the traffic caused by a single HTTP user. Each experiment consisted of 10 repetitions of the same request transmitted toward
the Web server over the TCP loop-back interface. As in the previous case, connect time, response time, connect rate, and total duration were the statistics
recorded. Apart from those statistics, the breakdown of the execution time of
each gateway instance was also logged. This was accomplished by enriching the
Scenario: 1
CGI
Embedded 1
SQL (8) 1
Servlet
API
Scenario: 2
CGI
ODBC 3.0
NSAPI
Embedded
Scenario: 5
SQL(S)
Scenario: 3
ISAPI
Embedded 1
SQL(S) 1
WAI
Embedded
Scenario: 6
SQL(S)
JDBC
^'
CGI
FIGURE 6
Database
demon
Embedded Scenario: 7
SQL (D)
Scenario: 4
652
FIGURE 7
1
2
4
7
(CGI/Embedded SQL),
(CGI/ODBC),
(Servlet/JDBC),^ and
(CGI <^ Database Demon/Dynamic SQL).
We restricted the time breakdown logging in those scenarios since they involve
different database access technologies.
Figure 7 plots the average response time for each scenario, and shows that
the CGI <^ Demon architecture accomplishes quite similar times to those of the
NSAPI and ISAPI gateways. In the multithreaded gateway specifications (i.e.,
NSAPI, ISAPI, and WAI), connections toward the RDBMS are established by the
executing thread (hence, the associated cost is taken into account). Connections
by multithreaded applications are only possible if mechanisms like Oracle's
Runtime Contexts [35] or Informix's dormant connections [20] are used. Other
multithreaded configurations are also feasible: a database connection (and the
associated runtime context) could be preestabUshed (e.g., by the DLL or the WAI
application) and shared among the various threads, but such configurations
necessitate the use of synchronization objects like mutexes. In such scenarios a
better performance is achieved at the expense of the generality of the gateway
instance (i.e., only the initially opened connection may be reused).
In Scenario 2, ODBC Connection Pooling was performed by the ODBC
Driver Manager (ODBC 3.0), thus reducing the time overhead associated with
connection establishments after the initial request. In Scenario 4, the JDBC
^Gefion's WAICoolRunner was used as a Servlet engine in this scenario.
653
2500,00
Scenario
FIGURE 8 Time breakdown for Scenarios 1, 2, 4, and 7.
drivers shipped with Oracle 7.3.4 were used. Specifically, we employed Type 2
drivers (see Section III. A).
In Fig. 8 we show where the factors Tcon and Tret range, and how in the
ODBC and JDBC scenarios (i.e., Scenarios 2 and 4) a very important percentage
of the execution time of the gateway is consumed for the establishment of
connections toward the DBMS. The Embedded SQL scenario (i.e.. Scenario 1)
achieves the lowest Tcon. The highest Tret is incurred in Scenario 7, where query
execution is fully dynamic (see Section III. A for the Dynamic SQL option of
Embedded SQL). Tcon is not plotted in Fig. 8 for Scenario 7, as this cost is only
incurred once by the database demon.
654
r^
C/3
1 ^^^
1 ^^^
P^
Qj
00
DB demon
Regulator
FIGURE 9
-<*
DB demon
NSAPI/
ISAPI
<> DB cfemoB
Interface C
Interface D
655
such tools are extremely efficient programming tools that drastically reduced
the time required to develop database gateways in the old-fashioned way using
techniques such as Embedded SQL.
656
Server Process
F I G U R E 10
FIGURE I I
657
resource A
Start of request
available
IClientpr^^essing
y transfer of resource A
-f
ON time
FIGURE
12
f
Weibull
distributed time
resources
start of new
available
request
^ferf,beddedfe,encB V
^
client toktime T
Client think time
time
*-
Pareto distributed
time
Inactive
OFF time
658
WebStone allows the measurement of a series of variables, namely the average and maximum connect time, the average and maximum response time,
the data throughput rate, and lastly, the number of pages and files retrieved.
The benchmark software is executed simultaneously in one or more clients positioned in the same network as the measured server. Each client is capable of
spawning a number of processes, named "WebChildren," depending on how
the system load has been configured. Each WebChild requests information from
the server based on a given configuration file. WebStone treats the server as a
black box (black box testing). The execution of WebChildren is coordinated by
a process called WebMaster. The WebMaster distributes the WebChildren software and test configuration files to the clients. Then it starts a benchmark run
and waits for the WebChildren to report back the performance they measured.
Such information is combined in a single report by the WebMaster. WebStone
has been extended to allow the performance evaluation of CGI programs and
server API modules (in WebStone 2.5, both ISAPI and NSAPI are supported).
The performance measured is largely dependent on the configuration files used
by WebChildren. Although such files can be modified, using the standard
WebStone file enables the comparison between different benchmark reports.
This standard file, practically, focuses on the performance of the Web server,
operating system, network, and CPU speed.
WebStone configuration files determine basic parameters for the testing
procedure (e.g., duration, number of clients, number of WebChildren) as well
as the workload mix that should be used. Each workload mix corresponds
to a specific set of resources that should be requested, in a random sequence,
from the measured server. Mixes may contain different sizes of graphic files or
other multimedia resources (e.g., audio, movies). As discussed in the previous
paragraph, WebStone mixes were recently enriched to invoke CGI scripts for
ISAPI/NSAPI compliant modules.
2. SPECWeb96
659
Yl. EPILOGUE
Database connectivity is surely one of the most important issues in the constantly progressing area of WWW software. More and more companies and
organizations are using the WWW platform for exposing their legacy data
to the Internet. On the other hand, the amazing growth of WWW content
forces the adoption of technologies like RDBMS for the systematic storage and
retrieval of such information. Efficiency in the mechanisms intended for bridging the WWW and RDBMS is a very crucial topic. Its importance stems from
the stateless character of the WWW computing paradigm, which necessitates
a high frequency of short-lived connections toward the database management
systems. In this chapter, we have addressed a series of issues associated with
the considered area of WWW technology. We have evaluated different schemes
for database gateways (involving different gateway specifications and different
types of database middleware). Although aspects like generality, comphance
to standards, state management, and portability are extremely important their
pursuit may compromise the performance of the database gateway. The accumulated experience from the use and development of database gateways over
660
the past 5-6 years suggests the use of architectures Hke the database demon
scheme, which try to meet all the above-mentioned requirements to a certain
extent but also exhibit a performance close to server APIs.
ACKNOWLEDGMENTS
We are indebted to Mr. J. Varouxis for extremely useful discussions and technical assistance in the
preparation of the experiments presented in this chapter. We also express our thanks to Mrs. F.
Hadjiefthymiades for reading this draft and commenting on its technical content.
REFERENCES
1. COLD FUSION User's Guide Ver. 1.5, Allaire Corp. 1996.
2. Banga, G., and Druschel, P. Measuring the capacity of a Web server. In Proceedings of the
USENIX Symposium on Internet Technologies and Systems, Monterey, Dec. 1997.
3. Barford, P., and Crovella, M. Generating representative Web w^orkloads for network and server
performance evaluation. In proceedings of ACM SIGMETRICS, International Conference on
Measurement and Modeling of Computer Systems, July 1998.
4. Bernstein, P. et al. The asilomar report on database research. ACM SIGMOD Record 27(4):
1998.
5. Bray, T. Measuring the Web. Comput. Networks ISDN Systems 28(7-11): 1996.
6. Brown, M. Fast CGI specification. Open Market Inc., available at http://www.fastcgi.idle.com/
devkit/doc/fcgi-spec.html, Apr. 1996.
7. Brown, M. FastCGI: A high performance Gateway interface. Position paper for the workshop
Programming the WebA search for APIs, 5th International WWW Conference, Paris, France,
1996.
8. Brown, M. Understanding FastCGI Application Performance. Open Market Inc., available at
http://www.fastcgi.idle.com/devkit/doc/fcgi-perf.htm, June 1996.
9. Chang, P. I. Inside the Java Web Server: An overview of Java Web server 1.0, Java Servlets,
and the JavaServer architecture. Available at http://java.sun.com/features/1997/aug/jwsl.html,
1997.
10. Coar, K., and Robinson, D. The WWW Common Gateway Interface^Version 1.2. Internet
Draft, Feb. 1998.
11. Crovella, M., Taqqu, M., and Bestavros, A. Heavy-tailed probability distributions in the World
Wide Web. In A Practical Guide to Heavy TailsStatistical Techniques and Applications
(Adler R., Feldman R., and Taqqu M., Eds.). BirkHauser, Basel, 1998.
12. Everitt, P. The ILU requested: Object services in HTTP servers. W3C Informational Draft, Mar.
1996.
13. Gokhale, A., and Schmidt, D. C. Measuring the performance of communication middleware
on high speed networks. In Proc. ACM SIGCOMM Conference, 1996.
14. Hadjiefthymiades, S., and Martakos, D. A generic framework for the deployment of structured
databases on the World Wide Web. Comput. Networks ISDN Systems 28(7-11): 1996.
15. Hadjiefthymiades, S., and Martakos, D. Improving the performance of CGI compliant database
gateways. Comput. Networks ISDN Systems 29(8-13): 1997.
16. Hadjiefthymiades, S., Martakos, D., and Petrou, C. State management in WWW database
applications. In Proceedings of IEEE Compsac '98, Vienna, Aug. 1998.
17. Hadjiefthymiades, S., Martakos, D., and Varouxis, I. Bridging the gap between CGI and
server APIs in WWW database gateways. Technical Report TR99-0003, University of Athens,
1999.
18. Hadjiefthymiades, S., Papayiannis, S., Martakos, D., and Metaxaki-Kossionides, Ch. Linking
the WWW and relational databases through Server APIs: A distributed approach. Technical
Report TR99-0002, University of Athens, 1999.
66 I
19. Performance benchmark tests of Unix Web servers using APIs and CGIs. Haynes &
Company, Shiloh Consulting, available at http://home.netscape.com/comprod/server_central/
performance_whitepaper.html, Nov. 1995.
20. Informix-ESQL/C Programmer's Manual. Informix Software Inc., 1996.
21. IS 9075-3, International standard for database language SQLPart 3: Call level interface.
ISOAEC 9075-3,1995.
22. Iyengar, A. Dynamic argument embedding: Preserving state on the World Wide Web. IEEE
Internet Comput. Mar-Apr. 1997.
23. JDBC Guide: Getting Started. Sun Microsystems Inc., 1997.
24. Khare, R., and Jacobs, I. W3C recommendations reduce "world wide wait". World Wide Web
Consortium, available at http://www.w3.org/Protocols/NL-PerfNote.html, 1997.
25. Kristol, D., and MontuH, L. HTTP State Management Mechanism. RFC 2109, Network
Working Group, 1997.
26. Laurel, J. dbWeb white paper. Aspect Software Engineering Inc., Aug. 1995.
27. Mazzocchi, S., and Fumagalli, P. Advanced Apache Jserv techniques. In Proceedings of
ApacheCon '98, San Francisco, CA, Oct. 1998.
28. McGrath, R. Performance of several HTTP demons on an HP 735 workstation. Available
at http://www.archive.ncsa.uiuc.edu/InformationServers/Performance/Vl .4/report.html, Apr.
1995.
29. Internet Server API (ISAPI) Extensions. MSDN Library, MS-Visual Studio '97, Microsoft
Corporation, 1997.
30. ODBC 3.0 Programmer's Reference. Microsoft Corporation, 1997.
31. Nance, B. Examining the network performance of JDBC. Network Comput. Online May 1997.
32. Writing Web Applications with WAINetscape Enterprise Server/FastTrack Server. Netscape
Communications Co., 1997.
33. User's Guide, WebDBC Version 1.0 for Windows NT. Nomad Development Co., 1995.
34. CORBA: Architecture and Specification. Object Management Group, 1997.
35. Programmer's Guide to the Oracle Pro'^C/C+-\- Precompiler. Oracle Co., Feb. 1996.
36. Orfali, R., and Harkey, D. Client/Server Programming with JAVA and CORBA. Wiley,
New York, 1998.
37. Ju, P., and Pencom Web Works, Databases on the WebDesigning and Programming for
Network Access. M & T Books, New York, 1997.
38. An explanation of the SPECweb96 benchmark. Standard Performance Evaluation Corporation,
available at http://www.spec.org/osg/web96/webpaper.html, 1996.
39. Stevens, W R. UNIX Network Programming. Prentice-Hall, Englewood Cliffs, NJ, 1990.
40. Tracy, M. Professional Visual C-\-\- ISAPI Programming. Wrox Press, Chicago, 1996.
41. Trent, G., and Sake, M. WebSTONE: The first generation in HTTP server benchmarking.
Silicon Graphics Inc., Feb. 1995.
42. Data management: SQL call-level interface (CLI). X/Open CAE Specification, 1994.
43. Yeager, N., and McGrath, R. Web Server TechnologyThe Advanced Guide for World Wide
Web Information Provides. Morgan Kaufmann, San Mateo, CA, 1996.
44. Karish, C , and Blakeley, M. Performance benchmark test of the Netscape fast-track Beta 3 Web
Server. Mindcraft Inc., available at http://www.mindcraft.com/services/web/ns01-fasttracknt.html, 1996.
45. Performance benchmark tests of Microsoft and NetScape Web Servers. Haynes &c Company,
Shiloh Consulting, Feb. 1996.
INFORMATION EXPLORATION
ON THE WORLD WIDE WEB
XINDONG W U
Department of Computer Science, University of Vermont, Burlington,
Vermont 0S405
SAMEER PRADHAN
JIAN CHEN
Department of Mathematical and Computer Sciences, Colorado School of Mines,
Golden, Colorado 80401
TROY MILNER
JASON LOWDER
School of Computer Science and Software Engineering, Monash University,
Melbourne, Victoria 3145, Australia
I. INTRODUCTION 664
II. GETTING STARTED WITH NETSCAPE COMMUNICATOR
A N D INTERNET EXPLORER 664
A. Starting with Your Web Browser 665
B. Searching the Web Using Explorer and Netscape 667
C. Speeding up Web Browsing 669
III. H O W SEARCH ENGINES WORK 670
A. Archie Indexing 671
B. Relevancy Ranking 672
C. Repositories 674
D. How a Harvest Search Engine Works 674
E. Anatomy of an Information Retrieval System 677
IV. TYPICAL SEARCH ENGINES 679
A. Search Engine Types 679
B. Analysis of Some Leading Search Engines 682
C. Search Engine Sizes 685
D. Summing It Up 686
V. ADVANCED INFORMATION EXPLORATION
W I T H DATA MINING 686
A. SiteHelper: A Localized Web Helper 687
B. Other Advanced Web Exploration Agents 688
VI. CONCLUSIONS 689
REFERENCES 690
6 6 3
664
WUETAL
I. INTRODUCTION
Over the past ten years, the World Wide Web (or the Web, for short) has grown
exponentially [9,39,51,64], According to the Internet Domain Survey by Zakon
[70], the Internet has grown from only 617,000 hosts in October 1991 to over
43 million hosts in January 1999 and in excess of 4 million Web servers in
April 1999. The amount of information on the Web is immense. Commercial
sites like Lycos [37], Alta Vista [3], and Web Crawler [61] and many others
are search engines that help Web users find information on the Web. These
commercial sites use indexing software agents to index as much of the Web as
possible. For example, Lycos [37] claims that it has indexed more than 90% of
the Web [34].
Currently, the Web is indexed mainly on its visible information: headings,
subheadings, titles, images, metadata, and text. Information retrieval from indexes is usually via one of the search engines, where submission of keywords
as a query returns a list of Web resources related to the query.
The chapter is organized as follows. Section II provides an introduction
to the most popular Web browsers, Netscape Communicator and Internet
Explorer, from which you can start your search engines. Section III describes
how search engines work on the Web. Section IV analyzes some leading search
engines. In Section V, we outline the limitations of existing search engines, and
review some research efforts on advanced Web information exploration using
data-mining facilities.
FIGURE I
665
Netscape icon.
SO we will primarily cover their similarities. For the most up-to-date information
about the browsers and a complete tutorial, check out the online handbook
under the Help menu (on either Explorer or Netscape) or go to the Websites of
the respective software companies.
A. Starting with Your Web Browser
When you first launch your Web browser, usually by double-clicking on the
icon (Figs. 1 and 2) on your desktop, a predefined Web page will appear. With
Netscape for instance, you will be taken to Netscape's NetCenter.
You can change the home page that loads when you launch your browser.
For example, with Netscape Version 4.6, go to the Edit menu, then select Preferences. In the Home page section, type in the new Web address in the text
box. Anytime you want to return to your home page from any other Website,
just click the Home button on Netscape's toolbar (Fig. 3).
If you are using Explorer, and you want to set the home page to your
favorite Website, just go to that site, then click the View menu, then select
Options. Click the "Navigation" tab, then "Use Current" button. Finally, click
"OK." And it is done!
Both Netscape and Explorer have a small picture in the upper right hand
corner of the browser. When this image is animated, it indicates that your
browser software, known as a client, is accessing data from a remote computer,
called a server. The server can be located anywhere. Your browser downloads
remote files to your computer, then displays them on your screen. The speed of
this process depends on a number of factors: your modem speed, your Internet
service provider's modem speed, the size of the files you are downloading, how
busy the server is, and the traffic on the Internet.
At the bottom of the Web browser you will find a window known as a
status bar. You can watch the progress of Web page transactions, such as the
address of the site you are contacting, whether the host computer has been
contacted, and the size of the files to be downloaded.
Once the files are downloaded and displayed in your browser, you can click
on any of the links to jump to other Web pages. However, it is easy to get lost
on this electronic web. That is where your browser can really help you.
FIGURE 2
Explorer icon.
666
WU ETAL.
^^ % ^
^
Back
Fcffi^ayl.-B<aiM
FIGURE 3
a^ -^ M- st, m
Search
Sulete
fiant
^iw^
Stop
Netscape toolbar.
The row of buttons at the top of your web browser, known as the toolbar
(Figs. 3 and 4), helps you travel through the Web of possibilities, even keeping
track of where you have been. Since the toolbars for Netscape and Explorer
differ slightly, we will first describe what the buttons in common do:
"Back" returns you to the previous page you have visited.
"Forward" returns you to that page again.
"Home" takes you to whichever home page you have chosen. (If you
haven't selected one, it will return you to the default home page, usually the
Microsoft or Netscape Website.)
"Reload" or "Refresh" does just that, loads the current web page again.
Why would you want to do this? Sometimes all of the elements of a Web page
do not get loaded at the first time, because the file transfer is interrupted. Also,
when you download a Web page, the data are cached, meaning they are stored
temporarily on your computer. The next time you want that page, instead of
requesting the file from the Web server, your Web browser just accesses it from
the cache. However, if a Web page is updated frequently, as may be the case
with news, sports scores, or financial data, you would not get the most current
information. By reloading the page, these timely data are updated.
"Print" lets you make a hard copy of the current document loaded in
your browser.
Finally, "Stop" stops the browser from loading the current page.
I. Buttons Unique to Netscape
You can turn off the graphic images (see Section II.C) that load when
you access a Web page. Because graphic files are large, the page will load much
faster if it is has only text. If you then decide you to view the graphics, click the
"Images" button.
"Open" lets you load a Web page you may have stored on your computer's hard drive. (With Explorer, you can access this feature under the File
menu.)
"Find" lets you search for specific words in a document.
&
4- .
B^ck
AidciFe$$ 1
FIGURE 4
)m^] Eo
*''.
fmmd
Fivorite
Stop
H^
Befiresh
Explorer toolbar.
[?|
Sewch
Fa0fit8it
-1
^|vj
667
When you use a search engine to find information on the Web, it helps
to find ways to narrow your search so it is faster and more efficient. Most of
the search engines use the same set of operators and commands in their search
vocabulary.
The following are the most commonly used operators and a brief description of each. These would be used when typing a keyword or phrase into a
search engine.
Quotes (" "): Putting quotes around a set of words will only find results
that match the words in that exact sequence.
Wildcard Use (*): Attaching an * to the right-hand side of a word will
return partial matches to that word.
Using Plus (-f): Attaching a + in front of a word requires that the word
be found in every one of the search results.
Using Minus ( ): Attaching a in front of a word requires that the
word not be found in any of the search results.
2. Netscape's W e b Search Features
Netscape's search engine is called Net Search (Fig. 5). Net Search in 1999
let you choose between six different search tools: Netscape Search, Alta Vista
[3], Excite [18], Infoseek [23], LookSmart [35], and Lycos [37]. All these tools
contain slightly different information, but each works in much the same way.
FIGURE 5
Netscape Search.
668
WUETAL
^m
Click the Search button on Netscape's toolbar. Netscape will load the
Net Search page.
Type the keywords into the text box.
Press Enter (Return) or click the Search button. In a moment, Netscape
will load a page that contains links to pages related to the keywords
you have typed. When a link looks interesting, just click the link.
If you do not find what you are looking for, you can try your search again
with a different search tool. Just go back to the Search page and click the name
of a different search engine. The keywords you are searching for will appear in
the new search tool automatically.
Browsing Directories
Search engines are a good help for finding the information you are looking
for, but you may not want to conduct a search every time you want to find a
Web site that you just know is out there.
New versions of Netscape offer powerful Smart Browsing features that help
you quickly and easily find the information that you need on the Web. Instead
of remembering long URLs or typing an entire address to go to a site. Smart
Browsing allows you to use a plain language to find exactly the information
you want.
Two important features of Smart Browsing are Internet Keywords and
What's Related.
Internet Keywords. Internet Keywords let you type words or questions
into the Location field. When you press Enter (Return), Netscape sends the
query to the Internet Keywords database, which then decides what to do with
your words.
669
670
WU TAL.
2. W i t h Explorer
From the Explorer menu bar, select View, then Options. The Options
dialog box will appear.
Select General on the Category list. That panel will move to the front.
In the multimedia section, uncheck Show Pictures.
Click OK to close the Options dialog box and return to the Explorer
window.
The next time you access a web page, you will see little placeholders instead
of the images that would normally load with the page, and your toolbar will
have a new button called Images for you to click when you want to see the
images on a page.
671
Harvesting
FIGURE?
>
-><-
Combination
Querying
ALIWEB[29].
Technically, meta-indexes are not a search engine, but rather a query interface
to multiple search engines. Meta-indexes mostly return a limited number of
results from each search engine or subject directory they query, and hence,
can be noncomprehensive [62]. Examples of meta-indexes are Dogpile [16],
SavvySearch [56], and the customizable multiengine search tool [14].
There are two broad techniques that search engines employ to create and
maintain their indexes, namely Archie indexing and Harvest indexing. We
address them in Sections A and D, respectively.
A. Archie Indexing
Archie indexing is a term used to describe manual indexing of Web resources,
where the index file is located on each server where the resource is located.
One site, generally the site that the search engine is located, retrieves the index
file from each Web server, allowing the combined index to be searched [29].
ALIWEB (Archie-like indexing on the Web) is an implementation of Archie
indexing. Figure 7 describes the ALIWEB technique of combining its distributed
indexes [29].
Search engines use specialized programs called gatherers (also known as
"robots," "spiders," "collectors," or "worms") to traverse the Web and collect
data. Archie indexing tries to remove some Web server load presented by these
gatherers (see Section III. D. 1). Archie indexing does not require gatherers for
resource discovery, minimizing the potentially serious dangers that gatherers
have of overloading a server,^ especially when several gatherers operate simultaneously. Such an approach also lowers the load on the computer hosting the
gatherer by allowing inclusion of index data, with a minimum of overhead, into
the existing index. Traditional gatherers are required to parse the document,
extract the indexable information, and add this information to the index. This
however, creates a substantially higher workload for the host.
^ A local test by [29] shows the number of files a gatherer requested from a lightly loaded
server was an average of 3 per second.
672
WUETAL
673
text retrieval techniques such as those discussed by Salton [55], we can see
some of the approaches in common use.
Keyword frequency is one of the more popular methods of index ranking.
An inverted list is built of each file, and from there repeating words can be extracted to form an index. Words occurring in the title of the document and any
headings can have a higher weight placed on them. Meta-data retrieval is one of
the simple techniques for query indexing and ranking, where keywords in the
meta-data are compared to the keywords in the body of the text to determine
document relevancy. Meta-data can also be used to increase the word-weighting
factor. Both of these options are easily abused by spamming. Anti-spam measures for such relevancy judgments include looking for duplicate words that
repeat frequently within a short span of text, as well as words whose color is
the same as the background color of the document. An anti-anti-spam measure
employed for a short period of time was "hit-farms," where documents were
duplicated under similar uniform resource locators (URLs) and sent to a search
engine. Each of these documents had a redirect to the "real" page. Search engine
owners took only a few weeks to include duphcate-document and near match
URL tests to their indexing algorithms.
Ranking by inward links is one strategy employed to ensure that popular
pages are always presented first. Pages with the greatest number of other pages
pointing to them can be brought back first. Such an approach has the problem
of spamming by collusion, where groups of people team together to link to
each others sites to increase their number of inward links and thus ranking.
"Voting" approaches are a different style of relevancy ranking. When users
click on a site to visit from the results page, they have their choice counted by
the search engine and the rankings adjusted for the query terms used.
The most accurate way for an author to ensure their page appears at the
correct place in a hierarchy, or displayed when a certain query is entered is
to be honest about the content of their document. Adding the keywords that
are prominent in the text to the title of the document is useful in allowing
users to identify the document quickly in a list of search results. Omitted titles
in documents will quite often give the user "No T i t l e " as the title of the
document, or just the URL. Most Web-editing tools have a Description that
you can fill in for the document. Many search engines look for this in order to
get text to return as the search result. If you have to code this in by hand, it
is worth entering. Inside the <HTML> tag, the author can place the following
HTML:
<META NAME = "Description" CONTENT = "Joe's Motor Parts and Workshop">
Documents that do not have the " D e s c r i p t i o n " META tag will often
have the first 200 words extracted as the description of the document instead.
Authors must, therefore, be careful about the text that makes up the first 200
characters of the document. Frequently the results from search engines appear
with
J o e ' s Motor P a r t s and Workshop l a s t updated 2/4/98
[Home] [Paints] [Mags] [Engines] [Exhausts] [Bookings] [Hours]
Welcome t o J o e ' s ! You w i l l find t h e best in a u t o m o t i v e . . .
674
WUETAL
C. Repositories
Repositories, sometimes called subject directories, are a hierarchical collection
of references that allow a user to locate documents restricted by topic domain.
Locating a document is performed by narrowing a selected topic from a broad
area to a specialized area. To find a document on hyper-text markup language
for example, a user would begin searching at the broad topic of WWW, then
navigate through the document hierarchy, narrowing the topics until hypertext markup language was located. An excellent example of a repository is
Yahoo [68].
Humans manually maintain repositories. URLs are submitted to the repository by document owners, and then are indexed and categorized by the maintainers of the repository. The person submitting the document for indexing
is asked to provide potential categories into which that page is to be placed.
Maintainers may also be employed to search the Web for new pages, and hence
Web pages found in this way, and through submission, lend themselves to being discarded. Repositories are one of the most basic search tools, and as an
example, they resemble the table of contents of a paper, presenting users with
little cognitive overhead in learning to interact with the index.
Repositories, being maintained and categorized by humans, are usually the
most accurate at retrieving a particular document that is sought by the user. Such
collections can be faster than search engines if trying to find an exact document
when the topic domain is known. Westera [63] concludes that repositories can
never index the entire Web due to human controls, but repositories are now
themselves resorting to implementing search engines to keep their databases
up-to-date [38].
D. How a Harvest Search Engine Works
The Harvest architecture was pioneered by Bowman et aL [11], and since has
been partially replicated by many other advanced search engines. Harvest indexing was named as such to describe its "reaping" of Internet information.
The following subsections will explain the central architecture of a Harvest
search engine.
The architecture for Harvest can be broken down into three core subsystems: gatherer, index and search, and query interface. The gatherer subsystem
(see Section 1 below) collects and extracts indexing information from one or
more documents hosted by one or more Web servers. The index and search subsystem (see Section 2 below) provides indexing and searching of the gathered
information, and the query interface subsystem (see Section 3 below) provides
the interface to the index and search subsystem.
I. Gatherer Subsystem
Information retrieval of most search engines is done by the use of a gatherer
(robot). A gatherer is an automatic program that traverses the Web collecting
information [29]. The general algorithm for gatherer traversal is as follows
[15]:
1. A gatherer starts with a certain URL for an initial Web page, ?.
2. It retrieves F, extracts any URLs from ?, and adds them to a list, L.
675
Harvest's index allows for diverse indexing and searching techniques that
are required by different search engines. The principle requirements of a client
search engine are that it supports incremental updates of its index and Boolean
combinations of attribute-based queries. Harvest offers two different index and
search subsystems: Glimpse and Nebula.
Glimpse
676
WUETAL
and approximate matching (e.g., spelling mistakes). The index also allows for
incremental updates and easy modification due to its relatively small size.
Glimpse is based on an inverted index (see Section III.E.l), with the main
part of the index consisting of a list of all the words appearing in the index. For
each word there is a pointer to a list of each occurrence of the word. The pointers
can point to the same file, another file, or even positions in files, Uke paragraphs.
Glimpse uses agrep [67] to search its main index and to search the areas found
by the pointers in the main index. This also allows for approximate matching
and other agrep services.
Bowman et al [11] claim that using agrep allows for much more selective
queries than keyword searching when the gatherer has extracted attributes on
the gathered page, like titles, headings, or links. For example, searching for all
pages with the keywords "incremental searching" with "unknown title" with
one spelling mistake is possible.
Nebula
The purpose of Nebula is to allow fast queries, but as a tradeoff, the index
size is larger than that of Glimpse. It provides support for views, which are
defined by standing existing queries against the database. A view can scope
a search to a particular subset of the database, for example, computing or
homepages, allowing domain-based search engines to be designed.
Nebula can be extended to include domain-specific query resolution functions. A server that contains indexes of Masters theses can be extended with a
query function to rank words found in "abstracts" and "introductions" higher
than if they are located in the body of the thesis, for example. The database is
set up with each term having a corresponding index that maps values to the
objects, thus speeding up queries but increasing index size [11].
3. Query Interface Subsystem
Bowman et al. [11] indicate that there are two different goals to keep in
mind when designing a query interface for use in an environment as diverse as
the Internet. They believe search engine designers need to provide:
A great degree of flexibility and the ability to customize for different
communities and for different users.
A simple, uniform interface so that users can move between domains
without being overwhelmed.
The Harvest query interface handles all queries to the underlying database,
allowing Boolean combinations of keywords and attribute-based expressions.
Query results are given in a uniform format that includes index-specific information and the location of each object.
The query interface is a simple HTML form (see Section II. B for examples),
which allows the query to be entered by the user, then spawns a process that
queries the underlying database. The results of the query are presented in the
browser as an HTML page, with hyperlinks to the location of returned pages.
The current implementation of the Harvest query interface maintains no state
across queries. The illusion of continuity and refinement of queries is created by
embedding the query state information within those URLs that form the links
677
678
WUETAL
Full-Text Scanning
With full-text scanning document structure is not rearranged at all. Retrieval consists of sequentially searching the stored documents and locating all
documents that match the query. This method requires no extra storage space
for indexes, and processing requirements are small when new documents are
added, as the documents do not need rearrangement.
The largest advantage that full-text scanning has is that as documents
change, reindexing is not required. Documents can be deleted and updated
without effecting the system as a whole [13]. The major disadvantage to fulltext scanning is that retrieval of documents may be slow, as sequentially scanning each document does take more time than using indexes to locate keywords
in documents. Faloutsos [19] and Salton [55] demonstrate that speed can be
increased by using parallel processors and faster string matching algorithms.
Document Inversion
Signature files, like inverted indexes, require at least one extra file to be
created. Most authors, cited in Campbell [13], agree that this method is only
679
The Internet is divided into several different information spaces on the types of
servers that accommodate information, which are the Gopher servers, the FTP
(File Transfer Protocol) servers, the Telnet servers, and the HTTP (Hyper-text
Transmission Protocol) servers (or Web servers). To facilitate quick searching
through these servers, numerous services and applications have come into being. The Gopher space is searched by a program called "Veronica" (Very easy
rodent-oriented net-wide index to computerized archives); the FTP space is
searched with the program called "Archie" (from the word "archives"); the
Telnet space is searched with the program called "Hytelnet", and the Web
space, which is a superset of all the preceding information subsets, is searched
with the help of "search engines" [25]. This last category of searching applications can search the whole Internet and is the most widely competed ones in
the industrial arena. A search engine is a software program that takes a query
from the user and finds information on numerous servers on the Internet using
680
WUETAL
that query [25]. It then fihers the huge amount of information retrieved and
provides the user with a list of Internet sites. The arrival of graphical browsers
and therefore graphical search engines can be traced back to 1993.
There are currently six major search engines and many others that help a
user to search through the maze of documents available on the Web. The majority of users that search the Internet use these search engines, and there have
been strong efforts in academia to make the search simpler and more viable.
In search applications, we have two distinct subcategories. They are directories or repositories (see Section III. C, for example, Yahoo [68] and Megallan
[41]) and "query-based engines" (for example, Altavista [3] and Excite [18]).
The latter search engines are further subdivided into "search engine directories" (for example. AU-In-One Search [2]) "metasearch engines" (for example,
SavvySearch [56] and Askjeeves [5]) and "subject-specific search engines" (for
example. Medical World Search [40]).
Similar to the Web, the world of search engines has become complex, rich,
volatile, and frequently frustrating [57]. Moreover the designers of these engines
must cater for the user needs that comprise the entry of a small number of
keywords to get relevant results. Not many users use the advanced facilities of
search engines and hardly bother to decipher the mechanism behind the search
engine [22]. As the search engine domain becomes more and more focused
on particular subjects, the likelihood of a user getting relevant information
increases, but the breadth of information is affected significantly. At the same
time, as the search engines become more and more comprehensive, there can
be various documents that are seemingly relevant, but belong to a completely
different domain that uses the same semantics, but in a different sense (like
Java is used to denote coffee as well as the programming language). Therefore
the size of a search engine itself does not dominate the quality or relevancy of
results. There are some search engines that try to, address this issue with the
help of "concept search" (see Section III), which tries to find more than just
the semantic meaning of the keyword, by considering other words that might
accompany that keyword in the query or from the earlier queries of the same
user (in case there is a record of the user available for analysis). Another way of
knowing what the user is looking for is to get feedback from the initial query,
for example, asking the user to identify the result that most closely matches the
one that is expected. One of the search engines that exhibits these two features
is Excite [18].
Each search engine can differ considerably in terms of the results generated,
as it has its own perspective of the Web, which may or may not overlap with
that of the other engines. It is observed sometimes that the optimum result set
of hypertext links for a query is a list of selected results from several search
engines. The plethora of search engines have made it difficult to decide which
search engine to use and when. To make the most out of any search engine
the user needs to know how it functions, what is it designed to retrieve, when
it gives the best response time, and even the fact that it exists. Empirical results have indicated that no single search engine is likely to return more than
4 5 % of relevant results [17]. Taking this clue, recently (from early 1994) the
Web has witnessed an emergence of a new family of search engines that make
use of several conventional search engines to bring relevant information to the
681
users. These are called "metasearch engines" (see Section III). In the same way
as robot-based search engines were developed in response to the rapid Web
growth, metasearch engines came into being owing to the increasing availability of conventional search engines. The search domain of these search engines
is not the Web anymore, but the result set obtained through the use of interfacing conventional search engines. These metasearch engines, by means of some
predetermined mechanism, decide which of the search engines they are going
to avail and trigger a parallel query to each of them. They then analyze the
outputs of these engines and display it in a uniform format. There are several
metasearch engines on the Web today, including SavvySearch [56], Askjeeves
[5], and MetaCrawler [42]. The query dissemination mechanism of metasearch
engines must bring about a balance between two conflicting factors, viz. the
time taken to get the results and the relevancy of the results. In some cases it
is wise to avoid certain speciaHzed engines as their response would hardly be
relevant and sometimes to avoid certain engines just because they incorporate
redundancy in the systemowing to a significant overlap between the areas of
the Web that they cover. Some of these search engines like SavvySearch [56]
allow the user to override the mechanism of selecting the search engines, by
specifying whether a particular search engine is to be used for a query, and if
so, then what weightage is to be given to the results generated by that search
engine in the final compilation [17].
The mechanism of metasearch is complicated by four issues:
1. The corpus of the Web is not directly available and is indexed by the
other search engines.
2. The search engine set comprises both the general and specific search
engines and thus it must account for the varying expertise.
3. The capabilities of search engines are not a static factor and change
continuously, depending on the update of their indexes.
4. The consumption of resources must be balanced against result quality.
These four issues are resolved with the help of the following mechanisms
1. Generation of a metaindex (see Section III) that tracks experiences in
dispatching queries to the search engines.
2. Ranking of each engine based on the information in the metaindex as
well as recent data on search engine performance.
3. Varying the degree of parallelism depending on the current machine
and network conditions.
I. Metaindex of Prior Experience
682
WUETAL
the user follows one of the links suggested by the search engine. Positive values
are used to represent Visit events and negative values for the No Visit events.
The intuition behind this is that higher positive values indicate the tendency of
the search engine to return interesting results. Thus the likeliness of a search
engine providing relevant results is considered while selecting it for a particular
query or avoiding it as being a waste of resources.
2. Search Engine Ranking for a Query
Each search engine is ranked based on the following two factors: whether
the search engine has done well over a period of time, and whether it has
returned relevant results quickly in the recent past. Higher ranking engines are
preferred while passing the query.
3. Degree of Parallelism
The concurrency value of an engine is calculated and used to determine the
amount of resources needed for a particular query. It is inversely proportional
to the estimated query cost, and is calculated using the following three factors:
expected network load, local CPU (central processing unit) load, and discrimination value. The first two terms speak for themselves, the third term means
that the search engine checks how general the query is, and in case it is very
general, then it selects a lesser number of search engines, since there would be
a significant overlap between the individual results, thus reducing the amount
of duplicate elimination when the unique list is generated.
There is also another family of metasearch engines that bring about a fusion
between the "search engine directories," which can also be referred to as the
"manual" version of metasearch engines, and "fully automatic" metasearch
engines. With search engine directories, a user selects the search engines that
they want to employ for the required search and dictates some more parameters
that decide the pressure on the system resources, in terms of time and network
congestion, that they want to allot for the search. Automatic metasearch engines
automatically decide all the above-mentioned factors. A hybrid search engines
is ProFusion [52]. In a recent controlled experiment where a number of users
were asked to rate the relevance of results obtained from 10 search engines,
among which were MetaCrawler [41], Savvy Search {S6\^ and ProFusion [52]
(both manual and automatic) along with six other engines, ProFusion [52]
returned the highest number of links judged to be relevant by the users [17].
B. Analysis of Some Leading Search Engines
Now let's look at some of the salient features that can be incorporated into
the search engines for the purpose of enhancements and checking which of the
search engines in the market address them. In our discussion, we will consider
the following search engines: Altavista [3], Excite [18], Inktomi [24], which
powers HotBot [26] and MSN [49], Infoseek [23], Lycos [37], and Northern
Light [48]. Excite [18] also covers the Excite powered searches of AOL NetFind
[4], Netscape Search [71], and WebCrawler [61]. Some data for Google [20] is
also listed. These data are as of April 5, 1999 [58].
683
1. Deep Crawling
Search engines exhibiting deep crawling list many pages from a Website
even if the site is not explicitly submitted to them. So the engines that support
this feature will have, for searching, many more pages per site than those that
do not implement it.
Alta Vista, Inktomi, and Northern Light implement this feature, whereas
Excite, Infoseek, Lycos, and WebCrawler do not. In the case of WebCrawler,
only home pages are listed.
2. Instant Indexing
A search engine that demonstrates frame support follows all the Unks on
a Web page that are in several different frames. So in case you have a lot of
information that is represented in different frames, then you should make sure
that the targeted search engines have this facility.
AltaVista and Northern Light are the only search engines that fully support
this facility, whereas Lycos provides limited support and Excite, Inktomi, and
Infoseek do not support it at all.
4. Image Maps
Image maps are images on Web pages that include hot regions (i.e., regions
on the image that point to a particular Web page). This facility shows that a
search engine can follow the client-side image maps (for more information see
h t t p : //www. o r i s . com/~ automat a/maps. htm).
AltaVista, Infoseek, and Northern Light support this facility; however,
Excite, Inktomi, and Lycos do not.
5. Password Protected Sites
This is the feature with which a search engine has access to data encapsulated at a password protected site. This means that in case the user is looking
for something that is addressed at these sites, then the search engine indicates
it, although the user needs to acquire the password in order to get the detailed
information. With such search engines, there is a username and password for
each password protected site.
AltaVista and Inktomi can access password-protected sites. However,
Excite, Infoseek, Lycos, and Northern Light do not support it.
6. Robot Exclusion Standard
When Webmasters want to keep their Websites out of reach of some search
engines, for privacy and other purposes, they must maintain a " r o b o t . t x t "
file at some servers in the root directory of their Web servers (see Section III.D.l).
The search engines that abide by this protocol will not index these Web sites.
684
WU ETAL
whereas others will go ahead and index them. In case of servers where it is
difficuh or impossible to maintain this " r o b o t . t x t " file, there is another
solution in the form of an HTML tag, called the Meta Robots tag. When this
tag is inserted and set for an HTML file, the search engine ignores the HTML
file for indexing purposes.
The following tag, if inserted in an HTML file, prevents it being indexed
by search engines that follow this standard:
<META NAME = " ROBOTS" CONTENT = " NO INDEX ">
All leading search engines support this protocol.
7. Link Popularity
All major search engines index the full visible body of Web text, although
some exclude stopwords or the words that are deemed to be spam.
10. Stop Words
Some search engines do not consider stopwords (to, at, and, etc.) while
indexing web pages.
AltaVista, Inktomi, Excite, Lycos, and Google ignore the stop words,
whereas Infoseek and Northern Light do not.
11. ALT Text
Some search engines also index the ALT text (i.e., the text that is used as a
substitute for images, in case the browser does not support their display or in
case the user chooses to view a "text only" copy of the Web page) associated
with some images in Web files.
AltaVista, Infoseek, and Lycos have this feature, whereas Excite, Inktomi,
and Northern Light do not.
12. Stemming
Some search engines search for the variations of the keywords submitted
in queries; for example, in the case of the keyword "grant," the search engines
will also find pages for "grants," "granting," etc.
Infoseek, Lycos, and Northern Light have the stemming feature turned on
by default, whereas in HotBot the user must manually turn it on. Altavista on
the other hand searches for related terms, and in Excite you can modify the
685
search by telling it to find Web pages similar to one of the previous result sets
that is, the closest to the favorable response.
13. Meta Tags Boost Ranking
Some search engines rank a page higher when one of the keywords falls in
the META tag of the HTML page.
Infoseek and Inktomi support this feature, while AltaVista, Excite, Northern Light, and Lycos do not.
14. Links Popularity Boosts Ranking
The number of links to the page under consideration is used by some search
engines while ranking the Web page.
AltaVista, Excite, Google, and Infoseek respect this type of ranking, while
Inktomi, Lycos, and Northern Light do not.
15. Spamming through Meta Index Refresh
Some site owners create target pages that automatically take visitors to
different pages within a Web site. This is done using the meta index refresh tag.
For example, the following tag, when inserted in an HTML file, will
display the page in which it is placed, and then after 15 s will display
http://mywebpage.com
<META http-equiv = " refresh" content = " 15; URL = http: //mywebpage. com" >
AltaVista and Infoseek do not index pages with any redirection whatsoever,
whereas Excite, Inktomi, Lycos, and Northern Light go ahead and index them.
16. Invisible Text Spamming
It is similar to the invisible text spamming case, and the only difference is
that the invisible text is too small and hardly readable. There are some search
engines that refuse to index the Web pages that have text below some font size.
AltaVista, Inktomi, and Lycos, detect this type of spamming, whereas
Excite, Infoseek, and Northern Light do not.
C. Search Engine Sizes
The larger a search engine's index is, the more is the probability of finding
requested information, at the same time more are the chances of getting back
irrelevant results.
686
WUETAL
150
120
HHTH
90
60
30
0
3^TO
MTF
12/95 3/96 6/96 9/96 12/96 3/97 6/97 9/97 12/97 3/98 6/98 9/98 12/98
EX
IS
'
F I G U R E 8 Search engine sizes. AV, AltaVista; INK, Inktomi; NL, Northern Light; EX, Excite; LY, Lycos;
IS, Infoseek; W C , WebCrawler.
Serious searchers prefer to use search engines with a larger search index
and more number of keywords. Figure 8 shows how the index of each search
engine has increased since 1995 [58].
D. Summing It Up
Many efforts have been made to evaluate the performance of different search
engines; however, they lack the required systematic efforts and consistency between the parameters chosen to measure them. The absence of user involvements has also been a problem for many such studies [57].
Along with the enormous growth of the Web, search engines can return a large
number of Web pages for a single search. Thus, it becomes time consuming for
the user to go through the list of pages just to find the information. Information
retrieval on the Web becomes more and more difficult. To remedy this problem,
many researchers are currently investigating the use of their own robots (or
"Web wanderers" [30]) that are more efficient than general search engines.
These robots are software programs that are also known as agents, like CIFI
[34], Web Learner [49], and many others. Some of theses agents are called
intelligent software agents [54] because they have integrated machine learning
techniques. The Web page entitled "Database of Web Robots Overview" at
h t t p : //www. r o b o t s t x t . o r g / w c / a c t i v e / h t m l / i n d e x . html lists 272 of these
robots or agents as of January 10, 2002.
The advantages of these robots are that they can perform useful tasks like
statistical analysis, maintenance, mirroring, and most important of all resource
discovery. However, there are a number of drawbacks, such as the following:
They normally require considerable bandwidth to operate, thus resulting in network overload, bandwidth shortages, and increases in maintenance costs. Due
687
688
WUETAL
at the point of entry to the department site, SiteHelper displays a Ust of changes
that have been made since the user last visited. By viewing these changes, the
user will know whether the PostScript file is now available, rather than accessing
the Web page again.
SiteHelper can be easily adopted for many other services, such as library
sites, Internet music stores, and archives. With library sites, for example, different users have interests in different topics: SiteHelper can retrieve the books
related to the user's topics during their visit. In their next visit, the system will let
the user know what other additional books or materials have become available
since their last visit.
B. Other Advanced Web Exploration Agents
Assisting Web users by identifying their areas of interest has attracted the attention of quite a few recent research efforts. Two research projects reported
in [7] at Stanford University and [69] at Stanford University in cooperation
with Hewlett-Packard laboratories are along this line. Two other projects,
WebWatcher [6] and Letizia [33] also share similar ideas.
Balabonovic and Shoham [7] have developed a system that helps a Web user
to discover new sites that are of the user's interest. The system uses artificial
intelligence techniques to present the user every day with a selection of Web
pages that it thinks the user would find interesting. The user evaluates these
Web pages and provides feedback for the system. The user's areas of interest
are represented in the form of (keyword, weight) pairs, and each Web page is
represented as a vector of weights for the keywords in a vector space."^ From
the user's feedback, the system knows more about the users' areas of interest in
order to better serve the users on the following day. If the user's feedback on a
particular Web page is positive, the weights for relevant keywords of the web
page are increased; otherwise, they are decreased.
This system adds learning facilities to existing search engines and, as a
global Web search agent, does not avoid the general problems associated with
search engines and Web robots. In addition, the (keyword, weight) pairs used
in this system cannot represent logical relations between different keywords.
For example, if the user's areas of interest are "data mining on Internet" and
"rule induction," the logical representation should be ("data mining" AND
"Internet") OR "rule induction."
Yan et al. [69] investigate a way to record and learn user access patterns in
the area of designing on-line catalogues for electronic commerce. This approach
identifies and categorizes user access patterns using unsupervised clustering
techniques. User access logs are used to discover clusters of users that access
similar pages. When a user arrives, the system first identifies the user's pattern,
and then dynamically reorganizes itself to suit the user by putting similar pages
together.
An (item, weight) vector, similar to the (keyword, weight) vector used to
represent each web page in [7], is used in [69] to represent a user's access
pattern. The system views each Web page as an item, and the weight of a user
"^The vector space approach is one of the most promising paradigms in information retrieval [7].
689
on the item is the number of times the user has accessed the Web page. This
system does not use semantic information (such as areas of interest) to model
user interests, but just actual visits. Also, it does not aim to provide users with
newly created or updated Web pages when they visit the same Web site again.
WebWatcher [6] designed at Carnegie Mellon University is an agent that
helps the user in an interactive mode by suggesting pages relevant to the current
page the user is browsing. It learns by observing the user's feedback to the suggested pages, and its objective is to guide the user to find a particular target
page. A user can specify their areas by providing a set of keywords when they
enter WebWatcher, mark a page as interesting after reading it, and leave the
system at any time by telling whether the search process was successful. WebWatcher creates and keeps a log file for each user, and from the user's areas of
interest and the "interesting" pages they have visited, it highlights hyperlinks
on the current page and adds new hyperlinks to the current page.
WebWatcher is basically a search engine, and therefore does not avoid the
general problems associated with search engines and Web robots. In addition,
it helps the user to find a particular target page rather than incremental exploration of all relevant, newly created, and updated pages at a local site.
Letizia [33] learns the areas that are of interest to a user, by recording the
user's browsing behavior. It performs some tasks at idle times (when the user is
reading a document and is not browsing). These tasks include looking for more
documents that are related to the user's interest or might be relevant to future
requests.
Different from WebWatcher, Letizia is a user interface that has no predefined search goals, but it assumes persistence of interest; i.e., when the user
indicates interest by following a hyperlink or performing a search with a keyword, their interest in the keyword topic rarely ends with the returning of the
search results. There are no specific learning facilities in Letizia, but just a set
of heuristics like the persistence of interest plus a best-first search.
Yl. CONCLUSIONS
The exponential growth of the World Wide Web makes information retrieval
on the Internet more and more difficult. Search engines and robots are useful
for narrowing the user's search so that it is faster and more efficient. However,
a search engine can still easily return a large number of Web pages for a single
search, and it is time consuming for the user to go through the list of pages just
to find the information. To address this problem, advanced information exploration with data-mining capacities is a direction for research and development.
In this chapter, we have introduced the two most popular Web browsers
(Netscape Communicator and Internet Explorer), and some leading search engines. We have also reviewed some research efforts on advanced Web information exploration using data-mining facilities.
ACKNOWLEDGMENTS
The paper has benefited from discussions and joint work with Daniel Ngu at Monash University.
690
WU ETAL.
REFERENCES
1. Deleted in proof.
2. Deleted in proof.
3. Altavista, h t t p : / / w w w . a l t a v i s t a . c o m , 1999.
4. AOL NetFind, h t t p : //www. a o l . com/net f i n d / , 1999.
5. Askjeeves, http://www.askjeeves.com, 1999.
6. Armstrong, R., Freitag, D., Joachims, T., and Mitchell, T. Web Watcher: A learning apprentice for the World Wide Web. Available at http://www.cs.cmu.edu/afs/cs.cmu.edu/
project/theo-6/web-agent/www/webagent-plus/webagent-plus.html, 1995.
7. Balabanovic, M., and Shoham, Y. Learning information retrieval agents: Experiments with
automated Web browsing. In On-line Working Notes of the AAAl Spring Symposium Series
on Information Gathering from Distributed, Heterogeneous Environments, 1995.
8. Barlow, L. The Spider's apprenticeHow to use Web search engines: How search engines work.
Available at h t t p : //www. monash. com/spidap4. html, Nov. 1997.
9. Berners-Lee, X, Gailliau, R., Luotonen A., Nielsen, H. E, and Secret, A. The World-Wide Web.
Commun. ACM 37(8): 1994.
10. Bowman, C. M., Danzig, R B., Hardy, D. R., Manber, U., and Schwartz, M. E The Harvest
information discovery and access system. In Proceedings of the Second International WorldWide Web Conference, Chicago, IL, Oct. 1994.
11. Bowman, C. M., Manber, U., Danzig, R B., Schwartz, M. E, Hardy, D. R., and Wessels, D. R
Harvest: A scalable, customizable discovery and access system. Technical report. University of
ColoradoBoulder, Mar. 1995.
12. Deleted in proof.
13. Campbell, M. An evaluation of indexing techniques for text-based information retrieval systems. Master's thesis, Chisolm Institute of Technology, Nov. 1989.
14. Chang, C.-H., and Hsu, C.-C. Customizable multiengine search tool with clustering. In Proceedings of Sixth International World Wide Web Conference, 1997.
15. Cho, J., Garcia-Molina, H., and Page, L. Efficient crawling through URL ordering. In Proceedings of Seventh International World Wide Web Conference, 1998.
16. Dogpile, http://www.dogpile.com, 1999.
17. Drelinger, D., and Howe, A. E. Experiences with selecting search engines using metasearch.
ACM Trans. Inform. Systems 15(3): 195-222, 1997.
18. Excite Inc., Excite, http://www.excite.com, 1999.
19. Faloutsos, C. Access methods of text. Comput. Surveys March 1985.
20. Google, http://www.google.com, 1999.
21. Gordon, M. Probabhstic and genetic algorithms for document retrieval. Commun. ACM
Oct. 1998.
22. Deleted in proof.
23. Infoseek, http://www.infoseek.com, 1999.
24. Inktomi, h t t p : //www. inktomi. com/products/search/, 1999.
25. He, J. Search engines on the Internet. Experiment. Tech. 34-38, Jan./Feb. 1998.
26. HotBot, http://www.HotBot.com, 1999.
27. Joachims, T , Mitchell, T , Freitag, D., and Armstrong, R. WebWatcher: Machine learning
and hypertext. Available at http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/
webwatcher/mltagung-e.ps.Z, 1995.
28. Jones, S. Text and Context: Document Storage and Processing. Springer-Verlag, London, 1991.
29. Koster, M. ALIWEBArchie-like indexing in the Web. In Proceedings of the First International
World-Wide Web Conference, Geneva, Switzerland, May 1994.
30. Koster, M. Robots in the Web: Threat or treat? Connexions 9(4): Apr. 1995.
31. Koster, M. Guidelines for Robot Writers, h t t p : //www. robot s t x t . o r g / w c / g u i d e l i n e s . html.
32. Koster, M. A Standard for Robot Exclusion h t t p : //www. robot s t x t . org/wc/norobot s . html.
33. Lieberman, H. Letizia: An agent that assists Web browsing. In Proceedings of the 1995
International Joint Conference on Artificial Intelligent, Montreal, Canada, Aug. 1995.
34. Loke, S. W , Davison, A., and Sterling, L. CIFI: An intelligent agent for citation. Technical
Report 96/4, Department of Computer Science, University of Melbourne, Parkville, Victoria
3052, Australia.
69 I
ASYNCHRONOUS TRANSFER
MODE (ATM) CONGESTION
CONTROL IN COMMUNICATION
AND DATA NETWORK SYSTEMS
SAVERIO M A S C O L O
Dipartimento di Elettrotecnica ed Elettronica, Politecnico di Bari,
70125 Bart, Italy
M A R I O GERLA
Computer Science Department, University of CaliforniaLos Angeles, Los Angeles,
California 90095
I. INTRODUCTION 694
II. THE DATA NETWORK MODEL 697
A. Advantages of Per-Flow Queuing 698
B. A Classical Control Approach to Model the Input-Output
Dynamics of a Per-VC Queue 700
III. A CLASSICAL CONTROL APPROACH TO MODEL A
FLOW-CONTROLLED DATA NETWORK 701
A. Why a Linear Feedback Control Scheme Is Proposed 702
B. The Reference Signal and the Worst Case Disturbance 703
IV DESIGNING THE CONTROL LAW USING
THE SMITH PRINCIPLE 703
A. The Smith Principle: Basic Concepts 703
B. The Proposed Control Law 704
V. MATHEMATICAL ANALYSIS OF STEADY-STATE
A N D TRANSIENT DYNAMICS 707
VI. CONGESTION CONTROL FOR ATM NETWORKS 709
A. Discrete Time Control Equation 710
VII. PERFORMANCE EVALUATION OF THE
CONTROL LAW 711
A. Computer Simulation Results 711
B. A Comparison with the ERICA 712
VIM. CONCLUSIONS 715
REFERENCES 716
693
694
Flow and congestion control is a major research issue in high-speed communication data networks. Link propagation delays along with higher and higher
transmission speeds increase the bandwidth-delay product and have an adverse
impact on the stability of closed loop control algorithms. Moreover, the growth
of data traffic along with the ever-increasing demand of quality of service require more sophisticated congestion control techniques in order to manage
the sharing of network resources. In this chapter, we propose classical control
theory as a simple and powerful tool for tackling the issue of congestion control
in high-speed communication networks. In particular, we propose the Smith
principle for designing a congestion control algorithm for high-speed asynchronous transfer mode (ATM) data networks. Smith's principle is a classical
control technique for time-delay systems. A major advantage of Smith's principle is that it allows the design problem for a system with delay being transformed
into one without delay. In our case, we transform the dynamics of the controlled
system into a simple first-order dynamics with a delay in cascade. The network
supplies queue levels as a feedback to the sources, which, in turn, execute the
control algorithm. The dynamics of the controlled data networks are analyzed
using standard Laplace transform techniques. It is shown that it is necessary
to provide a minimum buffer capacity at the switches equal to the bandwidthdelay product to guarantee full link utilization during transients, whereas under
steady-state condition full link utilization can be obtained with any small buffer
capacity. Moreover, data loss prevention during both transient and steady-state
conditions can be guaranteed with any buffer capacity at the expense of less
than full link utilization. A discrete time implementation of the control algorithm for ATM networks, which shows that a multiplicative decrease algorithm
is required due to the fact that, in real communication networks, the delivery
of feedback information can never be guaranteed at a fixed rate, is presented.
Finally the advantages of the proposed algorithm in comparison with a popular
explicit rate indication algorithm are discussed, and computer simulations are
carried out.
INTRODUCTION
Nowadays, communication networks are among the fastest-growing engineering areas. This growth is fueled by the rapid progress in computer and communications technology and by the extraordinary increase in productivity generated
by improved communications.
The first developed communication network was the telephone network
(Alexander Bell, 1876). In telephone networks a circuit is set up from the calling party to the called party when a telephone call is started. This circuit is
exclusively allocated to the phone conversation. Only one telephone conversation can be transmitted along one circuit. The switching of the circuits occurs
at the beginning of a new telephone call [circuit switched networks) [1].
The key innovations in computer or data networks are the organization of
data in packets and the store-and-forward packet switching. To send a packet
from computer A to computer , computer A puts the source address A and
the destination address F into the packet header and sends the packet first to
695
computer J5. When B gets the packet from A, it reads the destination address
and finds out that it must forward the packet to C and so on. Thus, when a node
receives a packet, it first stores it, then forwards it to another node. Using storeand-forward packet switching, Unks can be efficiently shared by a large number
of intermittent transmissions. This method is called statistical multiplexing, and
it contrasts with circuit switching that reserves circuits for the duration of the
conversation even though the parties connected by the circuits may not transmit
continuously. The reduction in delivery time due to the decomposition of data
in small packets is called pipelining gain [1].
The U.S. Department of Defense Advanced Research Projects Agency
(DARPA) have promoted the use of packet switched networks since the late
1960s. The resulting network, ARPANET, started operation in 1969 to evolve
into the Internet, a set of interconnected networks that are used today by millions of people to exchange text, audio, video clips, documents, etc.
A major advantage of packet switched networks is the sharing of network resources. This sharing drastically reduces the cost of communications. A
consequence is that sophisticated mechanisms of flow and congestion control
are required to manage resource-sharing, without incurring congestion phenomena [1-4].
An increasing amount of research is devoted to different control issues,
concerned with the goal of ensuring that users get their desired quality of service
(QoS) [1,4-7].
The deployment of new communication networks that merge the capabilities of telephone and computer networks in order to transmit multimedia
traffic over a fully integrated universal network has lead to the introduction of
broadband integrated service digital networks (B-ISDNs). The emerging asynchronous transfer mode (ATM) technology has been selected as the transfer
mode to be used in B-ISDNs [1,3].
ATM networks seek to provide an end-to-end transfer of fixed size cells
(53 bytes) and with specified quality of service [1,3,8]. ATM is a class of
virtual circuit switching networks conceived to merge the advantages of circuit switched technology (telephone networks), with those of packet switched
technology (computer networks). In particular, ATM networks are connectionoriented in the sense that before two systems on the network can communicate,
they should inform all intermediate switches about their service requirements
and traffic parameter by establishing a virtual circuit. This is similar to the telephone networks, where an exclusive circuit is setup from the calling party to the
called party, with the important difference that, in the case of ATM, many virtual circuits can share network links via store-and-forwardpacket switching [1].
The ATM Forum Traffic Management Group [8] defines five service classes
for supporting multimedia traffic: (1) the constant bit rate (CBR) class, which
is conceived for applications such as telephone, video conferencing, and television; (2) the variable bit rate (VBR) class, which allows users to send at a
variable rate. This category is subdivided into two categories: real-time VBR
(RT-VBR), and non-real-time VBR (NRT-VBR). An example of RT-VBR is interactive compressed video or industrial control (you would like a command
sent to a robot arm to reach it before the arm crashes into something), while
that of NRT-VBR is multimedia email. (3) The unspecified bit rate (UBR) class
696
is designed for those data applications, such as email and file transfer, that
want to use any leftover capacity and are not sensitive to cell loss or delay. UBR
does not require service guarantee and cell losses may result in retransmissions,
which further increase congestion. (4) The available bit rate (ABR) class was
defined to overcome this problem. It is the only class that responds to network
congestion by means of a feedback control mechanism. This class is designed
for normal data traffic such as file transfer and email. It does not require cell
transfer delay to be guaranteed. However, the source is required to control
its rate in order to take into account the congestion status of the network. In
this way the cell loss ratio (i.e., lost cells/transmitted cells) is minimized, and
retransmissions are reduced, improving network utilization [1,3,8].
Congestion control is critical in both ATM and Internet networks, and
it is the most essential aspect of traffic management. In the context of ATM
networks, binary feedback schemes were first introduced due to their easy implementation [9-13]. In these schemes, if the queue length in a switch is greater
than a threshold, then a binary digit is set in the control management cell. However, they suffer serious problems of stability, exhibit oscillatory dynamics, and
require large amounts of buffer in order to avoid cell loss. As a consequence,
explicit rate algorithms have been largely considered and investigated [3]. Most
of the existing explicit rate algorithms lack two fundamental parts in the feedback control design: (1) the analysis of the closed loop network dynamics;
(2) the interaction with VBR traffic. In [14,15], an algorithm that computes
input rates dividing the measured available bandwidth by the number of active
connections is proposed. In [9], an analytic method for the design of a congestion controller, which ensures good dynamic performance along with fairness
in bandwidth allocation, has been proposed. However, this algorithm requires
a complex on-line tuning of control parameters in order to ensure stability
and damping of oscillations under different network conditions; moreover, it
is difficult to prove its global stability due to the complexity of the control
strategy. In [16], a dual proportional derivative controller has been suggested
to make easier the implementation of the algorithm presented in [9]. In [17],
two linear feedback control algorithms have been proposed for the case of a
single connection with a constant service rate. In [18], these algorithms have
been extended to the case of multiple connections with the same round trip
delay sharing the bottleneck queue, and the robustness of these algorithms for
nonstationary service rate has been analyzed. In [19], the ABR source rate is
adapted to the low-frequency variation of the available bandwidth; a linear
dynamic model is presented, and Hi optimal control is applied to design the
controller. The controller is simple and its stability is mathematically shown.
In [20], a single controlled traffic source, sharing a bottleneck node with other
sources, is considered. The traffic is modeled by an ARMA process, whereas
an Hoo approach is used for designing the controller. In [21] a control scheme
based on Smith's principle is proposed assuming that a first in, first out (FIFO)
buffering is maintained at switch output links.
In this chapter, we state the problem of congestion control in a general
packet switched network using classical control theory. A fluid model approximation of the network, where packets are infinitely divisible and small, is
assumed. The advantages of assuming per-VC queuing versus FIFO queuing
697
698
Link
Buffer
O Switch
Flow
S4
F I G U R E I Store and forward packet switched network. Four (SD/) V Q connections with per-flow
queueing are shown.
(b) A set L = [li] of communication links that connect the nodes to permit
the exchange of information. Each Hnk is characterized by the transmission
capacity Cj = l/ti (packets/s) and the propagation delay t^i. For each node
/ N, let 0(i) C L denote the set of its outgoing link and let /(/) C L denote
the set of its incoming links.
The network traffic is contributed by source/destination pairs (Si, A ) G
NX N, To each {Si, A ) connection is associated a virtual circuit (VQ) or a
flow i mapped on the path p{Si, Di) = (n\, ni,..., Wn).
A deterministic fluid model approximation of packet flow is assumed; that
is, each input flow is described by the continuous variable u[t) measured in cells
per second.
In high-speed networks, the bandwidth-delay product tdi/ti is a key parameter that affects the stability of closed loop control algorithms. It represents
a large number of packets "in flight" on the transmission link. These packets
are also called in pipe cells,
A. Advantages of Per-Flow Queuing
In this section we discuss the advantages of using switches, which maintain
per-flow queuing at the output link. Per-VC FIFO queuing has two important
implications: (1) fairness can be easily enforced and (2) per-flow queue dynamics
are uncoupled.
Per-flow buffering separates cells according to the flow to which they belong
[4,22,23]. Thus, it is easy for a switch to enforce fairness. In fact, as shown in
Fig. 2, the link bandwidth leftover by high-priority traffic can be assigned to the
Ar/1
VCn
FIGURE 2
__
^ Z^
699
S^vice discipline
^^ . ^
U-" ^
Output link
Output link shared by n ABR flows and high-priority traffic (GBR + VBR). Per-VC buffering
is shown. The switch services packets belonging to different flows in accordance to a given service
discipline.
ABR flows in accordance with any definition of fairness by means of a scheduling mechanism executed at the switch, f. i. via a round-robin service discipline
of cells belonging to different queues. Thus, each VC gets a fair amount of the
"best effort" available bandwidth. At the same time, the congestion control
algorithm must ensure the full utilization of this bandwidth without incurring
cell loss. Note that the assumption of per-VC queuing leads to separating the
mechanism for ensuring fairness (i.e., the scheduling at the switch) and the
mechanism for avoiding congestion (i.e., the control algorithm executed at the
source). Figure 1 shows a network where switches allocate their memory on a
per-flow basis, whereas output links are shared by different flows.
Link sharing determines that the per-flow available bandwidth is time varying; that is, each flow sees the effect of other flows only through a time-varying
available bandwidth d(t). In this way, the dynamics of each per-flow queue
level in response to its input rate is uncoupled by other flows. The bandwidth
that a generic virtual circuit / can utilize is the minimum of available bandwidth over all the links belonging to its communication path. This minimum
is called the bottleneck available bandwidth, and the buffer feeding this link
is the bottleneck queue. Note that this queue is the only queue with a level
greater than 0 along the considered VC path; that is, it is the only queue level
that must be controlled. For these considerations, the control problem design
is a single-input/single-output (SISO) problem design, where the input is the
rate of the VC connection and the output is the per-flow queue level that is
the bottleneck for the considered VC. A major consequence of the fact that the
flows are uncoupled is that any coexisting traffic scenario influences a single
VC only through a time-varying available bandwidth at the bottleneck link.
Figure 3 shows a single VC extracted from the network shown in Fig. 1. Tf^,
is the forward propagation delay from the source to the bottleneck buffer that
feeds the bottleneck link /b whereas Tfb is the backward propagation delay from
F I G U R E 3 A single VC connection (S3,D3) with the bottleneck link /b = h is shown. The interaction
with all other flows shown in Fig. I is completely captured by means of the unknown and bounded
disturbance d3,2(t).
700
the bottleneck to the destination and then back to the source. The interaction of
the considered flow with all other flows in the network is completely captured
by means of the time-varying bandwidth available at the bottleneck link. The
drawbacks of per-flow queuing is that it is not scalable, and it may be costly to
be implemented.
B. A Classical Control Approach to Model the Input-Output Dynamics
of a Per-VC Queue
/ [Ui (r-Tij)-di^(T)]dr,
(1)
Jo
where Ui(t) is the queue inflow rate due to the ith. VC, 7/y is the propagation
delay from the ith source to the /th queue, and dij(t) is the rate of packets
leaving the /th queue, that is, the bandwidth left available at link Ij for the
ith VC. Note that the available bandwidth dij(t) depends on the global traffic
loading the link; that is, it is the best effort capacity available for the /th VC.
Since this bandwidth depends on coexisting traffic, we assume that its amount
is unknown at the VC source. In control theory an unknown input that cannot
be handled is called disturbance input.
The operation of integration is linear. Equation (1) can be transformed to
an algebraic equation by using the Laplace transform as^
s
^"^ Dij(s)
1
s'
The objective of the control is to throttle the YQ input rate Ui(t) so that the
WCj bottleneck queue x^y (t) is bounded (this guarantees cell loss prevention) and
^The Laplace transform L(-) of the integral jlu(x)dx is obtained by multiplying by 1/s the
Laplace transform of u(t); that is, L(f^u(T)dT) = L(u(t))/s = U(s)/s. The Laplace transform of
the delayed signal signal u(t - T) is obtained by multiplying by e~^^ the Laplace transform of u(t);
that is, L(u(t-T)) = U(s)e-''^.
701
Mt)
I-sT
FIGURE 4
Xjjjt)^
greater than 0 (this guarantees full link utilization) in the presence of an unknown and bounded disturbance dij(t), which models the time-varying leftover
bandwidth.
We choose to execute the control algorithm at the source, whereas the
bottleneck queue level is fed back to the source from the switches. This choice
is motivated by the fact that the VC source is the best place to look at where the
bottleneck link of the VC path is. The mechanism to implement the feedback
is that proposed by the ATM Forum [8]. In this scheme an ABR source sends
one control cell (resource management (RM) cell) every NRM data cell. Each
switch encountered by the RM cell along the VC path stamps on the RM cell
the free buffer space if and only if this value is smaller than that already stored.
At the destination, the RM cell carries the minimum available space over all
the encountered buffers (i.e., the bottleneck-free space), and it comes back to
the source conveying this value. Upon receiving this information, the source
updates its input rate.
702
dim
m^ O H
Gcis)
U,{t)
-sLMu
O-
Xijit)
"plant"
703
a'l(t),
^^o
704
MASCOLO A N D GERLA
-sT
Q J G,(^)
FIGURE 6
G{s)
goal of designing a controller Gc(s) such that the resulting closed-loop dynamics
is delay-free. More precisely, Gc(s) is chosen so that the system becomes equivalent to the reference system reported in Fig. 7. This system consists of: (1) the
delay-free "plant" G[sy, (2) the controller K(s); and (3) the delay exp(-sT),
which is out of the feedback loop.
The chosen reference system is appealing because it is a delay-free system
whose output is delayed by the time T.
By equating the transfer functions of the systems in Fig. 6 and Fig. 7,
K(s)G(s)
Gc(s)G(s)e -sT
e-'^ =
1 + K(s)G(sr
1 + Gc(s)G(s)e-'T'
the required controller Gc(s) results in
Gc(5) =
K(s)
1 + K(s)G{s){l -
e-'^y
(3)
0,
>Q
! K(,s)
J Q(^)
W\ ^V'*/
-t
FIGURE 7
r-^FT^
(4)
705
\x{.t-Tfi)
\ u{t)
\ ^
1
<
e-'^
s
^
FIGURE 8
;;
'^
^
T = RTT,,
(5)
706
r,it)
*cH~^
1
s
-.r^,^. \^'J<'\
W\
'^^>'J
-t
FIGURE 9
GAs) =
l + ^(l-^-RTT'^)'
(6)
1 _^ G_()^_RTT.s
i+!
Looking at the controller (6) shown in Fig. 8, it is easy to write the rate
control equation, that is.
u{t) = k(r(t - Tfb) - x(t -T(^)\
f
U(T) . ^ r ) .
Jt-Rrr
/
(7)
The control equation (7) is very simple and can be intuitively interpreted as
follows: the computed input rate is proportional, through the coefficient fe, to
the space free in the bottleneck queue, that is, r(^ Tfb) x(^ Tfb), decreased
by the number of cells released by the VC/ during the last round trip time RTT,
i.e., the "in flight cells." By denoting the bottleneck-free space as
r(t - Tfb) - x(t Tfb) =
Bottleneck-free^pace
and
u(r) dx =
In-flight-cells,
/ .f-RTT
"^From now on, the subscripts in Xij(t), dij(t), Gci(t), Ui(t), RTTj, which are used to refer the
/th output Hnk and the VQ connection, are dropped.
707
(8)
In the next section we show via mathematical analysis that the proposed control
law guarantees bottleneck queue stability and full link utilization during both
transient and steady-state conditions.
D(s)
1'
(1-^s/k)
s^s(s
-Tfw^
+ k)
(9)
^^^
and
D(s)
=e-^'',
s
(11)
(12)
708
It can be noted that 0<Xr(t) Sr"" for t > RTT,;tv(RTT = 0,Xr(oo) = r, Xd (t) < 0
for t >Ti and Xd(T[) = 0, Therefore,
x(t) = Xr(t) + Xd(t) < Xr(t) < r"" for t > 0,
that is, the stability condition is satisfied.
PROPOSITION
is X{OQ) =
XS
(r + RTT).
Now we show that the bottleneck link is fully utilized during both transient
and steady-state conditions if the bottleneck buffer capacity is at least equal to
a (r + RTT).
PROPOSITION 4. Considering the reference signalr {t) = r^ l{t T^), the
disturbance d(t) = ai - l(t - Ti), and the controller (6), the bottleneck link is
fully utilized for t > RTT if the bottleneck buffer capacity satisfies the condition
r ^ > ^ . ( R T T + r).
Proof
(13)
Xd(t)==-a'l(t-Ti)
a'l(t--Ti-KrT)-a'e-^^'-'^'-^^^'l(t-Ti-KTT);
-100
-150
400
600
1000
time
F I G U R E 10 Time waveform Xr{t) of the queue in response t o the set point r(t), time waveform Xd(t) of
the queue in response t o the disturbance d(t), and time waveform of the queue length x(t) = Xr(t) + Xd(t).
709
In the time interval [RTT, Ti) it results mx(t) = Xr(t) = k'r''- ^-^(^-^TT) > Q,
whereas in the interval [Ti, RTT + Ti), it results mx(t) = k'r''' e-^^^'^^^ - a.
This derivative is zero at ^ = RTT + | In ^ , which corresponds to a positive
maximum since x{t) = -^2^O^-^(^-RTT) ^ Q clearly, a condition for the existence of this maximum is that T\ < RTT + | In ^ < Ti + RTT. If there is no
zero for x, because Ti > RTT + r- In ^ , then x(Tf) = )^ r^ ^-^(^i-RTT) > Q ^^d
x(T^) = k'r''' ^-^(^i-RTT) - ^ < 0. Thus ? = Ti is still a point of maximum. If
there is no zero for x because Ti < | In ^ , then x(t) is strictly increasing in [Ti,
RTT+Ti).
In the interval [RTT + Ti, oo) it results in
x(t) = i^ . r^ . e-^(^-R^) - a . ^-^(^-Ti-RTT) ^(k^r^-a-
e^^')e-^^'-^^\
7 I0
[I'll
(14)
7I I
(2) If the time 7^ since the last RM cell was received at timefe_iexpires
and no RM cell is yet received, then the source performs a worst case
estimate of the feedback information dX th = th-\-\-%. To be
conservative and to prevent cell loss we propose the following worst
case estimate of the missing feedback. We assume that in the time
interval [fe-i, fe-i + %] the queue has zero output rate. Thus, the
worst case estimate of the bottleneck-free space is the last received or
the last estimate (r^(th-i - Tfb) ^(fe-i - Ifb)) minus what the
source has pumped into the network during the interval
[th-i RTT, th-i - RTT + 7^]. Without loss of generality and for
sake of simplicity we assume that in the interval
[th-i RTT, th-i RTT + 7^) the rate is constant and equal to
u(th-.i RTT). Thus the estimate is
r^(fe - Tfb) - x(th - Tfb) = r^(fe_i - Tfb) - x(th-i - Tfb) - u(th-i - RTT) Ts,
and the rate is computed as
u(th) = k |r^(fe_i - Tfb) - x(th-i - Tfb) - u(th-i - RTT) Ts - ^ u(th-i) A,- J
= u(th-i)-k'u(th-i)'Ts.
(15)
Now, if the RM cell is not received in the interval [fe-m? fejj then the calculated
rate is
U(th) = U[th-l) . (1 - ^ Ts) = U{th-m) (1 - ^ Ts)".
(16)
712
0.9
0.8 -
T3
1 0.6
-\
-S 0.5
.? 0.4
^0.3
0.2 0.1
n
.,
1,,
10
time
FIGURE I I
xlO"
the bottleneck with (ABR + VBR + CBR) traffic. Figure 11 shows the timevarying best-effort bandwidth, which is available for the considered VC. The
bottleneck link capacity is normalized to unity, so that RTT = 10,000 time slots.
This can correspond to a cross-continental connection (^24,000 km round trip)
through a wide area network (WAN) with a typical bottleneck link capacity of
155 Mb/s. The constant gain is set to ^ = 1/750, and according to (13), the
allocated buffer capacity is set to r = 1 (10000 + 750). The sampling time is
7^ = (2/5)r = 300. Figure 12 shows that the queue length is upper bounded
by 9,000 cells < r^ (i.e., cell loss is prevented) and always greater than 0, i.e.,
100% link utilization is guaranteed. Figure 13 shows the time waveform of the
ABR source input rate.
B. A Comparison with the ERICA
CBR)BW),
where the target utilization (U) is a parameter set to a fraction of the available
capacity. Typical values of U are 0.9 and 0.95. The fair share of each VC is
computed as
Fair Share :
ABRBW
Nvc
713
9000
0
F I G U R E 12
2
Botdeneck queue length.
7-
6B 5J5
cc
CO
< 3
2
1
0
()
l\^
ir^^L,^
2
time
FIGLIRE 13
10
x10'
714
where Nye is the number of active connections. The weakness of this algorithm
is the computation of fair share. In fact, (1) it is difficult to measure the available
bandwidth, which is bursty; and (2) a switch calculates a Fair Share that can
be effectively used only if no connection is bottlenecked downstream [14,15].
To overcome point (2) the load factor is defined as
z=
ABR;input rate
ABRB w
where CCR is the source current cell rate stored in the RM cells. The load factor
z is an indicator of the congestion level of the link. The optimal operating point
is z = 1. The goal of the switch is to maintain the network at unit overload
[15]. Thus the explicit rate (ER) is computed as [15]
ERcalcuiated ^^ Max(FairShare, VCShare)
ERcalculated ^
Min(ERcalculated5 ABRfiw)
ERICA depends upon the measurements of ABRBW and H/c If there are errors in these measurements, ERICA may diverge; i.e., queues may become
unbounded and the capacity allocated to drain queues becomes insufficient.
ERICA+ is an enhancement obtained by using queue length as a secondary
metric [15]. Other drawbacks of ERICA are: (1) an oscillatory behavior due to
the decrease/increase mechanism introduced by the load factor; (2) the selection of the target utilization (U) and of the switch measurement interval; and
(3) there are cases in which the algorithm does not converge to max-min fair
allocation [15].
Here we show that, even if we assume, despite points (1) and (2), that
a switch is able to perform a precise calculation of the bottleneck explicit
rate, the lowest upper bound for the queue length is still equal to the the
bandwidth X delay product. Thus, the complexity of bandwidth measurement
with calculation of bottleneck explicit rate is not justified in comparison with
the simplicity of feeding back the free buffer space. To show this result, consider
Fig. 14, which contains a block diagram depicting the basic dynamics of the
ERICA. It is assumed that the switch, which is the bottleneck for the considered
VC, is able to measure and feedback the per-VC available bandwidth filtered
diit-T^).)
Fis)
e-'^fl>
"(0
+
F I G U R E 14
-sT/iu
d(t)
1
s
x{t)
715
0.8
S 0-6
n
CD
S 0.4
<
0.2 -
n
0
6
time
FIGURE 15
10
x10*
by a low-pass filter F(s),^ This assumption is made for two reasons: (1) congestion occurs whenever the low-frequency input traffic rate exceeds the link
capacity [19,28]; and (2) since the controlled system is sampled via RM cells,
it is not possible to reconstruct the higher-frequency component of d(t) due
to the Shannon sampling theorem. Suppose that the available bandwidth d(t)
changes from Otoaatt = to. Looking at the block diagram shown in Fig. 14, it
is easy to see that, after the time 4r + Tf^ + Tfb, the VC input rate at the input
of the queue is a; that, is all available bandwidth is used. Now suppose that the
available bandwidth suddenly drops to 0. The queue input rate remains equal
to a for a time equal to 4T + RTT, then it drops to 0. Thus, during this time the
queue level reaches the value a - (4x -^ RTT). For instance, assuming the on-off
available bandwidth shown in Fig. 15, with RTT = 1000, simulation results
give the queue behavior reported in Fig. 16. In general, it can be noted that
an on-off available bandwidth with amplitude a and period equal to 2* RTT
causes a maximum queue length equal to a - (4z -\- RTT).
Ylll. COKCLUSIOKS
Classical control theory is proposed for modeling the dynamics of high-speed
ATM communication networks. Smith's principle is exploited to design a congestion control law that guarantees no data loss and full link utilization over
communication paths with any bandwidth-delay product. The properties of
the proposed control law are demonstrated in a general setting via mathematical analysis. In particular, it is shown that full link utilization and no cell loss
^For instance, P(s) can be thefirst-orderlow-pass filter 1/(1 + rs).
716
10000
9000
-\
8000
S
7000
1" 6000
o
1 5000
1 4000
3000
2000
1000
n
1
/
\ 1
8
time
F I G U R E 16
\10
XlO'
are guaranteed during both transient and steady-state conditions. The advantages of the proposed algorithm over a heuristic algorithm such as the ERICA
algorithm are discussed. Finally, computer simulations confirm the validity of
the theoretical results.
REFERENCES
1. Varaiya, P., and Walrand, J. High-Performance Communication Networks. Morgan Kaufmann,
San Francisco, CA, 1996.
2. Jacobson, V. ACM Comput. Commun. Rev. 18(4): 314-329, 1988.
3. Jain, R. Comput. Networks ISDN Systems 28(13): 1723-1738, 1996.
4. Peterson, L. L., and Davie, B. S. Computer Networks. Morgan Kaufman, San Francisco, CA,
1996.
5. Ding, W. IEEE Trans. Circuits Systems Video Technol. 7(2): 266-278, 1997.
6. Le Boudec, J., de Veciana G., and Walrand, L. In Proc. of IEEE Conf on Dec. and Control
'96, Kobe, Japan, Vol. 1, 1996, pp. 773-778.
7. Liew, S. C , and Chi-yin Tse, D. IEEE/ACM Trans. Networking 6(1): 42-55, 1998.
8. ATM Forum Technical Committee TMWG, "ATM Forum Traffic Management Specification
Version 4.0," af-tm-0056.000, 1996. Available at http://v^rww.atmforum.com.
9. Benmohamed, L., and Meerkov, S. M. IEEE/ACM Trans. Networking 1(6): 693-708, 1993.
10. Bonomi, F., Mitra, D., and Seery, J. B. IEEE J. Selected Areas Commun. 13(7): 1267-1283,
1995.
11. Fendick, K. W., Rodrigues, M. A., and Weiss, A. Perform. Eval. 16(1-3): 67-84, 1992.
12. Ramakrishnan, K., and Jain, R. ACM Trans. Comput. Systems. 8(2): 158-181, 1990.
13. Yin, N., and Hluchyj, M. G. In Proc. IEEE Infocom 94, vol. 1, pp. 99-108,1994.
14. Charny, A., Clark, D. D., and Jain, R. In Proc. of IEEE ICC95, vol. 3, pp. 1954-1963,1995.
15. Jain, R., Kalyanaraman, S., Goyal, R., Fahmy, S., and Visv^anathan, R. "ERICA Sv^itch
Algorithm: A Complete Description," ATM Forum, af-tm 96-1172, August 1996. Available
at http://www.cis.ohio-state.edu/jain/.
7 I 7
16. Kolarov, A., and Ramamurthy, G. In Proc. of IEEE Infocom 97, vol. 1, pp. 293-301, 1997.
17. Izmailov, R. IEEE Trans. Automat. Control 40(S): 1469-1471, 1995.
18. Izmailov, R. SIAMJ. Control Optim. 34(5): 1767-1780, 1996.
19. Zhao, Y., Li, S. Q., and Sigarto, S. In Proc. of IEEE Infocom 97, vol. 1, pp. 283-292,1997.
20. Pan, Z., Altman, E., and Basar, T. In Proc. of the 35th IEEE Conf. on Dec. and Control, vol. 2,
pp. 1341-1346,1996.
21. Mascolo, S. In Proc. IEEE Conf on Dec. and Control '97, vol. 5, pp. 4595-4600, 1997.
22. Benmohamed, L., and Wang, Y. T. In Proc. of IEEE Infocom 98, vol. 1, pp. 183-191,1998.
23. Suter, B., Lakshman, T. V., Stiliadis D., and Choudhury A. K. IEEE Proc. Infocom 98, San
Francisco, Mar. 1998.
24. Smith, O. ISA J. 6(2): 28-33,1959.
25. Marshall, J. E. Control of Time-Delay Systems. Peter Peregrinus, London, 1979.
26. Franklin, G. E, Pow^ell, J. D., and Emami-Naeini, A. Feedback Control of Dynamic Systems.
Addison-Wesley, Reading, MA, 1994.
27. Astrom, K. J., and Wittenmark, B. Computer-Controlled Systems. Prentice-Hall, Englewood
Cliffs, NJ, 1984.
28. Li, S. Q., and Hwang, C. IEEE/ACM Trans. Networking 3(1): 10-15, 1995.
OPTIMIZATION TECHNIQUES IN
CONNECTIONLESS (WIRELESS)
DATA SYSTEMS ON ATM-BASED
ISDN NETWORKS AND THEIR
APPLICATIONS
RONG-HONG JAN
Department of Computer and Information Science, National Chiao Tung University,
Hsinchu 300, Taiwan
l-FEl TSAI
Wistron Corporation, Taipei 221, Taiwan
I. INTRODUCTION 720
II. CONNECTIONLESS DATA SERVICES IN
AN ATM-BASED B-ISDN 724
III. CONNECTIONLESS DATA SYSTEM OPTIMIZATION 727
A. System Models 727
B. Statement of the Various CLSs Allocation Problems 729
IV SOLUTION METHODS FOR THE UNCONSTRAINED
OPTIMIZATION PROBLEM 733
A. A Greedy Method for Solving Problem I 733
B. A Branch-and-Bound Method for Solving Problem I 736
V SOLUTION METHODS FOR THE CONSTRAINED
OPTIMIZATION PROBLEM 739
A. A Heuristic Algorithm for Solving Problem I Subject
to Bandwidth Constraints 739
B. A Heuristic Algorithm for Solving Problem 2 742
C. A Heuristic Algorithm for Solving Problem 3 745
VI. CONSTRUCTION OF VIRTUAL OVERLAYED
NETWORK 745
VII. CONCLUSIONS AND DISCUSSIONS 748
REFERENCES 749
#19
720
I. rNTRODUCTION
To provide the interconnection between LANs/MANs is the most important
appHcations in the ATM-based B-ISDN. ATM networks are inherently
connection-oriented (CO); however, conectionless (CL) service is the
predominant mode of communications in legacy LANs/MANs. This implies
that the CL data will make up a dominant portion of ATM network's traffic
load in the long term. To support the CL data service on a B-ISDN, two alternatives, direct and indirect approaches, for the ATM networks have been
specified by ITU-T [1,2].
For the direct approach, CL service is provided directly in the B-ISDN.
Connectionless data traffics between the B-ISDN customers are transferred via
connectionless servers (CLSs) in which connectionless service functions (CLSFs)
are provided. Connectionless protocol functions such as addressing, routing,
and quality of service (QoS) selection are handled by CLSs. To serve as a ATM
cell router, the CLS routes cells to their destination or intermediate CLS according to the routing information included in the user data. Although logically a
separate entity, the CLSF should be physically located in the same block as the
ATM switch or may even be a physical part of the switch itself [19,36].
For the indirect approach, CL data traffics between customers are transparently transferred above the ATM layer via the connection-oriented service.
However, the use of fully meshed semipermanent connections may not be feasible due to the limit of the number of virtual path identifiers (VPIs) that can be
supported by a switch. On the other hand, the use of switched connections will
introduce call setup delay and overhead for call control functions within the
network. Besides, the indirect approach cannot fully utilize the benefits from
multiple path routing of CL data traffic that is inherently free from being out
of sequence. Thus, from the point of scalability and efficiency, direct CL services in a large-scale network realized by means of the CLSF seem to be a more
reasonable approach. There is growing consensus that providing CL service for
the interconnection of LANs and MANs in a public ATM environment will
more likely entail the use of CLSs [20].
For the sake of load balancing and congestion avoidance, a large-scale
B-ISDN necessitates more than one CLS. In general, the fewer the number of
CLSs, the higher the load on each VP, and the higher the total transport cost
in the virtual overlay network. On the other hand, it is not necessary to attach a
CLSF at every switching node. The number of switching nodes with a CLSF depends upon the volume of CL traffic to be handled and the cost of the network.
So the determination of where to place CLSs and how to interconnect CLSs has
important performance consequences. These problems are similar to network
optimization problems, such as access facility location, warehouse location,
and partitioning. It has been proved [35] that the problem of determining the
optimal number of CLSs in an ATM network is as difficult as the problem of
determining the minimum size of vertex cover in the corresponding graph. Note
that the problem of determining the minimum size of vertex cover in a network
is known as the vertex cover problem, which is NP-complete [27]. Optimization techniques for obtaining optimal CLS configurations in an ATM network
have been presented in [35]. These include the branch and bound method, the
greedy method, the heuristic approach, and the simulating annealing method.
721
ATM Switch
Interworking Unit
Connectionless Server
1 \^A
/ n
; i i i :
IEEE 802.3 CSMA/CD /
FIGURE I
Q FDDI
MAN
*
.. *
:',6 6 6 6 61-:
IEEE 802.6 DQDB
After a determinant number of CLSs are located among the ATM switching
nodes, an efficient semipermanent VP layout should be preconfigured around
the CLSs and the interworking unit (IWUs) to which LANs/MANs are attached.
Despite virtually any embedded topology being possible, it is desirable that there
is at least one CLS lying on the virtual path connection for each IWU pair. The
IWUs between the CLSs and the LANs/MANs work as an interface of CL and
CO environment. CL traffic between LANs/MANs will be transported by way
of the VPs between the intermediate CLSs and IWUs (see Fig.l).
As a packet forwarder, CLS can operate in two modes, say the reassembly
mode and the on-the-fly mode [38]. In the reassembly mode, each CLS segments
an entire data packet from the upper layer into multiple ATM cells, then forwards them to the same downstream CLS. If any cell of the same packet is lost,
then all the other cells will be discarded. In the on-the-fly mode, a CLS routes
each cell independently, and the segmentation/reassembly is done only on the
source and destination IWUs or ATM hosts. That is, ATM cells are forwarded
as a pipeline without regards to their entirety.
Although the former seems more inefficient and incurs more processing
delay than the latter, only the reassembly mode server survives under high traffic
load. This is because the on-the-fly server forwards each cell blindly regardless
of whether the cells belong to broken packets destined to be dropped at the
destination end. The functionality of loss and retransmission is done only by
the upper layer (such as TCP) of both ends, so the effective throughput degrades
as traffic load (/cell loss rate) becomes high.
To improve the performance of the on-the-fly server, Boiocchi et aL [38]
proposed a data unit discard algorithm for the buffer management in a CLS.
722
The main idea is that if the occupation of the shared output buffer exceeds a
certain threshold, all the new "head cells" (beginning of message) and their
succeeding cells (continuation and end of message) should be dropped. The
performance was then evaluated in terms of packet loss rate as a function of
traffic intensity and buffer size. With this strategy, the on-the-fly CLS performs
consistently better than the reassembly CLS. It is remarkable that this simple
buffer management scheme can be introduced not only to CLSs but also to
the ordinary ATM switches without a CLSF. In [39], a packet discard strategy named early packet discard that has the same idea as that in [38] was
investigated to alleviate the effects of fragmentation of TCP traffic over ATM
networks. The detailed hardware architecture of the on-the-fly CLS can be
found in [36,37], where only [37] employed the early packet discarding as an
enhanced switching node.
A number of studies concerned with supporting CL data service on a
B-ISDN, such as the general framework of the protocol architectures [6-14],
hardware architecture and buffer management [36-38], the CLS location problem [34,35], and bandwidth management and traffic control [15-17], have been
proposed. In [35], the authors considered how to locate a certain amount of
connectionless servers among the switching nodes in a public ATM network for
the internetworking of connectionless data networks. The problem, so-called
the p-CLS problem, which is similar to the p-median problem [13], was formulated as a network optimization problem and shown to be NP-complete even
when the network has a simple structure. The p-median problem is to minimize
the sum of distances from demand sites to their respective nearest supply sites.
This kind of network optimization problem occurs when new facilities are to
be located on a network.
The objective function of the CLS location problem is to minimize total transport cost. Minimized total transport cost is equivalent to minimized
average transport cost. Therefore, the minimum average end-to-end delay or
bandwidth requirement can be achieved. Two algorithms, one using the greedy
method and the other based on branch-and-bound strategy, for determining the
locations of CLSs were presented. By finding the optimal locations of a CLS,
an optimal virtual overlay network that has minimized total transport costs for
the CL data traffics can be constructed.
Most of the optimization techniques available for this kind of location
problem rely on integer programming for a final solution. The emphasis of [35]
is to develop two efficient algorithmic approaches for the optimal location of
CLSs in the ATM networks and to determine the optimal virtual topology.
The optimal virtual path connections between all the gateway pairs can be
determined as long as the optimal location of CLS sites have been determined.
Thus the CL traffic routing is also decided. A conceptual virtual overlayed
network for the direct provision of CL data service in the ATM networks is
shown in Fig. 2. In this figure VPs are preconfigured between the gateways and
CLSs, and the traffics from the source gateway are routed to the destination
gateway by CLS. It is clear that the proposed solution can be applied to homogeneous ATM LANs by substituting gateways with ATM stations. It can also
be employed to design the broadband virtual private networks [32].
There exists two other related paper works on the problem of where to
place the CLSs and how to connect them [26,34]. However, only in [35] is
723
Gateways
LANs/MANs
/
/
724
:* LAN '.
Gateway
/V
.; LAN ^ - n i ^ : : . .
\^
y.^9.!^i^)ip
ATM-based
/ . . .
.'"ibHu:' LAN
BISDN
: L A N
I
ConnectionLess Service Function
; LAN (H I \i|CLSF^
\pisF\-
FIGURE 4
^CLSF|
HC^F]!HH'^
LAN
ATM-based BISDN
Direct provision of connectionless data service.
725
CLSF handles connectionless protocols and routes data to a destination user according to routing information included in the user data. Thus a connectionless
service above the adaptation layer is provided in this case.
Service (1) may lead to an inefficient use of virtual connections of usernetwork interface and network-node interface, if permanent or reserved connections are configured among users. ATM connections require the allocation
of connection identifiers (VPIs and VCIs) at each intermediate ATM switch.
This means that some bandwidth amount at each switch will be reserved
implicitly for these connections. With the availability of signaling capabilities,
an end-to-end connection may be established on demand at the commencement
of connectionless data service. This on-demand operation of Service (1) may
cause a call setup delay, and may introduce a load on call control functions
within the network.
For Service (2), there are also two options, depending on the availability of
B-ISDN signaling capabilities. Option one is to use preconfigured or semipermanent virtual connections between users and connectionless service functions
to route and switch connectionless data across the network. Option two is to
establish virtual connections at the commencement of the connectionless service
session.
Support of Service (1) will always be possible. The support of a direct
B-ISDN connectionless service (Service (2)) and the detailed service aspects are
in [2].
The provision of the connectionless data service in B-ISDN can be realized by means of ATM switched capabilities and CLSFs. The ATM switched
capabilities support the transport of connectionless data units in a B-ISDN
between specific CLSF functional groups to realize the adaptation of the connectionless data units into ATM cells to be transferred in a connection-oriented
environment. The CLSF functional groups may be located outside the B-ISDN,
in a private connectionless network or in a specialized service provider, or inside the B-ISDN. The relevant reference configuration for the provision of the
connectionless data service in the B-ISDN is depicted in Fig. 5.
The ATM switched capabilities are performed by the ATM nodes (ATM
switch/cross-connect), which realize the ATM transport network. The CLSF
functional group terminates the B-ISDN connectionless protocol and includes
functions for the adaptation of the connectionless protocol to the intrinsically
connection-oriented ATM layer protocol. These latter functions are those performed by the ATM adaptation layer type 3/4 (AAL 3/4), while the former ones
are those related to the layer above AAL 3/4, denoted CLNAP (connection/ess
network access protocol). The general protocol structure for the provision of
CL data service in a B-ISDN is shown in Fig. 6.
The CL protocol includes functions such as routing, addressing, and
QOS selection. In order to perform the routing of CL data units, the CLSF
must interact with the control/management planes of the underlying ATM
network.
The CLSF functional group can be considered implemented in the same
equipment together with the ATM switched capabilities as depicted in Fig. 7
(option A). In this case it is not necessary to define the interface at the P reference
726
FIGURE 5
point. CLSF functional group and ATM switched capabilities can be implemented also in separate equipment (option B). In this case interfaces shall be
defined at the M or P reference points depending on whether the CLSF is located
outside or inside the B-ISDN.
From the above discussion, the indirect provision of CL service is only
applicable in the case where a few users of CL services are attached to the
Customer
Terminal
Equipment
Customer
Terminal
Equipment
ATM switched
capabilities plus
CLSF
CLNAP
ATM
CLNAP
ATM
CLNAP
AAL3/4
node
AAL 3/4
node
AAL 3/4
ATM
ATM
ATM
ATM
ATM
[ Physical
FIGURE 6
Physical
Physical
Physical
Physical
727
CLSF
CLSF
1
Porl\1
1
f
ATM
>.
switched
V capabilities J
F I G U R E 7 Two options for the CLSF functional group and ATM switched capabilities, implemented
in (A) the same and (B) different equipment.
B-ISDN. Maintaining a full mesh between all the gateway pairs is impractical
for a large number of gateways. So the direct CL service realized by means
of the CLSF seems to be a more reasonable approach. CL store-and-forward
nodes (CLSs) make it possible to implement arbitrary topologies (not necessarily fully meshed), thus reducing the number of VPCs required. Its drawback, of course, is the processing overhead at each CLS site. In the next section, based on the direct approach, the problem of internetting LANs/MANs
with a wide area public ATM network is formulated as an optimization
problem.
728
Thus, the transmission cost for gateway pair (gij, ghk) is equal to the traffic
flow w(gij,ghk) multiplied by the path cost d(gij,ghk)' It can be written as
Cd(gij, ghk) = Mgij.ghk) X d(gij, ghk)- Note that the CL data flows between all
gateway pairs are transported via the CLSs. That is, for each gateway pair, CL
data flows must pass through at least one intermediate CLS. Because the extra
delay being introduced into each intermediate CLS, in general, CL data flows
729
are handled by exactly one CLS. Therefore the transmission cost for the CL
traffic between gateway pair (gij, ghk) can be defined as
Cd(gij, ghk) = Mgij, ghk) X {d(gij, s) + d(s, ghk)},
where switch node s is a CLS.
The cost of CLS can be defined as the fixed cost to install CLS and the
processing cost of CL traffic that passes CLS. Thus, the cost of CLS s/ can be
written as
Cs(si)=
/ + X ] ^(/'^^)'
where f is the fixed cost to install CLS s/, a is the processing cost per unit flow,
N(si) is the set of all the neighboring nodes of CLS s/, and a(j, Si) is the data
flow of link (/, s/).
The buffer cost for CLS s/ is assumed to be proportional to the buffer size
(memory size) of server Sj, Thus, the buffer cost for Sj can be written as
Cb(s/) = ^m,.,
where ^ is the memory cost per unit size (e.g., 1 Mbits) and rrts- is the buffer
size of server Sj. Similarly, we defined the buffer cost for gateway gj as
Ch(gj) = ymg.,
where y is the memory cost per unit size (e.g., 1 Mbits) and nig. is the buffer
size of server gj. Therefore, the total cost of the CL service is
TC =
J2
(1)
f(M)=
C,i(gii,gkk).
(2)
730
Note that the minimum virtual path connection from gij to ghk via CLS s is
niin{J(^/y,s) + J(s,g^^)}.
^seM
Our goal is to choose a set of M from Vs such that (3) is minimized. This can
be stated as:
PROBLEM 1.
(4)
^gii^ghkeVg
y^
(5)
We can restrict the p-CLS problem to a vertex cover problem by allowing only instances having c(s/, sy) = 1 for all {s/, S/} 55 and having B being
equal to the lower bound of our objective function in Eq. (4), say LBss. The
NP-completeness of the p-CLS problem can be derived from the following
lemmas.
LEMMA 1. A subset M* of Vs, with cardinality p, is a feasible solution
of the p-CLS problem for G'(\^, 55) if and only if for each node pair of
731
G there always exists at least one node of M* lying on one of its shortest
paths.
Proof, As we have described at Section II, the lower bound of Eq. (4) can
be reached only if, in the ideal case, for each gateway pair there always exists at
least one CLS on one of its shortest paths. In other words, for each node pair of
G there always exists at least one node of M* lying on one of its shortest paths
if and only if the total transport cost for M*, say T T C M * , is equal to LB5S. Since
LB55 is the lower bound of the value of optimal solution, i.e., T T C M > LB55
for all subsets M of \^, therefore, M* is of size p for which total transport
cost T T C M * is equal to LB55 if and only if M* is an optimal solution of p-CLS
oiG.
LEMMA 2. For each node pair of G there always exists at least one node
of M* lying on one of its shortest paths if and only if for each edge {s/, s/} e Ess
at least one of si and Sj belongs to M",
Proof, =^\ For those adjacent node pairs [si^Sj],, there always exists at
least one node of M* lying on its only shortest path < s/, sy>, indicating that
Si e M* or sy e M*.
4=J For each node pair, its shortest path always contains no less than one
edge. Thus, if for each edge {Si,,Sj} e Ess at least one of s/ and sy belongs to
M*, then for each node pair there always exists at least one node of M* lying
on one of its shortest paths.
THEOREM 1. The p-CLS problem is NP-complete even if the network is
a planar graph of maximum vertex degree 3, all of whose links are of cost 1
and all of whose node pairs have weight 1.
Proof, We shall argue first that p-CLS e NP, Then we shall show that
p-CLS is NP-hard.
First, to show that p-CLS belongs to NP, suppose we are given a graph G =
{Vs, Ess), and integers p and B, The certificate M we choose is the subset of Vg,
The verification algorithms affirm that |M| < /?, and then it checks whether the
inequation (6) is satisfied. This verification can be performed straightforwardly
in polynomial time.
Garey and Johnson have shown that the vertex cover in planar graphs
with maximum degree 3 is NP-complete [27]. Let G(Vs,Ess) be a planar
graph of maximum degree 3, edge length c(si, Sj) = 1 for all {s^, s/} e Ess, and
weight w(sa, Sc) = 1 for all node pairs (s^, Sc) of G, According to Lemma 1 and
Lemma 2, we can conclude that M* is a vertex cover of size p for G, Therefore, the vertex cover problem is a special case of the p-CLS problem, thus
completing the proof.
PROBLEM 2.
mm
McV, .Vg,
732
= JBJ?. I
2Z
VS/GM \
i=N{si)
where f is the fixed cost to install server s^, a is the processing cost per unit
flow, l>i{si) is the set of all the neighboring nodes of server s/, and ^ ( / , s/) is the
data flow of link (/, s^).
Problem 3 is proposed by Eom et al. [34]. Their problem is to find an optimal number of CLSs and their locations subject to maximum delay constraints
and packet loss probability constraints. Given the ATM network topology, the
number of gateways and their locations, and the traffic between each pair of
gateways T = [w(gij, ghk)]uxu, the problem can be stated as:
PROBLEM 3.
min
Yl
(6)
(7)
subject to
Y^
^SiVP{gii,ghk)
Y\
'^'"^'^^'^'^^^^
pip(gif,ghkh
yginghkeVg
(8)
VM CV VVP(g,y,gM) e G,ygij,ghk e Vg
where t{si) is the maximum queuing delay at the buffer of Sj for VP
(gij'> ghk), t(gij) is the maximum queuing delay at the buffer of gateway gij for
VP(g/;, ghk), tnyaxigij, ghk) is the maximum end-to-end packet delay from gij to
ghk, P(si) is the packet loss probability at si for VP(g/;, ghk), P(gij) is the packet
loss probability at gij for V^igij^^ghk), plp(gij,ghk) is the end-to-end packet
loss probability for pair (gij, ghk), and V^igij^fghk) is a virtual path of pair
(gij,
ghk)'
J2
P(^i)-P(gii)-P(ghk)'
Vs,eVP(g//,gM)
ysieW?{gii,ghk)
For a set of CLSs M and a set of VPs, the end-to-end packet delay and loss
probability can be determined. Then we check whether obtained VPs satisfy
733
the given maximum end-to-end packet delay and packet loss probability constraints. Among all feasible M and VPs, a solution with minimum total network
cost can be obtained.
where
^ss = ^
i=l
IZ
12
jeN(si)h=l,h^ikeN{sh)
and
^55 = min
(9)
'Wsi,SheVs.,t^h
Note that the aggregated traffic flow w(si, Sh) between si and Sh is
w(Si,Sh)=
^
^
aeN{si) beN{sh)
Mgia,ghb)'
734
traversed, we find the switch node with the maximum value of v(sa); i.e., find
s* such that
Pis'") = max{v(sa)\sa e Vs).
We say the switch node s* is a candidate node for CLS, and then we remove
the shortest paths passing through s* from P. Repeat the same procedure for
all shortest paths in P and find another switch node for CLS until P becomes
empty or the number of candidate nodes is p.
The greedy algorithm is summarized as follows.
ALGORITHM 1.
Step 1. Apply the Floyd-Warshall algorithm to find all pairs' shortest paths
for G= (Vs^Ess)' Let P be the set that includes the shortest paths for all switch
pairs (S|, s/i,). Set L = 0.
Step 2. If P = 0 or \L\ = p, then stop. Otherwise, traverse every shortest
path in P and compute v(Sa) for each Sa eVs - L. Find s* such that
i/(s*) = max{v(sa)\Sa eVs-
L}.
for s/ = 1 , . . . , 6, s/i, = 1 , . . . , 6 and s/ ^ Sh- The costs of the links are given as
c(Si,Si)
c(Si,S2)
C[SUS^)
c(Si,S4)
c{Sx,Ss)
c{S\,S6)
c[S2,Sx)
C(S2,S2)
^(52,53)
C(S2,S4)
c[Si,Ss)
c[S2,S(,)
C(S5,S2)
^(55,53)
C(S5,S4)
c[Ss,Ss)
ciSs.S^)
c(S6,Si)
c{Se,Si)
C(S6,S3)
C(S6,S4)
c{S6,Ss)
c[S(,,S(,)
0
8
3
0
0
0
8
0
1
5
7
0
3
1
0
2
7
0
0
5
2
0
7
8
0
7
7
7
0
7
0
0
0
8
7
0
735
S3
Sb
hi
hi
hi
h2
Step 1. Apply the Floyd-Warshall algorithm to find the all pairs shortest
paths for the graph shown in Fig. 9. Then
P = { ( S i , S3, S i ) , ( S i , S3), ( S i , S3, S4), ( S i , S3, S5), ( S i , S3, S4, S6)y (S2, S3), ( s i , S3, S4),
te,
55), (S2, S3, S4, S6), (S3, S4), (S3, S5), (S3, S4, Ss), (S4, S5), (S4, S6), (S5, S6)}.
i/(s2) = 4 X 5 = 20,
1/(53) = 4 x 11 = 44,
i/(s4) = 4 X 8 = 32,
v(s5) = 4 X 5 = 20,
i/(s6) = 4 x 5 = 20,
Go to Step 2.
S^^p 2. Traverse every shortest path in P and find
v(s2) = 4 x 1 = 4 , z;(s4) = 4 x 2 = 8, v(s5) = 4 x 3 = 12, v(s6) = 4 x 2 = 8
and then viss) = max{4, 8,12, 8} = 12. Set L = L U {55} = {S3, S5} and P =
{(s4, $6)}' Go to Step 2.
S^^p 2. Traverse every shortest path in P and find
v(s4) = 4 x 1 = 4 , v(s6) = 4 x 1 = 4
and then v(s4) = max{4,4} = 4. Set L = L U {S4} = {S3, S4, $5} and P = 0. Go
to Step 2.
Step 2. Since P = 0 (or \L\ = 3), and then Stop.
The optimal solution for Example 1 is L = {S3, S4, S5}. The switches S3, S4,
and S5 are selected as the CLSs. A virtual shortest path tree rooted at ^i,;^ for
routing CL data traffic is shown in Fig 10.
736
F I G U R E 10
The optimal virtual path connections for all switch pairs are
P = {(Sly 53, S2), ( S i , S3), ( S i , S3, S4), ( S i , S3, S5), ( S i , S3, S4, S6), (S2, S3), (S2, S3, S4),
te,
S5), (S2, S3, S4, S6), (S3, S4), (S3, S5), (S3, S4, Ss), (S4, S5), (S4, Se), (S5, S6)}.
737
(0,0,0,0,0,0)
C>.____^
include
nodeS,
(0,0,1,0,0,0)
(0,0,1,0,1,0))
/ ^
(^
\.
include
nodeS/
exclude
^"~-~--~-,.,iiode 3
exclude
\vnode5
^^
3 J
^ ^
include y ^
node 5 ^ " ^
(0,0,1,0,-1,0)
\v
exclude
^Nnode5
(0,0,-1,0,1,0)^/^
^40,0,-1,0,-1,0)
C5)
(^
include /
node 4 /
\ (0,0,-1,0,0,0)
exclude
\ node 4
include /
node 4 /
exclude
\ node 4
r 6j
include /
node 4 /
exclude
\ node 4
(0,0,1,1,- 1 , 0 } /
r 7j
FIGURE I I
( ^)
while the right branch represents the exclusion of that switch node (not located
as a CLS). Figure 11 represents a state space tree for the network model shown
in Fig. 9. Note that many state space trees are possible for a given problem
formulation. Different trees differ in the order in which decisions are made. In
Fig. 11, we first consider node 3 (the first node obtained by Algorithm 1) and
then consider node 5 (the second node obtained by Algorithm 1). That is, the
order of nodes for consideration is the same as the order of nodes put in set L
in Algorithm 1.
For each node / in the state space tree, we maintain its lower bound lb(/)
and an array ai(Sj), / = 1 , . . . , |\^|, to record their status, where
I, if node sy is included,
1, if node sy is excluded,
0, otherwise (node Sj is in the don't care condition).
where
(Si,Sh)=
M/i
Y^
Y^
Mgiayghb)
aN{si) b^N{sh)
738
JANANDTSAI
switching node, e.g., switch node 3, is set to be included (i.e., set ^1(53) = 1 and
ai(sj) = ao(sj)y V/, / ^ 3). On the other hand, in the right child, switch node 3
is set to be excluded (i.e., set ^2(^3) = 1 and ai(sj) = ao(sj),yj, j ^^ 3). The
following child nodes will be further individually split into two nodes in the
same way, e.g., include switch node 5 and exclude switch node 5. The status of
the other nodes inherits their parent's status (i.e., ai(sk) = CLp{i){sk)', where node /
is a child of p(/)). A node is said to be a leaf node in state space tree if its status
has p number of I's. For example, node 7 in Fig. 11 is a leaf node. In this way
a state-space tree can be generated.
For every nonleaf node, a lower bound lb(/) can be found as follows. If there
always exists at least one unexcluded intermediate switching node on its shortest
paths for all node pairs, then the lower bound lb(/) for node / in state space tree is
just the same as its parent node (e.g., the lower bound of the left child will always
inherit from its parent). Otherwise, the lb(/) for node / should be updated. That
is, if all the intermediate switching nodes along a gateway pair's shortest path
are excluded, another shortest path with an unexcluded node lying on it should
be found instead of the original one and compute its lower bound. For example,
in Fig. 11, node 2 has the status (^2(^1), aiisi)^..., a2{se)) = (0, 0, 1, 0, 0, 0),
then the virtual connection from si to S5 cannot be si, S3, S5 but si, S2, S5. This
is because switch node 3 is excluded, i.e., ^2(^3) = 1- Thus, the shortest path
from Si to Sj with restriction (^/(si), ai(s2),..., ^/(S|y^|)) can be determined by
min{(i(s/, m) + J(m, Sj)[iai(m) ^ - 1 } .
To facilitate the generation of the state space tree, a data structure heap,
called live-node heap, is used to record all live nodes that are waiting to be
branched. The search strategy of this branch-and-bound is least cost first search.
That is, the node, say node x, selected for the next branch is the live node
whose Ib(jc) value is the smallest among all of the nodes in the live-node heap.
(Note that the minimal value is at the top of the heap.) Two child nodes of
node X are then generated. Traversing the state space tree begins from root.
Initially, apply the greedy algorithm (Algorithm 1) and obtain a current optimal value UB. When a complete solution (leaf node y) is reached, compute
its total transmission cost TTC^ and compare TTCy with that of the current optimal value UB. If TTCy < UB then the current optimal solution will
be replaced, i.e., UB ^f- TTC^. If a node v in live-node heap satisfies lb(i/) >
UB, then node v is bounded since further branching from v will not lead to
a better solution. When live-node heap becomes empty, we obtain the optimal solution with optimal value UB. The detailed algorithm is presented as
follows.
ALGORITHM 2.
L , p),
begin
Apply Algorithm 1 to find L and its TTC
UB ^ TTC
Set (ao(si), aoisi),..., ao(s\v,\)) = (0, 0 , . . . , 0) and lb(0) = Zss
live-node heap Q ^^ {0}
while Q / 0 do
begin
739
Q^Q-{k\lh(k)>m,ykeQ}
end
740
at Vr[0] = ^i5 goes to Vr[i]^ and then goes to i/;.[2],..., f = Vrik]- If P is the set of
all simple vi to t/ paths in G, then it is easy to see that every path in P - {pi}
differs from pi in exactly one of the following k ways:
(1) It contains the edges ( r [ l ] , r [ 2 ] ) , . . . , (r[k - l],r[k]) but not (r[0],r[l]).
(2) It contains the edges (r[2], r [ 3 ] , . . . , (r[i^ - 1], r[fe]) but not (r[l], r[2]).
(k) It does not contain the edge
(r[kl],r[k]).
More compactly, for every path p in P - {pi}, there is exactly one ;, 1 <
/ < k, such that p contains the edges (r[/], r[j + 1 ] ) , . . . , (r[k 1], r[k]) but
not(r[/-l],r[/]).
The set of paths P {pi} may be partitioned into k disjoint sets
P^^^,..., P^^) with set P^^^ containing all paths in P - {pi} satisfying condition / above, 1 < j < k.
Let p^^^ be a shortest path in P^^^ and let q be the shortest path
from vi to Vr[j] in the digraph obtained by deleting from G the edges
(^[/ - I]? ^[/D? (^[/jj ^[/ + l])^ 5 (^[^ - Ijj ^W)- Then one readily obtains
p(/) = (J, r [/], r [/ + 1 ] , . . . , r [fe] = w. Let p^^^ have the minimum length among
p^^\ ..., p^^\ Then p^'^ also has the least length amongst all paths in P {p/}
and hence must be a second shortest path. The set P^^^ {p^^^} may now be
partitioned into disjoint subsets using a criterion identical to that used to partition P {pi}. If p^^^ has k edges, then this partitioning results in k disjoint
subsets. We next determine the shortest paths in each of these k subsets. Consider the set Q, which is the union of these k shortest paths and the paths
p^^\ . . . , p^^~^\ . . . , p^^K The path with minimum length in Q is a third shortest path. The corresponding set may be further partitioned. In this way we
may successively generate the vi to i/ paths in nondecreasing order of path
length.
At this point, an example would be instructive. The generation of the vi to
f 6 path of the graph is shown in Fig. 12.
2. The Heuristic Algorithm
741
shortest
path
cost
included
V1,V3,V5,V6
V1,V2,V4,V6
V1,V2,V3,V5,V6
14
V1,V2,V3,V4,V6
V1,V3,V4,V5,V6
F I G U R E 12
16
new path
none
(V5,V6)
(V3,V5) (V5,V6)
none
(V5,V6)
{V3,V5)
(V1,V3)
Vl,V3,V4,V6-9
Vl,V2,V5,V6-12
V1,V2,V3,V5,V6-14
none
(V4,V6)
(V3,V4)(V4,V6)
(V5,V6)
(V4,V5) (V5,V6)
(V3,V4) (V5,V6)
{V1,V3) (V5,V6)
Vl,V2yV4,V6-13
V1,V2,V3,V4,V6-15
(V5,V6)
(V5,V6)
(V2,V5) (V5,V6)
(V3,V5)
{V2,V5) (V3,V5)
(V1,V2)(V3,V5)
(V4,V6)
(V4,V6)
(V2,V4) (V4,V6)
(V3,V4)(V5,V6)
(V2,V4) (V3,V4) (V5,V6)
(VI,V2) (V3,V4) (V5,V6)
(V3,V5) (V5,V6)
(V3,V5) (V5,V6)
(V2,V3)(V3,V5){V5,V6)
(VI, V3)
(V2,V3)(V1,V3)
(V1,V2)(V5,V6)
(V3,V4) (V4,V6)
(V3,V4) (V4,V6)
(V2,V3) (V3,V5) (V5,V6)
(V1,V3)(V5,V6)
(V2,V3) (VI,V3) (V5,V6)
(VI,V2) (VI,V3) (V5,V6)
{V5,V6)
(V5,V6)
(V4,V5) (V5,V6)
(V3,V4)(V4,V5){V5,V6)
(V2,V5)
(V4,V5)
(V3,V4)
(VI,V3)
V1,V3,V4,V6
V1,V2,V5,V6
excluded
(V3,V5)
(V2,V5) (V3,V5)
(V2,V5) (V3,V5)
(V2,V5) (V3,V5)
V1,V3,V4,V5,V6-16
Vl,V2,V4,V5,V6-20
Vl,V2,V3,V4,V5,V6-22
Step 4. Traverse all the pairs' legal paths in the sorted order. Whenever
passing by a switching node s,, add w(gs,t, gu,v) [d(gs,ty Si) + d(s,, g,^)} to z/(s,).
Whenever going through a link (/,;"), subtract x(i, j) from w(gs^f>gu,v)' W the
latter pairs cannot go through their legal shortest paths for the sake of congestion (the residual bandwidth is not enough), then use the algorithm of the
previous subsection to find the legal detouring paths.
Step 5. Select a switching node s* with a maximum value of v(si). Traverse
all those pairs' legal shortest path passed by s*, subtracting b(i, j) from traffic
flow w whenever going through the link (/, / ) .
Step 6. Mark all these pairs and set x(i^ j) to &(/, /) for all links (/, / ) .
Step 7. Go to step 2 for all those unmarked pairs until p CLSs are selected
or all the pairs are marked. If the p CLSs are selected before all the pairs are
marked, the remaining unmarked pairs' legal shortest paths can be searched
like the method described above subsequentially.
742
(10)
m(si) =
Note that if more traffic traverses the switch node s/, it is more preferable to
select Sj, On the other hand, if the cost of the connection from s/ is higher, it
should be avoided. Therefore, the preference is to place the CLS function with
the switch node with the higher node-metric value.
In this phase, the node-metric m(s/) is computed for each node s/. Then sort
m(s/). Si G Vs such that
m(s(i)) >m(s(2)) >"->m(s(|^,|)).
For example, consider an ATM network as given in Example 1. Then Ng =
{Si,S2, . . . 5 S 6 } :
T =
0
4
4
4
4
4
4
0
4
4
4
4
4
4
0
4
4
4
4
4
4
0
4
4
4
4
4
4
0
4
'0
4~
8
4
3
4
and C =
4
0
4
0
0
0
8
0
1
5
7
0
3
1
0
2
7
0
0
5
2
0
7
8
0
7
7
7
0
7
0
0
0
8
7
0
743
Apply the Floyd-Warshall algorithm to find the all pairs' shortest paths as
P = { ( 5 1 , S3, S i ) , ( S i , S3), ( S i , S3, S4), ( S i , S3, S5), ( S i , S3, S4, S6), (S2, S3), (S2, S3, S4),
(S2, S5), (S2, S3, S4, Se), (S3, S4), (S3, S5), (S3, 54, S^), (S4, S5), (S4, 55), (S5, S6)}.
Thus,
E/^2^(^i>^/) ^ 4 + 4 + 4 + 4 + 4 ^ 2 0
E-=2C(VP(,,.,,) " 4 + 3 + 5 + 10 + 13 - 35In this way, we obtain m(s2) = f|, ni(s3) = | | , m{s^) = | j , miss) = |^, m(s6) =
I I . Sort m(s/), / = 1 , 2 , . . . , 6. We have the sequence m(s3), m(s4), MS2), m(si),
m(s5),m(s6).
2. Clustering Phase
In the clustering phase, there are three steps (i.e., clustering, center finding,
and cost computing) in an iteration and | Vs \ iterations in the algorithm. The
steps in iteration / can be described as follows.
a. Clustering Step (The Switch Nodes are Divided into i Clusters.)
The / nodes S(i), S(2),..., S(/) with the highest / node metrics are selected.
Let S = {s(i), S(2),..., S(/)} and R= Vs S. Initially, set / clusters as S(s{k)) =
{s(^)},^ = 1 , . . . , / . For each node Sr e R^ find / shortest paths VP(5^,s(i))5
VP(sS(2))5 . . , VP(s,,5(,)) from Sr to S(i), S(2),..., S(/), respectively, and then node Sj
such that C(VP*,^,,^)) = min{C(VP(,,,,,,)), C(VP(ss(,))), , C(VP(,,,,^,))}. Thus,
we assign node Sr to cluster S(sj) i.e., S(sj) = S(sj) U [Sr). This is because switch
node Sr has a minimum shortest path to s/. By assigning all nodes in jR, we have
/ clusters S(s(i)), S(s(2)),..., S(s(/)) and each cluster has at least one node.
For example, consider iteration 2 ( / = 2). Then S = {S3,S4} and R =
{si,S2,S5,S6}. For si, find the shortest paths VP(si,s3) = (si, S3) and
VP(si,s4) = (si,S3,S4) and compute their costs. We have C(VP(5i,s3)) = 3,
and C(VP(si,s4)) = 5. Thus, set S(s3) = {53} U {si}. In this way, we obtain 2
clusters, Sfe) = {si,S2,S3} and S(s4) = {S4,S5,S6}.
b. Center-Finding Step (For Each Cluster, a Switch
Node is Chosen as a CLS.)
For each cluster S(s(^)), we consider every node in the cluster is a candidate
for CLS. Thus, for each s e S(s()^)), we find the shortest paths from s to all the
nodes in S(s()^))\{s} and sum up all shortest paths' cost as its weight. A node in
S(s{k)) with the least weight is selected as a CLS. Thus, a configuration with /
CLSs is constructed.
For example, in iteration / = 2, two clusters, S(s3) = {si, S2, S3} and S(s4) =
{S4, S5, S6}, are obtained. By computing all pairs' shortest paths for cluster S(s3),
node S3 is chosen as a CLS. Similarly, node $5 is selected as a CLS for cluster
S(s4). Figure 13 shows the CLS locations for two clusters.
744
Cluster 2
Cluster 1
Note that in the previous step a configuration / is found such that local transmission cost is minimized. In this step the total cost will be computed. We assume that / CLSs are fuUymeshed by connecting every two CLSs
using their shortest path. Thus, a virtual topology for an ATM network is
defined as:
1. Every cluster forms a rooted tree with a CLS as its root.
2. The CLSs are fully meshed.
For example, configuration / = 2 is shown as in Fig. 14.
Under this virtual topology, there exists a unique path between any gateway pair (gij^ghk)' Then the total cost for configuration / can be determined as
^gii.ghkeVg
Vs,e5
where Cci(g/;, ghk) is the transmission cost for gateway pair (g/y, ghk) and Q(s^)
is the cost of CLS s/.
745
746
Destination
0-
92,1
F I G U R E 15
94.1
(g2,iy g4,i) from g2,i to S3 and from S3 to ^44 (i.e., the intermediate endpoint is
on S3). The allocated bandwidth for them is proportional to w(g2^i, ^4,1).
If the flow passed by more than one intermediate CLS, then it can
have more than one intermediate endpoint. For example, the CL traffic flow
from g2,i to g6,i will go through the virtual path {(^2,1,52), (s2,S3), (S3,sii),
(siiySy), (S7, s^), (S6,g6,i)} with two CLSs, i.e., S3 and sy, lying on it. In this
case, we have three choices to set up the VPCs, as shown in Fig. 16. The first
choice is to have both S3 and sy as the intermediate endpoints, the second choice
is to only have S3, and the third is to have sy. Since the last two choices have
the same effect, we can consider it has only two alternatives: have one or more
than one CLS as the intermediate endpoint.
The first alternative (having only one CLS as the intermediate endpoint) has
the advantage of less delay but is defective is scalability. On the other hand, the
second alternative (having more than one CLS as the intermediate endpoints)
will have the benefits of bandwidth efficiency and scalability (a side effect of
statistical multiplexing) but will suffer from more delay. To take scalability into
Destination
choice (1)
Source
CLS
CLS
Destination
choice (2)
Source
CLS
CLS
Destination
[93(sT)(sT)Q(J)(sj)1^
choice (3)
F I G U R E 16
747
account is to consider the limited number of VPIs an ATM switch can provide.
The size of a VPI field in a UNI cell header is only 8 bits (12 bits for NNI).
An example of this limitation occurs in an 8-CLS network where 16 nodes are
connected to a switch S. Since VP is unidirectional, the number of VPs going
through S may be up to 2 x 16 x 8 = 256, which is the maximum available
VPIs without any one remaining for CO traffic flow.
The tradeoffs are very similar to that between the direct and indirect approaches for CL data service involving balancing efficiency with delay. The
problem can be solved easily by a hybrid approach for setting up the proper
VPs. We can choose a threshold carefully such that the delay on CLSs can be
reduced as much as possible. If the intensity of traffic flow is greater than the
predefined threshold, then we can adopt the first alternative; otherwise, we can
adopt the latter one. In the case of having only one CLS as the intermediate
endpoint, for the sake of load balancing, we prefer to select the one with the
less number of endpoints on it. On the other hand, a maximum value on the
number of CLS hops for each VPC should also be restricted to satisfy the delay
constraint.
First, we chose a threshold T carefully for the bandwidth allocation. If
w(gs^t,gu,v) > T then we adopt the first alternative; otherwise, we adopt the
later one. Second, we concentrate all the VPCs with the same endpoints to one
VPC. Since the bursty nature of CL data traffic may cause underutilization of
the bandwidth allocated to VPCs, we can preallocate less bandwidth, say, 70
or 80%, for the aggregated flows of those original VPCs. If a burst exceeds the
preallocated bandwidth at a front endpoint, it is buffered in the CLS's memory
for a short term. When the queue length or data rate exceeds a certain threshold
(e.g., in the rush hours [16]), bandwidth is renegotiated, to add a given increment to the preallocated one. An algorithm should be well defined to request
additional increments while progressively large queue depths or data rates are
reached. Bandwidth is later released while utilization goes below a specified
threshold. Consequently, in addition to the efficient VPI usage, the hybrid approach will also result in a higher utilization and flexibility. The appropriate
choice of T depends critically on the traffic characteristic. Thus, the parameters
must be tuned to the particular environment.
By backtracking the virtual path connection for each gateway pair
{ga,b'>gc,d)'> the CLSs M[l], M [ 2 ] , . . . , M[n] consecutively being passed by the
VPC of {ga,b'> gc,d) can be obtained. Let <^ /, / ^ denote the VP for the shortest
path from / to /. The following pseudocode illustrates how to layout a virtual
overlay network.
0
1
2
3
4
5
6
7
8
748
9
10
11
12
13
14
0
1
2
3
4
5
6
7
749
REFERENCES
1. CCITT Recommendation 1.211, B-ISDN service aspects, CCITT SGXVIII, TD42,1992.
2. CCITT Recommendation 1.364, Support of broadband connectionless data service on B-ISDN,
CCITT SGXVIII, TD58,1992.
3. lisaku, S., and Ishikura, M. ATM network architecture for supporting the connectionless
service. In IEEE INFOCOM, pp. 796-802,1990.
4. Ujihashi, Y., Shikama, T , Watanabe, K., and Aoyama S. An architecture for connectionless
data service in B-ISDN. In IEEE GLOBECOM, pp. 1739-1743,1990.
5. Kawasaki, T , Kuroyanagi, S., Takechi, R., and Hajikano, K. A study on high speed data
communication system using ATM. In IEEE GLOBECOM, pp. 2105-2109, 1991.
6. Fioretto, G., Demaria, T , Vaglio, R., Forcina, A., and Moro, T. Connectionless service handling
within B-ISDN. In IEEE GLOBECOM, pp. 217-222, 1991.
7. Landegem, T. V., and Peschi R. Managing a connectionless virtual overlay network on top of
an ATM Network. In IEEE ICQ pp. 988-992, 1991.
8. Box, D. R, Hong, D. P., and Suda, T. Architecture and design of connectionless data service for
a public ATM network. In IEEE INFOCOM, pp. 722-731, 1993.
9. Cherbonnier, J., Boudec, J.-Y. L., and Truong, H. L. ATM direct connectionless service. In ICC,
pp. 1859-1863,1993.
10. Sistla, S., Materna, B., Petruk, M., and Gort, J. E. Bandwidth management for connectionless
services in a B-ISDN environment. In ITS, pp. 21.1.1-21.1.6, 1990.
11. Lai, W. S. Broadband ISDN bandwidth efficiency in the support of ISO connectionless network
service. In ICC, p. 31.4.1,1991.
12. Gerla, M., Tai, T. Y, and Gallassi, G. Internetting LAN's and MAN's to B-ISDN's for connectionless traffic support. IEEE J. Selected Area Commun. 11(8): 1145-1159,1992.
13. Kariv, O., and Hakimi, S. L. An algorithmic approach to network location problems. Part 2:
The p-medians. SIAM], AppL Math. 37(3): 539-560,1979.
14. Kariv, O., and Hakimi S. L. Congestion control for connectionless network via alternative
routing. In GLOBECOM, pp. 11.3.1-11.3.8,1991.
750
JANANDTSAI
15. Krishnan, R., and Silvester J. A. An approach to path-splitting in multiplepath networks. In
ICQ pp. 1353-1357,1993.
16. Schmidt, A., and Campbell R. Internet protocol traffic analysis with applications for ATM
switch design. Comput. Commun. Rev. 39-52, Apr. 1993.
17. Yamamoto, M., Hirata, X, Ohta, C , Tode H., Okada, H., and Tezuka, Y. Traffic control
scheme for interconnection of FDDI networks through ATM network. In IEEE INFOCOM,
pp. 411-420, 1993.
18. Venieris, I. S., Angelopoulos, J. D., and Stassinopolous, G. I. Efficient use of protocol stacks for
LAN/MAN-ATM interworking. IEEE}. Selected Area Commun. 11(8): 1160-1171, 1992.
19. Newman, R ATM local area networks. IEEE Commun. Mag. 86-98, Mar. 1994.
20. Vickers, B. J., and Suda, T. Connectionless service for public ATM networks. IEEE Commun.
Mag. 34-42, Aug. 1994.
21. Yamazaki, K., Wakahara, Y, and Ikeda, Y. Networks and switching for B-ISDN connectionless
communications. lEICE Trans. Commun. E76-B(3): 229-235, 1993.
22. Corcetti, R, Fratta, L., Gerla, M., Marsigha, M. A., and Romano, D. Interconnection of
LAN/MANs through SMDS on top of an ATM network. Comput. Commun. 16(2): 83-92,
1993.
23. Boudec, J. L., Meier, A., Oechsle, R., and Truong, H. L. Connectionless data service in an
ATM-based customer premises network. Comput. Networks ISDN Systems 26: 1409-1424,
1994.
24. Schodl, W., Briem, U., Kroner, H., and Theimer, T. Bandwidth allocation mechanism for
LAN/MAN internetworking with an ATM network. Comput. Commun. 16(2): 93-99, 1993.
25. Monteiro, J. A. S., Fotedar, S., and Gerla M. Bandwidth allocation, traffic control and topology
design in ATM overlay networks. In IEEE GLOBECOM, pp. 1404-1408, 1994.
26. Stavrov, B., de Jogh, J., van Lomwel, A., in't Velt, R. Bnet: Designing a connectionless broadband data service in an ATM network by simulated annealing. In Proceedings of the Second
International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'95), pp. 253-256, 1995.
27. Garey, M. R., and Johnson, D. S. The rectilinear Steiner tree problem is NP-complete. SIAM
J. Appl. Math. 32(4): June 1977.
28. Katoh, M., Mukai, H., Kawasaki, T , Soumiya, T , Hajikano, K., and Murakami, K. A network
architecture for ATM-based connectionless data service. lEICE Trans. Commun. E76-B(3):
237-247,1993.
29. Corcetti, R, Fratta, L., Gerla, M., and MarsigUa, M. A. SMDS multicast support in ATM
networks. Comput. Networks ISDN Systems 27: 117-132, 1994.
30. Floyd, R. W. Algorithm 97 (SHORTEST PATH). Commun. ACM 5(6): 345, 1962.
31. Rose, C. Mean internodal distance in regular and random multihop Networks. IEEE Trans.
Commun. 40(8): 1310-1318, 1992.
32. Fotedar, S., Gerla, M., Crocetti, R, and Fratta, L. ATM virtual private networks. Commun.
ACM 38(2): 101-109,1995.
33. ATM Forum, Traffic Management Specification, ver. 4.0, 1995.
34. Eom, D. S., Murata, M., and Miyahra, H. Performance comparisons of approaches for providing connectionless service over ATM networks. lEICE Trans. Commun. E80-B(10): 14541465, 1997.
35. Tsai, I.-E, and Jan, R.-H. Internetting connectionless data networks with a wide area public
ATM network. Comput. Networks ISDN Systems 29(7): 797-810, 1997.
36. Omundsen, D. S., Kaye, R., and Mahmoud, S. A. A pipelined, multiprocessor architecture for
a connectionless server for broadband ISDN. IEEE/ACM Trans. Networking 2(20): 181-192,
1994.
37. Ushijima, S., Ichikawa, H., and Noritake, K. Scalable Internet backbone using multi gigabit
ATM-based connectionless switching. lEICE Trans. Commun. E81-B(2): 324-332, 1998.
38. Boiocchi, G., Crocetti, P., Fratta, L., Gerla, M., and MarsigUa, M. A. ATM connectionless
server: Performance evaluation. In Modelling and Performance Evaluation of ATM Technology
(Perros, H., Pujolle, G. and Takahashi, Y., Eds.), pp. 185-195. Elsevier (North-Holland),
Amsterdam, 1993.
39. Romanow. A., and Floyd, S. Dynamics of TCP traffic over ATM networks. IEEE J. Selected
Areas Commun. 13(4): 633-641, 1995.
INTEGRATING DATABASES,
DATA COMMUNICATION, AND
ARTIFICIAL INTELLIGENCE
FOR APPLICATIONS IN
SYSTEMS MONITORING AND
SAFETY PROBLEMS
PAOLO SALVANESCHI
MARCO LAZZARI
ISMES, via Pastrengo 9, 24068 Seriate BG, Italy
770
Ten years ago our group started applying AI techniques to improve the abiUty
of automatic systems to provide engineering interpretations of data streams
Database and Data Communication Network Systems, Vol. 3
Copyright 2002, Elsevier Science (USA). All rights reserved.
75 I
752
SALVANESCHI A N D LAZZARI
753
754
failure [...] are still extremely difficult or inaccessible for analytical modeling"
and "in most cases, computation of the overall probability of failure of a given
dam would require so many assumptions, affected by such a high degree of
uncertainty, that the final figure would not be of any practical value for project
design and a judgement of its safety" [1], This implies that a safety assessment must fuse numerical analysis, probabilistic assessment, and engineering
judgement, based on experience.
AI concepts and technologies can assist engineers in safety management
by providing additional components to the existing information system such
as real-time interpretation systems linked to the data acquisition units and
intelligent databases supporting the off-line analysis [2- 4].
It is important to consider the uncertainty involved in any decision about
technological systems; these problems may be regarded as cases of incompleteness, arising from the nature of the systems to be assessed, since there is no
guarantee that all possible behaviors can be identified. This incompleteness is
related to the distinction between open-world and closed-world assumptions.
In civil engineering systems, the closed-world assumption that all possible behaviors can be identified and modeled is useful in only very restricted domains.
It is important to acknowledge that unforeseen events may occur. In the
case of the dam safety, for example, often the worst problem at a dam may not
be the dam itself but a lack of knowledge thereof [5]. The literature is rich in
examples of failures resulting from unexpected combinations of causes, some of
which had not been previously envisaged. The failure of the Malpasset dam was
an exemplary case, where an unforeseen combination of the uplift pressure on
the dam, due to the reservoir, and some geological structures in the foundation
generated a collapse and subsequent flood, devastating the downstream valley
and killing about 400 people.
The more a structure is investigated, the more likely that a potential or
actual problem will be discovered.
In general, the need for monitoring systems increases with the degree of
uncertainty associated with properties of materials and structures, and with
the importance and potential risk of the system to be controlled. However,
an abundance of data coming from monitoring systems, inspections, and tests
raises its own problems in terms of evaluation, interpretation, and management.
The first problem regards the availability and cost of the expertise required for
the interpretation of the data. Data are only as effective as the designer's or
owner's ability to interpret them and respond accordingly. The second problem
arises from the volume of data, which can be too large to be easily handled.
Knowledge-based systems (KBS) may assist with such problems in the general context of decision support for safety management.
C. Technical Environments for Safety Management
Figure 1 shows a typical socio-technical environment for safety management,
based on our experience on dam safety management: a data-acquisition system
can get data from monitoring instruments installed on a dam; it communicates
with an interpretation system which may provide on-line evaluation of the data,
and with a local or remote archive for storing the data; these archives may be
755
DATA BASES
documents, tests,
Mistral
Kaleidos
Eydenet
ACQUISITION
INTERPRETATION
Indaco
MEASUREMENT
DATA BASE
Midas
Damsafe
DoDi
756
UNIX, VMS, and DOS environments. INDACO, a PC-based system, has been
installed at 35 sites for dam monitoring.
The coupling of these systems has limits, which belong to two different
levels, corresponding to two different kinds of users' requirements for improvement, that we have faced by exploiting artificial intelligence techniques: local
level (management of warnings) and central level (periodical safety evaluation
or deep analysis on demand).
At the local level, monitoring systems currently available enable two kinds
of checks on the values gathered by each instrument:
comparison of the measured quantity and of its rate of change with
preset threshold values;
comparison of the measured quantity with the corresponding value
computed by a reference model.
Therefore, these checks deal neither with more than one instrument at a
time nor with more than one reading at a time for each instrument. In addition, any behavior (either of the structure or of the instruments) not consistent
with the reference model generates a warning message. Because of the limited
interpretation skills of the on-site personnel, false alarms cannot be identified
and, hence, they require expert attention.
At this level, AI may contribute in collecting the expert knowledge related
to data interpretation and delivering it through a system linked with the existing monitoring system. The system can evaluate the measurements and classify
and filter the anomalies by using different types of knowledge (e.g., geometrical
and physical relationships). It can take into account the whole set of measurements to identify the state of the dam and to explain it. This enables on-line
performance of part of the expert interpretation, a performance that reduces
the requests for expert intervention and the associated costs and delays, and
increases the reliance upon the safety of the dam.
At the central level, the existing system contains quantitative data coming
from the monitoring systems. Nevertheless, the safety evaluation requires also
the availability of additional types of information (qualitative data coming from
visual inspections, results of experimental tests, design drawings, old interpretations) and different types of knowledge (empirical relationships, numerical
models, qualitative models), related to different areas of expertise (structural
behavior, hydrology, geology).
As a consequence, a supporting system is needed to help people manage
such heterogeneous data and the complexity of their evaluation. We have therefore exploited artificial intelligence techniques that can provide new ways of
modeling the behaviors of the physical systems (e.g., a qualitative causal net
of possible physical processes). This modeling approach is a useful way to
integrate different types of knowledge by providing a global scenario to understand the physical system behavior. Moreover AI may be helpful in collecting
and formalizing different types of knowledge from different experts for data
interpretation and evaluation of the dam state.
To improve the capabilities of the information system in accordance with
the above-stated requirements, we developed two systems using AI concepts
and technologies.
757
758
Users can be warned of alarm situations via the display of INDACO, and
may draw diagrams of the quantities acquired, as well as of their time trends
(yearly, monthly, or daily).
INDACO runs on personal computers, and several secondary storage devices, from removable devices to LAN servers, can be connected to it to store
data. Moreover, it can provide data via a standard telephone line, wide area
network, or satellite or radio link. In such a way, the acquisition system can
be integrated with other data processing systems, which exploit INDACO as a
continuous data source.
INDACO has been installed on several dozens of sites, where it provides its
support in several different situations. First of all, it is used as a real-time alarm
system, for early detection of anomalous phenomena: as soon as an instrument
reads an anomalous value, users are informed by the acquisition system and
can react appropriately.
On the other hand, its archive is used to provide data for off-line analysis and support of expert interpretation of the physical processes going on.
Eventually, the availability of communication facilities allows the coordinated
management of distributed data from a remote control center: in such a way,
data collected by instruments located on a wide area may be transferred to a
unique master station, where they can be jointly used by safety managers.
III. ADDING INTELLIGENCE TO MONITORING
We have discussed above the problems arising from the limits of real-time
acquisition systems, and we have introduced the requirements that led to
the development of Mistral, a decision support system aimed at interpreting data coming from acquisition systems of dams. From that original system, we have derived similar tools for monuments (Kaleidos) and landslides
(Eydenet).
In fact, the Mistral system is packaged in a way that enables us to deliver
solutions for specific problems. A so-called solution for specific business needs
is composed of:
a context of application providing a set of requirements;
a collection of software components that can be used as building blocks
to be adapted and integrated into a software product;
a technological environment (hardware and system software
environment, common software architecture, standard communication
mechanisms) that enables the integration of the existing components;
and
the ability to deliver adaptation and integration services based on a
specific know-how both of software technology and engineering
solutions.
The context of application is the management of the safety of structures.
Data about structural behavior are collected through tests and visual inspections, while automatic instrumentation and data-acquisition systems are used
for real-time monitoring. The interpretation of such data is not easy owing to
different factors, such as the large amount of data, the uncertainty and incompleteness of information, the need for engineering judgement, knowledge of the
759
760
SALVANESCHI A N D LAZZARI
by safety experts on the grounds of their knowledge of the structure and of past
measurements, the latter are purely regarded as instrumental thresholds, used
to verify the correct behavior of the instruments.
Empirical and model-based interpretation. Essentially, the interpretation
is a process of evidential reasoning, which transforms data states into states of
the physical system and interprets them in terms of alarm states. The interpretation may be based on three types of knowledge [7]:
1. Empirical knowledge. States of sensors (derived from comparisons between measurements and thresholds) may be combined through sets of rules.
The rules may be applied to groups of instruments of the same type providing
a global index having the meaning of an anxiety status based on empirical past
experiences. Rules may be also applied to sensors of different type belonging
to the same part of the physical system (e.g., a dam block). This may suggest
the existence of dangerous phenomena in that part of the physical system.
2. Qualitative models. Qualitative relationships between measured variables may be evaluated to provide evidences for physical processes (e.g., excessive rotation of a dam block)
3. Quantitative models. Quantitative models may be used, in cooperation
with thresholds, to provide states of sensors (e.g., relations between cause and
effect variables based on statistical processing of past data or deterministic
modeling approaches).
Case-based interpretation. Analogical reasoning is used to retrieve, given
the qualitative description of the state of a structure, the closest-matching cases
stored in a case base, which can help safety managers to interpret the current
situation. Usually, situations stored in the reference database are those more
deeply investigated by experts, because of some singularities, and are enriched
with experts' comments; therefore, at run-time, the analogical reasoning makes
it possible to understand whether the current situation is similar to a past one,
and the comments of the latter can address the investigation of the current one.
This makes it possible to record and manage special behaviors that cannot be
adequately managed by explicit models.
Two different approaches were adopted for developing these tools. The
first approach is based on numerical/symbolic processing: several metrics were
defined to compute the analogy of a couple of cases. These metrics span from
simple norms within the hyperspace defined by the state indexes, to more complex functions also taking into account the gravity of the cases, in a fashion
similar to the law of gravitation. In such a way, instead of computing the distance between two situations, their reciprocal attraction is evaluated. The similarity between the situations is then defined by checking their attraction against
an adequate threshold system. This approach led to the development of a tool
(DejaViewer) which encompasses these different metrics. The second approach
is based on the application of neural networks to implement a sort of associative memory that makes it possible to compute the similarity of a couple of
cases [8].
Empirical interpretation based on neural networks. Within Mistral, deep
knowledge is usually codified by numerical algorithms, and qualitative reasoning is implemented by symbolic processing; when heuristic knowledge is
involved, the shallow knowledge about the structure and the instrumentation is managed through empirical rules based on the alarm state of single
761
instruments, taking into account their reliability and significance. Such rules
are derived from the analysis of a set of exemplary cases, which make it possible to identify weights to be given to the parameters used by the rules.
This process, time consuming and boring for the experts, can be adequately
dealt with neural nets. Therefore, neural nets for performing the empirical
evaluation of data in order to achieve the same results as the symbolic processors previously used, but with reduced development and tuning effort, can be
developed [9].
Explanation, The result of the interpretation is the identification of the
current state of the structure. From the trace of execution, using knowledge
about the behavior of the monitored structure and the instruments, the explanation module generates natural language messages. They describe the current
state of the structure and the deductions of the system.
Statistical processing. Statistical analysis of data may be performed both
to provide evidences for the evaluation process, in order to spot trends in data
(the measure of a plumb line is constantly increasing), and for periodical synthesis of past evaluations (weekly reports).
Charting. It is possible to select a situation from the database and show
on the screen its graphic representation.
Reporting. Reports may be extracted from the database, providing
the required detail for various types of users (e.g., safety managers or local
authorities).
GIS'based interaction. A window of the system may interface a GIS, to
exploit its cartographic facilities and offer a powerful representation tool for
navigating through data and exploiting standard features of GISs, such as zooming functions to highlight subareas to be analyzed. This may be useful when
the monitoring instruments are spread over a large area, as with environmental
monitoring (we have exploited a GIS for monitoring landslides).
Man/machine interface. This window-based interface draws on the screen
graphical representations of the objects that have been assessed (e.g., the monument and its structural components) and displays them using a color scale based
on the object's state, by mapping the indexes belonging to the scale normal-very
high anomaly into colors ranging from green to red, using gray for representing
missing information (for instance, malfunctioning instruments).
Figure 2 shows how Mistral gives the user information about its interpretation. Small squared lights atop the screen codify the state of the instruments.
These squares lie on colored strips, whose colors represent the state of families of instruments of the same type, which are interpreted as the activation
states of relevant phenomena, such as movements with reference to plumb lines,
deformations with reference to extensometers, and seepage with reference to
piezometers.
Rectangular lights in the lower part of the screen codify the processes'
activation state, such as rotation or translation of a block. The colors of the
dam blocks (right shoulder, block 8/10, etc.) and of the strip at the bottom of
the screen (global state) summarize the state of the physical system's structural
components and the state of the whole structure.
Interactors are available to get more refined information about the dam
state, by focusing on interesting details. For instance, when a user presses the
gray button "Block 1/2" in Fig. 2, the window shown in Fig. 3 appears on the
762
iiiiiiD
^^^^^^^3
liliiiiMlri
HUD
w 1-1
! jim_
en
1111
"W
mi
-Date
01:10:92
08:30
FIGURE 2
screen: this window presents the current readings recorded by the instruments
on the block 1/2 (main section of the dam), a picture of the block, the current
state of the instruments (on the right) and of the whole block (on the left), and
an explanation of the current state.
Via the interface, the user can also activate functions, such as print screen,
and access the internal data bases.
FIGURE 3
763
Owner or manager
Operational from
No. of sensors
Ridracoli dam
Pavia Cathedral and
six towers
Pieve di Cadore dam
Valtellina landslides
Romagna Acque
Lombardia Regional
Authority
ENEL
Lombardia Regional
Authority
AEM
AEM
AEM
1992
1994
46
120
1996
1996
54
250
1997
1998
1998
68
96
12
Cancano dam
Valgrosina dam
Fusino dam
The first version of Mistral has been installed since October 1992 on a
PC linked to the monitoring system of the Ridracoli dam and is currently in
use. After that, new applications were delivered for the Pieve di Cadore dam
(operated by ENEL, the Italian National Electrical Company) and three dams
operated by the AEM.
This case is interesting because, while in the previous cases the system is
interpreting the data from a single dam and is installed at the warden house
near the dam, in this last case the system is installed at a remote control site and
is designed to progressively host the interpretation of data coming from many
dams. Moreover, the same software is loaded on a portable computer used by
inspection people and may be linked to the acquisition system on site to get
data on the fly and provide interpretations.
The system obtained very rapid acceptance by the user. It reduced the effort
required for the management of warnings, allowed better exploitation of the
764
power of the monitoring system, and most importantly, improved the quaUty of
the safety management procedures. Users appreciated both the interpretation
capabiUties of the system and its man/machine interface, which highhghts the
system's state in a user-friendly and effective way.
At least two classes of users of the system exist:
dam managers and dam safety experts: they use Mistral as a control
panel, which shows the current state of the structure, and as a decision support
system; the internal dynamic database helps the dam managers understand the
evolution of the dam state, in order to start necessary actions (e.g., inspections,
retrofitting); and
junior safety managers: they use Mistral as a training tool, to understand
past evaluations of the behaviors of a dam.
Another application was delivered for the management of the safety of the
Cathedral of Pavia and of six towers in the same town.
On March 17,1989 the Civic Tower of Pavia collapsed. After this event, the
Italian Department of Civil Defence required the installation of an automatic
monitoring system with the ability to automatically provide engineering interpretations, linked via radio to a control center, located at the University of Pavia.
The instrumentation installed on the Cathedral and on the towers makes it
possible to acquire the most important measurements on each monument, such
as the opening/closure of significant cracks, displacements, and stresses, and
also cause variables, such as air temperature, solar radiation, and groundwater
level (Fig. 4). In the case of anomalies the system calls by phone the local
authority and ISMES. The system is installed in Pavia and has been operational
since the beginning of 1994 [10].
Finally, the same approach has been applied to the interpretation of data
coming from the monitoring of landsUdes. In the summer of 1987 a ruinous
FIGURE 4
FIGURE 5
765
flood affected large areas of the Valtellina (Northern Italy) and caused many
landsUdes. On July 28 a large mass of rock, estimated to be 34 million m^,
suddenly moved down toward the Adda Valley and destroyed several villages
with casualties. As a consequence, the regional authorities and the Italian
Department of Civil Protection appointed ISMES to develop a hydrogeological
monitoring net (about 1000 sensors).
The net includes devices such as surface strain-gauges, topographical and
geodetic benchmarks, inclinometers, settlement meters, cracking gauges, rain
gauges, snow gauges, and thermometers (Fig. 5, Fig. 6, Fig. 7).
Currently, the data of the most significant instruments of the net (about
250) are processed by an instance of Mistral (called Eydenet), which supports
real-time data interpretation and analysis of data concerning the stability of
the slopes affected by the landslides. The system has been operational since
October 1996 at the Monitoring Centre for the Control of Valtellina, set up
by the Regione Lombardia at Mossini, near Sondrio; it is operated by a senior
geologist and a team of engineers of the Regione Lombardia [11].
766
Prindpalc
Entiti geograflche
Arec dl
risentimento
Strumentl
Archivlo
ftocedure
CS4-N
^^
\
0CS5-N
0I1*-1A CS6-N
FIGURE 6
Principale
' U o n K . 0CS13-J
C3JS9-N
/
Entity geograflche
FIGURE 7
GCS12-N
Arec dl
risentimento
Strumentl
Archivio
Procedure
767
With reference to the first aim, we believe that often very sophisticated AI
prototypes fail and are not accepted by users because their functionalities are
difficult to use. AI researchers tend to consider the intelligence of the system
their main target, while we think that our target is the intelligence of the system's
behavior, and this quality strongly depends, for interactive applications, on the
communication skills of the system, that is, on its ease of use.
Therefore, we have sometime sacrificed clever functions that appeared too
hard to be used or, even better, we have delivered different releases of the same
application, whose complexity, both of reasoning and of use, was progressively
higher, so that the users could get accustomed to the system in its simpler
version, before facing more complex functionalities.
As an example, the system developed for the dam at Ridracoli was initially delivered in a version for a restricted number of instruments and with a
limited set of interpretation procedures, mainly based on the validation procedures of the acquisition system, which were already well known by the users;
we concentrated our attention on the man/machine interaction, and quickly
got the system used by the safety managers. Subsequently, we delivered three
other releases of the system, which incorporated more sophisticated numerical
procedures, evaluation of trends, syntheses of the evaluation results, graphics,
and interpretation algorithms on a larger set of instruments, and their improvements were easily accepted by users on the grounds of their previous experience
with the system.
With reference to the second topicthat is, the reliability of the system^we
are rather satisfied by Mistral's behavior: in six years of continuous operation,
from the first installation at Ridracoli to the latest release at our fifth site (an
improvement of Eydenet), we had to fix bugs or tune procedures no more than
five times, and extraordinary maintenance interventions are required only when
accidental crashes of the system, the acquisition devices, or the computer nets
occur.
We believe that our development process, based on the ISO 9000 standard,
is the main reason for such achievement, since it provides a very strong and clear
reference framework for deriving correct code from specifications, and facilitates all those project reviews that highlight possible defects of the production
process.
Eventually, the third, essential goal was related to the interpretation and
filtering capabilities of the system.
In these years the different incarnations of Mistral have evaluated several
millions of measurements; both users and our safety experts have analyzed offline the system's behavior, and their qualitative judgement is that Mistral works
correctly.
More specifically, it never missed any significant situation, while it never
created unjustified alarms; it filtered correctly several accidental warnings of the
acquisition systems; moreover, it pointed out some situations that were significantly different from what was expected, since the physical structures under
evaluation were operated in situations violating the assumptions of Mistral's
models.
Let us have a look at some figures that describe an interesting sequence of
events that were monitored by Mistral at one of our dams. First of all, let us
768
Stress that nothing dangerous ever happened in this period, nor do the figures
show any failure of the socio-technical apparatus that is in charge of the safety
management: our main concern about these figures is related to the fact that
Mistral, even when tested with situations that it is not trained to manage, can
provide useful suggestions to safety managers.
We have analyzed a set of 684 acquisitions gathered in 1997 by some
dozens of instruments, for a total of about 37,000 measurements. This instance
of Mistral derives the global state from row measures through empirical and
model-based reasoning, taking into account numerical and statistical processing; case-based reasoning is not applied, since we were not requested by our
customer to incorporate this functionality. We have evaluated the global state
of the structure according to the values shown in the following table:
Level
1
2
3
4
5
Global state
normal
normalsome local warnings
normalsome anomalies
anomalouswarning
very anomalousalarm
No. of situations
% of situations
1
2
3
4
5
426
230
20
8
0
61
34
3
1
0
These figures are very different from those usually produced by Mistral,
which are much more concentrated on the first level of the scale.
However, an off-line analysis of the data pointed out that the warnings
were due to an extraordinary combination of events (higher level of the basin,
abnormal air temperatures), which were outside the limits of Mistral's models.
In fact, the result of Mistral's evaluations was, in those situations, a reminder for
the dam managers of the unusual, delicate, but not severe operating conditions.
Finally, if we extract from the whole data set only the measurements gathered in the period when the dam was operated under normal conditions, that is,
according to the limits we considered for tuning Mistral's models, we achieve
the following results:
Level
No. of situations
% of situations
1
2
3
4
5
416
13
3
0
0
96
3
1
0
0
769
overthreshold values. According to our safety experts, such filtering was proper
and correct; as a result, the dam managers were not requested to react to those
situations, with a sensible reduction of management efforts.
On the other hand, when Mistral highlighted some unusual situations, the
off-line analysis proved that its judgement was correct, and those situations,
although not dangerous, were suitable to be carefully taken into consideration.
As a combined result of these behaviors, we have achieved that the dam
managers are not distracted by false alarms, and can concentrate their attention
and efforts on really significant situations. Moreover, the reduced number of
stimuli avoids that safety managers discard alarms, since they are too much to
be dealt with.
770
The structure of the database allows storage of the raw automatic or manual
readings, which are organized in columns and defined as source measurements.
They can then be processed automatically by means of formulas predefined by
the user, to derive the so-called computed values.
The advantages of this method are that the original measurements are preserved unaltered (and available for further processing), while their automatic
processing eliminates possible errors resulting from manual operations. A further advantage is that data storage is accomplished rapidly, since the measurements do not need any preliminary manual processing stage.
Preliminary data control (correlation, Fourier analysis, min/max values,
smoothed theoretical values, etc.) highlights anomalous values (accidental
77 I
772
SALVANESCHI A N D LAZZARI
One-shot design
Redesign of legacy systems
773
774
FIGURE 8
775
3. Communication mechanisms, which take the form of interfacing software components, that enable the user to cooperate with the system through
an object-oriented man/machine interface.
The whole system can be used in two different ways:
As a diagnostic tool: there is a sequence of operations of the reasoning
agents that allows the translation of data into dam states;
Asa knowledge integrator: the system facilitates the integration of information about the dam. Drawings, maps and pictures of the dam form part of
the information base.
Several databases are linked to the system: a database of past measurements
of the dam, a database of laboratory and in situ tests, and an archive of documents and cadastral information. The system functions as an integration tool
for different types of knowledge about the dam, such as theory, regulations, and
expert knowledge. In this way the system can be seen as a virtual expert that
reflects the knowledge of many different experts (civil engineers, hydrologists,
geologists, etc.) interviewed during the knowledge-gathering phase.
The structure of Damsafe is based on the object-oriented approach. Different types of knowledge are integrated using a hierarchical model describing the
main components of the system. The hierarchical structure includes two physical world models and three reasoning agents. The models make up the problem
domain, while the reasoning agents contain the knowledge required to reason
about the models. They perform a variety of tasks, the most important being
that of relating the concepts in the data world to those in the dam world.
I. Data World and Dam World
The concepts that constitute the data world are those used by engineers in
discussing and interpreting the data for dam safety. Some of these concepts are
expressed quantitatively, that is, numerically; others are expressed qualitatively.
Within this model are the features of data significant for identifying particular
behaviors and states of the dam system.
The data world contains several objects; each object represents the data
related to a single instrument of the monitoring system. These data are attributes
of the object; they can be time series of instrument readings, as well as details
of the type of variable represented. Features such as peaks, trends, steps and
plateaux, identified in different types of time series, are recorded in this model.
Each object has methods to deal with it (in the object-oriented sense), which
allow the user to access the knowledge linked to the object. In this way one
can read the values of the attributes of the object, or show a time series on the
screen. It is also possible to assign values to attributes; this allows the user to
act directly on the data world, bypassing the filtering of the reasoning agents.
The dam world contains a model of the physical world of the dam and its
environment, concepts describing the possible states of this world, and a set
of concepts modeling its possible behaviors. The physical dam model describes
the dam as a hierarchy of objects (e.g., dam body, foundation). These objects
have attributes that taken as a set, describe the state of the dam. The attributes
776
The model of the behaviors of the dam and its environment is a set of
processes connected in a causal network that models how the behaviors of the
dam can interlink in a causal way, resulting in various scenarios as one process
leads to another (Fig. 9 shows a fragment of the network).
The full net includes 90 different processes describing possible dam behaviors (e.g., translation, chemical degradation of foundation). We derived this network from published case studies of dam failures and accidents, and from
discussions with experts on dam design and safety. We have included the
Deterioration of
curtain wall
Change In seepage
around dam
Reservoir
impounding/emptying
Heavy
rainfall
Change in
water level
Change in water
pressure in foundation
Sealing of
basin fines
Clogging of rock
joints with fines
FIGURE 9
777
conditions under which one process can lead to another, and documented each
of these processes along with descriptions of how evidence of these processes
might be manifested in the monitoring data or in visual-inspection reports.
The network can be used in different ways:
As a database: Each process has attributes that describe the process itself (e.g., start time, rate of change); among these attributes, the activation
state expresses whether the process is within physiological thresholds or over
them; both users and automatic reasoners may access and modify the attributes'
values.
As a control panel of the system: Each process is represented on the
screen by a box that is highlighted whenever the system infers that the process
is active. Therefore, the highlighted boxes give the user an immediate synthetic
report on the dam's current state. Damsafe represents other attributes besides
the activation state (for example, reversibility, speed) by colored areas of the
box associated to the process.
As an inference tool: Automatic reasoners can use the causal Unks to build
event paths for simulating the system state's future evolution or identifying the
possible causes of an active process. In the first case the net acts as an oriented
graph, where a causal flow propagates through the net from the cause processes
(e.g., rainfall, earthquake) to the effect processes (e.g., structural cracking) via
activation rules. With these rules, Damsafe sets the state of each process on
the basis of some conditions on its input (the links from its causes) and defines
how each process influences its effects. In the second case, Damsafe applies
consistency rules to chains of processes, using the activation state of some
processes as evidence for the state of those processes that cannot have any
direct evidence from the available data.
As a knowledge base: Each process has hypertextual links to its written
documentation that describes the process and its connections to other entities
(processes and objects). Therefore, users can easily access the system's theoretical foundations through the user interface.
3. Reasoning Agents
Three reasoning agents have been developed. The first one (extractor) operates solely on the data world to manipulate data and extract features from
the data sets. It uses the graphical interface to show to the user a time-series
plot and to interactively identify features of the plot considered relevant to dam
safety. They are defined by qualitative and quantitative attributes (e.g., spike
length, start time) and stored within the data world. These attributes can also
be accessed and manipulated through methods of the data world.
The second reasoning agent [mapper] performs the task of interpretation
by identifying the possible behaviors of the dam in terms of a set of processes
in the causal net, and the values of various attributes of the dam, based on
evidence in the data.
This task is performed by firing production rules defined by experts, which
link data values to dam states (Table 1). These links are defined by using a
formal language designed to allow nonprogrammers to easily write and read
rules (Table 2). When a rule is fired, the state of some dam world process is
778
SALVANESCHI A N D LAZZARI
H H
declared active and some dam world attributes receive a value. The set of active
processes linked in a causal chain is highlighted by the system and describes a
scenario that demonstrates the evolution of the dam behavior.
The third reasoning agent {enforcer) acts on the dam world to extend the
implications of the state identified by the mapper over the model of the dam
and its environment, thus highlighting possible causal chains.
4. On-the-Fly Creation of Objects and Interface
A dam's entire information set might be inserted into the system by providing Damsafe with a description written in a special language, called ADAM.
TABLE 2 A Part of the Grammar of the Rule-Based Language Used
by the Mapper
<ANiceRule> ::
(CONDITION( <Condition> ),
ASSERT( <ListOfDamProcesses> ),
SET( <SetList> ),
MAP( <MapList> ))
<Condition> ::
<ExistentialCondition> | <RelationalCondition>
<ExistentialCondition> ::
<ListOfFeatures>
<ListOfFeatures> ::
<Feature> OF <DataObject> [ OR <ListOfFeatures> ]
779
780
I principali temi relativi aJUa gestione del parco dighe EISTEL sonc qui organizzati gerarchicamente,
per pilotare I'lrtente nell'accesso alle informaziom a disposizione di DoDi. L'indice degli argomenti
pr^senta sinteticanaesnte qUesta gerarchia, pemxettfendo di nnwoversi ia un singolo passo verso A piinto
di partenza per I'accesso alle iiiformazioni di base; Tatbero <kg^ argomenti guida Tutente in un
percorso per approfondmienti successivi, ogauno dei quaH pud fomire informaziom specifiche oltre
che puntatoh alia gerarchia tematica sottostante.
Indice d e ^ argomenti
Albero degli argomenti
781
L'uteiite pTid intenogBie ^ aichm noti a DoDi (sxclmdo delte misuxe, del doctiuisiiti, del dali di siutesi) per otteiienie
mfonuiBzioni rektivie ]Qa diga di Alpe Gem.
Aichivio del dafi di smtesi
Aichmo delle ndstxre
Aickivio del docmnenti zehtni alia diga, di Alpe Gem (stfuU e
is^poit^
Ricexca di docuzneztti in linea cits coutengpao zifeaniientialk
diga di Alpe Gexa
Altre laformaaom
Gwvity dam cm tte Mallero River. The Alpe Geia Dam creates a 65 million cTi.m. s t o r ^ reservoir at the lie^ of a chain of
hydroelectric plants on the Mallero river in the upper Valiellina Valley. The dam is a rectilinear concrete structttie with a
cross-swtjon typical of gravity dams, i.e, a triar^itar shape with a vertex at elevation 2,126 m. a.s.l. (at the highest wrfer
lefvel), an tipstieam sbp of D.03 aisd a downstream stop of 0.70. Its csest, 5 m. krg& at ektvation 2,1!^ m. a ^ l , is
ctmrected with the downstream f;e, and at the bwest foxmcktion arm there is a cortcrete blocik towards downstream, which
enlarges the area of contact with the rock. The structtoe has a rnaximum hei^t of 17S m. above its bwi^t fbundation point, a
maximimibase width (includir^ the foundation blocIO of 110 m. andacrest length cdf 533 m. The total volume of concrete is
l,730j(300 cujn. Thefoundationrock consists of more or less metamorpluc serpentine schist of an intrusive volcanic rcKik.
FIGURE I I
Using the models the users may access muhimedia information Hke texts,
drawings, images, and data records. For the current implementation simple
integration models have been used. If required, more complex object-oriented
models may be added. For instance, a dam model may describe the dam as a
hierarchy of objects (dam blocks, foundation, instrumentation system, sensors),
each object including links to the available distributed information and methods
to process it.
Yl. CONCLUSIONS
We have seen how the integration of different information management technologies can help deal with safety problems. Artificial intelligence techniques
can provide powerful tools for processing information stored in archives of
safety-related data, such as measurements, documents, and test data. Internet
technologies can be successfully coupled with these data processing tools to
distribute data and knowledge among safety managers.
As a result of this process, we have shown a whole chain of software tools
currently being used to face different aspects of the safety management procedures in different fields of engineering.
The main achievements of the joint application of these software techniques
consist of the automatic support of those in charge of safety management, a
performance that reduces the requests for expert intervention and the associated
costs and delays, and increases the reliance upon the safety of the facility under
examination.
782
REFERENCES
1. International Commission on Large Dams. Lessons from Dams Incidents. ICOLD, Paris, 1974.
2. Franck, B. M., and Krauthammer, T. Preliminary safety and risk assessment for existing hydraulic structures - an expert system approach. Report ST-87-05, University of Minnesota,
Department of Civil and Mineral Engineering, Institute of Technology, 1987.
3. Grime, D. etal. Hydro Rev. 8,1988.
4. Comerford, J. B. et al. Dam Engrg. 3(4): 265-275,1992.
5. National Research Council (U.S.), Committee on the Safety of Existing Dams. Safety of Existing
Dams. National Academy Press, Washington, DC, 1983.
6. Anesa, E, et al., Water Power Dam Construct. 10,1981.
7. Salvaneschi, P. etal. IEEE Expert 11(4): 24-34, 1996.
8. Lazzari, M. et al. Looking for analogies in structural safety management through connectionist
associative memories. In IEEE International Workshop on Neural Networks for Identification,
Control Robotics, and Signal/Image Processing (NICROSP '96), pp. 392-400. IEEE Computer
Society, Los Alamitos, CA, 1996.
9. Brembilla, L. etal. Structural monitoring through neural nets. In Second Workshop of the European Group for Structural Engineering Applications of Artificial Intelligence (EGSEAAI '95),
Bergamo, Italy, 1995, pp. 91-92.
10. Lancini, S. etal. Struct. Engrg. Internat. 7(4): 288-291,1997.
11. Lazzari, M., and Salvaneschi, R Internat. J. Natural Hazards 20(2-3): 185-195, 1999.
12. Lazzari, M. et al. Integrating object-oriented models and WWW technology to manage
the safety of engineering structures. In Third Workshop of the European Group for Structural Engineering Applications of Artificial Intelligence (EGSEAAI '96), Glasgow, UK, 1996,
pp. 93-96.
I. INTRODUCTION 784
A. Survivability 784
B. Simple Example 785
C. Techniques for Making a Reliable Communication
Channel 785
D. Survival Method (Diversification): ^-reliable Channel 786
E. Restoral Method (Reservation): m-Route Channel 787
R Flow Network: As a Model of a Communication
Network 788
G. Organization of This Chapter 789
II. FLOWS IN A NETWORK 789
A. Preliminaries 789
B. Cut and Mixed Cut 790
C. Definition of Flow 792
D. Maximum Flow 792
E. Cut-Flow 796
F. Matching in a Bipartite Graph 797
ill. EDGE-<5-RELIABLE FLOW 800
A. Edge-6-Reliable Flow 800
B. 6-Reliable Capacity of a Cut 801
C. Maximum Edge-6-Reliable Flow 803
IV. VERTEX-6-RELIABLE FLOW 804
A. Vertex-^-Reliable Flow 805
B. ^-Reliable Capacity of a Mixed Cut 806
C. Maximum Vertex-6-Reliable Flow 808
V. m-ROUTEFLOW 810
A. Edge-m-Route Flow 810
B. Maximum Edge-m-Route Flow 811
C. Vertex-m-Route Flow 816
D. Maximum Vertex-m-Route Flow 817
VI. SUMMARY 821
REFERENCES 823
783
784
WATARU KISHIMOTO
In a database and data communication network system, the network must not be
allowed to fail or the entire system might suffer serious damage. We consider the design
of a reliable data communication channel for reliable data flow in a network.
Network failures cannot be avoided entirely, so we need to design a system that will
survive them. In this chapter, using a flow network model, we consider the design of a
reliable communication channel with well-specified network survivability performance.
There are two approaches for constructing reliable communication channels, stochastic and deterministic. Here, we mainly consider the deterministic approach, in which the
performance in the event of link or node failure is designed to exceed a predetermined
level. We classify this approach as a survival method or a restoral method.
The survival method is to confirm the deterministic lower bound of the channel capacity in the event of link or node failure, while the restoral method is to reserve a restoral
channel. With given network resources, we show methods for determining whether
the required channels exist. Moreover, when such channels do not exist, the methods
determine the locations of the bottlenecks so that we can improve the network resources
efficiently. We consider these problems as reliable-flow problems in a network.
I. INTRODUCTION
A. Survivability
In a database and data communication network system, the network must not
be allowed to fail or the entire system might suffer serious damage [1-4]. In
this chapter we consider the design of a reliable data communication channel
for reliable data flow in a network.
Network failures cannot be avoided entirely, so we need to design a system
that will survive them. With given network resources, this chapter shows methods for determining whether the required channels exist. Moreover, when such
channels do not exist, the methods determine the locations of the bottlenecks
so that we can improve the network resources efficiently [19-21]. We consider
these problems as a reliable flow problem in a network.
Usually, a network is designed to satisfy some specified performance objectives under normal conditions, without explicit consideration of network
survivability. How such a network will perform in the event of a failure is difficult to predict. A distributed database system with two or more host computers
has duplicate databases in different sites. When some data are changed in one
of those computers, that data must be transmitted to the other computers to update them. This system requires reliable communication channels for accurate
data exchange. If all the communication channels between the computers fail,
the system cannot maintain identical versions of the data in them. For such a
system, we need to specify a desirable survivability performance. Setting the objectives of survivability performance will enable us to ensure that, under given
failure scenarios, network performance will not degrade below predetermined
levels. The performance of the database system avoids the worst case, in which
just one network failure causes a system outage. A system outage caused by
a network failure is difficult to repair. With network equipment spread over a
wide area, it takes a long time to locate the failure. Then repairing the failed
equipment takes time too. A reliable communication channel with well-specified
785
(Office B^
FIGURE I
786
WATARU KISHIMOTO
787
(Office B^
FIGURE 2
capacity of the 5-reliable channel and the bottlenecks of the channel are useful
information for the design of the network.
E. Restoral Method (Reservation): m-Route Channel
By the restoral method, we reserve a dedicated channel for a channel or a set
of channels. For this aim we may use a multiroute channel, that is, m(m>2)
physically separate communication paths between two offices.
DEFINITION 2. An m-route channel is a set of m communication channels with the same channel capacity, each of which traces along a set of paths
between a specified pair of nodes, such that each pair of paths of the set is
link-disjoint.
FIGURE 3
A link failure.
788
WATARU KISHIMOTO
(Office B ;
FIGURE 4
occurs and stops the working channel, we can replace the failed channel with
the reserved channel, which is not affected by the link failure.
In general, for an m-route channel, if we reserve one channel and use the
other m 1 channels to work, a link failure can affect only one of the working
channels, and we can replace the failed channel with the reserved channel having
the same channel capacity.
Since the m-route channel of Definition 2 consists of link-disjoint paths,
the definition is a link version of the m-route channel. As in the case of the
link version of the 5-reliable flow, we can define a node version of the m-route
channel, which consists of node-disjoint paths. The channel of Fig. 4 is not a
2-route channel of node version.
There are a variety of uses for m-route channels. For an essential communication channel, the m-route channel is used because m 1 channels are reserved
and only the one channel is working. For more economical use, two channels
are reserved, and m 2 channels use them to restore communication. If there
are several failures, this economical method might not restore communication
completely. It is important to find out to what extent communication channels
can be constructed as m-route channels.
F. Flow Network: As a Model of a Communication Network
In this section we depict the modeling of a communication network system by
a flow network.
A network can be modeled by a graph in which c(^) is associated with each
edge. Each link in the network with a specific capacity is modeled by an edge
with the capacity, and each office is modeled by a vertex. Figure 5 shows a flow
network corresponding to the network of Fig. 1. A communication channel
between two offices corresponds to a flow along a single path between corresponding vertices in a network. A set of communication channels between two
offices corresponds to a single commodity flow along a set of paths between
two vertices. With this correspondence, a 5-reliable flow and an m-route flow
are defined as a (5-reliable channel and an m-route channel, respectively.
For the design of a surviving network, the goal is to establish a 5-reliable
channel or an m-route channel between two vertices according to the system
requirement. The objective of this chapter is as follows: using a flow network
FIGURE 5
789
corresponding to the network resources available for the system, we will determine whether a 5-reliable flow or an m-route flow corresponding to the
required channel exists. For this purpose we show some methods for evaluating the maximum 5-reliable flow and m-route flow. For the m-route flow we
show the procedure of the synthesis of m-route flows, which are not easy to
obtain and correspond to the assignment of the m-route channel to the data
communication network. If such a flow does not exist, with these methods we
can find the locations of the bottlenecks that limit the flow of the commodity.
Finding the bottlenecks is important in network design, so we can improve the
network resources efficiently.
G. Organization of This Chapter
Section II states the terms of a flow network, which is used as a model of a data
communication network in this chapter. Sections III and IV are devoted to the
survival method where 5-reliable flow corresponding to the 5-reliable channel
is depicted; in Section III, the edge version of 5-reliable flow is discussed, and
in Section IV, the vertex version of 5-reliable flow. In Section V, the restoral
method is taken up and m-route flows of edge version and vertex version, which
correspond to m-route channels, are described. Section VI is the summary of
this chapter.
790
WATARU KISHIMOTO
DEFINITION 3. Let N^ = (G^,c^(*)) and N^ = (G^, c^(*)) be two networks. If Gb is a subgraph of Ga and Cb(e) < Ca(e) for each edge e of G^,
then Nb is a subnetwork of N^.
Vi U V2 U . . . U V,
(1)
= E l U 2 U . . . U ,
(2)
(3)
Ci(e) = 0 fore ^ /.
DEFINITION 5. The directed sum of Ni, N 2 , . . . , N is N ' = (G, c'(*)) obtained from N, the sum of those networks as follows: In N if there exist some
pairs of edges with opposite directions to each other between two vertices,
we perform the next operation. For each of the pairs, the smaller capacity of
those edges is subtracted from the other, the greater capacity. Then the value
is assigned to the edge with the greater capacity, and the edge with the smaller
capacity is deleted.
DEFINITION
FIGURE 6
791
c{v) = mm{c{Ii{v)),c{Io{v))}.
(4)
When the capacity of a vertex is defined as above, the vertex is the same as a
vertex with infinite capacity in practice. From now on, if we say N^ = (G^, c:^(*))
is a subnetwork of N^ = (G^,c:^(*)), then besides Definition 3 the equation
should be satisfied:
Cb{y) S Ca(v) for each vertex v in G^.
(5)
Y.c{v),
(6)
where ^ is the set of all edges in (Vi, \ ^ ) , and V^^^ is the set of all vertices in
(Vi,y2).
If Vi U \^ = V, then the mixed cut (Vi, \^) coincides with an ordinary cut
(Vi, V2>. If s Vi and t e V2, then {Vi, V2) is an s-t mixed cut. A cut and a
mixed cut divide a network into two parts.
For a network NA = (GA, CAM) of Fig. 7 let Vi = {s, i/,}. Then (Vi, V^) is
a cut consisting of edges (s, i/^), (s, f^), (Vc, vy), (Vc, t), and c( Vi, V^) = 12. Let
V2 = [Va, t]. Then (Vi, V2) is a mixed cut consisting of edges (s, Va), (fc? t), and
a vertex Vb with c( Vi, V2) = 12.
792
WATARU KISHIMOTO
C. Definition of Flow
A flow from s to f has two types of definitions; one is defined as a function /"(*)
from the edge set of a network to nonnegative real numbers [32], and the other
is Definition 11, which is used here.
DEFINITION 10 [25]. A directed s-^ path-flow P = (Gp,c:p(*)) with a pathflow value of A, is a uniform network of value X such that Gp is a directed path
from s to t.
DEFINITION 11 [25]. A directed flow from s to f is the network F =
(Gf^ Cf(^)) = "directed sum of directed s-t path-flows Pi, P 2 , . . . , P." The
value of F is the sum of the values of those path-flows.
793
FIGURE 8
(7)
(8)
and remove e.
ALGORITHM 1. For a given network N and a pair of vertices s and ^, we
find the maximum flow from s to t, and a minimum s-? cut.
794
WATARU KISHIMOTO
NA
of Fig. 7.
FIGURE 9
795
796
WATARU KISHIMOTO
E. Cut-Flow
Until now we have discussed ordinary flows, in which the capacity of a cut
can be defined as being the sum of the edges in the cut. Next we introduce a
cut-flow for a cut in a network. Using this concept we can define the capacity
of a cut for a flow with a specified reliability against edge or node failure.
DEFINITION 13 [25].
subnetwork of N such that "each edge of G/-" e (Vi, V^). Then Fc is a cut-flow
of (Vi, V^) and the value of F, = Cf{Vu V^).
Since Fc = (Gf, c:f(*)) is a subnetwork of N, then Ci(e) < c(e) for each edge
e in Gf. The "maximum value of cut-flows of (Vi, V^^)" = c( Vi, V^^). A cut-flow
of (Vi, V-^) is a flow between a pair of vertex sets, instead of between a single
pair of vertices.
DEFINITION 14 [25]. Let (Vi, Vz) be a mixed cut of N, and Fc = (G^, c^i*))
be a subnetwork of N such that each edge of Gc is an edge in (Vi, V2) and each
vertex of Gc is a vertex in (Vi, \^) or an endpoint of an edge in (Vi, \ ^ ) . Then
Fc is a cut-flow of (Vi, \^) in N and its value is Cc{ Vi, \ ^ ) . For a cut-flow of a
mixed cut, Eq. (4) of Definition 8 may not be satisfied.
The "maximum value of cut-flows of a mixed cut (Vi, \ ^ ) " = c{Vi, V2).
797
Let Vi = [s^Vc], V2 = {Va,t}, A cut (Vi, V2) consists of edges (s,Va) and
(vc^ t) and a vertex vy. Cut-flow PQ = (Go, ^o(*)) of Fig. 13 is an example of a
cut-flow of a mixed cut {Vi, V2) in NA = (GA^ CA(*)) of Fig. 7. SmcecA(vb) = 10,
capacity co(vb) (= 6) satisfies co(vb) < CA(vb)' However, co(vb) does not satisfy
Eq. (4) of Definition 8 in FQA cut or a mixed cut divides a network into two parts. A cut-flow is a flow
between those two parts, that is, a flow from one part of the network to the
other part of the network. Obviously, a cut-corresponding part of a flow in a
network, where capacities of edges and vertices are the same as those of the
flow, is a cut-flow of the cut. As Theorem 1 (the max-flow min-cut theorem), the
"minimum capacity of s-t cuts in N " is a bottleneck of the "maximum flow
from s to ^," which corresponds to the maximum channel capacity between
two nodes. Therefore, in the succeeding sections we use cuts for specifying the
bottleneck of a flow that corresponds to a channel for the survival method or
restoral method.
F. Matching in a Bipartite Graph
For an undirected graph the degree of each vertex v is the number of edges
incident to v. An undirected graph is said to be regular to degree n if the degree
has the same value n at each of the graph's vertices. An -factor of an undirected
graph is a spanning regular subgraph of degree n.
DEFINITION 15 [25]. A 1-factor network of value /x of an undirected network N is a uniform subnetwork of value /x such that the graph of the subnetwork is a 1-factor of the graph of the network.
798
WATARU KISHIMOTO
A graph Gb = (X^, Y^;^) of Fig. 14 is a simple example of a bipartite graph, where Xb = {^i, uj] and Yb = {w/i, W2}. In G^, edges (^i, w/i) and
(^25 ^2) form a matching with the maximum number of edges.
There are many methods for finding a matching with the maximum number of edges in a bipartite graph. We show a method using an algorithm for
obtaining the maximum flow in a network.
ALGORITHM 3. Let G == (X, Y; ) be a bipartite graph. We will find a
matching of G with the maximum number of edges.
1. For all edges (z/, w) in G where v e Xand w eY^ assign a direction from
V to w,
2. Add two vertices s and t to G. Connect s and each vertex v in X with a
directed edge (s, f). Connect t and each vertex w in Y with a directed edge (w, t),
3. Assign all edges the edge capacity 1. Let the resulting network be N^.
4. Find the maximum flow from s to ^ in N^. From the integrity theorem
of flow network theory, in such a network N^ with integer edge capacities, the
maximum flow value is an integer, and there is a maximum flow in which the
flow value at each edge is an integer. Using Algorithm 1 we can find such a
maximum flow Fm in N^.
5. In G the edges corresponding to the edges with flow value 1 in N ^ form
a maximum matching.
(end of algorithm)
Applying Algorithm 3 to bipartite graph Go of Fig. 14, we get network N ^
of Fig. 15.
Then we can find the maximum flow F^ from s to f in N ^ depicted in
Fig. 16. From this maximum flow we can get a maximum matching consisting
of (wi, wi), (u2, wi) in Go.
DEFINITION 17. An undirected network N is even of value k if each incidence cut of its vertices has the same capacity X,
1
FIGURE 15 Network Nm
799
FIGURE 16 FlowF^
800
WATARU KISHIMOTO
2
Ui)
(W.
Uo]
(W.
Ui^
(W'
3. 0< 5 < 1.
Notation
Ce:8 {Vi, V/)
a
vl/i/)
In this and the next sections we consider 5-reliable flows for the survival
method. First, in this section, we consider the case of the edge version of
5-rehable flow.
A. Edge-6-Rellable Flow
DEFINITION
18
Let a flow F =
(Gp,Cf(*)).If
CF(e)<8f
/ = value of F,
(9)
801
Let a cut-flow Fc =
(10)
802
WATARU KISHIMOTO
FIGURE 20
Let the edges of cut (Vi, \^^) in N = (G, c(*)) be ei, ei,-..,
(12)
Let P = (G',c:'(*))bea network such that G' consists of edges of (Vi, V{^)
in N = (G, c:(*)). Then F is an edge-(5-reUable cut-flow of (Vi, V^) if and only if
c^(ej) < mm{c(ej)^80},
(13)
A necessary and sufficient condition for a value 6 to be the value of an edge-(5reliable flow is
0 <^mm{c(ek),8e}.
(14)
k=i
The next method for obtaining the edge-5-reliable capacity of a cut is described. Theorem 2 shows that Algorithm 5 is correct.
ALGORITHM 5 (EVALUATION O F EDGE-5-RELIABLE CAPACITY OF A CUT).
1. The sequence Oj, for 1 < / < a, is the 0 sequence of (Vi, \^^) defined as
(15)
/-I
^(j)\c{VuV^}-Y,c(ek)l
(16)
803
It can be shown that ^T- = 0 if and only ii q < a. The q < a impHes that a
5-reliable cut-flow requires that capacity be distributed over more edges than
are in the cut, so the edge-5-reliable cut-capacity must be zero. The next lemma
shows a property of the 0 sequence.
LEMMA 2.
(18)
(19)
3. Then
80i = 0.6'12 =
7.8<8(=c(ei)),
804
WATARU KISHIMOTO
,^^.
(end of procedure)
When the procedure stops, /ik is the maximum value of edge-(5-reliable flows
from s to t. Then the minimum s-t cut in Nk is also the minimum s-t cut for
edge-5-reliable flows in N.
(end of algorithm)
THEOREM
flow.
805
A. Vertex-^-Reliable Flow
DEFINITION
21.
(21)
FIGURE 22 AflowF3.
(22)
806
WATARU KISHIMOTO
FIGURE 23 AfiowF4.
An edge failure or vertex failure causes at most the fraction 5 of a vertex(5-reliable flow to fail. Therefore, at least (1 5) (value of the flow) always
survives the failure of an edge or vertex. The flow P4 of Fig. 23 is an example of
vertex-0.6-reliable flow. Each "capacity of the edge in ^4" < 3(= 0.6 5). The
capacities of vertices Va, Vb, Vc are 1, 3, 3. Each "capacity of the vertex in F4
except s and f < 3. Therefore, F4 is a vertex-0.6-reliable flow.
B. 6-Reliable Capacity of a Mixed Cut
Although the max-flow min-cut theorem of the edge-5-reliable flow has been
proved for ordinary cuts, the theorem for the vertex-(5-reliable flow is considered
for mixed cuts.
DEFINITION 22. Let Fc = (G^, cd^)) be a cut-flow of a mixed cut {Vi, V2)
in N. Let fc be the value of f^. If
(23)
Cc(v) <8'fc.
(24)
807
c{VuV2) = J2^f,
(25)
;=i
(26)
q-l).
<)=VI,(;).^^,
k=j
/-I
= Vl/(;)
(27)
^(Vl,V2)-^^^
such that
(28)
808
WATARU KISHIMOTO
1. Delete Vj,
2. All edges that exit from Vj are transferred to exit from a new vertex
(o)
3. All edges that go into Vj are transferred to go into a new vertex Vj.
4. Connect v^' and vf with the ej that exits from vf and goes into z/y^
Set c(ej) = c(vj). The ej is the bridge edge of Vj.
(end of transformation)
An edge-5-reliable flow in N^^^ becomes a vertex-5-reliable flow in the original N b y contracting each bridge edge between Vj and Vj to the vertex Vj, The
"vertex-5-reliablecapacity of ans-^mixedcut (Vi, V2) in N " = "edge-6-reliable
capacity of the corresponding cut in N^^^". Since in N^^^ the max-flow min-cut
theorem holds for the edge-5-reliable flow, the max-flow min-cut theorem of
vertex-5-rehable flow can be obtained easily.
THEOREM 5. The maximum value of vertex-6-reliable flows from a vertex
s to a vertex t in a network equals the minimum value of vertex-S-reliable
capacities ofs-t mixed cuts.
RELIABLE FLOWS).
809
s-t mixed cut for vertex-5-reliable flow in N. Then we can select the vertex
sets Vi and V2 that represent the above set of edges and vertices as (Vi, \^)
inN.
(end of algorithm)
THEOREM
flow.
EXAMPLE 5 (AN EXAMPLE OF OBTAINING THE VERTEX-0.6-RELIABLE
FLOW). Using Algorithm 8, we shall evaluate the maximum value of edge-0.6-
{Va,t}.
(end of example)
F I G U R E 26
810
WATARU KISHIMOTO
FIGURE 27
V. m-ROUTE FLOW
In this section we consider m-route flows for the restoral method. First, we
consider the case of the edge version of m-route flow. Then we consider the
case of the vertex version of m-route flow.
A. Edge-m-Route Flow
DEFINITION 24 [25]. An elementary edge-m-route flow from s to ^ is defined as a subnetwork of N, which is the sum of m edge-disjoint s-t pathflows, each of which has the same value /x. "Path flows Pi = (G^, c:/(*)) and
Pj = (Gj^ Cj(^)) are edge-disjoint" means that Gi and G/ are edge-disjoint paths
from s to t. The value of the elementary edge-m-route flow is defined as m /x.
DEFINITION 25 [25]. An edge-w-route flow from s to ? is a subnetwork of
N such that the subnetwork is the sum of elementary edge-m-route flows from
s to t. The sum of values of those elementary edge-m-route flows is called the
value of the edge-m-route flow.
Both the networks of mi of Fig. 28a and mi of Fig. 28b are elementary
edge-2-route flows from s to t. The network MQ of Fig. 29 is the edge-2-route
flow that is the sum of mi and m2. Both values of mi and mi are 2. Therefore,
the value of MQ is 4.
The maximum value of edge-m-route flow between two vertices corresponds to the maximum capacity of m-route communication channels between
two terminals.
DEFINITION 26 [25]. li 8 = 1/m (m is a positive integer) we say an edge5-rehable flow is an edge-m-leveled flow, and the edge-5-rehable capacity of a
cut is the edge-m-route capacity of the cut.
JI^^B FIGURE 28
81
8 I2
WATARU KISHIMOTO
On the basis of these theorems, Ref. [25] shows a method for finding the
maximum edge-m-route flow between a given pair of vertices in a network. The
method finds the maximum value of edge-m-route flows and the minimum cut
by using a method for maximum edge-(5-reliable flow. Then it gives a maximum
edge-m-route flow and represents the flow as the sum of elementary edge-mroute flows.
ALGORITHM 9 [22, 23, 25]. Let s and t be the distinct vertices in a given
network N. We will find a maximum edge-m-route flow between s and t,
(30)
VF = [ / U W ,
(31)
where both Ui and wi correspond to the edge ei in Mo. The edge set Ep of Gp
consists of edges between a vertex in U and a vertex in W, We write the capacity
of edge (x, y) as cp(x, y). The capacity of each edge is defined as
Cf(ui, Wj) = the sum of values of the path-flows that pass through edge ej
just after passing through edge ei among P i , . . . , P^,
c:p(s, Wj) = the sum of values of the path-flows that pass through edge ej
directly from vertex s among P i , . . . , P^, and
Cfiui, t) = the sum of values of the path-flows that end at vertex t just
after passing through edge ei among P i , . . . , P^.
The edges assigned the capacity of 0 are removed.
4. (a) For each pair of vertices Ui and Wj in Np, cp(J(w/i)) = c^(I(ui)).
Let the value be pi, and then pi < ^' Add edges (uj, Wj) (1 < i < r) with the
capacity of ^ - pi to Np only if pi is less than ^ .
(b) Then replace vertex s by m vertices {si, s i , . . . , Sm] and ^ by m vertices
(c) Let the edges incident to the vertex s be (s, Wj^), (s, WJ2), 5 (^j t^/jj.
We will divide those edges and connect them to si, s i , . . . , Sm such that the
capacity of the incidence cut of each s/ is equal to ^ as follows.
8 I 3
CF(S,WJ.)>~,
t\
(32)
CF(SI, Wj^).
(33)
m
For S2, S35..., S; proceed in the same way.
(d) The edges incident to the vertex t are divided in the same way.
(e) Let Nu = (Gu, cu(*)) be the resulting undirected bipartite network,
where Gu = (Xu, Yu; u).
5. Let
Xu = (U - {a}) U{aua2,,.,,
a^},
(34)
Y,j = (W-{b})U{bub2,...,bm}.
(35)
(36)
where U7=it^^; J = (^i? ^2? ? tm}- Hence, each EJ^^ corresponds to a path-flow
with the value v/ from s to ^ in N|^^^ Since Ey are pair-wise edge-disjoint, N|^^^
is an elementary edge-m-route flow with the value of m v/ from s to ^ in N.
8. Let Na be the sum of those elementary edge-m-route flows
N^^\N^^\.
. .^N^^\ Consequently, NQ, is an edge-m-route flow with the value
X from s to ? in N.
(end of algorithm)
814
WATARU KISHIMOTO
F I G U R E 30
(37)
(38)
(39)
Vp = U U W.
h)},
(40)
(41)
815
&
<D
(d)P4
FIGURE 31 Path flows (a) P i. (b) Pi, (c) P3. and (d) P4.
(42)
(43)
7(3)
E;
(44)
= {(^2, m), (m, Ws), (us, W4), (U4, We), {U6, ti)}.
(45)
7(3)
816
WATARU KISHIMOTO
F I G U R E 32
8. Let K be the sum of N^^\ N^^\ and N^^l K (Fig. 36) is an edge-2route flow with the value of 8 from s to t. (In this example Na is the same as MQ.)
(end of example)
C. Vertex-m-Route Flow
We consider the case of the vertex version of m-route flow. From our viewpoint,
the definitions for vertex-m-route flows are easy to follow.
DEFINITION 27 [25]. An elementary vertex-m-route flow from s to ^ is
defined as a subnetwork of N, which is the sum of m internally disjoint s-t
path-flows, each of which has the same value /x. "Path flows P/ = (G/, c:/(*))
and Pj = (Gy, c/(*)) are internally disjoint" means that Gj and Gy are internally
disjoint paths from s to t. The value of the elementary vertex-m-route flow is
defined as m /z.
DEFINITION 28 [25]. A vertex-m-route flow from s to ^ is a subnetwork of
Nsuch that the subnetwork is the sum of elementary vertex-m-route flows from
s to t. The sum of values of those elementary vertex-m-route flows is called the
value of the vertex-m-route flow.
Both the networks of mi of Fig. 37a and mi of Fig. 37b are elementary
vertex-2-route flows from s to t. The network Mi of Fig. 38 is a vertex-2-route
FIGURE 33
817
flow that is the sum of mi and mi. Both values of mi and m2 are 2. Therefore,
the value of Mi is 4.
DEFINITION 29 [25]. If 5 = 1/m (m is a positive integer), we say a vertex5-rehable flow is a vertex-m-leveled flow, and the vertex-5-reUable capacity of
a cut is the vertex-m-route capacity of the cut.
Obviously, a vertex-m-route flow is a vertex-m-leveled flow. Like the theorem of edge-m-route flow, the next theorem is easy to follow.
THEOREM 10 [25]. The maximum value of vertex-m-leveled flows from s
to t is equal to the minimum value of vertex-m-route capacities ofs-t cuts.
F I G U R E 34
F I G U R E 35
j(3)
/M N^,
M(2) and (c) K'
Elementary edge-2-route flows (a) N (')
i ' \ (b)
F I G U R E 36
819
F I G U R E 37
820
WATARU KISHIMOTO
FLOW).
FIGURE 39
821
4. Contracting each bridge edge between Vj and vf^ in (Vi, V^) to the
vertex Vj we transform {Vi,V^) into the minimum s-t mixed cut (Vi, V2) for
vertex-(5-rehable flow in NA, where Vi = {s, Pc}^ V2 = {Va, t],
(end of example)
VI. SUMMARY
FIGURE 40
822
WATARU KISHIMOTO
F I G U R E 41
F I G U R E 42
823
with given network resources. Moreover, when such channels do not exist, we
determine the locations of the bottlenecks so that we can improve the network
resources efficiently. We considered these problems as reliable-flow problems
in a network.
REFERENCES
1. Zolfaghari, A., and Kaudel, F. J. Framework for network survivability performance. IEEE
J. Select. Areas Commun. 12: 46-51, 1994.
2. Wu, T. H. Fiber Network Service Survivability. Artech House, Norwood, MA, 1992.
3. Alevras, D., Grotschel, M., Jonas, P., Paul, U., and Wessaly, R. Survivable mobile phone
network architectures: Models and solution methods. IEEE Commun. Mag. 88-93, Mar.
1998.
4. Wilson, M. R. The quantitative impact of survivable network architectures on service availability. IEEE Commun. Mag. 122-126, May 1998.
5. Evans, J. R. Maximum flow in probabilistic graphs^The discrete case. Networks 6: 161-183,
1976.
6. Lee, S. H. ReUability evaluation of a flow network. IEEE Trans. Reliability 29(1): 24-26,
1980.
7. Shogan, A. W. Modular decomposition and reliability computation in stochastic transportation
networks having cutnodes. Networks 12: 255-275, 1982.
8. Kulkarni, V. G., and Adlakha, V. G. Maximum flow in planar networks with exponentially distributed arc capacities. Commun. Statistics-Stochastic Models 1(3): 263-289,
1985.
9. Rueger, W. J. Reliability analysis of networks with capacity-constraints and failures at branches
and nodes. IEEE Trans. Reliability 35(5): 523-528, Dec. 1986.
10. Alexopoulos C., and Fishman, G. S. Characterizing stochastic flow networks using the Monte
Carlo method. Networks 21: 775-798, 1991.
11. Alexopoulos, C , and Fishman, G. S. Sensitivity analysis in stochastic flow networks using the
Monte Carlo method. Networks 23: 605-621, 1993.
12. Alexopoulos, C. A note on state-space decomposition methods for analyzing stochastic flow
networks. IEEE Trans. Reliability 44(2): 354-357, June 1995.
13. Ball, M. O., Hagstrom, J. N., and Provan, J. S. Threshold reliability of networks with small
failure sets. Networks 25: 101-115, 1995.
14. Strayer, H. J., and Colbourn, C. J. Bounding flow-performance in probabilistic weighted networks. IEEE Trans. Reliability 46(1): 3-10, 1997.
15. Chan, Y., Yim, E., and Marsh, A. Exact and approximate improvement to the throughout of a
stochastic network. IEEE Trans. Reliability 46(4): 473-486, 1997.
16. Biggs, N. L., Lloyd, E. K., and Wilson, R. J. Graph Theory 1736-1936. Clarendon Press,
Oxford, 1976.
17. Konig, D. Uber Graphen und ihre Anwendung auf Determinantentheorie und Mengenlehre.
Math. Ann. 77: 453-465, 1916.
18. Konig, D. Theorie der Endlichen und Unendlichen Graphen. Chelsea, New York, 1950.
19. McDonald, J. C. Public network integrityAvoiding a crisis in trust. IEEE J. Select. Areas
Commun. 12: 5-12, 1994.
20. Dunn, D. A., Grover, W. D., and MacGregor, M. H. Comparison of ^-shortest paths and
maximum flow routing for network facility restoration. IEEE J. Select. Areas Commun. 12:
88-99,1994.
21. Doverspike, R. D., Morgan, J. A., and Leland, W. Network design sensitivity studies for use
of digital cross-connect systems in survivable network architectures. IEEE J. Select. Areas
Commun. 12: 69-78, 1994.
22. Egawa, Y, Kaneko, A., and Matsumoto, M. A mixed version of Menger's theorem. Combinatorica 11(1): 71-74, 1991.
23. Aggarwal, C. C , and Orlin, J. B. On multi-route maximum flows in Networks, Preprint, 1996.
824
WATARU KISHIMOTO
24. Nagamochi, H., and Ibaraki, T. Linear-time algorithms for finding a sparse i^-connected spanning subgraph of a ^-connected graph. Algorithmica 7: 583-596,1992.
25. Kishimoto, W., and Takeuchi, M. On m-route flows in a network. lEICE Trans. J-76-A: 11851200,1993. [In Japanese.]
26. Kishimoto, W. A method for obtaining the maximum multi-route flows in a network. Networks
27: 279-291,1996.
27. Kishimoto, W. Reliable flows against failures in a network. In Proceedings of the 1994 lEICE
Spring Conference, Yokohama, A-9, April 1994. [In Japanese.]
28. Kishimoto, W. ReUable flows against failures in a network. In Proc. of the 1994 lEICE Spring
Conference, A-11, April 1994. [In Japanese.]
29. Kishimoto, W. Reliable flow with failures in a network. IEEE Trans. Reliability 46(3): 308-315,
1997.
30. Kishimoto, W., and Takeuchi, M. On two-route flows in an undirected network. lEICE Technical Report, CAS90-19, DSP90-23, 1990. [In Japanese.]
31. Kishimoto, W., Takeuchi, M., and Kishi, G. Two-route flows in an undirected flow network.
lEICE Trans. ]-75-A(ll): 1699-1717, 1992. [In Japanese.]
32. Ford, L. R., Jr., and Fulkerson, D. R. Flows in Networks. Princeton Univ. Press, Princeton, NJ,
1962.
33. Korte, B., Lovasz, L., Promel, H. J., and Schrijver, A. Paths, Flows, and VLSI-Layout. SpringerVerlag, Berlin, 1990.
34. Chen, W. K. Theory of Nets: Flows in Networks. Wiley, New York, 1990.
35. Ahuja, R. K., Magnanti, T. L., and Orlin, J. B. Network Flows: Theory, Algorithms and Applications. Prentice-Hall, Englewood Cliffs, NJ, 1993.
36. Ozaki, H., and Shirakawa, I. Theory of Graphs and Networks. Corona, Tokyo, 1973. [In
Japanese.]
TECHNIQUES IN MEDICAL
SYSTEMS INTENSIVE CARE UNITS
BERNARDINO ARCAY
CARLOS DAFONTE
JOSE A. TABOADA"^
Department of Information and Communications Technologies, Universidade Da Coruha,
Campus de Elviha, 15071 A Coruha, Spain; ^Department of Electronics and Computer Science,
Universidade de Santiago de Compostela, 15782, Santiago de Compostela (A Coruha), Spain
825
826
ARCAYETAL
HOSPITAL DATABASE
S
..5?.^('?^.f5f!?f:!j.w
'!**-
^.S9!!f^M!^?fi9!i/.F!^^f^f?..?f.f??^..,
GKMc
Therapeutic actions
FIGURE I
^m^f^B^^^^MeasurementSr
827
828
A. Information Acquisition. Networic Structure
The integration of devices proper to an ICU has several approaches from the
viewpoint of network design. On the one hand, researchers have proposed ad
hoc solutions that allow the connection of a set of medical devices of one specific
trademark. It is a method that leads quickly to the implementation of this philosophy and has frequently been used as a strategy for the development of highlevel applications in these clinical environments. Unfortunately, the enormous
variability of the connection hardware and software between various medical
devices, even if they come from the same manufacturer, represents very often an
important obstacle to designers who must maintain and update the applications.
A second and more flexible solution consists of incorporating the standards
that come from the industries. Corporations, however, often do not consider the
particularities of a ICU environment, which requires a series of requirements
nonexistent in other environments. Let us mention some of these requirements,
keeping in mind the large variety and amount of generated information and
aiming at the smallest requirement of a user for routine tasks.
Apart from the characteristics common to any kind of data network, we
must consider a structure that is flexible enough to:
Allow the connection of medical devices that differ in trademark and
version, thereby facilitating expansion and updating of the system:
Allow a hot plug-in so as to modify the monitoring of a patient through
the incorporation or elimination of devices.
Reconfigurate automatically the network after the connection or disconnection of devices. This "plug and play" capacity implies that the system is able
to relate automatically each apparatus with the bed or the patient it has been
connected to, without asking any kind of special intervention from the user.
Allow the configuration of the distinct devices in the ICU by offering a
uniform interface for all of them.
Make the local network connect with other hospital networks, thus connecting their databases, in order to facilitate the future incorporation of information coming from other services such as administrative, laboratory, or image
data.
Automatic recovery in case of failure: errors should indeed have the smallest impact possible on the final behaviour of the system. An example of how this
can be achieved is the local storage of information in case of a communication
disconnection.
Indicate a deliberate disconnection in order to avoid generating alarms
associated with the malfunctioning of devices or communications.
The third and final initiative for the design of centralized systems proposes
the creation of a specific LAN (local area network) standard to perform in an
ICU environment (Fig. 2). After two members of AMIA (Gardner and Shabot)
launched the idea in 1982, a special committee of the IEEE was created in 1984
to that very purpose. The project was called IEEE P1073 Medical Information Bus (MIB) [14] and included the definition of the seven layers of the OSI
(Open System Intercommunication) model of the ISO (International Standard
Organization).
829
Devices check
Therapeutic actions
CENTRAL
INTELLIGENT
MONITORING
SYSTEM
FIGURE 2
The standards proposed for the lower levels and described in IEEE documents 1073.3.1 and 1073.4.1 were approved in 1994 and entirely adopted by
the ANSI (American National Standard Institute). It is these documents that
are being used as a basis for the development of the European CEN TC251
standard, which other countries, such as Canada and Australia, are also about
to adopt.
The higher levels are charged with the communication of the data, content,
format, and syntax of the message. Since there is such a great variety and
quantity of information, these levels present the largest number of problems
for reaching consensus on what defines a standard. While this book is being
published, researchers are currently working on a version that will be submitted
to the development teams for a vote.
Although at these levels the standard proposed has its own language for
communications, the MDDL (Medical Data Device Language), it also acknowledges the contributions of other organizations and teams dedicated to the development of medical standards; the final version will therefore accept the use
ofHL7.
Founded in 1987, the HL7 aims at the development of standards for the
electronic exchange of clinical information that, apart from being clinical, is
also administrative and economic. Its name refers to the fact that it describes
only the higher levels of the OSI model, i.e., the application level (level 7).
830
ARCAYETAL
i m
TABLE I
Health Level-7
X12N
DICOM
ASTM
NCPDP
IEEE
Clinical
Context
Object
Workgroup
We must remember that, although the standard we describe was developed by the IEEE, the urgent need for a uniformalized communication between
medical devices has pushed each country or community to develop its own
standards in this field. There have also been other proposals for standards that
uniformalize the format or the exchanges of all sorts of medical information.
Table 1 describes a few of these standards.
B. Real-Time Information Management
The process of monitoring and continuous supervision associated with an ICU
implies one very important restriction: the need to work with real-time information. It is a time aspect that affects all the levels of an information system
and, consequently, all the associated processes. We must therefore see the time
variable as another part of the system and integrate it into the various tasks:
acquisition, network management, classification, storage in the database, exploitation, design of the inferential processes, presentation, etc.
As for the temporal problem, a few of the characteristics of these systems
could be enumerated as follows [23]:
Nonmonotonicity. Conclusions valid in the present may become invalid
at a later moment, so we must provide mechanisms that maintain the coherence
in the knowledge base and in the databases.
Continuous operation. These systems must remain active over large periods of time. In order to guarantee the robustness of the system, we will have
to make full use of the memory and install mechanisms of recuperation after
errors.
Reaction capacity in the case of asynchronous events. The system needs
to be able to detain its reasoning in order to assimilate the consequences of
events such as alarms, and will do this through an occasional modification of
the scheduling proposed.
83 I
832
ARCAYETAL
This introduction of a time variable in the reasoning of the system compHcates not only the size of the tree, but also the validity of the conclusions.
Therefore, in order to treat time as one more variable of the problem, specific
reasoning strategies were developed: the temporal reasoning in critical time, the
interruptible reasoning, and most of all strategies to maintain the truth associated w^ith the nonmonotonicity of the reasoning, a primordial factor in those
systems that consider the time element.
Time is not the only parameter that influences the reasoning; the fact that
we work in a medical environment with imprecise information makes us treat
this reasoning matter with great uncertainty [50]. Finally, the possibility of
working with partial models or with heuristic decision rules puts before us the
choice of either the use of reasoning based on "deep" knowledge or that based
on "shallow" knowledge [32].
I. Temporal Reasoning
833
pulsations can be perfectly valid if they have been taken with a time interval of
15 min, but they must be considered incorrect if the measurements were made
with an interval of 5 s.
Approaching this problem in such a generic way as described above would
consume important computational resources that cannot be disposable in realtime execution systems. Besides, these systems often contain necessary mechanisms for the automatic incorporation of information and in this way simplify
the problem and allow a less rigurous focus on the temporal problem:
(1) Date stamps are numeric values in the shape of segments or instants.
This simplifies enormously the analysis of the precedency, since it will be reduced
to an algebraic analysis.
(2) Coherence of the temporal information is guaranteed, since the facts
or data are automatically introduced by the domain reasoner and hence do not
need to be verified.
It is for this reason that the majority of these systems turn toward ad
hoc solutions for solving the temporal problem, mixing deliberately the time
reasoning and the reasoning in the domain time.
2. Reasoning in Critical Time
All real-time reasoning systems face a certain limit time, a moment from
which their solutions start to lose their value. This limit can be either hard or
soft [44]: hard in cases like the triggering of an alarm after the damage that
it was supposed to avoid has been done; soft when a conclusion about, for
instance, the general condition of a patient gradually loses its initial efficiency
even though it has been valid for a certain time. For those real-time systems that
have hard limits it is fundamental to obtain a solution before reaching the limit
time. They can do this by triggering, for instance, the alarm while showing the
motive, even though it is impossible to specify why the alarm went off or how
it could be corrected.
The largest problem with expert systems is that, due to the complexity of
the space of states with which they work, they do not allow us to calculate
how long it will take them to find an answer. There exist, however, several
solutions for guaranteeing that this answer is found within the disposable time
[15].
One possibility is progressive reasoning, which implants the knowledge in
levels: the first levels are focused on finding a fast solution, the subsequent levels
on refining it. In this way, the system starts with a coarse solution, refines it
through the execution of successive levels, and stops the process when it reaches
its limit time.
Another solution presupposes that we know exactly how much the system
disposes to come to a conclusion (the sampling time of the information, for
instance), which means that we are able to lead the reasoning in various directions to reach more or less elaborated solutions. This method is equivalent to
realizing more or less drastic cutting mechanisms of the states tree [34]: if the
disposable time is larger, the cutting process will be smoother and the states
tree can be examined more deeply.
834
ARCAYETAL
3. Interruptible Reasoning
An expert system based on facts invariable with time can use the conclusions
it has obtained in a given moment as part of the database for a subsequent
reasoning cycle. In that case we call it a monotonous reasoning, and it is identical
to those reasonings whose conclusions stay valid over time.
Quite the opposite is a system whose basis varies in time and which must
therefore continuously revise its conclusions to decide whether they are still
correct. For this reason these systems incorporate in their facts base mechanisms for maintaining the truth [12], which serves to eliminate those data and
conclusions that have stopped being valid. These mechanisms can go through
the states tree, modifying the elements that depend on each new information,
or they can change their values to "unknown" so that they must be recalculated
whenever they are needed.
5. Reflexive Reasoning
83 5
be taken about what tasks to realize according to the resources of the system
(CPU occupation, disposable memory, etc.). In this way the pruning algorithm,
selected during the reasoning in critical time, would have to consider not only
the available time but also the occupation level of the system.
A particular case of reflexive systems are the introspective systems. In these
systems, the motive for reflection is to determine "what the system knows
about what it knows." An example of this behavior is the default reasoning, in
which part of the information being analyzed corresponds to generic values or
parameters that are only substituted by real data when these are available. In
order to realize such a replacement, the system obviously needs to be conscious
of the existence of such data.
6. Reasoning with Uncertainty
Rather than resolve the problem of working in real-time, the reasoning with
uncertainty allows us to work with complex problems, or with incomplete or
imprecise information. The MYCIN, for example, one of the first expert systems
in medicine, made it possible to diagnose bacterial diseases while simultaneously
taking into account its own impreciseness when coming to conclusions.
There are various methods for working with uncertainty. As we have seen
in the case of time-related systems, these methods imply the introduction of
the uncertainty's representation in the knowledge base and the capacity of the
system to propagate the uncertainty and reason with it:
Classical probabilistic logic. Basically, it supposes the application of the
Bayesian probability that generally leads to complex calculations in the propagation processes.
Certainty factors. This method was used in the MYCIN and implies that
each fact and rule of the knowledge base has a determined reliability, although
only those rules that possess a certain degree of certainty in their conditional
part are used in the analysis.
Groups and fuzzy logic. This is an intent to formalize the reasoning with
uncertainty in which, to a certain degree, each element belongs to each possible
classification set.
7. Superficial Reasoning and Profound Reasoning
836
ARCAYETAL
various cycles of testing and modifying, these kind of systems can reach a high
level of competence.
Nevertheless, the negative side is that those possibilities that have not been
considered will irrevocably lead all the systems to incorrect decisions that must
be corrected. As a result, the behavior of the systems designed accordingly
does not slowly degrade itself as the situations move beyond their knowledge
domain; quite on the contrary, each possibility not foreseen in the design of
the system provokes the appearance of incorrect answers. Another drawback
of these kinds of systems is their explanation capacity, since it only takes into
account a trace of the executed rules. This trace merely describes the evolution
of the system through the states tree and does not necessarily coincide with a
rational argumentation.
This type of knowledge description has been very frequent in so-called first
generation expert systems, because they make it possible to generate rapidly
efficient prototypes. Now researchers are working on what they call second
generation expert systems, which are systems based on profound or causal
knowledge [8] and provided with information about the interaction between
the elements that compose the domain, for instance, the causal links between
pathologies and their symptoms [38]. Ideally, the profound knowledge presupposes a theory capable of explaining everything about a concrete domain; this
explanation then serves as the basis for all actions.
The main qualities of a system with profound knowledge are the following:
More robustness. As the rules of these kinds of systems are based on a
causal model, we will need to purify the model whenever it is unable to explain
a particular case. This correction will not only concern the error that originated
the change, it may also cover other errors that initially had not been considered.
Easier maintaining of the knowledge base. The methodical representation
of the knowledge in this method implies a clearer representation and as a result
facilitates the comprehension and alteration of the knowledge base.
Reusability possibility. The same qualities that facilitate the update allow
knowledge transfer between systems that work in the same domain.
Improvement of the explanatory possibilities. The availability of profound knowledge allows us to generate causal explanations of the behavior of
the system and its actions.
This model is obviously not without inconveniences. The main drawback is
that in many cases, and especially in the field of medicine, we do not know of any
model that can be implemented. This situation has justified the use of heuristic
expert systems, i.e., based on superficial knowledge, for solving problems of
implementation. Apart from that, applications with this kind of knowledge are
much harder to develop and need more time and efforts.
C. Integration of the Intelligence
Up to this point we have treated the problem of centralized information acquisition in intensive care environments. We have discussed ways of treating the
complex problem of introducing time as a reasoning element and the capacity
of knowledge-based systems to treat complex problems using incomplete or
837
838
ARCAYETAL
are disposable in the system at that moment (network congestion, states of the
acquisition devices, capacity of the processors, etc.), analyze them contextually,
and consequently define the planning of the tasks that must be executed, always
giving priority to some and slowing down or eliminating others (Guardian).
Here, as in any information system, the generation and presentation of
reports is a fundamental aspect. Using the technique of the expert systems is
not necessary, but it allows us to present the information in an interpreted
way, generating documents that can be assimilated faster by the clinicians and
facilitating the development of systems that actively help the physician to make
decisions. A pioneer in both tasks has been the HELP [36] system.
Once we dispose of the acquired information, it is possible to analyze it
with expert systems specifically developed for the pathology or pathologies that
must be treated. Most systems have been developed at this stage, many times
ignoring the temporal problem and focusing on the knowledge engineering
tasks (MYCIN [40], DXplain [6], Internist-1 [29,30], etc.).
Since the different knowledge modules that we have been analyzing do
not constitute isolated entities, but all work toward the same purpose and on
the basis of the same data, there must be communication between them. To
this effect researchers have developed strategies like the "blackboard" used in
Guardian or distibuted architectures like that proposed in SIMON. In any case
it is important to mention that one and the same task may present both soft and
hard temporal requirements according to its particular purpose. This occurs,
for instance, with the selection of a therapy, which will be a slow process if it
concerns the recuperation of a pathology, but a very fast process if we pretend
to correct an alarm. The SEDUCI system [46,47] deals with these criteria by
proposing a development of distributed expert systems. This means that it
incorporates intelligent mechanisms in the hardware elements close to the patient (reasonings that could be called "reflexive," with less answering time) as
well as in the central system, conceived for diagnosis and selection of a therapy.
FIGURE 3
839
840
ARCAYETAL
security, and velocity that could only be improved by investing heavily in design and implementation and are therefore out of reach of most expert system
developers.
A. Intelligent/Deductive Databases
In this type of system, the deductive components have been incorporated into
the database manager system. These deductive components can improve the
functionality and efficiency of the database by allow^ing it to maintain, among
others, restrictions of integrity and optimization of queries. Besides, during the
incorporation of deductive components into the system, the knowledge rules
that interact with the stored data can be implemented in the same environment without using an external tool: this allows typical database applications
(data entry, report generation, etc) and the intelligent processing to realize tasks
together.
We can define a database as a triple [31] composed of the following elements:
(1) A theory. It consists of a finite set of axioms and meta-rules that
describe how one operates with the particular theory.
(2) Integrity constraints. These must be satisfied by a database that must
be consistent with the theory.
(3) A set of constants. A finite set over which the theory is defined.
The way in which this definition tends to be implemented allows us to
maintain it using a syntax similar to that of natural language, which means that
the integrity restrictions of the database can be expressed with knowledge rules.
These database systems also seek to improve the mechanisms for manipulating
the information they contain. Instead of using a very strict language to consult,
modify, or eliminate data, they pretend to facilitate an interaction in near natural
language by means of integrating expert systems. This kind of interaction would
avoid the use of complex semantic constructions that now hinder the control
of these data managers. Some of the capacities provide the expert systems with
the following:
1. They make it possible to design front-ends, even in natural language,
to interact with the database in a less complex manner.
2. They trigger the periodical or event-driven maintenance tasks by
means of a simple syntax, close to that of natural language.
3. They facilitate the expression of complicated queries, like transitive
and recursive ones.
Most of the relational managers that are being sold at the moment incorporate more or less sophisticated rule systems charged with triggering tasks
(triggers).
B. Expert/Database Systems with Communication
Up to this moment this has been the most frequently used type of integration
in the expert systems domain. In this case the expert systems and the database
84 I
842
ARCAYETAL
843
Identifier
Name
Enter
Exit
23.332.222
7/1/1998
8/2/1998
1/5/1999
1/10/1999
23.332.222
34.865.345X
1/18/1999 1/19/1999
34.333.212
FIGURE 4
Reason
1/19/1999 void
These time periods do not have to be the same for a single fact. In the
database we have, for instance, information on patients that were hospitalized
the year before the actual one. The valid time of these facts is somewhere
between January 1 and December 31; the transaction time, however, starts
when we insert the facts into the database.
Suppose we want to store information on the patients of an intensive
care unit. The database table shown in Fig. 4 stores the history of patients
with respect to the real world. The attributes Enter and Exit actually represent a time interval closed at its lower bound and open at its upper bound.
The patient Anton Serrapio, for instance, was treated in the hospital from
7/1/1998 until 8/2/1998 for a Craneoencephalic Traumatism and later on
he entered again the ICU on 1/19/1999; the "void" value in the exit column indicates that this patient is still being treated and that the data are actual.
A temporal DBMS should support:
A possibility in the case of relational databases is the SQL3 specification. This change proposal specifies the addition of tables with valid-time
support into SQL/temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to a temporal system.
The proposal describes the language additions necessary to add vaHd-time
support to SQL3. It formally defines the semantics of the query language by
providing a denotational semantics mapping to well-defined algebraic expressions.
This kind of database has multiple applications, among which is medical
information management systems. Nevertheless, commercial database management systems such as Oracle, Informix, etc., are, at this moment, nontemporal
DBMS since they do not support the management of temporal data. These
DBMS are usually the ones we find in hospitals and ICUs, which means that
we must use a type date supplied in a nontemporal DBMS and build temporal
support into applications (or implement an abstract data type for time); the
management of timestamps must take place inside the monitoring applications
and expert systems involved, using techniques similar to those described in the
previous section.
844
ARCAYETAL
845
Intelligent
Monilor
FIGURE 5
846
ARCAYETAL
(MEA) approach that OPS/83 inherited from its predecessors; the MEA strategy
drives the identification and satisfaction of a succession of intermediate goals
on the path to satisfying a global objective [13,18]. The inference engine of our
monitoring expert exhibits three main modifications of this strategy. Firstly,
because of the very nature of monitoring, in normal circumstances the whole
system runs in a loop rather than terminating upon satisfying a global objective.
Secondly, for many rules the element defining the logical path of reasoning is
the element representing current real time, which is updated once per second
from the inference engine itself immediately before redetermining the set of rule
instantiations (an instantiation is a rule together with a particular set of existing memory elements satisfying its conditions); in this way, rules mediating the
acquisition of data via the MIB, the DAS, or the Ethernet at specified sampling
times will be executed immediately the time condition is satisfied as long as the
rest of their conditions are also satisfied. Thirdly, the inference engine includes
a memory analysis procedure which temporarily suspends MEA-type reasoning when memory threatens to become saturated; when this happens, all the
rules whose conditions are currently satisfied are executed so that information
elements made obsolete or superfluous by their execution can be destroyed,
and this element destruction is implemented in the inference engine itself rather
than by rules.
Communication between the bedside PCs, as BCCs, and the DCCs of the
monitoring devices is governed by a protocol using MDDL, the Medical Device
Data Language, defined for this purpose in IEEE standard P1073.
The workstation runs under UNIX, which allows simultaneous multitasking operation of the central user interface, the medical reasoning systems, analysis tasks referred to by the monitoring experts, communication with the Informix database, and a central control module managing the data exchange
between various tasks.
The selection of an Informix database running under DOS is made because
it is a system with a reasonable cost and it will allow us to do a simple migration
to a UNIX platform; it can also function as an autonomous system.
Communication between the expert systems and the Informix database
is managed on a client-server basis by a module organizing data requests by
priority before passing them in the form of SQL queries to the database; problems associated with the use of external data requests on the left-hand sides of
OPS/83 rules have been circumvented by the use of C routines to pass the SQL
queries.
I. Data. Its Acquisition and Time-Dependent Aspects
847
When ample memory is available the normal garbage collection routine allows
retention of at least one already-processed Mdata element for each measured
variable.
From that point of view, a fundamental aspect is the scheduling of these
tasks: to situate in time the acquisition processes, i.e., determine their frequency,
while keeping in mind the latency of each variable, its tendency, the process
state, and the possible existence of artifacts.
These acquisition and classification rules treat the time variable as one of
their LHS elements. Time is represented in segments [2], and the granularity is
measured in seconds.
In order to establish the taking of actions in a time context, we compare
the system's "real-time" variable with the "own.time" variable of the variable
or process under analysis [46,47]. As a result, we obtain the following elements
in the working memory of the rules system: Treal, the actual time stamp; TmVi,
the measurement time of the Vi variable (and consequently of all the actions
related to Vi); T_period, the variable's latency; and TpVi, the associated process
time.
The measurement of a variable takes place in a time lapse that starts when
TmVi>Treal. Furthermore, we are certain not to lose the execution of the process [46,47], since Treal is updated in the rule election mechanism before triggering each rule. The times associated with a variable or process are dynamic
and contextual, which means that TmVi increases if the TpVi of a variable
is superior to the available segment and the measurement task does not have
priority. In this way we adapt tasks according to their latencies and priorities. A
problem that may arise is the temporal overlap of two tasks: the execution of a
certain task may cause another task to remain outside of its work interval and
as a result causality is lost. This kind of problem depends mainly on the temporal requirements of the system [23]. It is solved with the help of the structure
of the rules and the design of the inferences motor, by estabUshing priorities in
the execution of rules via the incorporation of the rules' own weight factors.
On the other hand, we must remember that in this kind of process the latencies
are considerable. To minimize hard time problems, our inference engine of this
module has been written in such a way as to (a) give data acquisition priority
over any other task when there is enough memory to store the data, and (b) give
priority to data reduction (processing) when RAM threatens to run short.
In this form, Mdata and equivalent information elements are liable to be
removed from memory as soon as the time specified in their T_destroy field
has passed. Furthermore, the high-level general data acquisition module can
decide, on the basis of the relative urgencies of different data needs, to overrun
by up to 5% the normal data acquisition interval for any given medical variable
(the time specified in the T_period field of its descriptor element). This strategy
reduces the information transmission in the bedside subnet.
Signals are acquired (Fig. 6) and analyzed by the DAS only if the system is
relatively unoccupied. Since the time taken for signal acquisition is known, this
task is only initiated if it is completed before the presumed start time of any
other task. However, even if it is found to be compatible with scheduled data
acquisition from other sources, the initiation of signal acquisition is subject to
848
ARCAYETAL
K
DDE
C>
EXPERT SYSTEM
-J..fe|JNGINE .,.r^..-.:
'''''llBMIi'lllffllllllli
Ilatabase
FIGURE 6
849
||i|||^^^^
i'^Bffttnnetfls* SB^K^Mt
Wiysjc^afts IJoHn Smjflj
M A R M S COHlFIGUrMTION
ZmM
3Q04I
h
zm\
30841
If
fiicliteely* H ^
SewinEdiy Hlipi
MoctofiiAeiy' Hlglhi
<SllllpiQ)fJHII||jh
HinnRiai
SilJMiyUiw
foisS^iii^
^Si^lrei^Uiw
7841
iHiGHi
iBKlmieiy Uiw
168P
i
FIGURE 7
The central SUN workstation, which runs under UNIX, supports the
medical reasoning expert systems, has graphics capacity for simultaneous
reproduction of six bedside PC displays, mediates access to the Informix
database, and performs any signal analyses that the bedside PCs have no time
850
ARC AY 7 AL
FIGURE 8
for. All these activities are coordinated by a central management module. The
central manager provides a wide spectrum of information distribution modes,
ranging from the broadcasting of information to all active processes to the
channeling of information from one specific process to another. In particular, in
between these extremes, it allows information to be sent to all the processes dealing with a given patient that are capable of receiving information of the type in
question; for example, warning messages generated by a bedside PC can be sent
to all the expert systems currently analyzing the situation of the corresponding
patient.
B. Database Structure
An interface, running in the database PC, mediates communication with the
database and allows SQL queries to be passed to the database from any expert system (whether it be running in the workstation or at the bedside). The
formulation of SQL queries by the user is facilitated by the user interface. In
order to facilitate the use of externally stored data by the expert system, the
tables of the external database have, in general, been defined in such a way as to
echo the structure of the expert system's data elements; that is, the structure of
the Informix database reproduces that of the set of expert systems. It contains
tables recording, for each patient and at any given time, the monitoring devices
in operation, the variables being monitored and their acquisition frequency, the
current alarm and semiquantitative labeling thresholds, and so on (Fig. 9). One
detail to be taken into account is that the entity "Connection" results from
FIGURE 9
851
Database structure.
the relationship between "Patient" and the "Parameters" obtained from the
patient.
All the information stored in the database contains an entry for the date
and the hour of the analysis; this will allow us to maintain temporal restrictions
in the expert system when we use in the reasoning information belonging to the
working memory and to the database, including temporary information from
both.
Since information such as radiographics, tomographies, magnetic resonances, and signals (ECG, EEG, capnography, etc.), in digital form, cannot
be physically stored in the database (because the PC version of the Informix
does not allow storage of binary data), a nominal indexation structure of data
files has been created, thus making it possible to recover from the database the
name of the file with the information that interests us at a given moment. The
names of the files have been codified using the patients identification number,
the date, the hour, and the type of information to avoid the possibility of files
existing with the same names. In order to facilitate access to the database and
to reduce network traffic, the image files are stored, in compressed format, in
the workstation and not in the database server.
The resulting parameters of digital signal processing (ECG, EEG, capnography, etc.) are stored in files to analyze trends and stabilities; the instantaneous
real values remain in the working memory, too, i.e., the HR at the moment,
extracted from the ECG signal.
Another very important information stored in the database is all the data
refering to the maneuvers that have been applied to the patient, the drugs that
have been provided at each moment, commentaries and diagnosis given by the
doctor, etc. All this allows us to observe, a posteriori, the patient's evolution
and to analyze the reaction to the different treatments.
All the above information is available to all the medical reasoning expert
systems running in the workstation, and a series of special functions makes
access to this information totally transparent for anyone creating a new one. As
was mentioned earlier, access to the database from the left-hand sides of expert
852
ARCAYETAL
853
PARAMETERS
Calls into
OPS'83 r u l e s
RESULTS
Execution of
SQL queries
would, in contexts such as the left-hand sides of rules, have been disallowed by
the compiler if written directly in OPS/83 code.
The arguments of the parameter-passing functions BDsym, BDint, and
BDreal are the parameters to be passed (Fig. 10); these functions return 0 if
no error has occurred and 1 otherwise (the only cause of error for a group 1
function is a parameter vector overflow resulting from an attempt to pass too
many parameters for a single SQL query; the limit implemented is 10, which is
ample for our ICU system).
The function otherBD, which passes nonquerying SQL commands effecting data insertion, deletion, updating, etc., has three string arguments, each
containing one of the segments into which the SQL command to be passed
must be split. This admittedly messy way of passing the SQL command is made
necessary by the OPS/83 limit of 128 on the number of characters in a string;
the total of 384 characters allowed by the three arguments of otherBD suffices
for the SQL commands used in our ICU system. The SQL query-passing function readBD has three string arguments fulfilling the same function as those of
OtherBD, and an additional integer argument specifying in which of the three
results vectors is the results of the query to be stored. These functions return 0 if
no error has occurred and an Informix error code otherwise (note that readBD
returns an error code if a query has an ambiguous or multiple result).
The results-reading functions readBBDs, readBBDi, and readBBDr each
have two arguments; the second specifies which of the three results vectors is
addressed (0,1, or 2), and the first specifies which of the fields of that vector
is to be read from. The values returned by these functions are of course query
field values read from a results vector.
This interface is designed to return only one value from the queries because,
due to the design of the knowledge base of the monitoring system, this only
854
ARCAYETAL
FIGURE I I
);
- We suppose MPatient.PIN=^"12-3341-51A"
call BDsym (&MPatient.PIN);
&Error = call ReadBD (|SELECT Age|,
I FROM Patient|,| WHERE PIN=%s;|,0);
if (&Error=0) {&MPatient.Age=ReadBBDi(0,0));}
- Now, we have recovered the age value
- ar)d stored it into the element MPatient
needs to recover concrete values within the database (the most recent value of
X, the tendency of X, the number of alarms of type X, etc.).
A simple example of execution of a SQL query is shown in Fig. 11.
As an example of the use of the interface in a real rule implemented in our
monitoring system, consider the following diagnostic rule for diuresis (Fig. 12).
In the second pattern we verify the value of the CVP (cardiovascular
pressure). This parameter is continously measured; thus, it is never necessary to be obtained from the database as it will allways exist in the working
memory.
In the third pattern of the left-hand side of this rule, the condition
BDsym(@.PIN) = 0 stores the current patient's identity number in the common
block and checks that this operation has been properly performed; the condition readBD(
) = 0 instructs Informix to deposit, in results vector 0,
In natural language, the rule is:
IF
The immediate objective is to diagnose a diuresis, tlie CVP is LOW and diuresis=poliuria.
THEN
Diagnose: insipid diabetes.
This rule is implemented in OPS/83, as extended by the interface, as follows:
RULE DIAG_Diuresis_5
{
61 (Tasl< objective=|diuresis_diagnose|; type=|BD|);
62 (Data VarName=|CVP|; Clasific=|L|);
63 (Patient ( BDsym(@.PIN) = 0 A readBD(|SELECT Clasific FROM DATA A WHERE PIN = %s AND
VarName = "DIURESIS" AND DAY = (SELECT| , |MAX(DAY) FROM DATA WHERE PIN = A.PIN AND
VarName = AVarName) AND A.HOUR = (SELECT| , |MAX(H0UR) FROM DATA WHERE PIN = APIN AND
Var_Name = AVarName AND DAY = A.DAY);|, 0) = 0 A readBBDi(0,0)=|P|));
->
OPSmsg (llnsipid diabetes|);
};
F I G U R E 12
855
the last station value recorded for that patient in the Informix table DATA
(and checks that this operation has been properly performed); and the condition readBBDi(0,0) = |P| controls whether Poliuria is the current oliguria state
value read by readBBDi from the first field of results vector 0. The %'s in the
readBD command is a place-holding symbol; when reconstructing the desired
SQL command from their three string arguments, readBD and otherBD recognize this symbol as a request for insertion of the next parameter in the parameter
vector.
It is to be noted that although access to the database is clearly much slower
than access to the elements stored inside the working memory, due to the existing latency between the acquisition of one physiological parameter and the
following one (10-15 min), there is an existing problem at the time of making
a request to the database. It has been proved that, on average, this request has
an associated delay of 1 s, and in the worst case 3 s.
An example of using interface functions on the right-hand side of rules for
updating an external database is shown in Fig. 13. All insertion rules (update
included) are preceded by the execution of a screening rule, so that only valid
data can enter the database.
As in the previous case, the arterial presure is always present in the working
memory, so we do not need to access the database (second pattern). However,
the ICP is a parameter that can be either in the working memory or in the
database. In this case, we will try to find this information in the working
The immediate objective is to diagnose a craneoencephalic traumatism and the measured Arterial
Pressure is high, and Intracraneal Pressure value is unknown (it does not exist in the working memory nor
in the database).
THEN
A rescaling must be done in the higher limit of normality in the Arterial Pressure, setting the limit to
160. Apply a Beta-blockers treatment until normality.
This rule is implemented in OPS/83, as extended by the interface, as follows:
RULE DIAG_Craneo_Traum_2
{
61 (Task objective=|craneo_traum_diagnose|; type=|BD|);
62 (Data VarName=|AP|; Clasific=|H|);
63 (Data VarName=|ICP|; Clasific=NIL);
64 (Patient ((BDsym(@.PIN) = 0 A readBD(|SELECT Clasific, Day. Hour FROM DATA A WHERE PIN = %s
AND VarName = "ICP" AND Day = (SELECT|, |MAX(Day) FROM DATA WHERE PIN = A.PIN AND VarName
= A.VarName) AND A.Hour = (SELECT| , |MAX(Hour) FROM DATA WHERE PIN = A.PIN AND VarName =
A.VarName AND Day = A.Day);| . 0) <> 0) V SecondsOPS(readBBDl(1.0) , readBBDi(2,0)) < Time0PS(1200));
->
call BDsym(&4.PIN);
if (OtherBD (|UPDATE CLASIFIC SET LimitSnomial = 160 WHERE PIN=%s AND VarName="AP";|.||,||)<>0)
{
call en-orOPS (|ErrJnsert|);
}
else call 0PSmsg(|Apply a Beta-blockers treatment until normality);
};
F I G U R E 13
856
ARCAYETAL
memory, and after, if we have not found it, the system will seek it in the database
using the SQL query shown in Fig. 13. If the value is not present in the database
or the value found is not valid (belonging to the past 20 min), then a rescaling
must be done in the higher limit of normality in the arterial pressure, setting the
limit to 160. This rescaling is made by an updating of the database classification
table. It is to be noted that the last pattern that can be executed is the database
query, so if any of the previous patterns do not match, the query will not be
executed (using this strategy, the system will execute only the necessary queries).
REFERENCES
1. Aikins, J. S., Kunz, J. C , Shortlife, E. H., and Fallat, R. J. PUFF: An expert system for interpretation of pulmonary function Data. In Readings in Medical Artificial Intelligence: The First
Decade. Addison-Wesley, Reading, MA, 1984.
2. Allen, J. F. Maintaining knowledge about temporal intervals. Commun. ACM 26(11): 832-843,
1983.
3. Al-Zobaidie, A., and Crimson, J. B. Expert systems and database systems: How can they serve
each other? Expert Systems 4(1): 30-37, 1987.
4. Arcay, B., and Hernandez, C. Adaptative monitoring in I.C.U.: Dynamic and contextual study.
/ . Clin. Engrg. 18(1): 67-74, 1993.
5. Arcay, B., Moret, V., Balsa, R., and Hernandez, C. Physical and functional integration system
for intellegent processing and priorization of variables in an ICU. In Proc. 11th Int. Conf. IEEE
EMBS, Seattle, 1989.
6. Barnett, G., Cimino, J., et al. DXplain: An experimental diagnostic decission-support system.
/ . Amer. Med. Assoc. 257: 67.
7. Barr, A., and Feigenbaum, E. A. The Handbook of Artificial Intelligence, Vol. 1. Pitman,
London, 1981.
8. Chittaro, L., Constantini, C , Guida, G., Tassa, C., and Toppano, E. Diagnosis based on cooperation of multiple knowledge sources. In Proc. Second Generation Expert Systems Specialized Conf. Avignon-France, pp. 19-33, May 29-Jun 2, 1989.
9. Dafonte, J. C., Taboada, J. A., and Arcay, B. Database based reasoning for real time monitoring
and data analysis in intensive care units. Expert Systems 14(4): 190-198, 1997.
10. Dafonte, J. C., Arcay, B., Boveda, C., and Taboada, J. A. Intelligent management of a relational
database in a perinatal monitoring system. / . Clin. Engrg. 24(1): 34-40, Jan.-Feb. 1999.
11. Dawant, B. M., Uckun, S., Manders, E. J., and Linstrom, D. R The SIMON Project. IEEE
Engrg. Med. Biol. 12(4): 82-91, 1993.
12. Filman, R. E. Reasoning with worlds and truth maintenance in a knowledge-based programing
enviroment. Commun. ACM 31 Apr. 1988.
13. Forgy, C. L. The OPS/83 User's Manual, Sistem Version 2.2. Production Systems Technologies,
1986.
14. Franklin, D. E, and Ostler D. V. The P1073 Medical Information Bus. IEEE Micro 52-60, Oct.
1989.
15. Galan, R., and Martinez, G. Sistemas de Control Basados en Reglas. In Curso de Control Inteligente de Procesos (A. Jimenez Avello, R. Galan Lopez, R. Sanz Bravo, and J. R. Velasco Perez,
Eds.). Univ. Politecnica de Madrid, ETS de Ingenieros Industriales. Seccion de Publicaciones.
[In Spanish.]
16. Gardner, R. M. Computerized data management and decision making in critical care. Surgical
Clinics N. Amer. 65(4): 1041-1051, 1985.
17. Hayes-Roth, B., Washington, R., Ash, D., Hewett, R., CoUinot, A., Vina, A., and Seiver,
A. Guardian: A prototype intelligent agent for intensive-care monitoring. Artif Intell. Med.
4: 165-185,1992.
18. Hayes-Roth, E, Waterman, D. A., and Lenat, D. B. Building Expert Systems. Addison-Wesley,
Reading, MA, 1983.
19. Informix Software Inc. Informix-SE for DOS, Administrator's Guide v.4.1, 1992.
857
20. Informix Software Inc. Informix-ESOL/C for DOS, Programer's Manual v.4.1, 1992.
21. Kalli, S., and Sztipanovits, J. Model-based approach in intelligent patient monitoring. In Proc
10th Int. Conf. IEEE EMBS, New Orleans, pp. 1262-1263, 1988.
22. Kampman J., Hernandez C , and Schwarzer, E. Process control in intensive care: An integrated
working place. In Proc 11th Int Conf IEEE EMBS, Seattle, pp. 1215-1216, 1989.
23. Laffey, J. L. "El experto en tiempo real. Binary", Junio, pp. 110-115, 1991.
24. Larsson, J. E., and Hayes-Roth, B. Guardian: A prototype intelligent agent for intensive-care
monitoring. IEEE Intell. Systems 58-64, 1998.
25. Leung, K. S., Wong, M. H., and Lam, W. A fuzzy expert database system. Data Knowledge
Engrg. 4(4): 287-304, 1989.
26. Mac Dermott, D. A termporal logic for reasoning about processes and plans. Cognitive Sci. 6:
101-155,1982.
27. Marin Morales, R. Razonamiento temporal Aproximado. Univ. Murcia, Spain, 1995.
28. Miller, P. Expert Critiquing Systems: Practice-Based Medical Consultation by Computer.
Springer-Verlag, New York, 1986.
29. Miller, R., McNeil, M., Challinor, S., et al. INTERNIST-1/Quick medical reference project:
Status report. Western]. Med. 145: 816-822, 1986;
30. Miller, R., Pople, Jr., H., and Myers, R. A. INTERNIST-1: An experimental computer
based diagnostic consultant for general internal medicine. New Engl. J. Med. 307: 468-476,
1982.
31. Minker, J. Deductive databases: An overview of some alternative theories. In Methodologies
for Intelligent Systems. North-Holland, Amsterdam, 1987.
32. Mora, F. A., Passariello, G., Carrault, G., and Le Pichon, J. Intelligent patient monitoring and
management systems: A review. IEEE Engrg. Med. Biol. 12(4): 23-33, 1993.
33. Nilsson N. J. Principles of Artificial Intelligence. Tioga Publishing Company, Palo Alto, CA,
1980.
34. Paul, C. J., Acharya, A., Black, B., and Strosnider, J. K. Reducing problem-solving variance to
improve predictability. Communications of the ACM, 34(8): 80-93, 1991.
35. Perkins, W. A., and Austin, A. Adding temporal reasoning to expert-system-building environments. IEEE Expert 5(1): 23-30, 1990.
36. Pryor, T. A. et. al. The HELP system. / . Med. Systems 7: 87, 1983.
37. Raphael, B. The Thinking Computer: Mind Inside Matter. W H Freeman, San Francisco, 1976.
38. Reggia, J., Perricone, B., Nau, D., and Peng, Y. Answer justification in diagnosis expert systems.
IEEE Trans. Biol. Med. Engrg. 32: 263-272, 1985.
39. Schecke, Th., Rau, G., Popp, H., Kasmacher, H., Kalff, G., and Zimmermann, H. A knowledgebased approach to intelligent alarms in anesthesia. Engrg. Med. Biol. M, 38-43,1991.
40. Shortliffe, E. H. Computer-Based Medical Consultations: MYCIN. Elsevier/North Holland,
New York, 1976.
41. Shortliffe, E. H., Scott, A. C , Bischoff, M. B., Campbell, A. B., and Van Melle, W. ONCOCIN:
An expert system for oncology protocol management. In Proc. IICAI-81, p. 876, 1981.
42. Sierra, C. A. Estructuras de control en inteligencia artificial: Arquitecturas reflexivas y de
metanivel. In Nuevas Tendencias en Inteligencia Artificial, Universidad Deusto, 1992. [In
Spanish.]
43. Simons, G. L. Introduccion a la Inteligencia Artificial. Diaz de Santos, Madrid, 1987. [In
Spanish.]
44. Stankovic, J. A., and Ramamritham, K. Hard Real-Time Systems: A Tutorial. Comput. Soc.
Press, Washington, DC, 1988.
45. Steiner, A., Kobler, A., and Norrie, M. C. OMS/Java: Model extensibility of OODBMS for
advanced application domains. In Proc. 10th Int. Conf. on Advanced Information Systems
Engineering, 1998.
46. Taboada, J. A., Arcay, B., and Arias, J. E. Real time monitoring and analysis via the medical
information bus. Part I. / . Internat. Fed. Med. Biol. Engrg. Comput. 35(5): 528-534,1998.
47. Taboada, J. A., Arcay, B., and Arias, J. E. Real time monitoring and analysis via the medical information bus. Part II. / . Internat. Fed. Med. Biol. Engrg. Comput. 35(5): 535-539,
1998.
48. Tsang, E. P. K. Time structures for AL In Proc. Int joint Conf. Artificial Intelligence, pp. 4 5 6 461,1987.
858
ARC AY 7 AL
49. Uckun, S., Lindstrom, D. P., Manders, E. J., and Dawant, B. M. Using models of physilogy for
intelligent patient monitoring. In Proc. 13th Conf. IEEE EMBS, pp. 1308-1309, 1991.
50. Walley, P. Measures of uncertainty in expert systems. Artif. Intell. 83: 1-58, 1996.
51. Wiederhold, G., Barsalou, T., Lee, B. S., Siambela, N., and Sujansky, W. Use of relational storage and a semantic model to generate objects: The PENGUIN Project. In Database 91: Merging
Policy^ Standards and Technologyy The Armed Forces Communications and Electronics Association, Fairfax VA, pp. 503-515, June 1991.
52. Winston, P. H. Artificial Intelligence, Addison-Wesley, Reading, MA, 1979.
WIRELESS ASYNCHRONOUS
TRANSFER MODE (ATM) IN DATA
NETWORKS FOR MOBILE SYSTEMS
C. APOSTOLAS
Network Communications Laboratory, Department of Informatics, University of Athens,
Fanepistimioupolis, Athens 15784, Greece
G. SFIKAS
R. TAFAZOLLI
Mobile Communications Research Group, Center for Communication Systems Research,
University of Surrey, Guildford, Surrey GUI 5XH, England
I. INTRODUCTION 860
II. SERVICES IN ATM WLAN 861
III. FIXED ATM LAN CONCEPT 863
A. Topology and Defined Network Interfaces 863
B. Protocol Reference Model of ATM LAN 864
C. Traffic Management in ATM Networks 867
IV. MIGRATION FROM ATM LAN TO ATM WLAN 869
A. Definition for a New PHY Layer Based on Radio 869
B. Requirement for MAC Layer Definition in the ATM
WLANPRM 869
C. Extension of C-Plane Protocols to Support User
Mobility 869
D. ATM WLAN Architecture and PRM 869
V HIPERLAN, A CANDIDATE SOLUTION FOR A N
ATM WLAN 872
A. The Physical Layer of a HIPERLAN 872
B. The Channel Access Control (CAC) Layer of a
HIPERLAN 873
C. The MAC Layer in a HIPERLAN 874
D. HIPERLAN Inability for Traffic Policing 875
E. Comparison of a HIPERLAN and Adaptive TDMA 876
VI. OPTIMUM DESIGN FOR ATM WLAN 881
A. Cellular Architecture for a ATM WLAN 881
B. Traffic Policing in the MAC of an ATM WLAN 883
VII. SUPPORT OF TCP OVER ATM WLAN 891
A. TCP Principles 891
B. Performance Issues of TCP over ATM 891
C. TCP Behavior over Wireless Links 892
o 5 9
860
APOSTOLAS ETAL.
INTRODUCTION
The concept of wireless asynchronous transfer mode (WATM) Networks has
been a topic of great research interest. The estabUshment of the WATM group
[1] within the ATM Forum in 1996, the work carried out by EU ACTS projects
such as WAND, SAMBA, and MEDIAN [2,3], and the initiative on the broadband radio access network (BRAN) [4] within ETSI justify the significance of
deploying a wireless ATM network to carry multimedia traffic with support of
terminal mobility or portability.
ATM networks offer certain advantages when compared to traditional
LAN technologies like Ethernet and FDDI. Adopting asynchronous time division multiplexing (ATDM), they provide quality of service (QoS) guarantees and efficient support of different application types over the same physical
medium. Moreover, wireless access eases relocation of terminals and expansion
of networks, and enables user mobility and equipment portability. Furthermore,
the wireless technology is the enabling factor for universal personal communications. Existing cellular systems deal mainly with voice communication or
offer supplementary low-bit-rate data services over a network optimally designed for voice. Moreover, DECT [5] does not offer the high transmission
rate required for broadband multimedia communications, and standards like
IEEE802.il [6] and HIPERLAN [7] support high bit rates but they do not provide for mobility. Furthermore, they do not offer QoS guarantees. However,
the merging of wireless and ATM technologies could lead to universal personal multimedia communications. This concept is envisioned by the emerging
third generation communication systems like UMTS [8], IMT2000 [9], and
BRAN.
This chapter provides the migration path from existing fixed ATM LAN
standards to ATM wireless LAN (ATM WLAN), dealing mainly with the functions related to radio link access, the implications on transport protocols, and
the support of user mobility. ATM WLAN is envisioned as a system of densely
populated wireless access parts with small coverage areas and an ATM-oriented
backbone network that could span in a geographical area, justifying the term
LAN. Of course, some of the solutions presented could be adopted in wide area
networks too.
To establish the above-mentioned migration path, one must take into
account the characteristics and requirements of the services supported in
ATM networks. The same service characteristics must be supported in the ATM
WLAN. Moreover, the protocol reference model (PRM) of conventional ATM
86 I
862
APOSTOLAS 7 AL.
TABLE I
Services to be Supported in A T M W L A N
Application
Bit rate
(Kbps)
BER
Telephony
Videophone
Video conference
MPEGl
MPEG2
Hi-Fi distribution
Low-speed file
High-speed file
32,64
64-2000
5000
1500
10000
400
64
2000
10-^
1.3 X
1.8 X
2.5 X
1.5 X
10-^
10-^
10-7
10-6
10-6
10-6
10-6
PER
Call attempts
in small
business
(50 1jsers)
Call attempts
in medium
business
(ISO users)
Call attempts
In large
business
(4700 users)
10-3
8 X 10-6
5 X 10-6
9.5 X 10-6
4 X 10-6
10-7
10-6
10-6
5
1.5
1
0.5
0.5
0.9
1
3
40
10
1
3
3
0.9
10
20
200
50
1
10
10
0.9
50
100
Briefly, the aim is to support telephony, fast file transfers, distributed computing, and multimedia on the desktop. The bit rate requirements for these
services combined with the expected population of ATM WLAN customers define the required transfer rates in the ATM WLAN. Moreover, the QoS requirements for the above-mentioned application classes influence the mechanisms
related to the bandwidth allocation and error control.
In Table 1 [13] the average bit rate and the bit and packet error rate (BER,
PER) requirements for some applications are presented. Moreover, the number
of call attempts per user during the busy hour is also shown. In the scope of the
ATM LAN (fixed or wireless), the term call implies not only a telephony service
but an established connection (or set of connections) carrying traffic for some
application. Based on the average bit rate requirements and the number of call
attempts in Table 1, and under the assumption that one wireless module of ATM
WLAN will service one small business (considering user population), the available bandwidth on the air interface of ATM WLAN should be around 20 Mbps.
To have a complete view of the service requirements. Table 2 shows service
end-to-end delay requirements and their jitter requirements (i.e., the difference
between the actual interarrival times of two subsequent packets of a connection
at two points of the network and those expected).
TABLE 2
Services
Application
20 Mbps HDTV video
Telephony with echo canceler
Videophone (64 Kbps)
MPEGl (NTSC)
End-to-end delay
(ms)
0.8
End-to-end packet
delay variation (ms)
1
<500
130
300
130
6.5
863
Public
ATM
switch
FIGURE I
864
APOSTOLAS ETAL.
the signaling, the databases, and the operations required for the
management of an ATM LAN at UNI (address allocation to terminals,
definition of an allowable number of established connections,
specification of the electrical interface, etc.), and
the functions that monitor the traffic generated by the user terminals,
avoid congestion, and allocate bandwidth efficiently while satisfying the
QoS of the active calls.
There are certain key components of the UNI functionality affecting the design
of an ATM WLAN, and they will be presented later in this section. At this point,
one could note that the design of an ATM WLAN requires the modification of
some UNI specifications (starting from the physical medium) to come across
with a wireless or mobile UNI implementation (W-UNI).
Furthermore, P-NNI specifications cover:
the organization of ATM switches into logical nodes and layers of an
hierarchical backbone network structure in terms of addressing and
routing,
the functions and the signaling related to the topology updates within
the backbone part of the ATM LAN,
the exchanged signaling and the procedures for the call routing during
the setup phase, and
the functions, database specifications, and signaling related to the
backbone network management.
In Section VIII it is shown that P-NNI protocols can be adapted to support
mobility management, in-call connection rerouting, and call setup rerouting
because of mobility.
865
Plane Management
Layer Management
C-plane
U-plane
Higher layers
Higher layers
ATM
I. Physical Layer
The physical layer exchanges w^ith the ATM layer packets of fixed length
and structure called cells. The physical layer consists of two sublayers:
Transmission convergence sublayer, which deals with error correction
at the cell header, construction or recovery of frame structures that may
exist in the transmission link (e.g., SDH), scrambling of the cell payload
contents, and cell delineation (i.e., identification of cell boundaries
within transmitted frames), and
Physical medium-dependent sublayer which deals with the transmission
issues (e.g., modulation, line coding, bit synchronization).
At the moment, all physical layer specifications define fixed transmission mediums (fiber optics, twisted pairs, and coaxial cable [14]). The links are point-topoint, as shown in Fig. 1, and there is no need for multiple access algorithms,
like those used in traditional LANs (Ethernet, Token ring, FDDI), because different transmission directions are physically separated (different cable is used
for each direction). However, in the case of an ATM WLAN the broadcast nature of radio transmission requires the implementation of a MAC scheme to
regulate the allocation of the radio channel.
866
APOSTOLAS ETAL.
2. ATM Layer
The ATM layer is the heart of the ATM LAN PRM. All the main functions
that characterize an ATM-oriented network reside in this layer. The ATM layer
exchanges with the ATM adaptation layer (AAL) fixed size (48 bytes) service
data units that form the ATM cell payload.
Moreover, the ATM layer performs bandwidth allocation and statistical
multiplexing. Bandwidth allocation is based on the ATDM concept. Generally, in ATM networks bandwidth allocation is performed over time. In every link there is a maximum transfer capability expressed in cells per second
(cells/s) that can be viewed as a continuous transmission of cell slots. The filling of cell slots by transmitted cells is performed asynchronously and on a
demand basis. There are no predefined time slots allocated to a connection.
The straightforward result is that the bandwidth allocation mechanisms can
follow the traffic fluctuations of VBR connections. Moreover, bandwidth is
not reserved for connections when it is not required, but it can be shared
among them. Assuming that the instantaneous total required bandwidth does
not exceed the total link capacity or the ATM switch capacity, ATM can support more connections using the same total bandwidth in the network as
fixed bandwidth allocation mechanisms. This is called statistical multiplexing gain, and it means a better bandwidth exploitation than fixed bandwidth
allocation. Thus, several new applications can be supported in a bandwidthefficient way, increasing the network capacity and possibly reducing the service
cost.
However, bandwidth allocation occurs not only in the links among network entities but also in the ATM switches in terms of buffer space and busbandwidth reservation. As ATM cells flow from the user terminals deep in
the backbone network, the total traffic offered in a switch could overload it
(or some of its ports). Then the network is driven into congestion locally and
neighboring switches are affected too. To avoid such situations, traffic monitoring and control mechanisms have been introduced in the ATM layer to prevent
and react to congestion [12,15]. The main idea is that for every new connection
there is a call setup phase when the source specifies its QoS requirements and
traffic pattern, and negotiates with the network about its traffic limits. To prevent congestion, the network, executing the Call Admission Control algorithm,
decides to accept or reject the new call (based on the expected satisfaction of
the QoS parameters of the established calls and the new call) and defines the
traffic limits of the sources of an accepted call. These traffic limits are fed into
the traffic policing functions. For every generated cell in a connection these
functions decide whether the cell could drive the network to congestion (because its source exceeds its traffic limits), and whether it will be tagged as
redundant. When congestion occurs (or is about to occur), tagged cells are
discarded.
In the case of ATM WLAN, the traffic monitoring functions are of great
importance, because the radio channel is a limited and expensive resource.
Moreover, it is very important to monitor the allocation of radio channel
capacity as it has a direct impact on the QoS of the established connections.
The access of the radio link by a source must take place according to the rate
limits of the traffic source and without violating the bandwidth that should be
867
868
APOSTOLAS TAL.
any time and for any duration. Like CBR, the real-time VBR service category
is intended for applications with tightly constrained delay and delay variation.
However, in this case, if traffic is emitted at the PCR for a long period the traffic
conformance is violated. The non-real-time VBR service category is intended
for applications without hard delay requirements. Unlike real-time VBR, there
are no guarantees for delay constraints. Only the expected CLR is specified. The
UBR service category is used for non-real-time applications. In the UBR case, no
QoS parameter is specified. Moreover, no traffic specification must be provided.
Finally, the ABR service category is intended for applications without tight delay requirements. It is the only service category where the traffic specification
can change during the call. This is achieved through a flow control mechanism
feeding information in the reverse direction, toward the traffic source. The latter is expected to use the feedback information to adjust its transmission rate
and experience low CLR.
Whenever a new connection must be established, its service category must
be defined. Moreover, a traffic contract between the traffic source and the network part must be agreed. The traffic contract consists of a traffic specification
and a QoS commitment.
The traffic specification contains the values of the parameters characterizing
a connection of a specific service class. These parameters are PCR, sustainable
cell rate (SCR), minimum cell rate (MCR), maximum burst size (MBS), and cell
delay variation tolerance (CDVT).
The QoS commitment contains the values for the QoS parameters associated with the service category of the established connection. The following QoS
parameters are defined in ATM: peak-to-peak cell delay variation, maximum
cell transfer delay, cell loss ratio, cell error ratio, severely errored cell block ratio
and cell misinsertion rate. The former three parameters are negotiated while
the other three are not. The QoS commitment means that the QoS parameters
will be satisfied for the ATM cells emitted according to the traffic specification.
The traffic policing functions monitor the ATM cell stream of a connection and
identify those cells possibly violating the agreed traffic specification.
2. Traffic Policing Functions in A T M Networl<s
869
870
APOSTOLASETAL.
Private
UNI
I Public
ATM
' switch
FIGURE 3
871
Plane Management
Layer Management
C-plane
Higher layers
U-plane
Higher layers
transmitters and receivers, provided that for a given pair the same chip sequence
(code) corresponds to every information bit. Also, the code sequences assigned
to different channels must be mutually orthogonal. Follow^ing this convention,
both a frequency band and a code define a channel.
In CDMA, the number of users that can be accommodated is proportional
to the spreading factor (the ratio of spread spectrum signal bandw^idth to the
bit rate offered) [21]. Then the required system bandwidth should be increased
compared to the channel gross bit rate by a factor equal to the spreading factor
(usually on the order of 100) and this makes CDMA prohibitive for the bit
rates required in an ATM WLAN.
Moreover, all the signals at the BS receiver should arrive with equal power.
For this reason, power control must be applied; otherwise, the power of the
received signals will depend on the distance between transmitter and receiver.
Then the transmitters of undesired signals close to the receiver interfere strongly
with the desired signal and prevent proper communication, leading to a reduction of system capacity [22].
A dynamic form of TDMA looks like the optimum solution for the multipleaccess scheme of an ATM WLAN. However, a HIPERLAN being the latest
development in the area of wireless networking introduces a distributed
multiple-access scheme aiming to support multimedia traffic on the radio channel. In the following section it is examined whether this multiple-access scheme
of a HIPERLAN copes with ATM traffic and satisfies the requirements of ATM
WLAN MAC.
872
APOSTOLAS TAL.
HIPERLAN
Lookup
function
Routing
Information
Exchange
function
Power
Conserv^ation
function
User Data
Transfer
function
873
The allocated frequency spectrum is divided into five frequency bands (each
frequency band associated with a carrier). All the nodes that belong to the
same HIPERLAN should use the same unique carrier. Packets can be relayed
to nodes residing out of the transmission range of source nodes, following the
concept of forwarding controlled by the MAC layer of a HIPERLAN.
The physical layer is also responsible for reporting an idle channel condition
to the channel access cycle layer. This happens by monitoring the received power
and detecting whether it is below a well-defined threshold for a certain period
(equal to 1700 high-bit-rate periods). The idle channel condition enables a
HIPERLAN terminal to transmit the packet with the highest channel access
priority without following the contention-based multiple-access algorithm.
B. The Channel Access Control (CAC) Layer of a HIPERLAN
The CAC layer deals with the decision of whether to transmit a packet. It is
based on a multiple-access technique called elimination yield-nonpreemptive
priority multiple access (EY_NPMA). Transmitted information is contained in
unicast (one specific destination node is defined) and multicast (destination
nodes of the packet are more than one) packets. A correct reception of a unicast packet is always followed by a transmission of an ACK (acknowledgement)
packet by the destination node. This does not happen in the case of multicast packets. A channel access cycle starts after the transmission of an ACK
packet or after the end of the expected transmission of an ACK packet. It is
assumed that all the competing terminals that have Hstened to, or expected, the
ACK packet, are synchronized to enter the channel access cycle. Every node
contends for only one of its packets per channel access cycle, that with the
highest access priority among the packets residing in the node. According to
the EY_NPMA principles, the channel access cycle has four phases (Fig. 6).
The first phase is priority resolution: Generally packets can have five different access priorities (0 to 4, 0 being the highest priority). The access priority of
a packet is calculated at the transmitting node, based on the remaining lifetime
and the user priority of that packet, as well as on the number of intermediate
nodes (hops) that the packet must traverse before being delivered at the final
destination. The number of hops is known and assigned by the MAC layer,
assuming that the source node has all the required information to define the
routing of its packets. During the priority resolution phase, up to five priority
time slots (each one being 256 bits long) could occur, each one corresponding to
a level of packet access priority. During this phase, the nodes that have entered
the channel access cycle schedule transmissions of a well-defined bit sequence
priority
resolution
elimination
FIGURE 6
whoever
transmits
1st wins
transmission
l=ZI
DD. D
sync =
for next
Qj^Q
yield
whoever
transmits
last wins
backoff
randomly
packet +ack
transmission
874
APOSTOLAS ET AL.
in the priority time slots corresponding to the access priorities of their packets,
and hsten to the previous priority time slots corresponding to higher access
priorities. Transmissions in previous priority time slots disable the scheduled
transmissions in the next time slots, reject nodes with lower access priorities
from the channel access cycle, and drive EY_NPMA into its second phase. Therefore, the duration of the priority resolution phase is determined by the highest
packet access priority in every channel access cycle. Moreover, more than one
node could survive in this phase if they represent packets with the same access
priority, and this priority is the highest in the priority resolution phase.
The second phase of the channel access cycle is called elimination: In this
phase, up to 12 time slots (each one being 256 bits long) could be accessed (using
a well-defined bit sequence) by the nodes that have survived the first phase of
the channel access cycle. The transmission in these time slots is continuous and
starts always in the first time slot. The duration of every node's transmission
is based on a probability of the time slot access and is given by the binomial
distribution. Every node transmits for the decided period and then listens to the
channel to examine whether there are any nodes trying to access the channel,
too. Longer transmissions in this phase reject listening nodes from the channel
access cycle. More than one node could survive in this phase whose duration is
determined by the longest transmission (up to 12 x 256 bits).
The third phase of the channel access cycle is called yield: All the nodes
that have survived decide to sense the channel for a period that is a multiple
(1 up to 14) of a high-rate 64 bit block. The first node to transmit acquires the
channel because it disables all the listening nodes from the channel access cycle.
However, in this phase also, more than one node can survive, and this results
in packet collision.
The fourth phase is the transmission of packets by nodes that have survived
in the channel access cycle. If no collision happens (only one node survived
finally), a unicast packet could be transmitted followed by an ACK packet,
indicating that the packet was received correctly or damaged by radio channel
errors. It should be noted that not all the nodes listening to the transmitted
packet might be able to listen to the related ACK packet, because the destination
node could have a different coverage area than the source node. To start a new
channel access cycle, nodes should be able to be synchronized based on the end
of the ACK transmission or at the end of the expected ACK transmission.
C. The MAC Layer in a HIPERLAN
875
876
APOSTOLAS ETAL.
as a result of the short-term variations of traffic. However, ATM networks incorporate traffic poHcing functions that deal with exactly this problem [12],
and such mechanisms are expected to reside in an ATM WLAN too, as they
have a straightforward impact on the quality of service offered. By modifying
the existing HIPERLAN standard, traffic policing could be applied. To do so,
it is proposed that the HIPERLAN standard be modified in order to adopt a
centralized topology according to the ATM concept. Moreover, a centralized
multiplexing protocol based on adaptive TDMA can be introduced and compared to the distributed HIPERLAN CAC in order to assist the design of the
optimum ATM WLAN MAC.
E. Comparison of a HIPERLAN and Adaptive TDMA
I. Network Configuration for A T M over a HIPERLAN
877
--ir^i
Backbone
\
ATM
y ^jm^ ATMoverHIPERLAN
' " i l ^ Backbone
Air Interface
\ switches / \
| / PNNI
FIGURE 8
terminals using a carrier different from those used by neighboring BSs. Because
there are three main carriers defined for HIPERLAN operation, a frequency
reuse factor of 3 could be used.
2. Modifications of HIPERLAN PRM for ATM W L A N Operation
To support adaptive bandwidth allocation and traffic policing in the wireless part of the HIPERLAN-based ATM WLAN, certain modifications are proposed in the CAC and MAC layers of a HIPERLAN. The objective is to define
a superframe structure for the transmissions on the air interface in every cell
(Fig. 9). This superframe consists of a contention-based period, where all the
HIPERLAN-specified transmissions can be supported, and a contention-free
contention period
transmissions happen according to
standard HIPERLAN
multicast packet by BS
declaring duration of
contention period
multicast packet by BS
scheduling transmissions
in contentionfreeperiod
k
7
contention free
period "^
ACK
HIPERLAN packet
carrying an ATM cell
FIGURE 9
878
APOSTOLAS ETAL.
period, where packets carrying ATM traffic are transmitted following the
TDMA concept. The duration of the superframe is fixed while the boundary between its periods is variable, based on the terminal population and total
ATM traffic within a cell. To support the second period of the superframe the
HIPERLAN MAC and CAC must be slightly modified and their interface must
be extended. After the applied modifications, CAC transmits a packet in one of
the three following cases (where the first two are already defined in the existing
HIPERLAN standard):
1. The channel has been found free, the relevant indication has been given
to the MAC, and a packet has been issued at the SAP between the MAC and
the CAC.
2. A new channel access cycle has been initiated, the corresponding indication has been signaled to the MAC, and a packet has been issued at the service
access point between the MAC and the CAC. In this case the CAC will compete
for the channel entering a new channel access cycle.
3. A new time slot has begun, this event has been signaled to the MAC to
indicate the beginning of this time slot, and the MAC has delivered to the CAC
a HIPERLAN packet that carries an ATM cell in its payload. All the packets
transmitted this way are unicast packets and they are acknowledged according
to the CAC procedures. The end of the acknowledgement transmission defines
the end of the time slot.
The third case requires modifications in the CAC and MAC layers of a
HIPERLAN to support the superframe structure. For that reason, the superframe dehneation function is added in the CAC layer (Fig. 10). At the beginning
of the contention-free period of a superframe, the BS transmits a packet schedule
multicast packet to allocate the time slots to active ATM connections (Fig. 9).
The reception of this multicast packet forces CAC to enter the superframe delineation function and indicate the beginning of all the time slots to the MAC.
C-plane
U-plane
A TM Adaptation Layer
Routing
HIPERLANi Information
LookUp
Exchange
ftinction
fimgtm
Power
Conservation
Function
ATM
CAC
PHY
F I G U R E 10
879
The algorithm used for the allocation of the radio channel should be flexible enough to allow bandwidth sharing among terminals according to their
instantaneous capacity requirements. Moreover, it should be able to provide
bandwidth guarantees and apply traffic policing functions to established connections. The packet scheduling function introduced in the previous section
deals with this problem. In the terminal side and for every established ATM
connection, it keeps a separate virtual buffer, and based on the occupancy of
this buffer, it reports the current bit rate of the connection to the packet scheduling function of the BS.
The calculation of the connection rate is based on the ATM cell interarrival process within a certain period. To report the latest calculated rate of
a connection, the terminal sends to the BS a resource management ATM cell.
880
APOSTOLAS ETAL.
1
FIGURE I I
2
3
4
number of users
number of users
FIGURE 12 Mean access delays (max allowed delay: 12 ms): (f) standard HIPERLAN and ( I
HIPERLAN- based ATM WLAN.
881
come across with the HIPERLAN-based ATM there is a significant improvement in user capacity. Assuming 2 % as the maximum acceptable packet loss
probability, the capacity of a standard HIPERLAN when it encapsulates ATM
is only one user (1 Mbps on average) while the modified HIPERLAN based on
TDMA can support 4 users (4 Mbps on average). Moreover, Fig. 13 shows that
the packet loss probability for conforming sources is kept constant in the case
of a HIPERLAN-based ATM WLAN when the system operates at the maximum user capacity and irrespectively of the amount of excessive traffic offered
by nonconforming functions.
Thus, a centralized approach based on adaptive TDMA is preferred over
the contention-based multiple-access scheme when ATM traffic is supported
within a wireless system. The advantage is both quantitative in terms of user
capacity and qualitative in terms of guaranteed QoS offered to conforming
connections.
882
APOSTOLAS ETAL.
883
traffic contract for the connection and the route that will be followed by all the
packets in that connection unless a fault condition occurs are defined. All the
above-mentioned reasons justify the envisioned cellular system architecture for
an ATM WLAN as presented in Fig. 3.
B. Traffic Policing in the MAC of an ATM WLAN
This section proposes an optimum solution for the MAC layer of an ATM
WLAN based on adaptive TDMA and integrates the mechanisms required to
preserve QoS and support dynamic bandwidth allocation.
The objectives of the multiple-access layer regarding the QoS of connections
are:
The transmission of submitted packets within the negotiated limits of
access delay. End-to-end packet delay consists of the following components.
1. Access delay in the calling party's access network.
2. Queuing delay in the backbone network.
3. Possible access delay in the called party's access network. Maximum
access delays are defined per virtual connection during call setup. Generated
packets having experienced access delays longer than the negotiated are
dropped.
The preservation of bandwidth assigned to connections in the access network. It is assumed that a part of the total bandwidth is assigned per connection.
When VBR sources transmit at a rate higher than their allocated bandwidth,
they steal bandwidth allocated to other connections. It is expected that the
backbone network of an ATM WLAN contains the mechanisms for preventing
such violations. Similar mechanisms are introduced in the access network, too.
The assignment of bandwidth only to connections with pending packets
or with active sources. This is another way to preserve QoS because available bandwidth allocated to a silent source can be given to support the excess
bandwidth of an active connection.
The completion of the above-mentioned objectives will result in the satisfaction of the defined QoS requests.
I. Traffic Patterns of Connections
The following categories of connections are assumed in the ATM WLAN
MAC based on the way that information is submitted by the corresponding
sources:
CBR connections, assumed active for the whole duration of the call. The
management of these connections is rather easy. A constant portion of the total
bandwidth, equal to the bit rate of the connection, is allocated for the whole
duration of the call. In order to preserve the total bandwidth and enable several
terminals to access the channel, the call duration could be specified.
On-off connections, characterized by two states: an idle state when no
transmission occurs and an active state when information is transmitted at a
peak rate. An adaptive multiple-access scheme could take advantage of the idle
periods of connections and multiplex sources of the same type with total peak
bandwidth more than the bandwidth of the network. The duration of active
884
APOSTOLAS 7 AL.
and idle states is given usually by exponential distributions, and this provides
a mechanism for adjusting the multiplexing gain. For example, when a voice
source is characterized by one idle state with mean duration of 650 ms, an active
state with mean duration of 350 ms, and a bit rate of 32 Kbps, it is expected
that, on average, three sources could be multiplexed on a 32-Kbps channel. The
multiple-access scheme could disable active connections from using the channel
when they have exceeded a negotiated period in the active state, or when they
have exceeded a negotiated ratio of active period over silence period.
Continuous VBR connections, transmitting at a range of bit rates. Although they can be assigned an average bit rate (calculated as generated bits
over the call period), it is expected that rarely do they transmit at this rate.
However, most of the time the current transmission rate is close to the average
bit rate. A useful piece of information is the period of time for which these
sources keep a constant rate. For example, every 33 ms there is a change in the
bit rate of a video source, when video is transmitted at a picture frame rate of
30 frames per second and the bit rate changes per image frame. In that case
a bandwidth renegotiation could take place every 33 ms. Then the bandwidth
required for a longer time period can be calculated in an initial short period
and packet scheduling can become easier. For example, in a TDMA scheme
with frame duration equal to 6 ms, the bit rate of a video source is valid for
five TDMA frames.
2. Exploiting Silence and Low-Bit-Rate Periods of Connections
When sources enter the silent period, the bandwidth assigned to them could
be reallocated to connections with pending packets. Moreover, the silent sources
should have a way of indicating their transition back to active in order to acquire
the required bandwidth. This indication should be fast enough, to enable the fast
allocation of bandwidth, although the mechanism used should preserve the
total bandwidth in the network. Let's assume that voice sources, characterized
by silences and talkspurts, are multiplexed in the network, and that the voicesource bit rate during talkspurt is 32 Kbps, while the affordable access delay
is 20 ms. Moreover, let's assume that the transmitted packet has a payload
of 384 bits (ATM cell like) and that convolutional code (2,1) is applied on the
generated voice information. Then, during talkspurts, one packet is delivered at
the MAC layer every 6 ms. If the packet experiences delay more than 20 ms it is
dropped. When a voice source becomes active at fe, it should take bandwidth to
transmit its first packet, not before (^o + 6) ms and not later than (^o + 26) ms.
In PRMA [23], transmissions take place in time slots that are divided into
traffic time slots (assigned to active connections by the base station) and reservation time slots that occur periodically (in well-defined time intervals). All
the sources entering into their active state try to transmit the first packet of the
active state in the reservation time slots following contention-based multipleaccess methods like slotted ALOHA [24]. So, in PRMA it is guaranteed that
if a source becomes active, it will contend for bandwidth as soon as its first
packet is ready. If the transmission is successful, then the base station allocates
traffic channels to the active source; otherwise, subsequent attempts for transmission in the reservation slots take place. So, in PRMA it is not guaranteed
that a voice source, becoming active, will transmit the first packet within the
885
required time limits. This could lead to packet loss because of excessive delay
or because failure to transmit the first packet of the active state could influence
the rest of the packets in that state. Moreover, bandwidth is wasted because of
the contention-based multiple access of the reservation slots. If slotted Aloha
is used, the maximum achievable throughput in the reservation slots periods
is 40% for same sources in the network or 50% for nonuniform sources [24].
Assuming that two slots are used for reservation every 6 ms, the bandwidth
assigned for reservation slots is 128 Kbps (every slot carries 384 information
bits). The wasted bandwidth in that case ranges from 64 to 76.8 Kbps. Moreover, even if the bandwidth loss is affordable, the access delay experienced
due to the contention-based access of reservation slots could be prohibitive for
multimedia services.
However, polling could be used as another mechanism for indicating the
transition of a source from idle to active. A base station allocates periodically
traffic slots to idle sources. The time interval between two traffic slots allocated
to the same idle sources could be based on the delay requirements of the source
or on the statistics describing the source transitions. Bandwidth is lost because
the base station could poll inactive terminals. Assuming that one traffic slot is
assigned every 20 ms to an idle source (to compensate for maximum affordable
delay), the wasted bandwidth is 19.2 Kbps per idle source, i.e., three to four
times less than that in the case of PRMA. Moreover, polling guarantees stability
and upper bounded delays for the indication of active source states in contrast
with the contention in the reservation slots of PRMA.
Furthermore, short time slots, called reservation subslots, could be used
by terminals to indicate transition of the source to active or the existence of a
packet ready for transmission. Based on the information carried in the reservation subslots, the base station could assign traffic slots to sources or it could
keep excluding idle sources from the bandwidth allocation. Assuming that every
subslot contains 8 bits of information and is given to the same idle source every
16 ms, the bandwidth assigned for reservation subslots is 500 bps per idle
source. Then 128 sources could be simultaneously multiplexed to have the same
wasted bandwidth as in PRMA. Different kinds of services have different delay
requirements, so the polling interval could be different for different connections.
This could lead to a multiframe structure on the air interface that consists of different cycles corresponding to connections with different polling requirements.
A class of connections could be polled (using the subslots) every 1 ms (polling
channel: 8 Kbps per connection), another class every 6 ms (1.3 Kbps), etc.
Introducing reservation subslots, information transfer and bandwidth reservation are separated, and contention in the traffic slots is avoided.
The ATM WLAN will support not only temporary virtual connections
that exist for the duration of the corresponding call, but also semi-permanent
connections that could exist on a constant basis (i.e., the connection between
an X-terminal and an X-server). When semi-permanent virtual connections
are estabUshed it is expected that they will be highly asynchronous, so it will
not be efficient to apply polling to them. It is better to provide a contentionbased mechanism for such connections. Moreover, it is more likely to have
successful transmission when many reservation subslots are available and
not when there are few and longer time slots. To indicate pending bursts, a
886
APOSTOLAS ETAL.
Figure 14 presents the air interface that supports the features presented
in the previous sections. The air interface consists of a sequence of logical
frames (Fig. 14). Every frame contains traffic time slots where the base station
and the wireless terminals transmit ATM cell-like packets, as well as reservation subslots accessed only by the wireless terminals. Reservation subslots
are used to indicate the existence of a packet burst or report the bit rate of
FRAME N - l
FRAMEN
FRAME N + 1
887
a source expressed in packets per frame. They are accessed following slotted
ALOHA, but periodically some of them are assigned to specific connections for
polling. For this reason, the base station indicates which subslots are used for
polling and which can be used for contention, by broadcasting a packet with
the relevant information. Moreover, the transmissions in the reservation subslots are acknowledged by the base station to indicate collisions that possibly
occurred.
Each frame consists of three groups of traffic slots. Each group has 15
traffic slots, the first accessed by the BS for broadcasting a packet, indicating
the connections that will use the other 14 time slots of the group. So each frame
has in total 45 time slots. All the signaling messages and the user information
are carried in the transmitted packets within the traffic slots. The reservation
subslots carry only short messages related to the MAC layer.
5. Mechanism for QoS Assurance
During call setup, the BS decides about the traffic pattern of each new
connection. Based on the delay requirements of the supported service, the BS
determines the polling periods that will be used for the new connection when the
corresponding source is idle (how often it will be polled if idle). Moreover, the
traffic contract is negotiated between the BS and the corresponding terminal.
For bursty sources the traffic contract specifies the number of packets that will
be guaranteed with bandwidth within a certain time interval. This value is
based on the required average bit rate declared by the terminal. For continuous
VBR sources, the traffic contract specifies the guaranteed bandwidth of the
connection based on the declared average bit rate of the source. During the call,
the traffic contract is used by the BS to assign access priorities to the generated
packets.
For highly asynchronous (on-off) connections, every new packet burst is
indicated using the reservation subslots. It is assigned with an access priority
based on its length, the time of occurrence, and the length of the previous burst
in that connection. Moreover, different packets of the burst could be assigned
with different access priorities.
For continuous VBR connections, a new bit rate is reported using a reservation subslot. The new rate is used by the BS to determine the generation
instants of the new packets in the corresponding terminal. Each new packet is
assigned with an access priority value based on the guaranteed bandwidth of
the connection and the reported bit rate.
Four levels of priority are defined in the MAC layer of the ATM WLAN:
1. Priority zero, being the highest, corresponds to a guaranteed bandwidth
of connections and is given to packets that will be definitely transmitted at the
guaranteed rate.
2. Priority one is assigned to packets carrying urgent control messages (like
handoff messages).
3. Priority two is given to packets that do not violate the traffic contract
but still correspond to a bit rate higher than the guaranteed bandwidth of
the connection. This happens when sources transmitting at a bit rate lower
than the guaranteed bandwidth start transmitting at a bit rate higher than the
888
APOSTOLAS TAL.
guaranteed bit rate. The combination of priorities zero and two will force VBR
services to approach the guaranteed bit rate under a high traffic load in the
network, while under light traffic conditions, it will provide with low access
delays.
4. Priority three is given to packets violating traffic contracts. When the
network is overloaded, priority three packets are dropped and the network is
not driven into congestion. However, under low traffic load conditions, priority
three packets can be transmitted.
The priority assigned to the packets corresponding to different rates could
be given following two ways:
Statically during call setup when a set of relations of the type {rate,
priority} is defined. Then the source is not monitored during the call. For example, priority zero is always assigned to a rate up to 10 Mbps, priority two is
assigned to an excess rate (from 10 Mbps) up to 15 Mbps, and priority three
is assigned to all the excess rates above 15 Mbps.
Dynamically based on the agreed traffic contract and on the conformance
of the source to that contract. Then the base station monitors the traffic behavior of each source and relates the current state to the past history of the
traffic.
The assignment of access priorities is implemented as follows: For every
asynchronous source there is a timer to indicate the time instant when a number
of tokens, equal to the length of a conforming burst, will be generated. The
required information is retrieved from the traffic contract. Whenever a new set
of tokens is generated, tokens possibly left during the previous cycle are now
dropped. For a continuous VBR connection, state report instants are defined.
At these instants the source reports the current bit rate. At the base station, the
reported rate R is used to define the instants when a timer will indicate packet
generation at the source. If RG is the priority zero rate and RH is the priority two
rate then it is R = RH + Rxj where Rx is the priority three rate. The values RG
and RH are used to create tokens with access priorities zero and two, according
to the algorithm in Fig. 15. According to this algorithm, priority two tokens
are generated based on RH and some of them are characterized as priority zero
tokens based on rate RG- A similar approach is followed when a new packet
burst is generated.
Figure 16 shows how access priorities are assigned to packets of a virtual
connection. As mentioned, the generation of packets is estimated by the BS
based on the rate R and the instant of the state report.
The output of the mechanism for QoS assurance is a sorted fist of active
virtual connections granted with time slots on a first come first serve policy.
The input of the mechanism consists of the fast bandwidth requests contained
in the reservation subslots, the knowledge of the activity of the sources, the
property of whether connections have guaranteed bandwidth, and the traffic
contracts negotiated during call setup. Packet bursts with equal priorities are
sorted based on the access delay they have experienced. This access delay is
estimated by the base station based on the instance when the indication of the
burst occurred.
889
890
APOSTOLASETAL.
Number of terminals
Number of terminals
F I G U R E 17 (a) Mean access delay and (b) packet dropping probability in ATM W L A N .
6. Adaptive T D M A Peiformance
The capability of the proposed algorithm to apply traffic policing on connections is proven by simulation of a network with variable bit rate (VBR)
video sources. The following approach was followed: Initially the capacity of a
coverage area (in terms of number of sources) was estimated by calculating the
packet dropping probability when all the sources transmit at the same average
bit rate of 2 Mbps (Fig. 17) and conform to that average. Assuming 2 % as
an acceptable packet loss probability, the capacity is 8 VBR sources (16 Mbps
on average). Then the packet loss probability is calculated when five sources
conform to the average rate (2 Mbps VBR) while the other three transmit at
a higher average rate (2.5 to 4 Mbps per source) to simulate nonconforming
sources (Fig. 18). From Fig. 18, it is obvious that the packet loss probability of
the conforming sources remains unaffected by the presence of the nonconforming connections. Moreover, the latter experience high packet loss because the
network prevents the allocation of radio capacity to them, in order to preserve
the bandwidth and the QoS of the conforming connections. However, when no
traffic policing applies, conforming connections experience increased packet
loss, which reduces the user capacity of the system.
12 n
0.5
11
0.4
10
0.3
Pd
7-
*
\
1800
^
3250
4800
6400
0.2
0.1
0
0
1800
3250
4800
6400
F I G U R E 18 (a) Mean access delay and (b) packet dropping probability in ATM W L A N for
(> A)conforming (2 Mbps) and (^, ) nonconforming (2.5... 4 Mbps) VBR sources ( f , ) with or
( A . ) without traffic policing.
89 I
892
APOSTOLAS T/\L.
the ATM layer and decreases the throughput at the TCP layer simultaneously.
ATM cell loss because of congestion (buffer overflows) causes TCP throughput
degradation because TCP segments with lost ATM cells must be retransmitted.
Moreover, the congestion avoidance mechanism of TCP is invoked and reduces
TCP throughput. More bandwidth is wasted because of the transmission of
the nondropped (useless) ATM cells belonging to TCP segments that will be
retransmitted. The partial packet discard (PPD) [11] algorithm is proposed as
a solution but is found to be nonoptimal. However, the EPD [11] algorithm
provides optimal throughput. As soon as a buffer occupancy exceeds a certain
threshold, whole TCP segments are discarded. However, this results in unfair
sharing of bandwidth among UBR connections. Moreover, [11] investigates experimentally the performance of EPD and PPD with respect to the size of the
ATM switch buffers, TCP windows, and TCP segments. In order to improve
the TCP throughput one should avoid small ATM switch buffers, large TCP
windows, and long TCP segments. These are issues affecting the support of
TCP over an ATM WLAN.
In [28], experimental results of TCP traffic over ATM connections are
discussed. To achieve the maximum throughput, no TCP segment loss must be
achieved. Thus, the buffer size of the ATM switches must be equal to the sum of
the receiver window sizes of all the TCP connections. If no specific algorithms
are deployed, there is a problem of unfairness experienced by TCP connections
over UBR traffic class. It is claimed that EPD alleviates the problem of poor
TCP throughput but does not improve fairness.
C. TCP Behavior over Wireless Links
TCP deals only with congestion and does not take into account packet loss
because of increased BER. Invoking congestion avoidance algorithms to deal
with channel errors on the wireless link results in highly degraded throughput.
The existing schemes dealing with the above-mentioned problem are classified into three basic groups: end-to-end proposals, split-connection proposals,
and link-layer proposals [29]. The end-to-end protocols attempt to make the
TCP sender handle losses through the use of the two techniques. First, they use
some form of selective acknowledgements (SACKs) to allow the sender to recover from multiple packet losses without invoking congestion control mechanisms. This could cause an improvement of TCP throughput on the order of 1030%. Second, they attempt to have the sender distinguish between congestion
and other forms of losses using an explicit loss notification (ELN) mechanism.
Furthermore, split-connection approaches completely hide the wireless link
from the sender by terminating the TCP connection at the base station. Such
schemes use a separate reliable connection between the base station and the
destination host. The second connection can use techniques such as negative
or selective acknowledgements, rather than just standard TCP, to perform well
over the wireless link. Indirect TCP [29] follows this approach. It involves splitting each TCP connection between a sender and a receiver into two separate
connections at the base station: one TCP connection between the sender and
the base station, and the other between the base station and the receiver. However, the choice of TCP over the wireless link results in several performance
893
problems. Since TCP is not well tuned for the wireless link, the TCP sender of
the wireless connection often times out, causing the original sender to stall. In
addition, every packet must go through TCP protocol processing twice at the
base station (as compared to zero times for a nonsplit connection approach).
Another disadvantage of this approach is that the end-to-end semantics of TCP
acknowledgements is violated, since acknowledgements to packets can now
reach the source even before the packets actually reach the mobile host. Also,
since this protocol maintains a significant amount of state at the base station
per TCP connection, handoff procedures tend to be complicated and slow.
The third class of protocols, link-layer solutions, lies between the other
two classes. These protocols attempt to hide link-related losses from the TCP
sender by using local retransmissions and perhaps forward error correction
over the wireless link. The local retransmissions use techniques that are tuned
to the characteristics of the wireless link to provide a significant increase in
performance. The main advantage of employing a link-layer protocol for loss
recovery is that it fits naturally into the layered structure of network protocols.
The link-layer protocol operates independently of higher layer protocols and
does not maintain any per connection state. The main concern about linklayer protocols is the possibility of adverse effect on certain transport layer
protocols such as TCP. The snoop protocol [29] is a link-layer protocol that
takes advantage of the knowledge of the higher layer transport protocol (TCP).
It introduces a module, called the snoop agent, at the base station. The agent
monitors each packet that passes through the TCP connection in both directions
and maintains a cache of TCP segments sent across the link that have not yet
been acknowledged by the receiver. A packet loss is detected by the arrival of
duplicate acknowledgements from the receiver or by a local timeout. The snoop
agent retransmits the lost packet if it has the packet cached and suppresses the
duplicate acknowledgements. Considering the above-mentioned classification
of protocols, the main advantage of this approach is that it suppresses duplicate
acknowledgements for TCP segments lost and retransmitted locally, thereby
avoiding unnecessary congestion control invocations by the sender.
D. TCP over ATM W U N
Figure 19 presents the network scenario and the protocol reference model used
for the simulation of TCP over the ATMWLAN. Two ATM WLAN segments
are defined. Terminals in one segment transmit to terminals in the other segment
and vice versa. Thus all the ATM traffic must go through the two access points
and the presented backbone ATM switch.
The following operations take place in the layers of the presented protocol
reference model:
A Poisson traffic generator is considered with an average interarrival time
of 40 ms and a fixed packet length of 4096 bits. Thus, the average bit rate of
the traffic source is 100 Kbps. Furthermore, the available bandwidth for TCP
transmissions in the air interface is fixed at 1 Mbps.
TCP layer implements the slow start algorithm and congestion avoidance algorithm, as well as fast retransmit and fast recovery [10]. When a
894
APOSTOLASETAL.
Application
TCP
AAL5
ATM
WATMMAC
ATM
MAC
TC
Tx/Rx
Radio
Radio Wire
Terminal
BS
F I G U R E 19
work entitles.
- ^ Tx/Rx Tx/Rx
ATM 1
_ TC__
Tx/Rxl
Wire 1
ATM
switch
(a) Network configuration for TCP over WATM. (b) Protocol reference model of net-
895
Number of Transmitters
FIGURE 20 TCP layer throughput: () with and (A) without EPD.
Two cases were considered for simulation. The first one examines the TCP behavior as the traffic load in the network increases while the second investigates
the TCP performance when transmission errors are introduced (in terms of cell
loss rateCLR).
I. T C P vs Traffic Load in A T M W L A N
In the first case no CLR is assumed, and the behavior of the TCP connections is examined for different traffic loads (expressed as the number of enabled
transmitters) and when EPD is active. Two main observations occur from this
case. The first is that when EPD comes into effect, there is a certain improvement
in TCP performance in terms of throughput. This is shown in Fig. 20 where
the average TCP throughput is drawn. The second observation is that there is a
limit on the traffic that can be accepted in order to achieve maximum throughput. As shown in Fig. 20, maximum throughput is achieved for seven users
while for eight and up to ten users there is a significant throughput decrease.
The same phenomenon has been reported in [11] where the TCP throughput
decreases as the number of TCP connections increases. This shows that TCP
cannot prevent congestion at the air interface and that the traffic monitoring
mechanisms presented in Section VI are necessary.
2. Effect of Paclcet Retransmissions on T C P over an A T M W L A N
In this case the total traffic load in the air interface is kept constant (seven
enabled traffic sources corresponding to the maximum achievable throughput),
but the CLR is varied from 0 up to 0.1. Moreover, a cell retransmission scheme
in the WATM MAC is adopted to examine the potential improvement of the
system performance. Figure 21 shows the throughput degradation in the TCP
level for different CLR values, when no cell retransmissions take place. Moreover, it is shown how retransmissions improve the throughput performance of
TCP. For low CLR and adopting retransmissions, the throughput stays close
to the maximum achievable. For the no retransmission cases the congestion
avoidance algorithm is invoked, decreasing dramatically the offered traffic in
the network.
896
APOSTOLAS T/\L.
0.001 0.005
0.01 0.015
0.02
0.05
0.1
CLR
F I G U R E 21 Effect of packet retransmissions on TCP throughput: ( ) retransmissions and ( A )
retransmissions disabled.
ATM switch
PG#2
OldBS-1
New BS-:
Area 1,
Movement
F I G U R E 22
Mwement
(a) Mobility management scenario in ATM WLAN. (b) Backward handover protocol.
897
ATM switch, a new connection between the new BS and this ATM switch is
required (the path between the source and this switch does not need to be modified). This can be done from the old BS by sending a handover call setup (la^lb)
message to the new BS. (It is assumed that either the BS or the ATM switch
keeps a record of the initial ATM setup message and can use this information
for the construction of the handover call setup message, if more than one switch
are involved.) The new BS responds with a handover call connect (3) message,
and the new connection is established. When the MS approaches the new BS it
sends to it a handover request (4) message (it includes the address of the new
BS). This causes the new BS to send a handover connection redirection (5) message to the ATM switch, to indicate that the ATM cells should now be rerouted
over the new path. The switch sends a handover call release (6a) message to
the old BS. The old BS sends back to the switch any undelivered ATM cells and
marks the end of them with a handover call release ACK (7a) message. This way
no additional cell loss will take place. A handover call terminate (7b) message
could be included in the protocol, so the old BS could inform the MS for the
termination of their connection. When the redirection has been performed, the
ATM switch sends a handover redirection ACK (6b) message to the new BS.
The new BS sends a handover request accept (8) message to the MS. Finally,
the MS sends to the new BS a handover done (9) message, indicating that it is
ready to transmit and receive data.
However, if the old radio link, between the MS and the old BS, fails, message
(1) cannot be sent. Upon detection of the failure, the MS sends the message (4)
to the new BS. Since the old BS can be found, a handover hint (1) message is
sent to it, from the new BS, on behalf of the MS (proxy signaling concept);
a similar message flow takes place. Because the establishment of the new link
takes some time, data may be lost during this handover execution.
B. Location Management in ATM WLAN
The previous algorithm also maintains the connection of the MS inside the ATM
WLAN while it is moving from BS 1 to BS 5 (the message are similar to those
shown in Fig. 22a. ATM switch #1, upon reception of the message (2a) finds the
new ATM switch #2 (since the address of the new BS is known) and transmits
the relevant message. Using the existing P-NNI protocol, the new switch can be
found, a new connection from the old (#1) to the new (#2) switch can be established, and the handoff can be executed, as described previously. However, if the
optimal route (either the shortest route or the route with the lowest cost that can
be chosen using P-NNI routing) between the network and the MS is required,
along with the avoidance of loops, the following algorithm could be used:
Step 1. Upon receipt of message (2a), switch #1 checks the address of
the new BS to find whether the new BS is attached to it. If true, then switch #1
establishes a new connection with the new BS (as shown in Fig. 22 b). If false,
go to step 2.
Step 2. The new BS is registered to a switch (#2 in the example) that
belongs to the same lower level PG, as switch #1. If true go to step 3; else go
to step 4.
898
APOSTOLAS ETAL.
Step 3. At this stage, one of the switches in the PG is the border switch that
initially receives the traffic (ATM cells) for the MS that requested the handover.
If switch #1 is this border switch, it can send message (2a) to switch #2; the
protocol can continue as described earlier. If switch #1 is not this border switch,
it sends the message (2a) to the switch that transmits the traffic ATM cells to
it. Step 2 can then be executed at the new switch.
Step 4. In this case switch #1 is not able to locate in its PG the ATM
switch that controls the new BS. Although the new BS could be reached, using
the mechanism described in the P-NNI, it is not guaranteed that the chosen
route will be the optimal one. A centralized approach has been considered for
this case. It is assumed that the mobile agent (MA), shown in Fig. 22a, has
knowledge of the location of the MSs in the ATM LAN (such information
could be obtained from the registration process of the MSs during power-on)
and the connection point that is being used by each one of the connections (this
information is available to the MA if the ATM switch/gateway behind each connection informs the MA after the establishment of a connection). Then the ATM
switch, after receiving the message (2a) from the old BS, advances the handover
request message (2a) to the MA. The MA requests from the switch/gateway that
carries the traffic of the connection (the user of the connection that requested
the handover) to establish a new link between itself and the new BS.
The locality of the user movement and the distributed nature of the
P-NNI protocol have been taken into consideration. The introduction of the
MA (home/visitor location register) is necessary since the "mobile connections"
have the additional requirement of the location management over the "static"
ones and the need of a temporary ATM address. This distributed location
database scheme can operate with the location management proposals in the
ATM Forum [31]. To reduce the location update traffic, the MS informs the
MA, when it moves from one switch to another. Further reduction could be obtained if the location update takes place when the MS moves to a new PG, since
the P-NNI dynamic routing is based on the traffic load of the links among the
switches. The MA is also expected to be used for the registration/deregistration
and authentication of the MS with the ATM WLAN (as in the GSM). Extension
of the handover protocol between two ATM WLANs could be implemented in
a similar way, if the two MAs exchange the messages defined in the WATMUNI case. Partitioning the address space into mobile and fixed addresses is not
necessary, since the MA hides to the rest of the world the nature of the address
of a station.
IX. CONCLUSION
ATM WLANs will be a key technology for the provision of multimedia services to mobile and portable terminals by wireless means. This chapter presented the migration path from existing fixed ATM LAN technologies to the
new concept of a ATM wireless LAN. The latter inherits the architecture and
control protocols of fixed ATM networks and includes the mechanisms that
characterize ATM technology and offer QoS to supported services. It was
899
shown that a cellular approach combined with existing ATM protocols can
support mobile users. The dynamic multiple-access scheme based on adaptive
TDMA that was proposed is able to multiplex different kinds of applications,
enable dynamic bandwidth allocation, and provide QoS to established connections. Moreover, compared to contention-based schemes it offers higher user
capacity. Finally it was shown that an ATM WLAN is able to support TCP
and alleviate its problem of low throughput, providing wireless access to the
Internet.
REFERENCES
1. ATM Forum Technical Committee. Proposal for mobile ATM specification development. ATM
Forum, Apr. 1996. Available at http://atmforum.com.
2. Special Issue on Wireless ATM, IEEE Personal Commun, 1(4): Aug. 1996.
3. Special Issue on Wireless ATM, IEEE Commun. Magazine 35(11): Nov. 1997.
4. Kruys, J. Standardisation of broadband radio access networks in ETSFs BRAN Project. In
proceedings of the EU Wireless Broadband Communications Workshop, Brussels, Belgium,
Sept. 1997.
5. DE/RES-3001-1. Radio Equipment and Systems (RES); Digital European Cordless Telecommunications (DECT) Common Interface; Part 1: Overviev^, ETSI, ETSI TC-RES, Jan. 1995.
6. IEEE, Draft Standard IEEE 802.11, Wireless LAN, P 8 0 2 . i l / D l , Dec. 1994.
7. ETSI PT41 and RES 10. Radio Equipment and Systems (RES); High Performance Radio Local
Area Network (HIPERLAN); Functional Specification, ETSI, Jan. 1995.
8. Buitenwerf, E. G., Colombo, H. Mitts, and Wright, P. UMTS: Fixed network issues and design
options. IEEE Personal Commun. 2(1): 28-37, 1995.
9. Telecommunication Standardization Sector of ITU. Series Q: Switching and Signaling, Intelligent Network, Framework for IMT-2000 Networks, Q.1701, Mar. 1999.
10. Stevens, W R. TCP/IP Illustrated, Vol. 1, The Protocols. Addison-Wesley, Reading, MA, 1994.
11. Romanow, A., and Floyd, S. Dynamics of TCP traffic over ATM networks. IEEE J. Selected
Areas Commun. 13(4): 633-641, 1995.
12. ATM Forum Technical Committee. Traffic management specification, ver. 4.0. ATM Forum,
Apr. 1996. Available at http://vvrvvw.atmforum.com.
13. Onvural, R. Asynchronous Transfer Mode Networks: Performance Issues. Artech House,
Norwood, MA, 1994.
14. ATM Forum Technical Committee. UNI 3.1 Specifications. ATM Forum, Sept. 1994. Available
at http://wwrvs^.atmforum.com.
15. Perros, H., and Elsayed, K. Call admission control schemes: A review. IEEE Commun. Mag.
82-91, Nov. 1996.
16. ATM Forum Technical Committee. ATM user-network interface signaling specification, ver
4.0. ATM Forum, Mar. 1995. Available at http://wsvw.atmforum.com.
17. Washburn, K., and Evans, J. T. TCP/IP, Running a Successful Network, pp. 411-429. AddisonWesley, Reading, MA, 1994.
18. DE/RES-3001-3. Radio Equipment and Systems (RES); Digital European Cordless Telecommunications (DECT) Common Interface; Part 3: Medium Access Control Layer, ETSI TC-RES,
Jan. 1995.
19. Maglaris, B. Performance models of statistical multiplexing in packet video communications.
IEEE Trans. Commun. 36: 834-844, 1988.
20. Bantz, D. F. Wireless LAN design alternatives. IEEE Network Mag. Mar./Apr. 1994.
21. Monogioudis, P. Spectral Efficiency of CDMA for Personal Communication Networks. Ph.D.
Thesis, University of Surrey, 1994.
22. Pahlavan, K. Wireless intraoffice networks. ACM Trans. Office Inform. Systems 6(3): 277-302,
1988.
23. Goodman, D. J. Factors affecting the bandwidth efficiency of packet reservation multiple access.
In Proc. 39th IEEE Vehicular Technology Conference, 1989.
900
APOSTOLASETAL.
24. Tanenbaum, A. Computer Networks. Prentice-Hall, Englewood Cliffs, NJ, 1988.
25. DTR/SMG-023006U. Universal Mobile Telecommunications System (UMTS); UMTS Terrestrial Radio Access (UTRA); Concept evaluation, ETSI SMG2, Dec. 1997.
26. Proakis, J. G. Digital Communications, 3rd ed. McGraw-Hill, Singapore, 1995.
27. Lakshman, T. V. and Madhow^, U. The performance of TCP/IP Networks with high bandwidthdelay products and random loss. IEEE/ACM Trans. Networking 5(3): 336-350, 1997.
28. Goyal, R., Jain, R., Kalyanaraman, S. et al. UBR-I-: Improving performance of TCP over ATMUBR service. Available http://www.cis.ohio-state.edu/~jain/papers/icc97.ps
29. Balakrishnan, H., Padmanabhan, V., Seshan, S., and Katz, R. A comparison of mechanisms for
improving TCP performance over wireless Links. Available at http://daedalus.cs.berkeley.edu.
30. C-Toh, The design and implementation of a hybrid handoff protocol for multimedia wireless
LANs. In First International Conference on Mobile Computing and Networking, Berkeley, CA,
1995.
31. ATM Forum Technical Committee. BTD-WATM-01.11, Draft Wireless ATM Capability Set 1
Specification, Sept. 1999. Available at http://v^rvvw.atmforum.com.
SUPPORTING HIGH-SPEED
APPLICATIONS ON SingAREN'
ATM NETWORK
N G O H LEK-HENG
SingAREN, Kent Ridge Digital Labs, Singapore 119613
LI HONG-YI
Advanced Wireless Networks, Nortel Research, Nepean, Ontario, Canada K2G 6J8
I. BACKGROUND 902
II. ADVANCED APPLICATIONS O N SingAREN 903
A. Advanced Applications over High-Speed Networks 904
III. ADVANCED BACKBONE NETWORK SERVICES 905
IV SingAREN "PREMIUM" NETWORK SERVICE 908
A. Design Goals 909
B. Desired Multicast Properties 909
C. Support for Host Mobility 910
D. Application-Oriented Traffic Aggregation 910
E. Coexist with Other Signaling Solutions 910
F. Scalable Design 910
V KEY RESEARCH CONTRIBUTIONS 911
A. Open Signaling 911
B. Multicast as a Basis for All Connections 912
C. Dynamic Logical Multicast Grouping for the Support
of Host Mobility 912
VI. PROPOSED DESIGN 913
A. End-User Multicast Signaling InterfaceService Binding
Agent (SBA) 914
B. Switch-End Signaling InterfaceVirtual Switch 914
VII. MULTICAST SERVICE AGENT (MSA) 915
A. Resource Management Agent (RMA) 915
B. Group Management Agent (GMA) 916
C. Routing Agent (RA) 917
D. Connection Management Agent (CMA) 919
E. Performance Analysis for Connection Setup 920
VIII. SCALING UP TO LARGE NETWORKS WITH MULTIPLE
MSAs 921
^ The Singapore Advanced Research and Education Network.
90 I
902
NGOH AND LI
This chapter focuses on the design of a muhicast connection service for asynchronous
transfer mode (ATM) networks. In particular, the solution proposed offers what is
termed the "premium backbone service" to the backbone connecting nodes of the
Singapore Advanced Research and Education Network (SingAREN)a high-speed advanced broadband network serving the R&D community in Singapore and internationally. Main motivation of this work began from the need to provide scalable quality-ofservice guarantees on SingAREN backbone to support high-speed applications. Existing
ATM signaling solutions proposed by the ATM Forum and others are not suitable in the
SingAREN heterogeneous environment with a wide variety of communication requirements. Furthermore, current solutions must be modified extensively in order to support
host mobility brought about mainly by wireless access. The work described here addresses these shortcomings by proposing an integrated signaling and connection service
architecture, and related algorithms for the setting up of ATM virtual channels across
interconnected ATM switches. Some of the important concepts elaborated in the design
include the notion of open signaling, the use of logical multicast groups to handle all
connections, traffic aggregation, and seamless support for host mobility. The proposed
design is described here using some details of a prototype implementation currently underway. A distributed routing scheme for ensuring that the proposed design is scalable
to large worldwide ATM networks, thus making it suitable for future international trials
with SingAREN's international counterparts, is presented. Finally, performance analysis
and measurements are applied to routing and connection setup in order to understand
their limitations.
BACKGROUND
The Singapore Advanced Research and Education Network (SingAREN) [32]
is a national initiative to create a very high-speed platform to support R&D and
advanced netv^orking technology development, serving users from academia,
research organisations, and industry. One of the key objectives for SingAREN is
to provide technological support for Singapore ONEthe nation's commercial
ATM broadband infrastructure [33]. To help achieve this, SingAREN operates
an advanced ATM-based research backbone network linking together the networks in local universities, research organizations, and industry R&D partners.
In addition, SingAREN has its international links currently connecting similar
networks in the United States, Canada, and Japan. The entire network is used
903
SmgAREN Network
Singapore NE
1-NET SWITCHING NETWORK
* ADSL
NetwOTl Serviee Modem
/ I \ Providi Nttweric
R&DPamiOT
F I G U R E I SingAREN network infrastructure showing its connections to Singapore ONE and international linl<s.
by the connecting partner sites to conduct a wide range of activities from testing new network software and protocols to international collaborative IP-based
application developments such as telemedicine. Figure 1 depicts the SingAREN
network infrastructure and its relationship with Singapore ONE.
Similar to many other ATM-based R&D networks elsewhere, the
SingAREN backbone network provides a basic IP connectivity to its connecting sites through a statically configured, fully meshed logical network using
ATM permanent virtual circuits (PVCs). All routing takes place at the various
IP routers, each belonging to individual organizations and that of SingAREN.
Standard routing algorithms such as the broader gateway protocol (BGP) are
employed. Under this configuration, all IP packets (unicast and multicast) between sites are carried over one or more PVCs, which act as "fat pipes." Under
this arrangement, all IP packets to the same destination are carried indiscriminately over the same PVC; therefore, no link-level QoS guarantees can be provided, and multicast capabilities of the backbone switches are also not easily
exploited.
904
NGOH AND LI
905
biology), and defence. Applications and content constitute the value to users
over the physical network infrastructure, which can be considered as the
"plumbing."
These advanced applications can also be grouped into several classes based
on their characteristics:
Multimedia. These refer to applications such as audio and video conferencing, distribution of multimedia content, and reliable and efficient multicasting of such content.
Interactive and collaborative environments. Although shared workspaces
such as CSCW can enable general collaboration as mentioned above, more
advanced and domain-specific applications such as the cave automatic virtual
environment (CAVE) and distributed interactive simulation (DIS) environment
need to be developed to enable new ways of visualizing data and interacting
with complex scenarios.
Time-sensitive applications. These refer to applications having a time
constraint or delay bound for transmitting and receiving data, such as interactive applications, real-time data delivery, distributed computing, remote sensing
and monitoring, and distributed control, e.g., of a scanning electron microscope. For such appUcations, data that arrive after the specified deadline have
significantly reduced value.
Sharing of scarce resources. These refer to new ways of connecting users
to resources geographically distant from them, such as databases and specialized equipment, e.g., in manufacturing and supercomputing. As these applications typically do not have strict timing requirements, the key networking
requirement is high bandwidth.
Networking research and middleware. These refer to new networking
protocols, software, and equipment developed to improve upon the existing
software and hardware network infrastructures, such as ATM over long distances, QoS, RSVP, IPv6, multicast, and reliable bulk transfer over high-speed
links.
Table 1 shows some examples of international collaborative projects currently supported by SingAREN.
906
NGOH AND LI
TABLE I
Project
XtalBench technology
Tele-education
Tele-manufacturing/Tele-design
Tele-medicine
Kent Ridge Digital Labs and Virginia Polytechnic Institute and State
University (USA)
IPv6 testbed
Traffic monitoring in
international links
(Japan)
Satellite technology
Application of high-order
statistics systems to
confirmation radio
therapy treatment
907
the use of IP multicast and appropriate audio/video tools. Works are currently
underway to explore the use of multicast in other areas such as distributed
real-time simulation. IP multicast itself is also undergoing further research,
which promises to add real-time QOS guarantees and improved scalability
[2,3,34].
ATM networks, on the other hand, developed independently of the Internet activities in the late 1980s. Realizing the importance of supporting group
communications, all switches support one-to-many (i.e., one sender and many
receivers) virtual channels (VCs) through special hardware design at the switch.
Unlike the IP multicast case, however, there are severe limitations, which include
one-to-many VCs are unidirectional and allow only semi-static setup through
the standard signaling procedures [15]. Furthermore, all receiver VCs are expected to have the same QOS throughout the connection lifetime. Support for
many-to-many multicast is accomplished by a mesh of one-to-many VCs or
a multicast "reflector" approach [4,30], which makes relatively inefficient use
of network resources. Another major problem is that support for interswitch
ATM multicast routing is still not fully implemented, even though the signaling support for setting up one-to-many connections across multiple switches
has been proposed [5,16]. Recently, various techniques that come under the
general heading of "IP switching" or "multiprotocol label switching (MPLS)"
have also been proposed by different vendors. The key here is to provide a
more efficient support for providing IP services (including multicast) over ATM.
While these various techniques have proven to be more efficient and are gaining market acceptance, however, they do not interwork well with each other,
thus making them unsuitable in a multivendor environment like the SingAREN
backbone.
Perhaps an even more important reason for SingAREN to have a new
signaling and control solution is the need to support wireless networks in the
form of a wireless access network or wireless LANs connecting to the backbone.
Broadband wireless communications are becoming more and more important
due to the rapid advancements of related hardware and software that resulted
in great improvements in the multimedia capabilities of laptop PCs and even
cellular phones. Emerging technologies such as wireless ATM [17], wide-band
code division multiplex access (WCDMA) [31], and local multipoint distribution systems (LMDS) [31] will no doubt pose new challenges to the wired
broadband backbone networks such as SingAREN in the area of interworking and service interoperability. Existing ATM signaling and control solutions
described above will have to be extensively modified in order to support issues
such as host mobility.
To overcome these problems in supporting multicast in SingAREN's ATM
networks, a new scheme in achieving an integrated ATM multicast service called
the premium service is described here. The proposed solution seeks not only to
address the current IP service limitations, but also includes support for mobile
hosts. To date, most of the components described in this chapter have been
implemented in the SingAREN network consisting of multivendor switches.
Further proofs of design scalability in larger networks is provided through an
analytical approach. The rest of this chapter is organized as follows. Section IV
outlines the design goals of the proposed solution suitable for SingAREN; this is
908
NGOH AND LI
followed in Section V by a presentation of the key research ideas that shape this
work. The details of the proposed multicast service are described in Sections VI
and VII. In Section VIII, the issue of sclability of the proposed routing solution is addressed together with an implementation of the signaling network.
The solution to how host mobility is supported is described in Section IX.
Finally in Section X some conclusions and future directions of this work are
outlined.
FIGURE 2
909
SingAREN NOC
Centre
" ^
M i l l J)
Science Park II
\^^
>^
Fore ASX200BX
Panasonic Singapore
Labs
FIGURE 3
SingAREN local backbone ATM network (thick lines) and connecting sites.
A. Design Goals
The long-term objective of the work described here is to become the preferred
solution over the standard ATM signaling in providing connection service for
hosts and networks attached to SingAREN's ATM networks (see Fig. 3). The
various design goals are elaborated next.
B. Desired Multicast Properties
Given that the focus here is to address the issue of multicast at the ATM VC
level, it is appropriate to compare it with the Internet IP multicast where the
concept of group communications is applied at network host level. The works
and operational experience of IP multicast have demonstrated many desirable
properties capable of supporting a wide range of applications both now and
in the future. Some of these properties to be realized in the proposed ATM
multicast service are as follows.
Many-to-many group communications with heterogeneous QOS
guarantees at both sender and receiver ends;
Dynamic membership with leaf-initiated (senders and receivers) group
operations and QOS requests;
Multicast data filtering and merging; and
Support simplex VC connections.
9 I0
NGOH AND LI
91
912
NGOH AND LI
^^-^C^^
n--:^^^^ CCM^
Producers \ (
\^y^
, Consumers .
Switching
Network
^ ^ H
FIGURE 4
913
MSA
SBA Signal
Interface
Access Point
Wireless Terminal
Multicast Routing
Resource Mgmt.
Connection Mgmt
MGMM: Multicast Group
Membership Mgmt
FIGURE 5
protocol (see Section VII. B). In fact it will be explained later in this chapter that
same design, which provides multicast connection service for ATM networks
with only "fixed" hosts, can be used to support mobile hosts with virtually no
change.
9 I4
NGOH AND LI
155 Mbits/s, OC-3 multimode fiber) and PCs (at 25 Mbits/s, UTP-5). It has
been successfully tested on the setting up of both multicast and unicast connections, as well as running video and audio applications over these premium
service connections. Also verified is the feasibility of supporting IP over VC
multicast connections. In the rest of this section, the various components of the
proposed ATM multicast service are presented. The reader is also referred to
related papers [10,11] for a more detailed description of the proposed design.
A. End-User Multicast Signaling InterfaceService Binding Agent (SBA)
As described above, the SBA acts as the interface between the end-users and
MSA for the purpose of accessing the multicast premium service. Given the
goal of supporting the various properties similar to that of an IP multicast as
outlined in Section IV, SBA therefore supports a programmatic interface similar
to that of the IETF Resource Reservation Protocol (RSVP) [12,13]. In addition,
SBA provides the function of giving the application transparent access to the
ATM multicast service (via MSA) and, if necessary, performs service binding for
an IP-level multicast (e.g., binding an ATM VC end-point to its corresponding
IP address).
The way that SBA is implemented is clearly dependent on the intended
platform and environment. For example, SBA can be realized as a set of library
calls that can be linked into a brand new networking application, or run as
specialized "blackbox" hardware with necessary software and operating system
to provide premium service access. With the current SingAREN prototype,
a version of SBA implemented and accessible through a set of World Wide
Web HTML pages has been done. This allows each SingAREN site to access
the service via virtually any host on the network. Using the Web page, the
requesting site is able to indicate the required QOS and how the premium
service is to apply to IP traffic flows generated at either host level or application
level. Once the service has been successfully obtained the subsequent selected
traffic will be diverted and carried over the connection(s) setup by the premium
service transparently. The same SBA also provides mechanisms to be used to
unsubscribe from the premium service.
B. Switch-End Signaling InterfaceVirtual Switch
In this section, we concentrate on the software that is added to operate alongside the ATM switches in order to provide the necessary multicast service, using
the ATM cloud as shown in Fig. 5. The approach involves adding a software
component called a "virtual switch" (VirSwt) in a host attached to each of
the switches. This software provides a common interface to switch functions
such as setting up and tearing down of VCs, as well as modifying the QOS
parameters associated with each VC. To effect the actual change in each switch,
however, it is important to note that since different switch vendors have designed their switch interface differently, to perform the above switch-related
functions on these various switches requires an appropriate switch-dependent
code. Examples of these access mechanisms are the switch-supplied serial-line
(RS-232) management software, the Simple Network Management Protocol
9 I5
(SNMP), and more recently the General Switch Management Protocol (GSMP)
[14]. VirSwt's communicate directly with the MSA via the available communication channel (e.g., IP interface) between them.
Recent research activities in the areas of active network [35] also proposes the use of highly flexible and programmable software components in the
switching nodes. The VirSwt proposed here clearly has the potential of being
expanded to incorporate some other functions such as intelligent data filtering
and merging of IP flows not well supported in today's ATM switch hardware
[21,23,25]. However it is not clear to the authors how these functions should
be realized effectively and implemented in VirSwt or even in MSA (see below).
For now, VirSwt will simply provide the uniform access to the native ATM
switch interfaces.
9 I6
NGOH AND LI
I ^ H
Switch type
FORE-100
FORE-200WG
FORE-200BX
FORE-1000
Scorpio
ATML
32-47
256-1023
32-47
32-47
768-1535
256-1023
of the connection. The RMA determines whether setting up the new connection
violates the QOS guarantee of existing connections in the network domain.
The Hnk state database also records the current connectivity information of
the network domain. Therefore, the current network topology can be derived
based on the connectivity information of the link state database. The topology
of a network can be expressed by a connectivity matrix that varies according
to the QOS requirement of a connection request. All the links that cannot
satisfy the requested QOS are pruned from the set of all possible paths for the
connection. With this reduced set of links, the routing algorithm can compute
the shortest paths for any connection request.
B. Group Management Agent (GMA)
The GMA provides an important function of the multicast service by managing
the membership status and member joining or leaving activities. GMA keeps a
database of multicast group names and the corresponding physical addresses
of the existing group members (see later), so that whenever a group operation is being performed, the relevant group information can be looked up or
updated accordingly. Other information kept by GMA includes the terms of
QOS parameters needing to be guaranteed during the data transmission and
the user-specified filters at each VirSwt.
In a network where multiple MSA domains exist (see Section VIII), a group
management protocol to allow the respective GMAs to exchange individual
domain group membership information is required. One significant observation
made when designing a GMA is that it is also providing the required location
management function, which is needed in a mobile environment. A process
often referred to as the location registration is carried out whenever a host
moves to a new location (see Section IX for more details). This is so that it (the
mobile host) can be located and reached by the connection service subsequently.
It is observed that this same function is provided in a GMA whenever a host
"leaves" a group but "joins" it again at a new physical location, thus, leading
to the conclusion that the proposed design can be made to handle mobile hosts
with little or no change. With this intention in mind, a suitable group naming
and host addressing scheme should therefore be proposed to ensure that a GMA
also plays the role of a location manager. This is described next.
9I 7
The Routing Agent computes a multicast tree that connects producer and consumers by using the link state information in RMA, group membership information, and negotiated agreement in GMA. It also offers routing options to the
receivers that may emphasize different factors on data transmission, such as reliability, real-time, or cost effective paths. When an RA computes the multicast
tree, it selects the strategic switches to invoke user-specified programs such as
filters. The results of an RA are a set of contracts for the (virtual) switches
involved in the computed multicast tree.
1. QOS-Based Routing
9 I8
NGOH AND LI
accurate information of the network topology and link state. This is similar to
the current link state routing protocol such as OSPF where all network nodes
can obtain the estimated current state of the entire network [22].
The network can be modeled as a graph of nodes connected by node-tonode links. Each link has parameters, maximum cell transfer delay (MCTD),
and residual cell rate (RCR). These values are known to the routing algorithm
and we use t^ b to represent MCTD, RCR in the distance function. A simplified distance function 4y reflects transfer delay tij^ and residual bandwidth bij
is defined empirically for each direct link Uj : 4;(^//? hj) = (^itij + (J^ii'^lhjY^
where Uj denotes the link from node / to node /. The exponential form of a
residual bandwidth term compared with the transfer delay term's linear form
shows the emphasis on bandwidth to be exhaustive as a much more important
factor affecting the distance function. The w\^ wi are weights of the two factors
that can be changed by the users in the negotiation phase. Increasing the weight
of a factor may change the emphasis of the routing algorithm. For instance, if
a receiver wants to receive a real-time data stream, it can increase the value
of w/25 which leads the routing algorithm to emphasize more the propagation
during the routing computation.
By using the Dijkstra algorithm [19,24], the shortest path from source
node to the receiver nodes can be computed. If the paths leading to different
receivers travel along the same link, they need to be merged into a single path.
After all the common paths are merged, a multicast tree can be established.
In the merging process, QOS reservations for different receivers need to be
aggregated to form a resource reserved multicast tree. As a result, only the
essential (i.e., the minimum superset) common QOS is reserved on a link for
a common path. A filter that is split from the common link to facilitate the
selective data transmission is assigned with each outgoing path. These filters
select the requested data set for the specified receivers according to their QOS
requests. Therefore, a logical multicast tree with various QOS reservations on
different branches can be established. Note that all the actions described in
the above paragraphs, such as routing computation, resource reservation, and
filter placement, are performed with the memory of an RA (i.e., we only have a
logical multicast tree at this stage). Subsequently, the logical multicast tree will
be converted into separate contracts for all the (virtual) switches involved to
interpret and implement.
2. Node Joining / Leaving a Multicast Tree
919
Merge common
links
Attach point
swipcl
S^^S[5C1
Sw3PCl
(^
Joining node
FIGURE 6
Sw4. The unicast route from source to receiver R4 is indicated by the dash lines.
The attach point for the joining node is in Sw2. The unicast route from S to
R4 has a common link with the existing multicast tree at the link between Swl
and Sw2. This common path needs to be merged into the same path.
The leave operation removes the leaving node from all the multicast trees
currently sending data to it. As in the joining algorithm, the node is disconnected
with each multicast tree separately. For instance, if the receiver R4 wants to leave
the multicast tree as shown in Fig. 6, the leaving algorithm first disconnects the
R4 and then releases the reserved resources for it. The intermediate switch Sw5
checks whether there exists other nodes still attached to itself in the multicast
tree. It will remove itself from the multicast tree if there is no group member
attached to it. Otherwise, the intermediate node frees the resources that were
reserved for the leaving node.
920
NGOH AND LI
be sent back to the CMA to confirm the success of the connection setup. CM A
will forward the response message to the caller after it receives the responses
from all the involved switches. In this way, the switches, producers, and receivers
can be bound into one multicast session.
E. Performance Analysis for Connection Setup
This section gives performance analysis on connection setup time to the proposed MSA connection setup scheme. It can be proved that the MSA's parallel connections setup scheme uses less time than the widely used hop-by-hop
connection setup scheme in setting up nontrivial connections which traverse
multiple hops. Since the following analysis uses the terms such as eccentricity,
radius, and diameter of a graph, we give the definitions of these terms in the
next paragraph.
The topology of a network domain can be represented by a graph G( V, ),
which consists of a set V of elements called vertices and a set of edges that
connect the vertices. We define a path from a vertex u to a vertex v as an
alternation sequence of vertices and edges {vi, ^i, V2, ^25 ? ^k-i, Vk], where
vi = u,vk = V, all the vertices and edges in the sequence are distinct, and successive vertices Vi and z/^+i are endpoints of the intermediate edge ei. We say a
pair of vertices u and z/ in a graph are connected if and only if there is a path
from u to V. If every pair of vertices in G is connected, then we say the graph
G is connected. In a connected graph, the path of least length from vertex u to
a vertex z/ in G is called the shortest path from u to z/, and its length is called
the distance from utov. The eccentricity of a vertex v is defined as the distance
from V to the most distant vertex from v, A vertex of minimum eccentricity
is called a center of G. The eccentricity of a center of G is called a radius of
G denoted by R(G), and the maximum eccentricity among all the vertices of
G is called the diameter denoted by D(G), A subgraph S of a graph G(V, E) is
defined as a graph S(V', E') such that V^ is a subset of V and ' is a subset
in , and the endpoints of an edge in E' are also in V\ We use the G-V^ to
denote the induced subgraph on V-V^ The border node z/ of a graph G(y, E)
is defined as a vertex v e V that is linked to a vertex v^ and v^ ^ V. Assume V
is the set of border nodes of G(y, ), and the set of interior nodes is denoted
by V-V.
The left graph in Fig. 7 shows an example graph where |V| = 8, || = 15.
Four of the vertices are border nodes and each of them has one exterior link. In
this graph, border nodes V = {^, c, J, /"}; interior nodes V - V = {/?, e, g, h]. The
numbers associated with the edges are the length or the distance between the two
nodes. The distance is used to represent aggregated effort for transmitting data
between two nodes. Based on the distance pattern of the graph, we can calculate
the center of the graph as node h with the eccentricity equal to 5, and hence
the radius of the graph is R(G) = 5. The diameter D(G) = 9 is determined by
the most distance pair of nodes a and d. The summarized distance information
between pairs of border nodes is shown in the right graph of Fig. 7, where the
distance between the border nodes is indicated by underlined numbers on the
graph. The summarized distance information enables a routing algorithm to
compute the paths crossing the network domain.
921
FIGURE 7 (left) A graph representation of a network domain, (right) The corresponding summarized
graph.
Ideally, the MSA of a network domain is connected to the center node such
that the maximum distance from the MSA to any node in the domain is less
than or equal to the radius of the network. Assume the time for sending a signal
message is proportional to the distance from the MSA to a node, then the time
used for sending a signal message from a MSA to a switch node is tm = cd8^
where 8 is the distance between the MSA and the node; ct3 is a normalizing
factor that results in time scale. Assume the time for setting up a connection in
any switch is identical and is denoted by ^s- The time used by the MSA to signal
the switch S and to get back the response from S is 2cd8, Therefore, the total
time for setting up a connection at switch S is t^^^^^^ = 2a)8 + ^5. Since MSA is
at the center of a network domain and its distance to any node in the domain
is less than or equal to R(G), we can conclude that for any node in the network
domain, the connection setup time is less than or equal to 2(oR(G) + ^s- The
routing contracts are sent to the switches in a parallel domain by the CMA
such that the time used on setting up any connection in the domain depends
on the switch that causes the longest setup time delay. As mentioned above,
this longest setup time should be less than or equal to 2cdR(G) + 4 and hence
if the configuration of a network is determined, the time used for setting up
any connections within the domain is bounded by 2cdR(G) + ^s no matter how
many hops the connection traverses.
922
NGOH AND LI
domains. This is mainly because single MSA will become a bottleneck when
there are large amounts of connection requests. In addition to that, routing
algorithms will become inefficient when the number of nodes is large. An ATM
network can be partitioned into nonintersection domains, and each domain has
its own unique domain ID. Each network domain is assigned an MSA to handle
all the connection requests within the domain, and hence we name the MSA
as the domain MSA (D-MSA) in large ATM networks. The D-MSA processes
all the required routing and connection setup activities within its domain, and
it also floods the summarized reachability information to all D-MSAs in other
network domains. The reachability information includes the propagation delay
and available resource information between each pairs of border nodes in the
same domain as described in the right graph of Fig. 7. Summarizing reachability information of the border nodes hides the internal connectivity information
from other D-MSAs. The D-MSAs trigger the flooding mechanism in an ondemand fashion (i.e., as long as there is a change in the summarized topological
or reachibility information, the flooding mechanism will be triggered). With
the flooding mechanism, each D-MSA can obtain detailed topological and link
state information of its own domain plus the summarized topological and link
state information of all other domains.
Once the link state information is obtained by all D-MSAs, they can use this
information to compute routes for connection requests. In a mobility-enabled
network, the actual location of a host is determined by the current attachment
not traced by the D-MSA at the calling party's domain. Therefore, two phases
are required for setting up a connection in our approach: The first phase is to
discover the current location of a host, and the second phase is to setup the
connections. For setting up a connection, the requests are always sent to the
D-MSA of the caller's domain. When a caller's D-MSA receives a connection
request with the names {caller, callee}, it checks the name of the callee in the
GMA to examine whether it is in the same domain as the caller. If both the
caller and the callee are in the same domain, the caller's D-MSA can handle
the request with the intradomain routing and call setup procedure as described
in Section VII.C. Otherwise, the caller's D-MSA multicasts the request to all
other D-MSAs in the network. This multicast message is extended to include
the distance information from the caller to each of the border nodes in the caller
domain. This distance information can be used for the D-MSA at the callee's
domain to perform "destination" routing computation as described in the next
section.
A. Routing and Connection Setup in Wide Area ATM Networl<
One of the most important routing issues is to find the shortest path from
the source to the destination. The data transmission could be bidirectional
where the exact meaning of source and destination is blurred. For simplicity,
we assume source as the caller and destination as the callee of a connection
in later discussions. In most of the hierarchically organized network routing
protocols, source-based routing algorithms such as hierarchical OSPF [22] for
the Internet and PNNI [16,20] for ATM networks are used. In these protocols,
the source router or switch normally has the detailed topological information
923
Host a
Domain A
Domain E
FIGURE 8
of its own domain and the summarized information of other domains. The
nonoptimal path problem arises when the source does not know the detailed
topology in the destination domain. The reason is that the shortest paths from
a source to different hosts in another domain may not necessarily travel to the
same border node at the destination domain. Figure 8 shows a multiple-domain
ATM network, where the capital letters. A-E denote domain names, the digits
1-5 denote switch ID, and the lower case letters a-c denote host names. For
instance, A means switch n at domain A, D-MSA.X represents the D-MSA at
domain X. Assume the source host a is in domain A, and the two destination
hosts b and c are in domain E. If we use the number of hops as the distance
of a path from a source to a destination, the shortest path between host a and
hostfcis ^fo = {Ai, Ai, ^5, Bi, Di, 2^25 2} and the distance between host a and
host b is \ab\ = 6. The shortest path for route "ac = {Ai, A2, C4, Ci, C3, C2, 1}
travels through domain C instead of domain B and D used for path ab. The
distance between host a and host c is \ac\ = 6. From this example one can find
out that if the internal topology of domain E is not known by the MSA.A, there
is no way for MSA.A to determine how to find a shortest path to an internal
host in domain E.
B. ^^Destination" Routing and Connection Setup
To solve the suboptimal route problem, we propose using a "destination" routing scheme, where the D-MSA at the destination domain is responsible for
setting up the connection from the source to the destination. The process for
setting up a connection has two phases, namely, the host discovery phase and
924
NGOH AND LI
the connection setup phase. The procedures of the two phases are described as
follows:
Host discovery phase has four steps:
(1) Connection request is sent to the D-MSA of the source domain.
(2) The source D-MSA looks for the destination host in the GMA of the
source domain. If the source and the destination are in the same domain, the
D-MSA of the source is responsible for setting up the connection by using
intradomain routing and connection setup protocol described in Section VII.C.
Otherwise go to step 3.
(3) The D-MSA sends an extended connection setup(S, D) message to
all the D-MSAs in the network. S and D stand for source and destination
respectively. The setup(S, D) message has the distance information from the
source to each of the border nodes in the source domain.
(4) When the setup(S, D) message reaches the destination domain, the
D-MSA at the destination domain obtains the distance information from the
source host to the border nodes at the source domain, the detailed topology
information of its own domain, and the summarized distance information of
the intermediate domains. With all the information mentioned above, the destination D-MSA is able to compute the shortest path from the source to the
destination.
Connection setup phase has four steps:
(1) The D-MSA of the destination domain computes the shortest path
(SP) from the source to the destination by using the Dijkstra algorithm and
constructs a set of routing contracts for each of the D-MSAs in the domains
that the SP traverses. The routing contracts specify the requirements for setting
up the partial paths within the involved domains.
(2) The D-MSA of the destination domain sends the set of routing contracts to all the D-MSAs of the domains over which the SP traverses; the message
is called setup (contract). The contracts are encapsulated into a signaling packet
and the packet is transmitted from the destination D-MSA to the source D-MSA
along the signaling path that connects all the involved D-MSAs in the domains
that the SP crosses through. Each D-MSA takes the contract designated for its
own domain and forwards the rest to its upstream D-MSA.
(3) Upon receiving the routing contracts,
The D-MSA of the destination domain sets up the connection from
the border node to the destination host and sets up the interdomain connection
to the border node of its upstream domain on the SP.
All the intermediate D-MSAs set up partial paths from the upstream
border node to the downstream border node on the SP in their domains. They
also set up the path from the downstream border nodes at their upstream
domains to the upstream border node at their own domain.
The source D-MSA is responsible for setting up connections from
the source to the border node on the SP at the source domain.
(4) A response message is sent from the destination D-MSA just after
the setup (contract) message. The response message travels along the same
route as the setup (contract) message. The function of the response message is to wait for each of the D-MSAs completing the tasks specified in
925
Host a
MSA.A
MSA.B
MSA.C
MSA.D
MSA.E
Hostb
Setup(a,b)
GMA Query
Setup(a,b)^
Setup(a,b)
Setup(a,b)
Setup(a,b)
GMA Query
GMA Query
[OMA Query J
GMA Query
Routing
^etup(Contra.)
Setup(Contra.)
Setup(Contra.)
Setup path
\4
H A2-Al-a
Setup path
Setup path
B2-B5-A2
Setup path
D2-D1-B2
b-E2-D2
.Response
^Response
Response
FIGURE 9
Response
the contract. When the response message reaches the source D-MSA, it will
be forwarded to the source host to confirm the success of the connection
request.
Let's describe the connection setup process by using an example that sets
up a connection from host a to host c in the multiple-domain network in
Fig. 8. In this network, there are five domains (i.e., domains A-E) and each
of them has a D-MSA that controls the connection setup in its domain. Each
domain has a couple of switches and host a is linked to switch Ai of domain
A and host b is linked to switch 2 in domain . We describe this connection
setup by using an interaction diagram with signaling descriptions.
As shown in Fig. 9, host a sends a setup (a, b) message to the MSA. A. MS A. A
look up host b in the GMA of its domain. Since host b is not in domain A, a
multicast message will be sent to MSA.B, MSA.C, MSA.D, and MSA.E. Then
MSA.E finds out that host b is in domain after querying the GMA. MSA.E
computes the shortest path from a to b and constructs the routing contracts
for MSA.D, MSA.B, and MSA.A. The setup (contract) message is sent along the
signaling path {MSA.E, MSA.D, MSA.B, MSA.A, a}. When MSA.E completes
setting up the partial path {fo, 2, D2}, it will send out the response message
toward host a along the signaling path {MSA.E, MSA.D, MSA.B, MSA.A, a].
The response message waits for each MSA on the route to complete its contract
and then is forwarded to the upstream MSA. When host a receives the response
message, it can start to transmit data to host b.
926
NGOH AND LI
m
"^ ^^^^ "^ 2cd(R(Gl)
+ (5/) + ^s + X I ^ ^ / '
i=\
where R[Gi) is the radius of the domain in which the D-MSA causes the longest
delay for implementing the setup (contract), 5/ is the distance from the upstream
border node of domain / to the downstream border node at the domain /'s
upstream domain, and ^s is the time for setting up a connection in a single
switch.
The "destination" routing and connection setup scheme described here is
different from other ATM network routing protocols such as the PNNI [16].
For a start, this scheme integrates multicast group management, routing, and
connection setup functions into one. It also guarantees to find the "shortest"
route for setting up connections in a wide area ATM networksomething
927
which PNNI cannot achieve [20]. Furthermore, since the proposed routing and
connection setup scheme uses D-MSAs to exchange Unk state information and
to compute routes, it effectively reduces the burden on the switches, which
were not designed for performing heavy routing computation and storing large
amounts of link state information. One may argue that using multicast messages
among D-MSAs for connection setup is too heavy in the proposed routing and
connection setup protocol. However, further optimization techniques such as
caching can be applied to the design so that the setup message can be sent
directly to the destination DMSAs of popular destinations, without relying on
multicasting a "query" message first.
D. Multicast Connection Setup
The procedure for setting up multicast connections is similar to the unicast
connection setup case. A multicast session consists of one data source and a
number of receivers that form a group. Each multicast session is assigned a
network-wide unique connection identifier that is used to identify this multicast connection over the whole network. When a data source requests to send
data to a group, the source D-MSA assigns a new connectionJd to the multicast session. The source D-MSA multicasts a setup message to all the D-MSAs
in the network. This setup message contains the group name and the unique
connectionJd. After receiving the setup message, each D-MSA looks up the
GMA for the group members. If there are group members in its domain, the
D-MSA performs a "destination" routing and connection setup procedure separately for each of the group members. When any intermediate D-MSA receives
setup contract message for its domain, it compares the connectionJd of the
new connection with the connectionJd of the existing connections node by
node from the downstream border node to the upstream border node in its
domain. If the new connectionJd is the same as an existing connectionJd,
and the upstream border nodes for the new connection are coincident with an
existing connection with the same connectionJd, then this D-MSA attempts
to identify the common node for the two connections. The paths of the two
connections can be merged from the common node to the upstream common
border node such that only one upstream subpath is used for the common path
of the two connections. If all the involved D-MSAs detect common nodes for
the new connections and merge a subpath with the existing ones whenever it
is possible, then multicast trees can be formed for multicast sessions. Figure 10
shows the merging process for an existing connections Q = {c, /?, g, f} and a
new connection C2 = {e^ h^ g, f] that have the node h as their common node
and f as their common upstream border node. The subpath {/?, g, f] of the
new connection is merged with the existing one.
E. MAS Signaling Network
It should be obvious by now that the proposed solution assumes the existence
of signaling and control channels in which communications between an MSA
and other components such as the SBA and switches are under control. These
channels, which are referred to here as the "signaling network", can be provided
928
NGOH AND LI
F I G U R E 10
in a variety of ways. However, given that SingAREN already supports a besteffort IP service to all hosts and switches concerned, it is used here to provide the
required communications. Clearly the performance (i.e., QOS) of this network
will also impact the overall performance of the proposed signaling solution.
Given that this arrangement is adequate for the current implementation, the
real impact is left for future studies.
The initial tests reveal that normally it takes between 1 and 2 for a requested
service to be completed on the SingAREN network over a maximum of four interconnected switches. This typically involves the sending of a service request via
the SBA to the MSA, and for the MSA to calculate and effect a connection to be
setup via the virtual switch(es) before returning an acknowledgement to the requesting host. More studies are needed to verify it with the analytical results presented earlier. No tests involving wireless network have been conducted so far.
929
Although the wireless ATM is still in its infancy stage, many proposals
are being worked out to address the three basic mobility management issues
[28,29]. For the location management, an obvious way is to adopt methods
similar to those used in cellular/PCS systems, which use the Home Location
Registration (HLR) and Foreign Location Registration (FLR). Some valueadded techniques to be applied to the HLR/FLR approach for improving the
performance on address resolution such as caching and hierarchically organizing the address databases have also been proposed [27]. Another approach for
location management is to extend current PNNI signaling mechanisms to integrate location discovery of a mobile terminal with the connection setup. In both
approaches a network is responsible for translating the original destination address (i.e., mobile terminals home address) used by a caller into a topologically
significant address and then reroute the call to the mobile's current location.
The major differences lie in where the address translation is performed in these
schemes. In the extended PNNI approach, address translation is performed by
looking at an additional translation table at the mobile-enhanced ATM switches
in the mobile's home domain while the HLR/FLR approach requires external
services for address translation to be deployed in the ATM network. There are
several shortcomings in these proposed location management schemes: Firstly,
a mobility tracking mechanism is required for the network to trace the movement of each mobile terminal, which means additional tracking and messaging
protocols need to be deployed in the network. Secondly, the signals for setting
up connections always travel to the home location of the mobile terminal to find
out the current location of the callee and then reroute the signal to its current
location. Furthermore, both of the proposals need the mobility-enhanced ATM
switch to interpret the connection setup messages, which makes the signaling
software even more complex.
A. Proposed Approach
An integrated approach as proposed in this chapter, which deals with both mobile terminals and fixed terminals in a similar way, would be more appropriate
for future mobility-enabled ATM networks with large numbers of mobile terminals. It should be clear by now that the proposed multicast service solution
has some important features that make it suitable to support host mobility. For
clarity, these are further elaborated here.
(1) By using a logical multicast group identifier to associate any connection,
hosts can move and be connected freely anywhere in the network.
(2) The leaf-initiated connection request supported via the SBA for both
the sender and receiver ends allow maximum flexibility for each party to move
independently of each other. More importantly (not mentioned previously),
SBA can further support the concept of "advance-join," which allows join
connection requests on a new location to be initiated in advance. Using this
feature in the mobile environment, a mobile host is able to request in advance a
new connection to be negotiated and setup in its next location before the actual
wireless handover takes place.
930
NGOH AND LI
Other
Protocol
Stack
Protocol
Binding
Mobile
\
Service Binding
Agent (MSBAV
Wineless
-coalrol
channel
(3) The multicast group management, which allows for multicast group
member belonging to a same group to discover each other, fits nicely into the
role of location management in a mobile environment. This means that while
other proposals call for the addition of the location management function to be
added to the overall system design, the proposed multicast service has already
an efficient location management built in.
To understand how mobility can be achieved, let's use a simple example of
setting up a point-to-point connection between two mobile terminals followed
by a subsequent movement of one of them, which involves a handover. The
reader should have no problem constructing a multicast example using the
information provided here.
To begin, each MT will register (i.e., power-up, determine the strongest
signal) via its mobile version of SBA (see Fig. 11, the mobile service binding
agent (MSBA)) with its respective MSA by issuing a join (or create) operation
to a logical group name, which also identifies itself uniquely. Each MSA will in
turn distribute this information to the other MSAs in the network via the group
and location management agent (details described in Section VII.B). Whenever
a connection is to be made from a MT to another, the logical group name of the
called party and the logical name of the calling party will be made known to
their respective MSAs. Other information to be supplied to MSA via the MSBA
include the current physical location of the MT and required QOS parameters.
As soon as a suitable route is determined by the routing function, a physical
VC will be setup to connect the two parties, and as soon as the connection is
made, the respective MSBAs will be notified and data can flow. Let's further
93 I
assume that one of the two MTs has been moving and that a stronger signal
"beacon" is detected by the wireless physical layer and triggers a handover. This
handover information, together with the address of the new wireless access port
(AP), will be made known to the MSBA of the MT in question via the wireless
control channel (WCNTL). Once received, the MSBA will initiate an advancejoin request on the new AR The result of this is that a new multicast route will
attempt to be set up from the crossover switch to the new AR Assuming that
the new VC can be set up, the requesting MSBA will be notified, which will in
turn initiate a wireless handover via the MSBA WCNTL.
B. The Mobile Service Binding Agent (MSBA)
It should be clear from the above description that the only difference between
a fixed host and a wireless mobile host in the proposed design is in the MSBA
(Fig. 11). To begin, the MSBA will have an interface to the underlying radio
frequency (RF) layers. In contrast to a fixed SBA, MSBA will perform all the
function of an SBA i.e., in formulating group connection requests on behalf
of the application and communicating with its MSA, but the MSBA will also
include the following additional functionalities, which due to space constraints
are explained briefly here.
(1) It will have an interface (via the WCNTL) to the underlying air interface so that it can be told to initiate a handover as and when the conditions
are satisfied. Furthermore, it should also be able to inform the air interface
the outcome of the higher-level handover and allow the air interface to take
appropriate actions such as to do a switch over to the new AR MSBA should
also be notified when the wireless handover has been completed.
(2) Given that the VCWPI space is managed by the respective MSAs, this
means that whenever a handover takes place the new VCs may be assigned
VCWPI pairs different from the old ones. MSBA is therefore required to provide the necessary solutions to ensure that these newly assigned VCIA/^PI pairs
are used by the applications nonintrusively.
(3) Given that data loss could occur during the handover, MSBA is also responsible for providing the necessary solutions such as a data buffering scheme
in order to maintain the connection QOS. Interested readers are referred to [36]
for a description on such a scheme.
932
NGOHANDLI
ACKNOWLEDGMENTS
The authors acknowledge the generous funding support for SingAREN from the National Science
and Technology Board (NSTB) and Telecommunication Authority of Singapore (TAS). Our special
thanks to Dr. Tham Chen Khong, application manager of SingAREN, for providing the main text
in Section II. Some content of this chapter first appeared in a paper [37] by the authors.
REFERENCES
1. The MBONE Information Page: http://www.mbone.com/.
2. Borden, M., et al. Integration of real-time Services in an IP-ATM network architecture. IETF
RFC 1821, Aug. 1995.
3. Braden, R., etal. Integrated services in the Internet, architecture: An overview. RFC 1633, Sept
1994.
4. Laubach, M. Classical IP and ARP over ATM. IETF RFC 1577, Jan. 1994.
5. Armitage, G. J. IP multicasting over ATM networks. IEEE J. Selected Areas Commun. 15(3):
445-457,1997.
6. Lazar, A. A., Bhonsle, S., and Lim, K. S. A binding architecture for multimedia networks. In
Proceedings of the COST-237 Conference on Multimedia Transport and Teleservices, Vienna,
Austria, Nov. 14-15, 1994.
7. Lazar, A. A., Lim, K. S., and Marconcini, F. Binding model: Motivation and description, CTR
Technical Report 411-95-17. Centre for Telecommunications Research, Columbia University,
New York, Oct. 1995.
8. Lazar, A. A., Lim, K. S., and Marconcini, F. Realizing a foundation for programmability of ATM networks with the binding architecture. IEEE J. Selected Areas Commun.
933
Special Issues on Distributed Multimedia Systems, 1214-1227, Sept. 1996. Also see
http://comet.ctr.columbia.edu/xbind/.
9. Chan, M. C , Huard, J. E, Lazar, A. A., and Lim, K. Service creation, renegotiation and adaptive transport for multimedia networking. In Proceedings of The Third COST 237 Workshop
on Multimedia Telecommunications and Applications, Barcelona Spain, November 25-27,
1996.
10. Ngoh, L. H., Li, H. Y., and Pung, H. K. A direct ATM multicast service w^ith quality of service
guarantees. In Proceedings of the IEEE International Conference on Multimedia Computing
and Systems, June 1996, pp. 54-61.
11. Li, H. Y., Pung, H. K., &: Ngoh, L. H. A QoS guaranteed ATM multicast service supporting
selective multimedia data transmission. In Proceedings of the International Conference on
Computer Communications and Networks (IC3N), Oct. 1996.
12. Zhang, L., et al. RSVP: A new resource reservation protocol. IEEE Network Sept. 1993.
Available at http://www.isi.edu/div7/rsvp/rsvp.html.
13. Braden, R., and Hoffman, D. RAPIRSVP application programming interface^Version 5.
IETF Internet Draft, draft-ietf-rsvp-rapi-OO.ps, May 1997.
14. Newman, P., Hinden, R., Liaw, E C , Kyon, T , and Minshall, G. "General Switch Management
Protocol Specification." Ipsilon Networks, Inc.
15. ATM Forum. ATM user-network interface specification version 4.0, af-sig-0061.000, July
1996.
16. ATM Forum. Private network-to-network interface specification 1.0 (PNNI 1.0), af-pnni0055.000, Mar. 1996.
17. Caceres, R., and Iftode, L. Improving performance of reliable transport protocols in
mobile computing environments. IEEE J. Selected Areas Commun. 13(5): 850-857,
1995.
18. Raychaudhuri D., and Wilson, N. D. ATM-based transport architecture for multiservices wireless personal communication networks. IEEE. / . Selected Areas Commun. 12(8): 1401-1413,
1994.
19. Murty, K. G. Network Programming. Prentice-Hall, Englewood Cliffs, NJ, 1992.
20. AUes, A. ATM Internetworking, white paper. Cisco Systems, Inc., Mar. 1995.
21. Shacham, N. Multipoint communication by hierarchically encoded data. In IEEE Proceedings
oflNFOCOM '92, 1992, pp. 2107-2114.
22. Semeria, C , and Maufer, T. Introduction to IP multicast routing. Internet Draft, 1996.
23. Buford, J. K. Multimedia Systems. Addison-Wesley, Reading, MA, 1994.
24. Widyono, R. The design and evaluation of routing algorithms for real-time channels. Tenet
Group TR-94-024. University of CaliforniaBerkeley, 1994.
25. Yeadon, N., Garcia, E, Hutchison, D., and Shepherd, D. Continuous media filters for heterogeneous internetworking. In IS & T/SPIE International Symposium on Electronic Imaging
Special Session on Multimedia Computing and Networking, San Jose, California, Jan.
1996.
26. Bell, T. E., Adam, J. A., and Lowe, S. J. Technology 1996Communications. IEEE Spectrum
30-41, Jan 1996.
27. Cho, G., and Marshall, L. E An efficient location and routing scheme for mobile computing
environment. IEEE J. Selected Areas Commun. 13(5): 868-879, 1995.
28. Raychaudhuri, D. Wireless ATM networks: Architecture, system design and prototyping. IEEE
Personal Commun. 42-29, Aug. 1996.
29. Ayanoglu, E., Eng, K. Y, and Karol, M. J. Wireless ATM: Limits, challenges, and proposals.
IEEE Personal Commun. Aug. 1996.
30. ATM Forum. LAN emulation servers management specification 1.0, af-lane-0057.000. Mar.
1996.
31. Honcharenko, W , et al. Broadband wireless access. IEEE Commun. Mag. 20-26, Jan. 1997.
32. Ngoh L. H., et al. SingARENThe Singapore advanced research and education network.
IEEE Commun. Mag. 38(11): 74-82, 1998. Special issue on Advanced Telecommunication
Infrastructures in Asia. See also http://www.singaren.net.sg.
33. Singapore ONE: http://v^rww.s-one.gov.sg.
34. Deering, S., and Hinden, R. Internet protocol version 6 (IPv6) specification. RFC 1883, Dec.
1995.
934
NGOHANDLI
35. Tennenhouse, D. L., and Wetherall, D. J. Towards an active network architecture. ACM
Comput. Commun. Rev. 5-18,1997.
36. Li, H. Y., et al. Support soft hand-over in wireless ATM networks using mobiUty enhanced multicast service. In International Workshop on Wireless ATMy China, Feb. 1997. Also available
from the authors.
37. Ngoh, L. H., et al. An integrated multicast connection management solution for wired and
wireless ATM networks. IEEE Commun. Mag. 35(11): 52-59, 1997.
INDEX
935
936
Artificial intelligence (continued)
ease of use, 765, 767
filtering capability, 765,
767-769
function, 757
impact on existing
organization, 769
installations and
performance, 763-765
reliability, 765, 767
solutions for business
needs, 758-759
user classes, 764
weak information systems for
data management
cost considerations, 772
Internet implementation,
773-774
rationale, 771-772
strong information system
comparison, 772-773
data mining classification
utilization, 70-72
database technology prospects,
27-28
fixed-weight networks,
28
intensive care unit intelligent
monitoring system
database and expert system
integration
advantages, 838-839
communication integration,
839-842
enhanced expert system,
839
intelligent/deductive
databases, 839-840
temporal databases,
842-843
implementation
bedside communications
controller, 845
central workstation, 845,
849-850
communication between
controllers, 846
communication mechanism
in database-expert
system, 852-856
data acquisition and
time-dependent aspects,
846-849
data acquisition system,
845-848
database structure,
850-852
INDEX
937
INDEX
asynchronous transfer
mode layer, 866-867
C-plane, 867
overview, 860-861, 864
physical layer, 865
plane management, 867
specifications, 863-864
topology, 863
traffic management in
networks
policing functions, 868
service categories, 867-868
wireless network, see Wireless
asynchronous transfer
mode network
network model
bandwidth-delay product,
698
input-output dynamics
modeling with classical
control approach,
700-701
per-flow queuing advantages,
698-700
store-and-forward packet
switching, 697-698
service classes, 695-696
SingAREN high-speed
applications, see Singapore
Advanced Research and
Education Network
virtual circuits, 695, 698
ATM network, see Asynchronous
transfer mode network
Authorization, see Access control
AUTONET, network evaulation,
299
METALXASE, 485-486
object classification,
481-484
PACKAGE-PENCIL object
class, 484-486
property object, 482-484
rationale, 477, 479
window interfaces, 480
relational database management
system generation,
455-456
structure, 422-423
types, 421-422
Binary coded decimal (BCD), data
compression, 273
Bitmap, text compression for
information retrieval
compression, 624-631
hierarchical compression,
625-628
Huffman coding combination
with run-length coding,
628-631
overview, 622
run-length coding, 624-625
usefulness in information
retrieval, 622-624
Block-to-block coding, data
compression, 244
BNF, see Backus-Naur Form
BOM, see Bill of materials
BPM, see Ballistic particle
manufacturing
Branch X-change (BXC),
topological design of data
communication networks,
312-313
BXC, see Branch X-change
938
Compression (continued)
information theory
definitions, 1?>6-1?>%
theorems, 238-239
information types, 234
Hst update algorithms, H1-T7Z
lossless compression, 235-236
modeling in compression
process
adaptive models
performance, 242
problems, 243
theorems, 243
classification of models
adaptive model, 235
semiadaptive scheme, 235
static modeling, 235
static models
conditioning classes
problem, 241-242
finite-context models,
239-240
grammar models, 241
Markov models, 240-241
rationale, 233-234, 573-574
run length encoding, 271-272
statistical coding
adaptive arithmetic coding,
255
arithmetic coding, 250-254,
602-604
block-to-block coding, 244
Huffman coding
adaptive Huffman coding,
254-255
array structure creation
algorithm, 247-248
binary prefix codes as
Huffman codes, 246-247
external path length, 247
memory requirements, 247
redundancy of codes,
248-250
sibhng property of trees,
246
steps, 245-246
Shannon-Fano coding,
244-245
variable-to-variable coding,
244
universal coding
Ehas codes, 269-270
Fibonacci codes, 270
Neuhoff and Shields code,
271
Noiseless Coding Theorem,
269
INDEX
939
INDEX
940
Data mining (continued)
generalized records, 49-51
support degree of
generalized record and
monotonicity, 50-51
fuzzy domain knowledge
fuzzy ISA hierarchy
transformation into fuzzy
set hierarchy, 52-54
hierarchies, 51-52
informativeness of
generalized records
informativeness measure,
61-63
overview, 58-59, 61
Shannon's entropy, 61
overview, 48
process of data
characterization, 54-58
Data structure
definition, 368
ideal characteristics, 369
linear versus nonlinear, 368
Data warehousing
abstraction levels, 31
architecture of systems, 74-75
client/server network, 32-34
distributed system, 34
sources of data, 31
system types, 32
three-tier approach, 34
DBMS, see Database management
system
Debt ratio, financial analysis, 503
Decision-support system (DSS),
components, 35-36
Deductive data model
chaining, 10-11
data storage, 12
history of development, 12
search path, 10-11
Deductive and object-oriented
database (DOOD)
declarative programming,
23-24
history of development, 8
object identity, 22-23
Delphi, fuzzy query processing, see
Fuzzy query processing
Delta-reliable flow
channel definition, 786-787
edge-5-rehable flow
assumptions and notation,
800
capacity of a cut, 801-803
definition, 800-801
maximum flow, 803-804
INDEX
vertex-5-reliable flow
capacity of a mixed cut,
806-807
definition, 805-806
maximum flow, 808-810
DESNET
functions, 300-301
network design tool, 300
operation, 301-304
Dictionary, text compression for
information retrieval
permuted dictionary, 608-609
prefix omission method,
607-608
suffix truncation, 608
Dictionary coding
adaptive coding algorithms
LZ77, 265-267
LZ78, 267-268
LZSS, 267
LZW, 268-269
definition, 255
static dictionary coding
almost on-line heuristics,
263-265
applications, 256
edge weights, 256-257
general dictionary, 258
heuristics, 257-258
on-line heuristic algorithms,
261-263
optimal and approximate
algorithms, 258-260
text compression for
information retrieval,
604-607
Distance learning, see Multimedia
Micro-University
Distributed database
business applications, 217
classification, 19
design, 18-19
fragmentation schemes, 19
fuzzy queries in distributed
relational databases, see
Fuzzy query processing
DoDi, features, 779-780
DOOD, see Deductive and
object-oriented database
DSS, see Decision-support system
941
INDEX
Statements
balance sheet, 500-501
cash flow statement, 501
income statement, 501
interpretation, 502-503
First-come-first-serve (FCFS),
multidatabase system query
optimization, 152-153, 158
Flow
data, see Data flow
visualization
classification of techniques,
540
examples, 541-542, 544
prospects, 546-547
segmentation of features, 541,
544-546
visualization programs, 541
Force display, see Haptic interface
Fused deposition modeling
(FDM), rapid prototyping
and manufacturing, 375
Fuzzy query processing
a-cuts operations for translation
convex versus normal fuzzy
sets, 207
examples using relational
database systems,
211-213
Fuzzy Structural Query
Language syntax,
208-211
trapezoidal fuzzy number,
208, 231
distributed relational databases
components of system
failed data generator,
230
Fuzzy Structural Query
Language statements,
229
Fuzzy Structural Query
Language translator,
229-230
remote database
connectors, 230
structural query
language-type database
management
environment, 230
data estimation
closeness degree between
fuzzy terms, 217-218,
228
failed attribute value
estimation, 219,
221-225, 227
Haptic interface
approaches
exoskeleton-type force
display, 551-552
object-oriented-type force
display, 552
tool handling-type force
display, 550-551
criteria, 553
definition, 550
development tools, 552-553
history of development, 550
library for haptics system, see
LHX
Harvest search engine, see Search
engine
Hewlett-Packard Graphics
Language (HP/GL),
computer-aided design system
interface, 380-381,404
HIPERLAN
asynchronous transfer mode
network configuration for
overlay, 876-877
bandwidth allocation, 879-880
channel access control layer,
873-874, 877-879
modifications for network
operation, 877-879
multiaccess control, 861, 872,
874-875, 877-879
objectives of standard, 872
origins, 872
physical layer, 872-873
protocol reference model, 872,
877-879
simulation results, 880-881
traffic policing limitations,
875-876
HP/GL, see Hewlett-Packard
Graphics Language
HTML, see Hypertext markup
language
Huffman coding
adaptive Huffman coding,
254-255
array structure creation
algorithm, 247-248
binary prefix codes as Huffman
codes, 246-247
external path length, 247
memory requirements, 247
redundancy of codes, 248-250
sibling property of trees, 246
steps, 245-246
text compression for
information retrieval
algorithm yielding of optimal
code, 583-585
average codeword length
minimization, 582-583
binary forests for reducing
partial-decoding table
number, 589-594
bit reference elimination,
585-588
canonical Huffman codes,
596-598
codes with radix greater than
2,594-595
combination with run-length
coding, 628-631
decompression speed, 585
942
Huffman coding (continued)
entropy of probability
distribution, 583
information content, 583
partial-decoding tables,
588-589
skeleton trees for fast
decoding
construction, 599-601
decoding, 598-599
space complexity, 601-602
Hyperrelation (R^)
definition, 130
operations
examples, 148-149
H-DIFFERENCE, 145-146
H-INTERSECTION, 146
H-JOIN, 144-145
H-PRODUCT, 146-147
H-PROJECTION, 143-144
H-SELECTION, 141-143
H-UNION, 145
overview, 139-141
transformation of operations,
147-148
schema and mapping
attribute-versus-attribute
conflict, 135-136
attribute-versus-table conflict,
136-137
table-versus-table conflict,
138-139
value-versus-attribute
conflict, 137
value-versus-table conflict,
137-138
value-versus-value conflict,
135
structure, 133-134
Hypertext markup language
(HTML), multimedia
database system
implementation approach,
114-117
Hypervolume
applications, 524
attribute volume, 521, 524-525
definition, 520-521
features, 523-524
geometric volume, 521,
524-525
mathematical representation,
521-523
INDEX
Exchange Specification
IMMPS, see Intelligent
Multimedia Presentation
System
Income statement, financial
analysis, 501
INDACO
dam monitoring system
function, 755, 757
popularity, 758
validation and processing of
data, 757-758
Information retrieval
coordinates of word occurrence,
576-577
data compression for text
retrieval
alphabet, 579
arithmetic coding, 602-604
bitmap
compression, 624-631
hierarchical compression,
625-628
Huffman coding
combination with
run-length coding,
628-631
overview, 622
run-length coding, 624-625
usefulness in information
retrieval, 622-624
codes
binary tree relationships,
581-582
codewords, 579-581
fixed-length code, 579
instantaneous code, 581
prefix code, 581-582
uniquely decipherable code,
580-581
concordance
coordinate of occurrence,
609-610
decoding, 614-616
encoding, 614
frequent value encoding,
612
length combination
encoding, 613-614
model-based compression,
618-622
parameter setting,
616-618
sentence length, 610
variable-length fields,
611-618
dictionary
permuted dictionary,
608-609
prefix omission method,
607-608
suffix truncation, 608
dictionary coding, 604-607
Huffman coding
algorithm yielding of
optimal code, 583-585
average codeword length
minimization, 582-583
binary forests for reducing
partial-decoding table
number, 589-594
bit reference eUmination,
585-588
canonical Huffman codes,
596-598
codes with radix greater
than 2, 594-595
decompression speed, 585
entropy of probability
distribution, 583
information content, 583
partial-decoding tables,
588-589
skeleton trees for fast
decoding, 598-602
overview, 573-579
test systems, 578
inverted files, 576-57^
keywords and variants, 575
negative keywords, 576
query with distance constraints,
575-576
text system partitioning,
574-575
World Wide Web, see Search
engine
Information system, history of
development, 6-7
Information theory, data
compression
definitions, 1?>6-12>%
theorems, 238-239
Initial Graphics Exchange
Specification (IGES),
computer-aided design system
interface, 379-380
Intelligent Multimedia
Presentation System (IMMPS)
database architecture, 337-338
data mining, 339
design and implementation,
350-353
directed acyclic graph, 335-336
frame attributes, 338
943
iNDe<
knowledge presentation, 334
navigation, 334-336
presentation windows, 336-337
rationale for development,
333-334
resource object layer, 338-339
reuse of presentation script,
340
storage management, 340
Intensive care unit (ICU)
information sources and
relations, 826
intelligent monitoring system
database and expert system
integration
advantages, 838-839
communication integration,
839-842
enhanced expert system,
839
intelligent/deductive
databases, 839-840
temporal databases,
842-843
implementation
bedside communications
controller, 845
central workstation, 845,
849-850
communication between
controllers, 846
communication mechanism
in database-expert
system, 852-856
data acquisition and
time-dependent aspects,
846-849
data acquisition system,
845-848
database structure,
850-852
OPS/83 programming, 844,
846, 852-853
integration of intelligence,
836-838
network structure, 828-830
rationale, 827
real-time information
management
critical limit time
reasoning, 833
interruptible reasoning, 834
nonmonotonous reasoning,
834
profound reasoning, 836
reasoning overview,
831-832
reflexive reasoning,
834-835
requirements, 830-831
superficial reasoning,
835-836
temporal reasoning,
832-833
uncertainty and reasoning,
835
standards for communication
between medical devices,
830
monitoring system
requirements, 826-827
patient recovery, 825
Internet, see World Wide Web
Internet Explorer
availability, 664
displays, 665
home page setting, 665
image disabUng, 670
launching, 665
search tips and features, 667^
669
toolbar, 666
unique features, 667
Internet Server API (ISAPI)
overview, 639-640
performance, 642-643
Inventory control
economic purchase quantity
equations, 425-426
inventory types, 424
item types, 425
lot-sizing rules, 425
policies, independent versus
dependent, 425
Inverted file, information retrieval,
576-578
IRIS, 119-120
ISAPI, see Internet Server API
Jasmine/C
aggregate functions, 85
comparison with other
programs
advantages, 119
GemStone, 119
IRIS, 119-120
02,119
ORION, 119
object-oriented database
programming language
overview, 81
procedural attributes, 87
query language, 81-86
944
LHX (continued)
haptic Tenderer, 5SS
haptic user interface, 557
HapticWeb utilization, 571
implementation, 5S6
model manager, SS5-5S6
primitive manager, 556
shared haptic world, 570
surgical simulators, 570
three-dimensional shape design
using autonomous virtual
object
autonomous free-form
surface manipulation
enemy avoidance, 568
food reaching, 568
functions of free-form
surface, 565-566
restoration, 566, 568
usability study, 569
interaction w^ith autonomous
virtual objects, 564
shape design and artificial life,
563-564
tree-like artificial life, direct
manipulation, 56A-565
visual display manager, 556
Link, types, 296
Link cost, equation, 293
Liquidity ratio, financial analysis,
502-503
LMI, see Layer manufacturing
interface
Local-area network (LAN)
asynchronous transfer mode
fixed networks
advantages, 860
protocol reference model
asynchronous transfer
mode layer, 866-867
C-plane, 867
overview, 860-861, 864
physical layer, 865
plane management,
867
specifications, 863-864
topology, 863
traffic management in
networks
policing functions, 868
service categories, 867-868
wireless network, see Wireless
asynchronous transfer
mode network
definition, 295
metropolitan-area network
interconnection, see
INDEX
Connectionless server,
ATM-based B-ISDN
LZ77, data compression, 265-267
LZ78, data compression, 267-268
LZSS, data compression, 267
LZW, data compression, 268-269
945
INDEX
requirements, 467-468
shop floor control system,
471-475
production planning and
control, 459-461
rationale for use, 457-^58
scopes of object orientation,
492-494
production requirements, 418
relational database management
application
aggregation view of
subassembly, 453
bill of materials generation,
455-456
metaplanning concept, 452
module design, 452
net requirement computation,
454-455
prototype system, 455
surrounding information and
database systems
clusters of information for
integration, 445-446
computer-aided design,
446-448, 450-451
enterprise resource planning,
445
metadatabase approach,
446-448
Petri net theory, 452
supply chain management,
445
Master production scheduling
(MPS)
delivery promise, 424
final assembly schedule, 423
rough-cut capacity planning,
423
scheduled receipts, 424
time fences, 424
Material requirements planning
(MRP)
calculations, 428
components, 426
computerized systems, 428-429
example, 427
gross requirements, 427
net requirements, 427
planned order, 426-427
scheduled receipts, 427
Maximum merge scheduling
(MMS), multidatabase system
query optimization, 161-165
MDBMS, see Multimedia
database management system
MDBS, see Multidatabase system
vertex-m-route flow
definition, 816-817
maximum flow, 817, 819-821
MRP, see Material requirements
planning
MRPII, see Manufacturing
resource planning
MSA, see Multicast service agent
Multicast service agent (MSA),
SingAREN
connection management agent,
919-920
group management agent,
916-917
multiple multicast service agent
scale-up to large networks
destination routing and
connection setup,
923-925
domain multicast service
agent, 922
multicast connection setup,
927
performance analysis for
destination routing and
connection setup,
926-927
routing and connection setup
in wide area ATM
network, 922-923
signaUng network, 927-928
performance analysis for
connection setup, 920-921
resource management agent,
915-916
routing agent
node joining and leaving of
multicast trees, 918-919
QOS-based routing, 917-918
Multidatabase system (MDBS)
query optimization
algebra level optimization
flexible relation approach
comparison, 149-150
hyperrelations, 130,
133-134, 170
information capacity, 131
least upper bound,
130-133, 170
multirelational approach
comparison, 150-151
schema conformation, 130
schema conformation and
mapping, 134-139
execution strategy level
optimization
assumptions, 154
946
Multidatabase system (continued)
first-come-first-serve
strategy, 152-153, 158
greedy scheduling strategy,
153, 158-159
intersite operation site,
154-156
maximum merge scheduling
strategy, 161-165
participating database
system function,
151-152, 170-171
participating database
system workload sharing
with multidatabase
system, 159-161
performance study
comparing strategies,
165-169
reduced join graph, 157
sorting and merging
approach, 153
traditional database system
differences, 156-157
hyperrelational operations
examples, 148-149
H-DIFFERENCE, 145-146
H-INTERSECTION, 146
H-JOIN, 144-145
H-PRODUCT, 146-147
H-PROJECTION,
143-144
H-SELECTION,
141-143
H-UNION, 145
overview, 139-141
transformation of
operations, 147-148
levels
algebra level, 125
execution strategy level,
125-126
operation level, 125
schema level, 124
semantic inquiry
optimization levels, 125
overview, 123-124
schema conflicts
attribute-versus-attribute
conflict, 128-129
attribute-versus-table conflict,
129
hyperrelation schema and
mapping
attribute-versus-attribute
conflict, 135-136
attribute-versus-table
INDEX
conflict, 136-137
table-versus-table conflict,
138-139
value-versus-attribute
conflict, 137
value-versus-table conflict,
137-138
value-versus-value conflict,
135
table-versus-table conflict,
129-130
value-versus-attribute
conflict, 129
value-versus-table conflict,
129
value-versus-value conflict,
127
semantic discrepancy, 126-127
Multilevel access control, see
Access control; Wireless
asynchronous transfer mode
network
Multimedia database management
system (MDBMS)
applications, 118
education and training
applications, see Intelligent
Multimedia Presentation
System; Multimedia
Micro-University
implementation approach
agents and related work,
109-110
program components,
117-118
text, 114-117
video, 111-114
maintenance approaches, 329
multimedia presentations, 328
networked multimedia
application issues,
104-105, 329, 331
overview, 2 1 , 327-328
prospects for development, 120
requirements, 103-104
reusability issues, 331-333
synchronization, 329-331
system architecture
control operators, 107-108
multimedia, 105
overall architecture, 108-109,
330
structural operations,
105-106
temporal and spatial
operations, 106-107
visual query, 333
Multimedia Micro-University
(MMU)
architecture of system, 342-343
design and implementation,
353-355, 359
goals, 341-342
prospects, 350
Web document database
BLOB objects, 347-348
document references,
346-347
duplication of documents,
348-349
layers, 344-345, 349
physical location of
documents, 348
rationale, 343
reusability of documents, 347
tables
annotation table, 346
bug report table, 346
implementation table,
345
script table, 345
test record table, 346
Multipoint network
definition, 295
topology, 295-296
Nearest-neighbor gridding,
scattered data, 536
Netscape
Navigator/Communicator
availability, 664
displays, 665
home page setting, 665
image disabling, 669
launching, 665
Net Search features, 667-668
search tips, 667
Smart Browsing, 668-669
toolbar, 666
unique features, 666
Netscape Server API (NSAPI)
performance, 642-643
Service Application Function
execution, 638-639
Network failure, see Data flow
Node
costs, 296
reliability, 296
Noiseless Coding Theorem, data
compression, 269
Nonuniform rational B-splines
(NURBS) volume
derivatives, 525-527
947
INDEX
generation
interpolated volume, 527-531
swept volume, 531-535
NSAPI, see Netscape Server API
NURBS volume, see Nonuniform
rational B-splines volume
948
OPNET, network evaluation, 300
ORION, Jasmine comparison, 119
INDEX
949
INDEX
LDV, 186
MLR data model, 187-188
multilevel relational data
model, 184-185
Sea View, 185-186
Web gateways
Cookies and state
management issues, 653
database connection
persistence problem, 654
generic interfaces
dynamic SQL capability,
644
Java DataBase
Connectivity, 645-646
Open DataBase
Connectivity, 645
SQL Call-Level Interface,
645
historical perspective,
643-644
measurements of time
response for different
scenarios, 648-653
protocols and interprocess
communication
mechanisms, 646-648
template-based middleware,
654-655
Relational data model
design approaches, 9
engineering application, 37
history of development, 7-8
integrity constraints, 8-9
normalization of source tables,
9
overview, 507
Reliability, networks,
296-297
Reliable data flow, see Data flow
Rendering, volumetric data
direct volume rendering,
539-540
hexahedron method, 538
image-order algorithms, 539
object-order algorithms, 539
tiny cube method, 538
vanishing cube method,
538-539
voxel analysis, 539
Resource planning, manufacturing
resource planning, 430
, see Hyperrelation
Rough-cut capacity planning
(RCCP), manufacturing
resource planning, 423,
430-431
concurrency control
architectures, 196-197
protocols
timestamp-ordering
algorithms, 197-199
two-phase locking
algorithms, 197-198
scope of problem, 194, 196
problem components
availability, 176-177
integrity, 176-177
secrecy, 176
Selective laser sintering, rapid
prototyping and
manufacturing, 371-372
Semantics
definition, 126
discrepancy in multidatabase
systems, 126-127
Servlet Java API, overview,
640-641
Shannon-Fano coding
data compression, 244-245
text compression for
information retrieval, 619
SIDRO, topological design of data
communication networks,
305,310-311
SIF, see Solid Interchange Format
Simulated annealing (SA),
topological design of data
communication networks,
314,322
Singapore Advanced Research and
Education Network
(SingAREN)
collaborative research,
903-904, 906
high-speed network advanced
applications, 904-906
host mobility support
connection management,
928-929
handover management,
928-929
integrated approach, 929-931
location management,
928-929
mobile service binding agent,
930-931
infrastructure of network, 903
IP multicast backbone, 905,
907
multicast service agent
advantages, 931-932
connection management
agent, 919-920
950
Singapore Advanced (continued)
group management agent,
916-917
multiple multicast service
agent scale-up to large
networks
destination routing and
connection setup,
923-925
domain multicast service
agent, 922
multicast connection setup,
927
performance analysis for
destination routing and
connection setup,
926-927
routing and connection
setup in wide area ATM
network, 922-923
signaling network,
927-928
performance analysis for
connection setup,
920-921
resource management agent,
915-916
routing agent
node joining and leaving of
multicast trees, 918-919
QOS-based routing,
917-918
objectives, 902-903
premium network service
application-oriented traffic
aggregation, 910
connecting sites, 908
design
component overview,
913-914
goals, 909
service binding agent,
914
virtual switch, 914-915
host mobility support, 910
multicast properties, 909
rationale for development,
907
research contributions
dynamic logical multicast
grouping, 912-913
multicast as basis for all
connections, 912
open signaling, 911-912
scalable design, 910-911
signaling solution
coexistence, 910
INDEX
951
INDEX
comparison of approaches,
320-322
concave branch eUmination,
313
cut saturation, 314, 320
genetic algorithm, 314-315,
321-322
MENTOR, 314
numerical applications,
317-322
simulated annealing, 314, 322
tabu search
definition of moves,
316-317
performance, 320-322
principles, 315-316
hierarchy of networks, 289-290
notations, 292
overview of approaches, 291
representation of networks,
297, 299-300
Transport control protocol (TCP)
capabilities, 891
wireless asynchronous transfer
mode local-area network
support
overview, 861, 891
performance issues
fixed networks, 891-892
packet retransmission
effects, 895
traffic load effects, 895
wireless networks, 893-894
wireless link behavior,
892-893
TS, see Tabu search
Universal coding
Ehas codes, 169-170
Fibonacci codes, 270
Neuhoff and Shields code, 271
Noiseless Coding Theorem, 269
User interface
database technology prospects,
30-31
dialogue-based applications, 31
text-based applications, 30-31
Vertex-w-route flow
definition, 816-817
maximum flow, 817,
819-821
Video, multimedia database
system implementation
approach, 111-114
Virtual reality (VR), rapid
prototyping and
manufacturing
virtual prototypes, 410-411
Virtual Reality Modeling
Language development
and advantages, 411-412
Volumetric data
flow visualization
classification of techniques,
540
examples, 541-542, 544
prospects, 546-547
segmentation of features, 541,
544-546
visuaUzation programs, 541
gridding methods of scattered
data, 536-538
haptic data, see Haptic
interface; LHX
hypervolume
applications, 524
attribute volume, 521,
524-525
definition, 520-521
features, 523-524
geometric volume, 521,
524-525
mathematical representation,
521-523
modeling
multiresolution modeling,
519
overview, 519
rapid prototyping and
manufacturing, 412-414
scattered data modeling,
519-520
nonuniform rational B-splines
volume
derivatives, 525-527
generation
interpolated volume,
527-531
swept volume, 531-535
rendering methods, 538-540
scientific data visualization
overview, 518
segmentation of features, 520,
544-546
952
Wireless (continued)
location management,
897-898
optimum design
cellular architecture, 881-883
traffic policing in multiaccess
control
adaptive TDMA
performance, 890
air interface based on
adaptive TDMA,
886-887
bandwidth renegotiation,
886
connection patterns,
883-884
objectives, 883
quality of service assurance
mechanism, 887-889
silence and low-bi-rate
connection exploitation,
884-886
protocol reference model
C-plane protocol extensions,
869
fixed networks, 860-861, 864
multiaccess control
requirements, 869-870
radio transmission
considerations, 869
services to be supported,
861-862
timing requirements, 862
transport control protocol
support
overview, 861, 891
performance issues
fixed networks, 891-892
packet retransmission
effects, 895
traffic load effects, 895
wireless networks, 893-894
wireless link behavior,
892-893
World Wide Web (WWW)
browsers, see Internet Explorer;
INDEX
Netscape
Navigator/Communicator
gateway specifications
Common Gateway Interface,
637-638
comparisons between
Common Gateway
Interface and server APIs,
641-643
FastCGI, 640
Internet Server API, 639-640
Netscape Server API,
638-639
programming languages, 642
Servlet Java API, 640-641
Web Application Interface,
639
growth, 664, 689
HapticWeb, 571
Multimedia Micro-University
document database
BLOB objects, 347-348
document references,
346-347
duplication of documents,
348-349
layers, 344-345, 349
physical location of
documents, 348
rationale, 343
reusability of documents, 347
tables
annotation table, 346
bug report table, 346
implementation table, 345
script table, 345
test record table, 346
multimedia database system
implementation approach,
114-117
relational database management
system gateways
Cookies and state
management issues, 653
database connection
persistence problem, 654
generic interfaces
dynamic SQL capability,
644
Java DataBase
Connectivity, 645-646
Open DataBase
Connectivity, 645
SQL Call-Level Interface,
645
historical perspective,
643-644
measurements of time
response for different
scenarios, 648-653
protocols and interprocess
communication
mechanisms, 646-648
template-based middleware,
654-655
robots for data mining
advantages and
disadvantages, 686-687
Letizia, 689
SiteHelper, 687-688
Stanford projects, 688-689
WebWatcher, 689
search engine, see Search
engine
servers
architectures, 655-656
performance evaluation
analytic workload
generation, 657
SPECWeb96, 658-659
trace-driven approach,
656-657
WebStone, 657-658
types, 679
three-tier versus two-tier
architecture, 637
universal acceptance,
635-636
weak information systems for
technical data
management, 773-781
WWW, see World Wide Web