The Architectural Design of Globe: A Wide-Area Distributed System
The Architectural Design of Globe: A Wide-Area Distributed System
The Architectural Design of Globe: A Wide-Area Distributed System
The spectacular growth of the Internet has the potential of connecting a billion
computers together within the next decade into an integrated distributed system
offering numerous applications for science, commerce, education, and entertain-
ment. The hardware and communications infrastructure needed is rapidly being
deployed. However, the software infrastructure is still lacking. We propose a
novel scalability infrastructure for a massive worldwide distributed system.
At present, we are building applications on top of a limited number of com-
munication services. In the Internet, for example, this means that applications
communicate mainly through TCP connections, but otherwise have to implement
all additional services themselves, including services for naming, replication, mi-
gration, fault tolerance, and security.
As an example, consider the World-Wide Web. The Web implements its own
communication protocol, HTTP, on top of TCP. It uses a tailor-made naming sys-
tem based on URLs. Replication is supported in the form of caches that are part
of Web proxies, but cannot be used for other applications as cache coherence pro-
tocols rely on attribute fields of Web pages. Hardly any measures have been taken
to handle broken links and server crashes. Finally, security has been proposed in
the form of an extension to HTTP, but there are also proprietary solutions such
as SSL from Netscape. Other Internet applications such as e-mail and USENET
News each have their own software models and infrastructure, with no common-
ality among any of them.
As a consequence, building new wide-area applications is difficult. First, too
much effort is repeatedly spent on implementing common or standard services
that should already have been there to start with. Second, by using application-
specific services, interoperability between different applications can be difficult
or even impossible.
Instead, we propose a different approach. Rather than developing applications
directly on top of the transport layer, we want to create a software infrastructure
that provides us with a set of common distribution services. The main requirement
is that this infrastructure, or middleware, can scale to support in the order of a
billion users all over the world.
Worldwide Scalability
The real challenge is that we may eventually have to support one billion users,
each having thousands of objects, and requiring services from all over the world.
A worldwide scalable distributed system is capable of offering adequate perfor-
mance in the face of high network latencies, congestion, overloaded servers, lim-
ited resource capacity, unreliable communication, etc. To achieve worldwide scal-
ability we at least need to provide extensive support for partitioning and replicat-
ing objects. 7
Adequate support for scaling techniques is precisely what is lacking in current
middleware. DCOM, DCE, and CORBA do not provide the tools for replicating
objects. In those cases where caching or replication is supported, such as in AFS
and the Web, policies are fixed. However, efficient solutions that scale worldwide
can be found only by taking application-level consistency into account. Again,
this calls for flexibility.
be partitioned and replicated across multiple machines at the same time. However,
processes are not aware of this: state and operations on that state are completely
encapsulated by the object. All implementation aspects, including communica-
tion protocols, replication strategies, and distribution and migration of state, are
part of the object and are hidden behind its interface.
In order for a thread in a process to invoke an object’s method, it must first
bind to that object by contacting it at one of the object’s contact points. A con-
tact address describes such a contact point, specifying a network address and a
protocol through which the binding can take place. Binding results in an interface
belonging to the object being placed in the client’s address space, along with an
implementation of that interface. Such an implementation is called a local ob-
ject. This model is illustrated in Figure 1, which shows a Globe object distributed
across four address spaces.
Distributed
object
A1 A2
Local Address
object space
Network
Contact
point
A3 A4
A5
Control
User-defined
Replication subobject interface
interface
Control
callback
Replication
interface
Semantics
subobject subobject
Communication
subobject
Local object
modular way, to separate issues such as replication and communication from what
the object actually does (i.e., its semantics). We distinguish the following four
subobjects, as shown in Figure 2:
A control subobject handling the flow of control within the local object
These four subobjects are designed for building scalable distributed shared ob-
jects. Of course, we also need support for security and persistence, as well as
other services, which, in our approach, are handled by separate subobjects. As
scalability is the focus of this paper, we discuss only the four listed subobjects
here.
Replication interface
Method Description
start Called to synchronize replicas of the semantics subobjects, obtain
locks if necessary, etc.
invoked Called after the control subobject has invoked a specific method at the
semantics subobject
send Provide marshalled arguments of a specific method, and pass
invocation to local objects in other address spaces
finish Called to synchronize the replicas again, release locks, etc.
In principle, all invocation requests, whether they come from the local client or
from the network, are first passed to the replication subobject before the method
is invoked at the semantics subobject. When the control subobject receives an
invocation request from the local client, it first calls start to allow the replication
subobject to synchronize the copies of the semantics subobject. For example,
the coherence protocol may require that a token is acquired before any method
invocation at the semantics subobject takes place.
The start method returns a set of actions that the control subobject should
take. The return value INVOKE tells the control subobject to invoke the method
at the semantics subobject. Likewise, SEND instructs the control subobject to
pass the marshalled arguments of the invocation to the replication subobject by
subsequently calling send. So, for example, with a replication strategy where
a method has to be invoked at all replicas, an implementation of start may return
INVOKE,SEND , telling the control object to (1) do a local invocation, and (2) pass
the marshalled invocation request so that it can be sent to the other replicas.
The final step is invoking finish, allowing the replication subobject to syn-
chronize the replicas again (if needed). Again, when finish is to be invoked is
determined by the replication subobject, for which it returns FINISH after the in-
vocation of start or send. Invoking finish generally returns RETURN telling the
control subobject that it can pass the return value of the method invocation to the
local client.
A distinctive feature of our model is that we allow method invocations at the
semantics subobject to block on condition failures. For example, appending data
to a bounded buffer may fail when the buffer is full. Concurrent access to the
semantics subobject is controlled by the replication subobject. After invoking a
method at the semantics subobject, the control object always calls invoked, in-
forming the replication subobject whether or nor a condition failure occurred, and
passing control back to the replication subobject. If necessary, the current thread
blocks inside the replication subobject. The replication subobject can then allow
other invocations to take place, which may possibly change the state of the seman-
tics subobject such that the blocked thread can later on continue successively.
Control subobject. The methods of the semantics subobject are always invoked
by the control subobject. This subobject controls two type of invocation requests:
those coming from the local client, and those coming in through the network.
The control subobject is also responsible for (un)marshalling invocation requests
that are passed between itself and the replication subobject. The interface of the
control subobject offered to the local client is the same as the (user-defined) inter-
face of the semantics subobject. In addition, it offers the callback interface to the
replication subobject shown in Table 2.
The Architectural Design of Globe: MAIN TEXT
A Wide-Area Distributed System Page 8 of 22
Table 2: The callback interface of the control subobject as used by the replication
subobject
In general, when a local client invokes a method at the control subobject, the
latter will eventually invoke that method at its local copy of the semantics sub-
object after receiving permission from the replication subobject. Remote invo-
cation requests, that is, requests that have been passed by replication subobjects
in remote address spaces, are eventually passed to the control subobject through
handle request. The control subobject then simply does the local invocation at the
semantics subobject.
Discussion
We have chosen for this organization as it provides the minimum framework for
implementing scalable distributed objects in a flexible way. A key role is reserved
for the replication subobject. In our view, the only way to achieve wide-area
scalability of distributed objects is to concentrate on the distribution of their state.
With the enormous variety of objects, it is clear that a general-purpose, “one-size-
fits-all” distribution policy will never suffice, which calls for per-object solutions.
The main role of our communication subobjects is that they provide a uniform
interface to underlying networks and operating systems concerning their commu-
The Architectural Design of Globe: MAIN TEXT
A Wide-Area Distributed System Page 9 of 22
Process–to–Object Binding
To communicate through a distributed object, it is necessary for a process to first
bind to that object. The result of binding is that the process can directly invoke
the object’s methods. In other words, a local object implementing an interface
of the distributed object is placed in the address space of the requesting process.
Binding itself consists roughly of two distinct phases: (1) finding the distributed
object, and (2) installing a local object. This is illustrated in Figure 3. Finding
a distributed object is separated into a name look-up and a location look-up step;
installing the local object requires that we select a suitable contact address, as well
as an implementation for that interface.
Name
Client process
1
Naming service
2 Object handle
Load &
Installing a
4 instantiate
local object
class(es)
Class(es)
5 Make contact
Local object
binding changes without notice. On the other hand, an object can update its con-
tact addresses at a location service without having to consider under which name
it can be reached by its clients.
We can now remove all location information from names, thus making it eas-
ier to realize distribution transparency. However, we do require a scalable location
service that can handle frequent updates of contact addresses in an efficient man-
ner. We have designed such a service8 and are currently implementing a prototype
version that is being tested on the Internet.
Current Status
We have built an initial prototype implementation of our system, concentrating on
the support for distributed shared objects. Our initial prototype has been imple-
mented in ANSI C. We are currently developing a Java-based implementation.
Interfaces are written in an interface definition language (IDL). The proto-
type has an interface compiler which creates a C header file for each interface
definition. The interface compiler also generates skeletons for (class) object im-
plementations. A skeleton provides the necessary glue to turn a method invocation
on a (local) object into a C function call. The programmer only has to implement
one C function for each method.
The interface compiler also generates composite objects. A composite object
encapsulates a collection of subobjects and allows them to be treated as a single
The Architectural Design of Globe: MAIN TEXT
A Wide-Area Distributed System Page 12 of 22
Java are generated from IDL descriptions. This approach has shown to be highly
effective leading to well-designed subobjects. Nevertheless, the control subobject
currently has to be made by hand, which unnecessarily complicates object con-
struction. It is better to specify the semantics subobject in an Object Definition
Language (ODL), from which, together with IDL descriptions, we can generate
the control subobject. We are currently developing an ODL for Globe.
Being able to implement policies on a per-object basis proved to be highly
effective. For example, because we were initially not interested in persistence,
we used a single database to store the state of different distributed shared objects.
The problem with this approach, which is basically the same as the one followed
in CORBA, was that too many policy decisions had to be implemented outside the
control of the object being stored. Later, we decided to follow more closely the
Globe paradigm, by which each object is in full control of handling its own state.
In our current prototype, each object implements its own persistence facilities, as
well as the policy that go along with it. This approach has turned out to be much
more flexible and, in fact, easier to implement and maintain.
The performance of our prototype, which is currently dominated by the time
it takes for a process to bind to an object, confirmed that the granularity of dis-
tributed shared objects should be relatively large. For wide-area objects, network
speed and delay will additionally determine performance. Granularity is deter-
mined by the size of the semantics subobject. Unfortunately, in our model, a
replication strategy operates on the entire state as contained in this subobject.
This approach is not always appropriate. For example, when a semantics subob-
ject is built from a number of Web pages, including icons, images, etc., we would
like to apply different strategies for different parts of the subobject. Developing
each part as a separate distributed shared object has an unacceptable performance
penalty. We are currently investigating how we can support composite semantics
subobjects whose elements can have separate replication strategies.
Local object
HTTP connection
Interface
Document representative
Figure 4: The general organization for integrating Globe Web services into the
current Web.
The gateway accepts all URLs. Normal URLs are simply passed to exist-
ing (proxy) servers, whereas Globe URLs are used to actually bind to the named
The Architectural Design of Globe: MAIN TEXT
A Wide-Area Distributed System Page 15 of 22
Discussion
With the exponential growth of the Web, it is clear that we need a highly scalable
infrastructure for implementing a wide variety of applications. Globe provides
such an infrastructure.
An important aspect of our model is that partitioning, replication, and migra-
tion of an object’s state is supported on a per-object basis. Different objects can
use different strategies: each object fully contains an implementation of its own
strategy, independent of other objects. This makes it much easier to have very
different objects interoperate, for the simple reason that each hides its internals
from the other behind well-defined interfaces. More importantly, is that by pro-
viding a mechanism for implementing distribution policies on a per-object basis,
we can tackle worldwide scalability. In our view, the next generation of distributed
systems will have to support a wide variety of objects that can be invoked from
anywhere. The only way to achieve worldwide scalability is to provide extensive
support for partitioning and replicating objects, and allow very different consis-
tency strategies to co-exist. 7 Globe provides this flexibility.
We have finished the initial architectural design of our system, leaving a num-
ber of subjects open for further research. For example, we are currently working
on the design of a security architecture. Furthermore, we are concentrating on spe-
cific schemes for wide-area replication and persistence, mechanisms that support
large-scale applications composed of many distributed objects, and persistence.
Our current efforts concentrate on developing a Java-based implementation for
constructing scalable Web documents. More information on Globe can be found
at our home page http://www.cs.vu.nl/ steen/globe/.
Acknowledgments
Many ideas presented in this paper have been formed during discussions with
other participants in the Globe project: Arno Bakker, Gerco Ballintijn, Franz
Hauck, Anne-Marie Kermarrec, Ihor Kuz, and Patrick Verkaik. We gratefully
acknowledge their contribution. Leendert van Doorn is acknowledged for his con-
tributions to the detailed design and implementation of local objects. Finally, we
The Architectural Design of Globe: MAIN TEXT
A Wide-Area Distributed System Page 16 of 22
thank Henri Bal, Koen Langendoen and the anonymous referees for their valuable
assistance in improving this paper.
References
1. Microsoft Corporation. DCOM Technical Overview, 1996.
2. W. Rosenberry, D. Kenney, and G. Fisher. Understanding DCE. O’Reilly & Associates,
Sebastopol, CA., 1992.
3. OMG. “The Common Object Request Broker: Architecture and Specification, revision 2.0.”
OMG Document 96.03.04, Object Management Group, Mar. 1996.
4. M. Satyanarayanan. “Scalable, Secure, and Highly Available Distributed File Access.” Com-
puter, 23(5):9–21, May 1990.
5. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns, Elements of Reusable
Object-Oriented Software. Addison-Wesley, Reading, MA., 1994.
6. N. Islam. “Customizing System Software Using OO Frameworks.” Computer, 30(2):69–78,
Feb. 1997.
7. B. Neuman. “Scale in Distributed Systems.” In T. Casavant and M. Singhal, (eds.), Readings
in Distributed Computing Systems, pp. 463–489. IEEE Computer Society Press, Los Alamitos,
CA., 1994.
8. M. van Steen, F. Hauck, P. Homburg, and A. Tanenbaum. “Locating Objects in Wide-Area
Systems.” IEEE Commun. Mag., 36(1):104–109, Jan. 1998.
9. M. van Steen, A. S. Tanenbaum, I. Kuz, and H. J. Sips. “A Scalable Middleware Solution for
Advanced Wide-Area Web Services.” In Proc. Middleware ’98, The Lake District, England,
Sept. 1998. IFIP.
The Architectural Design of Globe: SIDEBAR TEXT: An Example
A Wide-Area Distributed System Page 17 of 22
come from all over the world. In these cases, a primary–backup approach where
pages are replicated to a number of mirror sites is useful. The organization’s Web
site could be constructed as one or more Web documents, where each document
is registered at the location service with multiple contact addresses. The nearest
address is always returned to a user. Note that, in Globe, the name of a Web docu-
ment can be the same everywhere. Also, there is no need to tell the user that there
are mirror sites, and where these sites are. In contrast to personal Web documents,
site-wide caching as is done by current Web proxies, may now be useful.
There are also Web sites whose content change rapidly and which may re-
quire active replication schemes. For example, Web documents of online news
providers may want to use a publish/subscribe type of replication by which sub-
scribers to a provider’s document are notified when news updates occur. This
also holds for Web documents related to conferences and other types of timely
events. In the current Internet infrastructure, automatic notification is often done
by making use of mailing lists. Such lists are highly inefficient. In Globe, notifica-
tion would be an integral part of the Web document, using a multicasting scheme
appropriate for that document. Of course, notification could be combined with
actively replicating the updates, but this may not be appropriate in all cases.
What we see here are similar Web documents, but that require very different
replication strategies. Personal home pages need not be replicated, and should be
cached on a per-user basis. Organizational home pages can apply primary–backup
replication, and should be cached per site. Home pages related to timely events
may benefit from a publish/subscribe type of replication where clients are noti-
fied when updates occur. Other examples where different replication policies are
required can easily be thought of. Unfortunately, such distinctions are presently
impossible to make. In Globe, however, each Web document can use a replication
strategy tailored to its own characteristics.
The Architectural Design of Globe: SIDEBAR TEXT: Related Work
A Wide-Area Distributed System Page 19 of 22
Related Work
There is much academic and industrial activity on the design and implementa-
tion of shared data and objects. A shared-data model offers a small set of primi-
tives for reading and writing bytes to shared regions of storage. Typical examples
of shared-data models are network file systems and distributed shared memory
(DSM) implementations. The main problem is achieving performance and scal-
ability while keeping data consistent. Distributed shared memory and storage
systems such as Munin 1 and Khazana, 2 respectively, follow an approach similar
to Globe by attaching replication policies on a per-region basis. In most DSM
systems, performance is improved by relaxing memory consistency. 3 The main
drawback of the shared-data model is that it simply does not provide the level
of abstraction needed for developing distributed applications. Therefore, much
attention is being paid to object-based approaches.
Objects come with an architectural model that lends itself well for distributed
systems. An object can be seen as a fine-grained service provider. To most devel-
opers, this means that an object is naturally implemented through its own server
process, which handles requests from clients. This view leads to the remote-object
model in which a remote-method invocation is made transparent using RPC-like
techniques as is done in DCOM. 4 However, this approach is the major obsta-
cle to scale worldwide. The problem is that remote-object invocations cannot
adequately deal with network latencies. Additional mechanisms such as object
replication and asynchronous method invocations are therefore necessary.
In the Legion system, 5 objects are located in different address spaces, and
method invocation is implemented nontransparently through message passing.
The Legion approach is one of the few which explicitly addresses wide-area scal-
ability. The Globus project has developed global pointers to support flexible im-
plementations. 6 A global pointer is a reference to a remote compute object. The
pointer identifies a number of protocols to communicate with the object, of which
one is to be selected by the client. Global pointers offer a higher degree of flexi-
bility than the Legion approach.
When it comes to distribution transparency, Legion and Globus fall short.
Transparency is explicitly addressed by object request brokers (ORBs). An ORB
is a mediator between objects and their clients. Basic ORBs provide only support
for language-independent and location-transparent method invocation. CORBA-
compliant ORBs 7 offer additional distribution services such as naming, persis-
tence, transactions, etc. Unfortunately, CORBA has not yet defined services for
transparently replicating objects, or for keeping replicas consistent.
When an ORB is responsible for distribution services, we require additional
mechanisms independent of the core object model. One such mechanism is the
subcontract used in the Spring system. 8 A subcontract implements an invocation
The Architectural Design of Globe: SIDEBAR TEXT: Related Work
A Wide-Area Distributed System Page 20 of 22
protocol: it describes the effect of a method invocation at the client side in terms
of the method invocation(s) at the object’s side. For example, in the case of repli-
cation, method invocation by a client may result in the invocation of that method at
each replica. Replicating the invocation is encapsulated in the subcontract and is
hidden from the client. As a general mechanism, subcontracts are too limited. For
example, it is hard to develop subcontracts that keep a group of objects consistent
that are being shared by several clients.
An alternative approach is to fully encapsulate distribution in an object, lead-
ing to a model of partitioned objects. Partitioned objects appeared in SOS in
the form of fragmented objects. 9 Globe’s distributed shared objects form another
implementation of partitioned objects, and have been derived from the Orca 10
programming language.
Fragmented objects in SOS are mostly language independent. Distribution
is achieved manually by allowing interfaces to act as object references that can
be freely copied between different address spaces. An important difference with
Globe’s distributed shared objects, is that fragmented objects make use of relative
object references. In contrast, Globe’s object handles are absolute and globally
unique. Fragmented objects have not been designed for wide-area networks. For
example, there are no facilities for incorporating object-specific replication strate-
gies. Likewise, the communication objects have been designed and implemented
for local-area networks only. Furthermore, the model does not provide facilities
for implementing different coherence policies, nor does it address the problem of
platform heterogeneity.
References
1. J. Carter, J. Bennett, and W. Zwaenepoel. “Techniques for Reducing Consistency-Related
Communication in Distributed Shared Memory Systems.” ACM Trans. Comp. Syst.,
13(3):205–244, Aug. 1995.
2. J. Carter, A. Ranganathan, and S. Susarla. “Khazana: An Infrastructure for Building Dis-
tributed Services.” In Proc. 18th Int’l Conf. on Distributed Computing Systems, pp. 562–571,
Amsterdam, May 1998. IEEE.
3. J. Protić, M. Tomas̆ević, and V. Milutinović. “Distributed Shared Memory: Concepts and
Systems.” IEEE Par. Distr. Techn., 4(2):63–79, Summer 1996.
4. Microsoft Corporation. DCOM Technical Overview, 1996.
5. A. Grimshaw and W. Wulf. “The Legion Vision of a Worldwide Virtual Computer.” Commun.
ACM, 40(1):39–45, Jan. 1997.
6. I. Foster, J. Geisler, C. Kesselman, and S. Tuecke. “Managing Multiple Communication
Methods in High-Performance Networked Computing Systems.” J. Par. Distr. Comput.,
40:35–48, 1997.
7. OMG. “The Common Object Request Broker: Architecture and Specification, revision 2.0.”
OMG Document 96.03.04, Object Management Group, Mar. 1996.
The Architectural Design of Globe: SIDEBAR TEXT: Related Work
A Wide-Area Distributed System Page 21 of 22
Biography
Maarten van Steen is assistant professor at the Vrije Universiteit in Amsterdam
since 1994. He received an M.Sc. in Applied Mathematics from Twente Uni-
versity (1983) and a Ph.D. in Computer Science from Leiden University (1988).
He has worked at an industrial research laboratory for several years in the field
of parallel programming environments. His research interests include operating
systems, computer networks, distributed systems, and distributed-software engi-
neering. Van Steen is a member of IEEE Computer Society and ACM.
Philip Homburg graduated in 1991 at the Vrije Universiteit, Amsterdam. Before
starting with his Ph.D. study, he wrote software to transparently connect multiple
Amoeba sites over the Internet as part of the Starfish project. Currently, he is a
Ph.D. student working on the overall design of Globe.
Andrew S. Tanenbaum has an S.B. from M.I.T. and a Ph.D. from the University
of California at Berkeley. He is currently a Professor of Computer Science at
the Vrije Universiteit in Amsterdam and Dean of the interuniversity computer
science graduate school, ASCI. Prof. Tanenbaum is the principal designer of three
operating systems: TSS-11, Amoeba, and MINIX. He was also the chief designer
of the Amsterdam Compiler Kit. In addition, Tanenbaum is the author of five
books and over 80 refereed papers. He is a Fellow of ACM, a Fellow of IEEE,
and a member of the Royal Dutch Academy of Sciences. In 1994 he was the
recipient of the ACM Karl V. Karlstrom Outstanding Educator Award and in 1997
he won the SIGCSE award for contributions to computer science.