1
The Legion Grid Portal
Anand Natrajan, Anh Nguyen-Tuong, Marty A. Humphrey, Andrew S. Grimshaw
Abstract — The Legion Grid Portal is an interface to a grid
system. Users interact with the portal, and hence a grid through
an intuitive interface from which they can view files, submit and
monitor runs, and view accounting information. The architecture
of the portal is designed to accommodate multiple diverse grid
infrastructures, legacy systems and application-specific interfaces.
The current implementation of the Legion Grid Portal is with
familiar web technologies over the Legion grid infrastructure.
The portal can be extended in a number of directions —
additional support for grid administrators, greater number of
application-specific interfaces, interoperability between grid
infrastructures, and interfaces for programming support. The
portal has been in operation since February 2000 on npacinet, a
worldwide grid managed by Legion on NPACI resources.
I. OVERVIEW
T
he Legion Grid Portal is a grid computing environment
project designed to make grids accessible to users via
easy-to-use interfaces. The portal uses standard, off-the-shelf
software in conjunction with existing grid infrastructures to
facilitate access to a grid. In its current implementation, the
portal employs Legion [13] as the underlying grid
infrastructure by using Legion’s command-line tools. In other
words, when a user interacts with the browser, the appropriate
tool with the appropriate parameters is invoked at the back-end.
The portal can support the full suite of Legion command-line
tools; in practice, it supports a rich subset of the existing tools.
Particularly, it supports initiating and monitoring runs of grid
applications and accessing the Legion distributed file system.
Services such as security, scheduling, data transfer, etc. are
supported implicitly. The portal is in operation and is servicing
the needs of users of npacinet, Legion’s worldwide grid.
The Legion Grid Portal is an architecture for integrating a
number of existing technologies under a common interface.
Although the portal currently uses Legion, it can employ
Globus [11] and other grid systems as well as an underlying
infrastructure. In addition, the portal can employ legacy
Manuscript received June 30, 2001. This work supported in part by DARPA
(Navy) contract #N66001-96-C-8527, DOE contract DE-FD02-96ER25290,
DOE contract Sandia LD-9391, DOE contract D459000-16-3C, DARPA
contract SC H607305A, Logicon (for the DoD HPCMOD/PET program
through the NAVO MSRC) contract DAHC 94-96-C-0008, National Science
Foundation Next Generation Software grant EIA-9974968, National Science
Foundation NPACI grant ASC-96-10920, and a grant from NASA-IPG.
Anand Natrajan is with the Dept. of Computer Science at the University of
Virginia, Charlottesville, VA 22904-4740, USA (e-mail: anand@virginia.edu).
Anh Nguyen-Tuong is with the Avaki Corporation, Charlottesville, VA
22902, USA (e-mail: anh@avaki.com).
Marty A. Humphrey is with the Dept. of Computer Science at the University
of
Virginia,
Charlottesville,
VA
22904-4740,
USA
(e-mail:
humphrey@cs.virginia.edu).
Andrew S. Grimshaw is with the Dept. of Computer Science at the
University of Virginia, Charlottesville, VA 22904-4740, USA (e-mail:
grimshaw@virginia.edu).
systems, such as databases, in order to provide users with
greater functionality. In this paper, we discuss how the portal
uses an off-the-shelf, commodity database for accounting.
Moreover, the portal can support application-specific
interfaces, called special portals. We describe how users can
access a molecular modelling package using a special portal.
An important benefit of the Legion Grid Portal is that it does
not require downloading any of the Legion software on a user’s
machine. Since the portal operates entirely on the web server,
the client machine requires merely a browser installed on it.
Future work in the Legion Grid Portal involves constructing
an increasing number of special portals, providing greater
support for grid administrators, exploring interoperability
between grid infrastructures, providing a programming
interface to grids, and providing increasing support to grid
users in the form of superschedulers, information services,
interfaces for parameter-space studies, etc.
II. ARCHITECTURE
The architecture of the Legion Grid Portal is shown in Fig. 1.
The architecture is a layered one, with the highest layer
consisting of the user and portal interfaces, the middle layer
Fig. 1. Architecture of the Legion Grid Portal
2
consisting of the actual portal along with state information, and
the lowest layer consisting of the underlying system in terms of
grid infrastructures, legacy systems as well as special portals.
The component identifiers, C1-C6 are explained in Section III,
where we explain how we implemented the shaded boxes as
well as discuss the relationships between the components.
II.A Grid Software/Services on which the Portal depends
Currently, the Legion Grid Portal depends on the Legion
infrastructure for managing a grid. Legion presents an entire
grid as a single virtual machine to users [12]. As part of this
philosophy, Legion provides a truly distributed file system.
This file system is similar to a Unix or Windows file system in
terms of command-line or programmatic access, but different
from them in terms of the manner in which its components are
located, migrated, replicated and retrieved. Moreover, a user’s
view of a distributed file system is the same no matter which
machine he uses to log on to a grid. The contents of a
distributed file system are objects, a term used to describe any
first-class entity in Legion, such as files, directories, machines,
disks, users, consoles, programs, etc. The single-machine view
of the grid is particularly attractive to the Legion Grid Portal
because it enables presenting a complex environment in a
manner familiar to most users.
The portal accesses a grid through Legion’s command-line
tools. Most of these tools are analogues of Unix tools, but
targetted towards the distributed file system of the grid. For
example, a tool called legion_ls lists the files in a directory
of a distributed file system just as ls lists the files in a
directory of a Unix file system. Likewise, legion_cat prints
the contents of a file in a distributed file system much as cat
prints the contents of a file in a Unix file system. For the sake
of differentiation, directories in a distributed file system are
called contexts and a distributed file system itself is called a
context space [15]. Some Legion tools have no Unix
analogues. For example, a tool called legion_get_acl
retrieves the permissions of an object. Likewise,
legion_list_attributes retrieves any metadata
associated with an object. Currently, Legion’s command-line
tools are essential for any and all functioning of the portal.
Most user interactions with the portal involve invoking a
command-line tool to perform the task requested by the user.
For example, when a user logs in to a grid via the portal, she is
presented with the listing of the contents of her home directory
in the distributed file system for that grid. This listing is
procured by running the tool legion_ls on her home
context. The user could then click on the name of any file or
context in her home context. If she clicks on a context, the
portal performs a legion_ls on the new context. If she
clicks on a file, the portal performs a legion_cat on that
file. The results of either command are parsed and presented to
her in an intuitive manner. For any member of a context, the
user may request other actions, for example, a listing of the
permissions, a listing of any metadata associated with it or its
physical location. Moreover, the user can traverse context
space similar to what she would do in say, Windows Explorer
[18], with the caveat that she can access only those objects for
which she has permissions. In addition to traversing context
space, a user logged on to a grid via the portal can submit jobs
as well. Jobs can be submitted through a general interface or
through special portals.
The Legion Grid Portal is a convenient interface that masks
the tremendous complexity of a grid from a lay user. The portal
provides a simple and intuitive interface to a grid that lets the
user ignore details about syntax, parameters, permissions, etc.
The portal is an attractive tool for introducing users to a grid
and giving them a broad overview of its scope without
overwhelming them with too much detail. In its current form it
is very useful to novice users but less so to advanced users. As
we develop more specialised portals, advanced users will
benefit in terms of running applications as well being able to
manage a grid better.
II.B Grid Software/Services the Portal could use
A large number of grid management tools provided by
Legion are not accessible from the portal. These tools are
invaluable to grid administrators for maintaining and
monitoring a grid. Although these tools are accessible from the
command line, they have not yet been incorporated into the
portal primarily because most of them require grid
administrator privileges. As such, they are not useful to
ordinary users who should not be misguided into trying them
when they cannot use them. Therefore, an administrator would
require different interfaces from ordinary users. However,
currently, administrators are not treated any differently from
ordinary users in the portal. We are considering increasing the
security provided to administrators logging in to a grid. In
addition to grid management tools, users and administrators
alike may benefit from logs about specific objects. Currently,
these logs are not visible from the portal.
A grid managed by Legion can be configured in many
different ways, often during run-time. Administrators should
be able to exploit this flexibility without resorting to traditional
interfaces like command-line tools. They should be able to
conduct detailed investigations from the portal and should be
able to locate any and all problems from the portal. In addition,
creating new objects or services should be simple.
Legion provides other interfaces to a grid in addition to the
portal. For example, Legion provides a graphical interface for
sharing a Windows directory with other users securely, using
grid-level permissions from the context space of a grid.
Currently, such tools are stand-alone. Integrating them with the
portal would let users access context space using tools most
convenient for their needs.
II.C Grid Software/Services the Portal requires but not
supported by the Grid
Although not envisioned for the near future, a large number
of services and software would be attractive if incorporated
into the Legion Grid Portal. For example, users should be able
3
to take advantage of system services such as Network Weather
Service (NWS), perhaps using protocols such as Lightweight
Directory Access Protocol (LDAP) [14]. Users should be
provided with high-level tools, for example, engines to search
documents in the distributed file system, or graphing tools for
viewing the entire file system at a glance.
II.D Software/Services the Portal uses/requires outside the
scope of the Grid
An important task for future work in the Legion Grid Portal
is developing special portals. In particular, we would like to
incorporate tools and techniques for application users to view
their runs as they progress. For example, currently a user
issuing an Amber run can observe the molecule under study
periodically. In order to do so, the portal retrieves intermediate
files from the run (using a Legion tool called
legion_probe_run), processes them and creates a protein
database (PDB) file using a non-Legion tool called ambpdb.
This tool is specific to this kind of application. As we develop
more special portals we expect to use an increasing number of
such application-specific tools in order to let users view runs.
III. IMPLEMENTATION
The Legion Grid Portal consists of six main components, as
shown in Fig. 2. The central component both in the figure and
in the design is the Legion Grid Portal (C2), which is a Perl
CGI script used to process most requests by users. This script
is accessed by Portal Interface (C4), which includes the entry
page for the portal, subsequent pages generated by the portal
and the user, who initiates all actions in the portal. During
normal execution, the portal generates caches and session
information that are used for authentication as well as speedier
execution. These caches and session files, as well as images
and logs accessed by the portal are the Session State (C3).
Currently, the Legacy System (C5) used in the portal is a
commodity database (see Section III.A) along with the scripts
necessary to access it. Special Portals (C6) are used to run
specific applications from the portal; since the mechanisms for
running specific applications are similar to those for running
any application in Legion, this component includes tools,
software and scripts for running specific as well as general
applications from the portal. Both of these components, C5 and
C6 are intricate enough to merit description. The Grid
Infrastructure (C1) in this case, refers to the services and tools
provided by Legion for managing a grid. The list of
components can be augmented in order to provide additional
functionality to a user. In the subsequent sub-sections, we
discuss the existing components. The purpose of these
discussions is to present the techniques involved without the
details of the specifics.
III.A Commodity Technologies/Software used
The Legion Grid Portal uses Perl [4], PHP [5], MySQL [3]
and the Common Gateway Interface (CGI) [7] mechanism for
invoking Legion commands. In CGI, the user is presented with
a form in which she can fill parameters. Alternatively, the user
may be presented with a link with the relevant parameters
enumerated. In either case, when the user submits the request,
a CGI program on the web server parses the request, retrieves
the parameters and executes the appropriate commands. In the
case of the Legion Grid Portal, the CGI program on the web
server is a Perl script. Part of the portal functionality is
implemented in PHP, e.g., accessing accounting and job
databases stored in MySQL tables. Although the PHP scripts
are part of the Legion Grid Portal, we show them as part of the
component C5 rather than C1 because these scripts were
written explicitly for the MySQL legacy system. Likewise,
scripts written for specific portals are shown in C6 rather than
C1 because they were written for specific visualisation tools.
The entire portal is implemented on a Unix operating system.
III.B Proprietary Technologies/Software developed that can
be shared with others
The Legion Grid Portal is not based on any proprietary
technology. Source code for the portal is available to Legion
users. The algorithms and techniques within the portal are
based on standard programming practices found in Perl
documentation [10] [20].
III.C Implementation Details
In this section, we describe implementation details of the
Legion Grid Portal at an abstract level. Our intent is to
highlight some design decisions as well as present solutions to
problems that occurred during design.
III.C.1 Grid Infrastructure (C1). The underlying grid
infrastructure for the portal is Legion. Legion provides
programmatic as well as command-line interfaces to access a
grid. However, in the Legion Grid Portal, we take advantage of
the command-line interfaces only. The underlying grid
infrastructure software for the portal can be changed to any
grid system as sophisticated as Legion. For example, the portal
can be made to operate on top of Globus with the
understanding that since Globus does not have a distributed file
system, users would not benefit from a large part of the portal.
III.C.2 Legion Grid Portal (C2). The primary rôle of
component C2 is to issue Legion commands on behalf of the
user. The portal is implemented as a Perl CGI script used to
process most of the user’s requests. This script has three
requirements. Let SERVER denote the machine running the
web server, USER denote the user ID owning the script on
SERVER, and NET denote the name of the grid which the user
chooses to access.
1. The directory of USER, for example, /home/USER, must
be accessible from SERVER.
2. The directory of the Legion tree, for example, /home/
NET, must be accessible from SERVER.
3. Legion must be compiled for the architecture of SERVER
and the grid NET must be started. SERVER need not be a
Legion host.
4
Fig. 2. Details of Components of the Legion Grid Portal
5
If #1 is violated, the user’s browser cannot locate the script. If
#2 or #3 is violated, every Legion command fails and the grid
is inaccessible. Assuming familiarity with Perl, CGI and basic
Legion commands, the major steps involved in the Legion Grid
Portal are:
1. Parse the arguments.
2. Set up the Legion environment and global variables.
3. Get credentials and session id, perhaps by logging in.
4. Generate the context tree for browsing.
5. Select the appropriate handler for the command to run.
6. Check if all arguments are present for the command. If
not, generate the page to get the desired arguments and go to
step #9.
7. Run the command, displaying its status along the way.
8. Display the output and error of the command.
9. Show the generated page to the user.
Most of the steps above are straightforward and require little
detailed knowledge about Legion. Step #7 is performed by a
handler for the appropriate command. A handler is a module
written explicitly for checking the required parameters and
parsing the output and error states of a Legion command. Most
handlers are simple; however, some of them can perform
substantially complex tasks. For example, the handler for
legion_cat creates a download window wherein the
contents of the selected Legion file are shown with the
appropriate MIME type.
Likewise, the handler for
legion_run, which runs a legacy application, is expectedly
complex. Techniques for constructing handlers for additional
Legion commands are explained in the documentation for the
Legion Grid Portal. Most handlers eventually invoke a Legion
command in a standard manner. This manner involves logging
at the start of the command, periodically during the execution
of the command, and at the termination of the command.
Commands terminate normally, or because a user-specified
timeout expired.
III.C.3 Session State (C3). The session state component
consists of the various files, caches and session information
associated with a particular session initiated by a user as well
as general logs maintained by the portal.
The session information particular to the current session for a
user includes an environment cache, a credentials cache and a
session ID. These files are necessary in order to set up the
environment for a user. If a user does not use the portal (thus
interacting with Legion via its command-line interface) her
Legion environment can be set up by sourcing a setup file
called setup.sh or setup.csh provided with all Legion
installations, and then using legion_login to generate the
user’s credentials. The user’s credentials are generally stored in
a well-known file protected by the underlying operating
system’s mechanisms. The user’s environment is valid as long
as she continues to use the same terminal from which she
logged in. However, in the portal, because of the manner in
which CGI scripts are executed, each time the user executes a
new command, she gets a new terminal. Since requiring the
user to log in for every command would be inconvenient, in the
portal we manage the environment explicitly. When the user
logs in from the portal, we source the setup script provided by
Legion and log the user into the grid as well. After a successful
login, we cache copies of the user’s environment variables as
well as her credentials (similar to the MyProxy mechanism in
Globus [17]). When the user issues a command, we re-set the
environment from the cached copy and re-create the credentials
by copying the cached credentials to the appropriate wellknown file. The effect is similar to the user’s having logged in
anew except that the user is unaware that it happened and no
additional Legion command is executed. This management of
the Legion user’s credentials is a security risk only if the Unix
user running the CGI script, namely, USER or nobody, cannot
be trusted.
Access to any Legion functionality, including the credentials
file is controlled by a session ID generated by the CGI script.
The session ID is a large numeric sequence of random numbers
that is extremely hard to forge. For example, in the current
implementation, the session ID is constructed by concatenating
three random floating-point numbers between 0 and 1000 with
12-digit (decimal) precision. The resulting session ID is a
sequence of 39-45 decimal digits with three periods. When the
user logs in, she is provided with a session ID that is valid as
long as she continues to interact with the portal. The session ID
is saved in a file and propagated between consecutive requests
from the same browser. If at any time, the session ID from the
browser does not match the saved session ID, the session is
terminated. Thus, a user’s session can be compromised only if
he communicates his session ID to an intruder and keeps the
session valid by interacting with the portal.
The portal maintains a single log that contains timing
information about every command issued through the portal.
When a Legion command is invoked from the CGI script, its
status is logged periodically as well as when the command
starts and ends. The statuses of commands are shown in a
status window that a user may choose to view or ignore. By
default, the status window is displayed to the user to give him
feedback about the execution of the command.
Additionally, a user may choose to view the output and error
of every command in separate windows. The portal permits
viewing these windows in which the output and error are
presented directly from the Legion command, i.e., unprocessed
in any manner. The main window continues to show the
processed results of the same commands. The outputs and
errors of most commands issued in a session are stored in
separate files. If a user chooses to view the output and error
windows for his session and uses a browser’s navigation
buttons to view previous and next pages, the saved output and
error files are retrieved and presented correctly. Moreover, if a
user re-issues a previous command, the output and error of that
command may be retrieved from the saved files, which thus
constitute a cache. Retrieving output and error from caches
avoids invoking Legion commands, which can be slow. Since
the passage of time as well as a user’s actions may invalidate a
6
cache, we invalidate caches aggressively. Caches can be
invalidated explicitly by the user, from within the CGI script
and periodically by an external session timeout mechanism.
Cached files can be removed periodically by using the Unix
crontab tool after a session timeout. A crontab line
similar to the one below is used to check session files every
hour and remove those that have not been accessed recently.
0 * * * * browser_timeout
In this manner, the cached output and error of previous
commands can be purged. The purging may remove cached
credentials and session IDs as well, thus ensuring that users
Fig. 3. Entry Page
7
Fig. 4. User Page generated after login
who forget to log out from the portal are not compromised after
a timeout has elapsed. Files associated with runs initiated by
users are not purged unless a user explicitly requests to do so.
Thus, users may initiate runs and logout or timeout, but
continue their runs uninterrupted. The results of the runs can be
accessed by logging in again.
Purging caches requires Unix permissions to delete the
associated files. Therefore, we recommend running the CGI
script with cgiwrap [16]. If the script is run without
cgiwrap, it runs as some special user, usually nobody.
Consequently, only the user nobody is permitted to remove
the cached files. However, on most web server installations,
nobody’s permissions are restricted to prevent running many
Unix commands or even logging in. If the script is run with
cgiwrap, a non-privileged user, e.g., USER, could log in and
clean up caches and session state on errors or timeouts.
cgiwrap operation can be selected or ignored by changing
the entry page alone for the portal.
III.C.4 Portal Interface (C4). The entry page as well as the
user pages generated by the portal constitute its interface. The
generated pages contain enough information to invoke the
portal subsequently. The portal state of a user’s session is
passed between consecutive CGI invocations by normal CGI
8
Fig. 5. Control Panel
mechanisms. These mechanisms involve either setting hidden
widgets in forms or explicitly enumerating the state in links.
Alternatively, session cookies can be employed if they are
supported by the browser [6]. The CGI script uses six elements
to reconstruct the session state for a user. These elements are:
the Legion user name, the Legion grid name, the command to
be executed (defaults to legion_ls), the current working
context, the session ID, the timeout selected by the user and the
verbosity level for the portal.
The entry page (Fig. 3) requests the Legion user name, the
Legion password, the Legion grid name, the command to be
executed, the timeout and the verbosity level from the user. In
turn, it invokes the CGI script (C2), which computes the
current context to be the user’s home context and generates a
session ID. From this point on, every generated page (Fig. 4
onwards) contains the above-mentioned six elements from
which the user’s Legion environment can be reconstructed. For
any subsequent invocation of the CGI script, if either the
Legion user name, the Legion grid name or the session ID are
absent or incorrect, the Legion Grid Portal reports an error,
terminates the session, and requests the user to re-login.
After a user logs in, the main window of his browser displays
pages generated by the portal (Fig. 4). In addition to this
window, the portal may open up to five windows on behalf of
the user. First, a control window is always opened with a panel
of buttons and links that are likely to be used often (Fig. 5). For
example, the control window has links for browsing the user’s
home context, running an application and copying files
between the grid file system provided by Legion and the user’s
file system provided by the operating system. Second, a status
window is opened by default to show the logs of commands
(see Section III.C.3 and Fig. 6). A user may elect to close and
re-open this window at any time. Third, an output window may
be opened to show the raw output of commands (Fig. 7). A
user may elect to open and close this window at any time.
Fourth, an error window may be opened to show the raw error
of commands (Fig. 8). A user may elect to open and close this
window at any time. Opening and closing the status, output and
error windows can be accomplished by setting the verbosity
Fig. 6. Status Window
Fig. 7. Output Window
Fig. 8. Error Window
level of the portal. Fifth, a download window may be opened
with the appropriate mime type if the user decides to view the
contents of a file.
9
Fig. 9. Accounting Information
III.C.5 Legacy Systems (C5). Currently, accounting and job
monitoring in the Legion Grid Portal are accomplished by
taking advantage of a commodity database system. The portal
provides the interfaces to access this legacy system through a
PHP [5] script that accesses a MySQL [3] database as well as
Legion. The PHP script executes Legion commands to obtain
resource consumption data about a user’s Legion objects
(Fig. 9). The resources contributed by the user towards a grid
can also be obtained.
Whenever a user requests accounting information, the portal
seamlessly transfers control from C2 to C5. All session
information is transferred by CGI mechanisms to the PHP
script which manages the information in the same manner as
the Perl script in C2. Subsequently, when the user returns to
non-accounting actions, control is transferred seamlessly from
C5 to C2. The user is not necessarily aware that different CGI
scripts are used to process different requests because the
session state required by the portal is small and can be
transferred easily. Our success in adding accounting
functionality to the Legion Grid Portal by way of transfer of
control between these scripts is encouraging because it implies
that we can continue to increase the functionality provided
through the portal in this manner. Moreover, the same
mechanisms used to transfer control between C2 and C5 can be
used to transfer control between the Legion Grid Portal and
other grid portals such as GridPort [1]. Thus, the portal can be
extended to provide desirable interoperability between various
grid infrastructures such as Legion and Globus.
III.C.6 Special Portals (C6). A significant task enabled by
the Legion Grid Portal is starting up runs on behalf of a user. A
command like legion_run is much more complex than
ordinary Legion commands. Moreover, since runs may execute
for a long time, the user’s browser cannot be made to wait until
the command is complete. Consequently, the handlers for
executing runs are complex.
Before starting a run, the grid portal assembles the
arguments, parameters and input files for the run. A handler for
a run command provides means for the user to input all
arguments. Therefore, it generates buttons, boxes and widgets
for arguments for the run itself and arguments to the Legion
run command (Fig. 10). Necessary arguments are checked for
10
existence and sanity. Reasonable defaults are chosen for the
remaining arguments. If the run requires Unix/Windows input
files, the browser can send their contents in a multipart form
(such input files cannot exceed around 5MB in size; this
limitation is imposed by the HTTP protocol for multipart
forms). These files are saved on the web server in a subdirectory specially created for that run.
The run script invokes Legion commands to initiate the run.
Along with initiating the run, the script records information
about the run in the database in C5. A user can access the status
of her runs from the page generated by the portal for the user
immediately after the run begins. (Fig. 11, Fig. 12, Fig. 13).
Additionally, a user browsing her accounting information can
access the status of her runs from the database using interfaces
provided in C5 (Fig. 14). The user can monitor these pages
periodically to view the results of the run as it progresses. From
these page, the user can record the remote machine on which
the run executes, the working directory of the run and any other
information Legion provides about the run. In addition, he can
transfer intermediate files from the remote machine to the
server (and thence to his own machine) as well as transfer files
from his own machine to the remote machine (via the web
Fig. 10. Initiating a Run
11
Fig. 11. Special Portal for Amber
server). Transferring intermediate files out from the run is
useful for periodically viewing the run as well as
checkpointing. Transferring files in to the run is useful for
computational steering.
The run script can be used to initiate runs of well-known
applications, i.e., applications for which the binaries are
registered under well-known and widely-accessible runnable
classes [15]. For example, we have developed a portal for a
molecular modelling package called Amber and an
astronomical modelling package called Hawley-Hydro. The
portals have application-specific widgets that let users enter
input files intuitively (Fig. 11, Fig. 12). However, from the
Legion Grid Portal’s perspective, these applications are
identical to any other application. In other words, the
mechanisms to initiate an Amber or Hawley-Hydro run are
identical to those for any other run. After the run is initiated,
the portal provides an additional viewing capability for the run.
This additional view for Amber requires a plug-in called
Chime [2] to be installed on the user’s browser. With this plugin, the user can view the progress of the run graphically in
addition to the usual view provided by Legion. Since special
portals are merely more convenient interfaces to the basic
functionality of initiating a run, creating new portals is simple.
For example, a special portal for CHARMM [9] should take
only a few hours to construct once the particulars of the
application are available.
12
Fig. 12. Special Portal for Hawley-Hydro
IV. SUPPORTED GRID SERVICES
The Legion Grid Portal supports a significant subset of the
capabilities and features of a grid system using Legion; the
remaining can be added with a small amount of effort.
IV.A Security
Security in the Legion Grid Portal is addressed in a number
of ways. The portal takes advantage of the security
infrastructure provided by Legion. In Legion, typically users
log on to a grid using a login-password combination. Legion
generates credentials for the user to be used for that session.
Currently, the credentials are based on public-private key pairs.
The method for generating the credentials is irrelevant to the
portal. However, the portal manages a persistent copy of the
credentials on behalf of the user.
When a user logs on to a grid from the command-line,
Legion queries his login ID and password. If the pair is valid,
Legion generates a file which stores the user’s credentials. The
file is named based on the process ID of the user’s terminal; if
the user opens another terminal, the credentials file is not valid
for the new terminal. However, in the portal, subsequent
commands are executed in different shells. Since we cannot the
13
Fig. 13. Special Portal for Rendering
expect the user to supply an ID and password for every
command, and since we prefer not to store the user name and
password either on a disk on the web server or in the session
state of the browser, we manage the credentials explicitly.
Access to the managed credentials are moderated with a
session ID generated for every session (see Section III.C.3).
The session ID is hard, if not impossible, to duplicate.
Consequently, a user can access only his credentials. Moreover,
if the user is inactive for some time (currently, two hours), the
session ID and the saved credentials file are both invalidated.
This invalidation requires the user to log in again, but ensures
that the user’s credentials are erased if he forgets to log out or
his browser terminates unexpectedly.
A potential security risk in the portal occurs when the user
logs in for the first time. At this time, the legion_login
command is invoked with the user’s Legion login ID and
password. A Unix user on the web server could potentially
view the password in clear-text by executing a ps command
exactly during the legion_login command. Currently, we
avoid this problem by restricting the people permitted to be
Unix users on the web server. Another solution which we are
considering is to modify the legion_login command to
take not the clear-text password but the name of an encrypted
file as a password parameter. With this solution, an intruder
may be able to see the name of the file but may not be able to
access the file in any manner. However, neither solution is
secure enough if the superuser or web server user herself is the
intruder. Since every command issued via the portal executes
under the Unix ID of the web server user, this user can access
any information pertaining to any Legion user. Likewise, the
superuser on the web server can access any information
pertaining to any user. We believe that insecure as this situation
is, it may be unavoidable. On large installations it is common
for privileged users such as the superuser to be able to access
any information pertaining to any user. A responsible choice of
privileged users and judicious encrypting of critical data may
be the only reasonable solutions.
IV.B Information Services
The portal permits accessing all information services that can
be accessed by a command-line user. For example, a portal
user can browse context space (Fig. 15), view the metadata for
any object for which he has permissions to do so (Fig. 16), or
view all of the hosts in a grid (Fig. 17, Fig. 18). In Legion,
collection objects are repositories of metadata of other objects.
For example, a certain frequently-used collection object stores
metadata about every host in the grid. This collection is queried
during the scheduling process. A command-line user may issue
14
Fig. 14. Status of a Parameter-Space Study
one command to procure the data stored by any collection.
Currently, a portal user cannot issue such a command although
adding such a functionality to the portal is trivial. However, a
portal user can access a collection’s data implicitly, for
example, during the scheduling process.
IV.C Scheduling
Legion supports explicit and implicit scheduling. In explicit
scheduling, a user specifies the host on which she wishes to
run. In implicit scheduling, the grid infrastructure selects the
host on which the user runs. In Legion, creating any object,
whether it be a file or an instance of a program, involves a
scheduling process. Typically, users schedule implicitly,
especially during the creates of non-programs. The Legion
Grid Portal supports only implicit creates for non-programs,
although adding explicit creates is a trivial task. The portal
supports both implicit and explicit creates for programs.
Therefore, users can choose exactly the resources on which
they would like to run their applications. We are investigating
mechanisms and interfaces for letting users select sets of hosts
for large runs, such as parameter-space studies. Also, we are
investigating constructing superschedulers that let users select
from a wide variety of resources.
IV.D Data Transfer
The Legion Grid Portal supports automatic data transfer for
applications insofar as it is not limited by the CGI mechanism
itself. In Legion, the commands for running applications
provide switches by which input and output files can be
specified either from the local file system or from the grid file
system provided by Legion. Moreover, applications may access
the grid file system directly, thus obviating the need for any of
these switches. The portal supports all of these modes of
operation. However, if the input/output switches are specified,
certain CGI restrictions become apparent. If the user specifies
that the input and/or output files are to be accessed from the
grid file system, then the portal has no restrictions. If the user
specifies that the input files be accessed from the local file
15
Fig. 15. Browsing Contexts
system, CGI provides a way to upload the files to the web
server from where they can be supplied to the appropriate
Legion command. However, CGI imposes a limit on the size of
the uploaded file, currently of the order of a few Mbytes. If the
user specifies that the output files be stored to the local file
system, then there is no mechanism within CGI to store the
files automatically because the user must authorise the storage.
Therefore, the solution in the portal is to store the file on the
web server and provide a mechanism to download it to the
user’s local file system.
IV.E Additional Grid Services
The Legion Grid Portal provides excellent facilities for
monitoring jobs, viewing jobs as they progress, transferring
intermediate files and viewing accounting information. The
Legion tools for running applications provide means by which
the status of a currently-executing job can be viewed.
Moreover, the tools provide mechanisms for sending and
retrieving intermediate files. The portal enables users to
perform these tasks. For some applications, we have
constructed special portals which have all of the functionality
associated with running any application using Legion in
16
Fig. 16. Metadata for a Host Object
addition to specific interfaces for visualising the progress of a
job. For example, using the Amber portal users can view the
progress of an Amber job graphically. After they submit a job
through the Legion Grid Portal, a new window appears in
which the intermediate state of the molecule being studied is
displayed. The portal displays this view by accessing
intermediate files generated by the application periodically and
converting them into a protein database file using commodity
tools. The protein database file can be viewed graphically using
the Chime viewer. Likewise, the portal also permits viewing
Hawley-Hydro jobs by accessing intermediate files and
converting them into GIF images. Finally, in the case of some
parameter-space studies, the portal provides a means of
viewing the status of each job in an abbreviated manner, giving
the user an aggregate view of the application as a whole.
Integrated with the job views are accounting views that show
the resource donation and consumption by every grid user.
IV.F New Grid Interfaces arising from the Portal
The original design goal of the Legion Grid Portal was to
present an intuitive front-end to the command-line tools
available to a Legion user. However, as the design of the portal
17
Fig. 17. Viewing Hosts in a Grid
progressed, we discovered that we had to change some design
details in Legion. In particular, some of the changes were:
• The error modes of many tools were solidified and
standardised. Previously, standards for writing command-line
tools were lax; tools reported different error codes for the same
error, had different conventions for reporting outputs, had
different levels of versbosity, etc. Although standardisation of
tools is not yet complete, we are taking steps in that direction.
In particular, we have realised that Legion tools must be as
robust and standard as the Unix suite of tools, if not more so.
• Non-blocking modes of running applications was
developed. Previously, Legion users used to initiate runs in
“blocking” mode, i.e., the tool initiating the run would wait
until the run completed. Such a mode is highly undesirable in a
portal because most browsers will terminate a connection after
a period of inactivity. Since we cannot expect runs to
periodically output text for a browser, and we cannot expect
browsers to sustain connections notwithstanding, we developed
tools for running applications in “non-blocking” or
asynchronous mode. In this mode, the user (or the portal on
behalf of the user) initiates the run and collects a token or ticket
that identifies the job from Legion.
18
Fig. 18. Worldwide Grid managed by Legion
• Probing/monitoring runs was developed (with suggestions
from existing users). After we developed non-blocking runs,
the natural design progression was to enable probing or
monitoring runs. Using the token or ticket generated by
Legion, we were able to develop tools that report on the
progress of the run. Typically, these reports include
information about the machine on which the run is executing,
the working directory of the run, the names and sizes of the
files generated, etc. The ability to probe runs has proven to be
extremely helpful to Legion users. In the case of special
portals, the ability to probe runs has enabled the use of
visualisation tools that use intermediate files to display the
progress of runs.
• A proxy tool was designed to make command-line tools run
faster by pre-initiallising the Legion library. In Legion, every
command-line tool initiallises a Legion library. When
executing multiple tools frequently, as in the case of the portal,
repeated initiallisations of the Legion library can represent a
large overhead. We developed a proxy tool which can initiallise
the Legion library once for an entire session. Multiple tool
invocations result in connections to the proxy which can
perform the functions of many tools quickly. Preliminary
investigations have shown that the speedup in tool execution is
around 33%-50%.
• We are re-thinking mechanisms to run general parameterspace studies. By definition, parameter-space studies require
large sets of parameter values. Typically, each set of
parameters is supplied in one or more files. A user desiring to
conduct a parameter-space study must construct the sets of files
for each run in the parameter space. Constructing those files is
an application-specific task. However, submitting those files
for initiating a large number of runs can be complex. For
command-line users, Legion provides a tool called
legion_run_multi, that enables them to specify the sets
of files. For portal users, specifying sets of files can be difficult
— specifying each file singly is tedious, specifying multiple
files with wildcards is not possible because the web server and
client have different filesystems, and specifying all the files
within one single archive is difficult because of file size limits
inherent in CGI transactions. Although it seems like the only
option is for users to use Legion’s distributed file system
(which could be accessible from both the server and the client),
we are exploring methods by which the user can use her Unix/
Windows file system as well.
V. PROJECT STATUS AND FUTURE PLANS
The Legion Grid Portal has been operational since February
2000. Over time, it has acquired an increasing number of
features and undergone several changes in its look-and-feel.
The portal has been made more robust and more intuitive to the
user. The entire design of the portal has been motivated by the
desire to present users with an interface to a grid that is not
more complicated than a few mouse clicks and occasional
typing. Informal studies have shown that users can grasp
important grid concepts much more quickly through the portal
than with other interfaces. Consequently, we are increasing the
usage of the Legion Grid Portal in tutorials.
The tasks that remain for the Legion Grid Portal fall into the
following categories:
1. Increasing access to Legion functionality for lay users.
Currently, the portal enables users to access only a small albeit
critical subset of Legion. We expect that as the portal matures,
more and more Legion commands will become accessible from
the portal. Moreover, interesting new compositions of Legion
19
tools will become commonplace in the portal.
2. Increasing the number of special portals. Currently, we
have portals for three applications — Amber, Hawley-Hydro
and RenderGrid. Such portals are well-suited for introducing
high-performance users to grids.
3. Increasing the number of tools for administrative users.
Currently, the administrator of a grid is treated as just another
user on the portal. The tools available to such a user are
identical to the tools available to any user. We expect to add
tools that only administrative users can employ. Also, we
expect to make log files available to such users. Such an
approach will make the management of a grid intuitive and
simple to administrators.
4. Providing a programming interface for grids. Legion
provides an abstract programming model based on dataflow
graphs. This model is attractive to developers of grid services.
We expect to provide such developers with tools to construct
their services over Legion.
5. Exploring grid interoperability. The portal has the
potential to unify high-level functionality provided by different
grid infrastructures. We expect to study how the relative
strengths of different approaches to grid can be utilised within
the common interface provided by the Legion Grid Portal.
ACKNOWLEDGEMENT
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
We thank Michael Herrick and Mark Morgan at Avaki
Corporation for their help with the design and implementation
of parts of the Legion Grid Portal.
[19]
REFERENCES
[20]
[1]
—, “GridPort”, gridport.npaci.edu.
View publication stats
[18]
—, “MDL Information Systems, Inc.”, www.mdli.com.
—, “MySQL”, www.mysql.com.
—, “Perl Mongers”, www.perl.org.
—, “PHP”, www.php.net.
—, “Server-Side JavaScript Guide”, developer.netscape.com.
—, “The Common Gateway Interface”, hoohoo.ncsa.uiuc.edu/
cgi.
—, “The Legion Manuals (v1.7)”, University of Virginia, October 2000.
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J.,
Swaminathan, S., Karplus, M., “CHARMM: A Program for
Macromolecular Energy, Minimization, and Dynamics Calculations”, J.
Comp. Chem., vol. 4, 1983.
Christiansen, T., Torkington, N., Perl Cookbook, O’Reilly & Associates,
ISBN: 1-56592-243-3, 1998.
Foster, I., Kesselman, C., The Grid: Blueprint for a New Computing
Infrastructure, Morgan Kaufmann, 1999.
Grimshaw, A. S., Wulf, W. A., “The Legion Vision of a Worldwide
Virtual Computer”, Comm. of the ACM, vol. 40, no. 1, January 1997.
Grimshaw, A. S., Ferrari, A. J., Lindahl, G., Holcomb, K.,
“Metasystems”, Comm. of the ACM, vol. 41, no. 11, November 1998.
Howes, T., Smith, M., LDAP: Programming Directory-Enabled
Applications with Lightweight Directory Access Protocol, Macmillan
Technical Publishing, ISBN: 1-57870-000-0, 1997.
Natrajan, A., Humphrey, M. A., Grimshaw, A. S., “Capacity and
Capability Computing in Legion”, 2001 International Conference on
Computational Science, May 2001.
Neulinger,
N.,
“CGIWrap:
User
CGI
Access”,
cgiwrap.unixtools.org.
Novotny, J., Tuecke, S., Welch, V., “Initial Experiences with an Online
Certificate Repository for the Grid: MyProxy”, High Performance
Distributed Computing 10, August 2001.
Richter, J., “Custom Performance Monitoring for your Windows NT
Applications”, Microsoft Systems Journal, August 1998.
Snir, M., Otto, S., Huss-Lederman, S., Walker, D. W., Dongarra, J., MPI:
The Complete Reference, MIT Press, 1998.
Wall, L., Christiansen, T., Schwartz, R. L., Programming Perl, O’Reilly
& Associates, ISBN: 1-56592-149-6, 1996.