Academia.eduAcademia.edu

MySRB & SRB: Components of a Data Grid

2002

Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we briefly describe the use of the SRB infrastructure as tools in the Data Grid Architecture for building distributed data collections, digital libraries, and persistent archives. We also provide details about the mySRB and its functionalities.

MySRB & SRB – Components of a Data Grid Arcot Rajasekar, Michael Wan, Reagan Moore San Diego Supercomputer Center, University of California at San Diego {sekar,mwan,moore}@sdsc.edu Abstract Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we briefly describe the use of the SRB infrastructure as tools in the Data Grid Architecture for building distributed data collections, digital libraries, and persistent archives. We also provide details about the mySRB and its functionalities. 1. Introduction The “Grid” is a term used to describe the software infrastructure that links multiple computational resources such as people, computers, sensors and data [1]. The term “Data Grid” has come to denote a network of distributed storage resources, from archival systems, to caches, to databases, that are linked using a logical name space to create global, persistent identifiers. Examples of data grids can be found in the physics community [2,3,5,6], for climate prediction [4] and for ecological sciences [7]. More recently several projects have promoted the establishment of data grids for other communities such as astronomy [8], geography, and earthquake and plate tectonic systems [9], etc. Most of these data grids are under construction and represent different proto-typical systems for building distributed data management environments. At SDSC we have developed software middleware, called the Storage Resource Broker [12], that can be used to build a data grid. The SRB together with MySRB, a web-based interface to the SRB, provides a suite of functionalities that can be used to implement data and information management systems. We discuss the SRB and MySRB infrastructure within the context of data grids, and demonstrate how data grids can be used to implement distributed data collections, digital libraries, and persistent archives. 2. Data Grid Architecture (DGA) All Data grids form the core infrastructure for building data management systems that span multiple administration domains, multiple types of storage systems, and multiple types of data access environments. We can characterize data management environments as distributed data collections, digital libraries and persistent archives. Distributed data collections provide a single name space for referencing data stored on multiple storage systems, typically within the same administration domain. Digital libraries integrate remote archival storage systems into a data collection, while providing discovery and manipulation services. Persistent archives support the migration of data collections onto new technologies, while preserving the ability to organize, discover, and access data. Each of these systems builds upon the capabilities provided by the lower level system, and all build upon data grids for managing distributed resources. We can characterize these capabilities as follows: Distributed data collection capabilities [17]: • Integrate Data Collections and Associated Metadata. As part of any DGA, we assume that the digital entities within a data collection will be described by attributes that characterize administrative, structural, provenance, and discipline-specific information. A data grid should provide mechanisms to support attributes associated with each registered digital entity. • Handle Multiplicity of Platforms, Resource & Data Types. The DGA network should handle diverse computational and storage resources. In the network, one should be able to access files on a super computer such as a IBM SP-2 or a desktop system (say an SGI) or a lap top running Linux or Windows OS. The DGA should support access from arbitrary types of compute platforms. • Seamless access to data and information stored within the DGA. The data from various collections at participating sites will be stored in archival storage systems (such as HPSS, DMF, ADSM, UniTree), file systems (Unix, NTFS, Linux), and databases (Oracle, Sybase, DB2) . Researchers at remote sites should be able to access these data as though they were accessing a local dataset, including support for reading and writing files. Digital Library capabilities [18]: • Handle Seamless Authentication. A digital library typically manages digital entities under a collection or community ID. To access data from a remote archive, the DGA should be able to manage the authentication of a user to the data handling environment, the authorization of the user for access to a digital entity, and the authentication of the data handling system to the remote archive. The DGA should be able to provide access to the user to all the storage systems with a single sign on authentication. • Virtual organization structure for data and information based on a digital library framework. Even though data will be stored at multiple sites, it would help users if the data are organized according to some logical (context-dependent) structure with an easy navigational aid. Hence, the DGA has to provide means to group data into collections (actually hierarchies of collections) and provide management facilities for the same. • Handle Dataset Scaling in size and number. The sizes and numbers of datasets involved in any DGA will grow in the coming years Hence any solution for the data grid should be scalable to handle millions of datasets, hundreds of Terabytes as well as large files that are tens of Gigabytes in size. Support is also needed for aggregating small data files into physical blocks called containers for storage into archives, and for decreasing latency when accessed over a wide area network. Persistent Archive capabilities [14,15,16] : • Replication of Data: For reasons of fault tolerance, disaster recovery and load balancing it will be useful for data to be replicated across distributed resources. Moreover, the consistency of the replicas should be maintained with very little effort on the part of the users. • Version Control: Since datasets may evolve over time, providing distributed version control will help in collaborative data sharing. This includes facilities for locking and checking out files. • Handle Access Control and provide Auditing Facilities. In some communities, data need to be guarded so that access to them is given only to selected and relevant people. Moreover, the selection should be done by the owner of the data. The DGA should be able to control access at multiple levels (collections, datasets, resources, etc) for users and user groups beyond that offered by file systems. Moreover, in some cases, it may be necessary to audit usage of the collections/datasets. Hence, auditing facilities will be needed as part of the framework. Data grids can provide support for each of the above capabilities, making it possible to extend current data collection, digital library, and persistent archive technology into distributed data management environments. 3. The Storage Resource Broker The SDSC Storage Resource Broker (SRB) is clientserver middleware that uses collections to build a logical name space for identifying distributed data [12,19]. The SRB provides a means to organize information stored on multiple heterogeneous systems into logical collections for ease of use. The SRB, in conjunction with the Meta data Catalog [13], supports location transparency by accessing data sets and resources based on their attributes rather than their names or physical locations [20]. The SRB provides access to data stored on archival resources such as HPSS, UniTree and ADSM, file systems such as the Unix File System, NT File System and Mac OSX File System and databases such as Oracle, DB2, and Sybase. The SRB provides a logical representation for describing storage systems, digital file objects, and collections and provides specific features for use in digital libraries, persistent archive systems and collection management systems. SRB also provides capabilities to store replicas of data, for authenticating users, controlling access to documents and collections, and auditing accesses. SRB also privides a facility for co-locating data together using containers. One can view containers as tarfiles but with more flexibility in accessing and updating files. The SRB can also store user-defined metadata at the collection and object level and provides search capabilities based on these metadata. SRB is a federated server system, with each SRB server managing/brokering a set of storage resources. The federated SRB implementation provides unique advantages: 1) Location transparency - Users can connect to any SRB server to access data from any other SRB server , and discover data sets by either a logical path name or by collection attributes. 2) Improved reliability and availability - data may be replicated in different storage systems on different hosts under control of different SRB servers to provide load balancing . 3) Logistical and administrative reasons - different storage systems may be run on different hosts under different security protocols, through use of a single sign-on environment and Access Control Lists maintained for each digital entity; 4) Fault tolerance – data can be accessed by the global persistent identifier, with the system automatically redirecting access to a replica on a separate storage system when the first storage system is unavailable. 5) Integrated data access – SRB provides the same mechanisms for accessing data in distributed caches and archives, making it possible to integrate access to back-up copies into the data management environment 6) Persistence – data can be replicated onto new storage systems by a recursive directory movement command, without changing the name by which the data is discovered and accessed. This makes it possible to migrate collections onto new resources without affecting access. The SRB has been implemented on multiple platforms including IBM AIX, Sun, SGI, Linux, Cray T3E and C90, Windows NT, 2000, Me, Mac OSX, etc. The SRB has been used in several efforts to develop infrastructure for GRID technologies, including the Biomedical Information Research Network (NIH) NSF/DOE Particle Physics Data Grid [2], DOE ASCI [10], NASA Information Power Grid [11] and NSF GrPhyN[5]. The SRB also has been used for handling large-scale data collections, including the 2-Micron All Sky Survey data (10 TB comprising 5 million files in a digital library), NPACI data collections, the Digital Embryo collection (a digital library of images) and LTER hyper-spectral datasets (a distributed data collection). More details on the SRB can be found at http://www.npaci.edu/DICE/SRB/. 4. MySRB– a web-based interface to the SRB MySRB is a web-oriented interface for accessing the data and metadata brokered by the SRB, that allows users to share their scientific data collections with their colleagues in a secure fashion. It provides a system where users can organize their static files and dynamic digital objects (virtual data) according to logical cataloging schemes independently of the physical location and formats of the files and also associate queriable metadata with these files. MySRB provides three primary functionalities: • collection and file management: operations collection, maintenance and deletion, operations for data creation, data ingestion, reload and registration, data replication and movement, access control, and for data deletion. Versioning and locking functions are under implementation. • metadata handling: operations for ingestion, extraction, copy, maintenance, update, and deletion of user-defined and standardized metadata. Standardized metadata might be based on lists of elements such as the Dublin Core, or might be from definitions based on Semantic Web. • access and display of files and metadata: functions for browsing files in the collection hierarchy and to search and query using system-level, user-defined and standard metadata. MySRB uses the secure-http (https) protocol with 128bit RSA authentication. Each session to MySRB is given a unique session key (stored as an in-memory cookie at the Browser). These session keys have a maximum timelimit set on them (currently 60 minutes). MySRB also performs security checks on the session keys when validating a user request. The web-browser interface for MySRB uses a splitwindow, shown in Figure 1. The small top-window is used to display metadata about data objects and collections, and the larger bottom-window is used for displaying elements in a collection or for displaying data objects accessed by the user. Hence, when a user “opens” a file, the attributes about the file are displayed along with the contents of the file. We show a screenshot of the MySRB interface in Figure 2 that is used to enter metadata. Default attribute values can be specified, as well as restricted vocabularies for attribute values, and as well as user-defined metadata. We illustrate the functionalities supported by MySRB through an exemplar scenario: Consider a curator who wants to form a new collection called “Avian Culture” under an existing “Cultures” collection. Her aim is to gather in one `folder’ all documents and multi-media available on the topic even though they might be located as distributed files, images, and movies stored on diverse mediaformats in file systems, archives, databases and websites. Some of the files would be under the control of the collection being created but others might be owned and curated by outside administrators with only links provided to them. She would also like to allow other curators to include their own materials into the collection. But she wants to have them include some minimal set of metadata based on entities defined under “MetaCore for Cultures” which she has augmented with more attributes relevant to her specialized topic. Also, she would like a set of selected users to add additional metadata for the collected items as and when they come across more information. Moreover, she would like users to add their own comments, ratings, errata and dialogues and annotations which will make the collection richer and more useful. Also an important criteria to her, is to include multi-modal relationships among the collection items so that one can link the objects in many ways for ease of browsing. Finally, she would like the public users to be able to access her collection by browsing in a pre-determined fashion (which is done by organizing the collections as subcollections as well as through multi-modal relationships), and/or search/query the collection using the rich mix of metadata based on standardized meta data, curatorial meta data, user annotations, ratings and dialogues. All the above operations are facilitated through the MySRB interface. The collections and sub-collections structure provides a means to organize the objects in a hierarchical fashion. Metadata at the collection-level can be used to provide (queriable) information about the collections and sub-collections as well as to enforce metadata that need to be provided when new items are added to the collection. Metadata at object level and collection level can be used to encode multi-modal relationships between the objects in the collection as well as to provide links to other documents and data outside the collection. The metadata stored in MySRB are not just entity-value pairs, but have a richer structure including associated ontology, units of the meta-value, groupings of the meta entities in schemas and subgroupings. (The ontology part is under development). MySRB also allows for storing annotations to capture dialogues, comments, ratings and user annotations and errata. MySRB supports a rich set of metadata management operations to help curators, users and public to ingest, maintain and access multiple kinds of metadata. A rich set of access control mechanisms provides a role-based access matrix from curator to public. The SRB facilitates federated and seamless access to remote storage from web-servers and file servers to tape archives and databases. It also provides for copying data as well as remote linking to enable read-only access to data not curated by current collection. Also, the SRB provides a feature to link objects and sub-collections across multiple collections without copying them. Finally SRB provides a replication management capability that can be used to provide fault-tolerance, load balancing and archival functionalities. 5. Features in MySRB We briefly discuss the mySRB in terms of its data movement capabilities and metadata management features Data Movement Operations: At the collection-level, a user can ingest a file into SRB or create new sub-collection through the mySRB interface. At ingestion time, the user can choose the logical resource that will be used for storing in SRB. The logical resource can be a single physical resource (say a Unix or NT file system, database, or archival file system) or it can be a logical resource that ties together two or more physical resources. In such a case, the file is replicated and stored in the underlying physical resources. For example, consider a logical resource logrsrc1 which consists of two resources: unix-sdsc, a unix file system at SDSC and hpss-caltech, a HPSS archival system at CalTech; then storing a file into logrsrc1 will ingest the file into both physical resources, unix-sdsc and hpsscaltech, synchronously and the two copies will be shown as two replicas of the same SRB object. During retrieval, the user can ask for a particular copy or or let SRB choose its own access for the file. Instead of specifying the resource, the user can specify a container when ingesting the file. In this case, the file is added to the container. Note that a container specification on ingestion overrides a resource specification. During ingestion, the user can specify the data type as well as any metadata that is required by the collection as well as a few user-defined metadata (metadata is discussed later in the section). MySRB uses the file-browse mechanism of web Browsers to identify the local file that need to be ingested. Files from Unix, Windows and Macintosh have been successfully ingested using mySRB. At this stage, only single file ingestion is supported. Apart from ingesting a file, a user can reingest a file (i.e., all metadata associated with the file by the SRB are still linked to it) and edit a file, if it is a small ASCII file (the edit facility is allowed only for a few data types). Apart from ingesting a file, a user can register SRB objects where no physical copy of the file is maintained or controlled by the SRB but a pointer to a physical location is maintained. There are five such types of objects that can be registered through mySRB: 1. A file that can exist either in a file system, an archival storage system or as a LOB in a database system. In this case, the user specifies the physical resource in which the file exist and the path name to the file in that resource. The user can perform all operations that mySRB offers, including deletion on registered files. Since the file is not fully under SRB’s control, the file size and other characteristics might change without SRB being aware of these changes. 2. A directory in a file system or an archival storage system. The user specifies the physical resource and the directory path name. The mySRB registers this path name as a ’shadow directory object’ (i.e., the cone of files under this directory is visible through this object), and provides all operations that can be performed by the user on a file except new file ingestion into the directory structure or update/deletion of files. These functionalities have some security implications but might be supported in a later version if these concerns are resolved. 3. A SQL query for a database resource. The user specifies a SQL query which can be either partial (i.e., the user can specify reminder of the query at retrieval time) or a full SQL query. Note that for security reasons, we recommend that one register only ’select’ commands and also have at least a partial query starting with select as part of the SQL. The select statement can be any query supported by the underlying database, including table joins, functions, stored-procedures, sub-queries and union queries (limitation of size might apply). The query is executed at retrieval time, and is not stored on registration. Hence the answer to the query can vary with time. During registration of the SQL, the user can specify the template to be used for pretty-printing the retrieved table. The mySRB supports three built-in templates that can be used: the first template HTMLREL, prints the result as a relational table in HTML format, the second template HTMLNEST, prints the result as a nested table in HTML, and the third template XMLREL, prints the result in XML using a simple DTD. Apart from the builtin template, the user can specify their own ’style-sheet’. In this case, the user specifies a file already in SRB as the style-sheet file. Currently the style-sheet is written in Tlanguage, an interpreted language native to SRB that supports rule-based data extraction and style-sheet for data organization. Support for other style-sheet languages such as XSLT will be provided in a later version. (Note: for additional information on T-language please refer to the primer given as part of the SRB package.) Deletion operation on this SRB object just removes the query from the SRB (and also any associated metadata and annotations) but does not change or delete tables in the underlying database. Currently mySRB does not support ingestion into databases (note that the SRB allows ingestion through command line and API), but the feature might be supported in a later version. 4. A URL. The user can specify any URL including ftp calls and cgi queries. On retrieval, the contents of the URL are retrieved and displayed. The contents of the URL are not stored in the SRB on registration. Deletion operation just removes the URL and any associated metadata from SRB and does not damage the contents of the URL at its physical location. 5. A method object or virtual data. The user can specify two types of registered method objects. The first type of method object runs an executable program that is invoked by the SRB as a remote proxy command. A proxy command is an executable that is available in the bin directory of a SRB server and is made available for execution by the SRB administrator (users have to ask a SRB administrator to place an object in a, possibly remote, SRB bin directory; this is done as a security precaution). When the method object is ’accessed’, the command is executed on the remote server and any results of that execution piped back to the browser. The user can provide command-line parameters at the invocation. As an example, one can register a method object that invokes a ’srbps’ command on a remote host. The ’srbps’ shows the process status similar to ’ps’ command in Unix, and the result is sent back to mySRB browser. The second method is an invocation of a proxy function inside SRB. One can compile user-defined functions in SRB and can invoke them using this feature. For example the metadata extraction function (explained later) is implemented in this manner. Apart from ingesting or registering files into mySRB, the user can also perform other data movement/maintenance operations. We briefly discuss them below: replicate: In mySRB, a user can replicate any file that is either ingested into the SRB or one that has been registered into SRB. Files inside a registered directory is not replicable. When replicating, the user specifies the resource that will be used for storing the replica. The new replica inherits all metadata associated with its siblings. A replica number is uniquely determined for the new replica and is displayed for the user in the listing. At this stage, mySRB does not support replication of files inside a container using this operation. Replication of a container (and its objects) is done by the SRB system using semantics associated with the logical resource specification of the container. register replicate: When a SRB object is a registered directory, URL, or SQL, then another object which has similar characteristics can be registered as a replicate. For example, if a SQL object which queries an oracle database, say dlib1, is registered in SRB, one can also register as a replicate of the first object another SQL object which queries another database, say a db2 databese dlib2. This might imply that they are equal (in some sense) and either queries will give the same result. Note that SRB does not check whether a registered replica is really an equal of the other copy. One can use this technique to register another object as a semantically equal copy of each other. For example, two SQL queries one giving an HTML output and another giving an XML output can be registered as replicas. ingest replica: For a SRB object (ingested or registered), one can ingest another file as a replicate. This is very useful when you want two different files in SRB that are syntactically different but semantically equal (eg. a tiff file and a gif file of the same image). Note that SRB does not check for syntactic or semantic equality. copy: A SRB collection, file or registered file can be copied as another SRB file or collection in another collection possibly with a new name. Currently we do not support copy of URL, SQL or method objects. The copy command does not copy any user-defined metadata or annotations for the new copy. We discuss these operations later. A copy is different from the replication operation because these two objects are considered to be entirely different and unconnected. Notions of synchronizations, trying for alternate replicas for retrieval and other operations and semantics associated with replicas do not apply to copied objects. The user specifies the new resource, path name and collection for the copy operation. move: Both files (ingested or registered) and subcollections in SRB can be moved from one collection to another. The user-defined metadata remains unchanged. This move is considered a logical move. Another type of move supported by the mySRB is a physical move of the object. This is possible only for files ingested into SRB resources (container-based files cannot be moved using this operation). In this case, the user provides the resource and path names of the new location for the file. link: One can link a SRB object (ingested or registered) in another collection. The operation is similar to soft linking in Unix. The access control of the original object is inherited by the linked object. Metadata and annotations associated with the original object can be viewed as part of the link object’s metadata but does not allow modification of the original object’s metadata. One can associate metadata and annotations with the link object apart from those available for the linked. One can also link a collection as a sub-collection of another collection. One can have more than one link to the same data (though replicas are not allowed). Chaining of links is not allowed. An attempt to link to another link object will result in a direct link to the parent object. Linking will be supported in the next version of SRB release. delete: A user can delete an ingested or registered file using the delete operation. A registered directory, SQL, URL or method object are unlinked without any deletion of the physical object. The objects or links when deleted is done one replica at a time and when the last replica is deleted all the metadata and annotations are also deleted. A linked file cannot be deleted through the link; a delete operation on a link basically performs an unlink operation. lock, pin, checkout: Using mySRB, an object can be locked so that operations on it are restricted. Two types of locks are supported: a ’shared’ lock which locks the object from being written to by any user other than the locking user but reads from the object and associated metadata are allowed, and ’exclusive’ lock which allows no interactions with the object. A lock placed by a user has an expiry date at which time it gets unlocked. A userdriven unlock operation is also supported. Pin operation makes sure that a SRB object does not get deleted from a particular resource. This is useful for pinning a file in a cache resource from being purged by SRB when performing cache management. An expiry time is also associated with pins and an explicit unpin operation is also supported. Checkout and checkin operations provide very crude forms of version control in mySRB. A checkout by a user disallows any changes to be made to that object and when checkin occurs, the older version of the object is still maintained as an earlier version with a distinct version number. Note that this is a very rudimentary version control and will be improved in later versions. These operations are implemented in mySRB but are not available in the current version of the SRB (1.1.8); they will be supported in the next version of the SRB release. Metadata Operations: The mySRB interface provides a very rich set of operations for creating, maintaining, viewing and searching different types of metadata for SRB objects as well as collections. There are five types of metadata in mySRB: 1. 2. 3. 4. 5. system-defined metadata, user-defined metadata, type-oriented (domain-oriented) metadata, file-based metadata, and annotations and commentary metadata. The system-defined metadata is created and maintained by the SRB system and the user can view them and also use them in their search mechanism. User-defined and type-oriented metadata for SRB files and objects are descriptive in nature and are made of name, value and units triplets. The metadata for a SRB collection can be of two types: descriptive and structural. Descriptive metadata are tripletswhich describe the content of the collection where as the structural metadata describe metadata that is required/suggested by the collection creator/curator for objects ingested or registered in the collection. The structural metadata has two additional parameters, default values and comments for explaining the metadata and its requirements. For these type of metadata, one can associate either no default values, or one default value or a set of reserved keywords which appear as a drop-down list in mySRB. Also the creator can designate zero or more of these metadata as mandatory wherein the file ingestor is required to provide a value for the metadata. There are four ways of associating user-defined metadata in mySRB. The first method allows the user to associate metadata when ingesting or registering an object, or when creating a new sub-collection. The second method is to invoke the insert metadata function which provides a form for the operation. This operation can be performed as many times as required and hence there is no limits for the number of metadata associated with a SRB object or collection. The third method is to copy metadata from other SRB objects or collections. The fourth method is to extract metadata from an extraction method associated with the data-type of the file. The metadata can be extracted from the object itself (eg. FITS files, HTML files) or one can extract the metadata from a second SRB object and associate the metadata to the first object (eg. AMICO image metadata with XML metadata files, or DICOM image metadata from separate header files). One can associate more than one metadata extraction method for a data-type and the user is allowed to choose one at the time of metadata creation. If necessary, more than one method can be applied to the same object to extract different metadata. Metadata extraction methods can be written in Tlanguage, which has a simple form of rules for identifying metadata values and associating them with metadata names. The type-oriented metadata are pre-defined sets of metadata elements that can be associated with the SRB objects through their data types or for all SRB objects. For example, Dublin Core metadata can be associated with any SRB object and an entry form for Dublin Core can be invoked when needed. Data-type designated metadata can be ingested for SRB objects of particular type and can be done through forms, by copying from other objects and/or by extracting through metadata extraction methods. User-defined metadata and typeoriented metadata can be ingested only by users who have ’ownership’ permission for the SRB object or collection. File-based metadata is as name suggests a file in SRB that is associated as a metadata-carrying file for another SRB object. This metadata is used only for viewing and cannot take part in querying (at the current time). One can associate the same file to be a metadata file for more than one SRB object. Currently triplets are the only form of metadata supported in this manner. XML-based metadata will be supported in a later release. Annotations and commentary metadata are useful for associating free-form metadata to a SRB object. They can be used for providing notes, comments, errata, queries and answers, annotations, memoranda, etc. These have a type/location associated with them and the timestamp and the annotation writer’s name. Unlike other types of metadata, the annotations and commentary can be inserted by any user with a read permission on the object. Having seen all the different types of metadata ingestion methods, one can be very creative in the type of metadata being ingested. Currently one can associate a URL as a metadata and if the URL is designated as being of ’inlineable’ type then the mySRB shows the contents of the URL. One can also associate other SRB objects as related to this object and in that case, a reference is provided as a clickable hot-link in mySRB. If this SRB object isdesignated as ’inlineable’, mySRB shows the content of the object. This is useful when showing thumbnail images for larger images or when showing some properties that are stored in a database. Other creative modes of metadata support will be implemented in future mySRB releases. Metadata in mySRB can be viewed in two ways. When a user selects an object for viewing, then the associated metadata is shown in a split screen on the browser. Hence the user can see both the data and the metadata at the same time. In the case of collections, the user-defined metadata is shown in one part of the screen and the collection listing with some of the system-metadata is shown in the other part of the screen. In the second method, the user can select to just view the metadata for an object. In the case of collections users have a choice of seeing the descriptive, structural or all metadata. Architecture for building distributed data collections, digital libraries, and persistent archives. The importance of metadata in SRB comes from the queriability of the metadata. MySRB provides a query interface where one can either query only the userdefined and type-defined metadata or query also with annotations and some system-defined metadata. When a user selects the mySRB icon in a collection-page of mySRB, it opens a new page with a set of query conditions where each condition has four parts: a metadata name part which is a drop-down menu containing all the metadata names that are queryable in that collection and every collection in the hierarchy under the collection. Hence, one can query across collections by being above the collections. The second part is a comparison operator where one can choose =,>,<,<=,>=,<>,like, not like, etc. The third part is a text box where the user can provide values for the comparison condition. The fourth part is a checkbox where a user can check if the user wants to see the values for a metadataname in the query result listing. One can check the box of a metadata name without using it as part of any query condition. The query is taken as a conjunctive query in the current implementation, i.e., an AND of all the condition is used for search purposes. The result of the query is a listing of SRB objects that satisfy the search conditions. 7. References Apart from creation, viewing and querying of metadata, MySRB provides functionality for updating, copying and deleting user-defined metadata and annotations. Apart from these operations, the MySRB interface provides additional functionalities such as user registration, access to resource, user and container metadata, ability to navigate the collection hierarchy and on-line help. The MySRB interface is available at https://srb.npaci.edu/mySRB.html. [1] Foster, I., and Kesselman, C., (1999) “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann. [2] PPDG, (1999) “The Particle Physics Data Grid”, (http://www.ppdg.net/, http://www.cacr.caltech.edu/ppdg/). [3] Hoschek, W., Jaen-Martinez, J., Samar, A., Stockinger, H., and Stockinger, K. (2000) “Data Management in an International Data Grid Project,” IEEE/ACM International Workshop on Grid Computing Grid’2000, Bangalore, India 17-20 December 2000. (http://www.eudatagrid.org/grid/papers/data_mgt_grid2000.pdf). [4] Hammond, S., (1999). "Prototyping an Earth System Grid", at the Workshop on Advanced Networking Infrastructure Needs in Atmospheric and Related Sciences, National Center for Atmospheric Research, Boulder CO, 03 June 1999. (http://www.scd.ucar.edu/css/esg/presentations/nlanr/inde x.htm). [5] GriPhyN, (2000) “The Grid Physics Network”, (http://www.griphyn.org/proj-desc1.0.html). [6] NEES, (2000) “Network for Earthquake Engineering Simulation”, (http://www.eng.nsf.gov/nees/). [7] KNB, (1999) “The Knowledge Network for Biocomplexity”, ( http://knb.ecoinformatics.org/). [8] NVO, (2001) “National Virtual Observatory”, (http://www.srl.caltech.edu/nvo/). 6. Conclusion Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we saw brief descriptions of the use of the SRB and the MySRB infrastructure as potent tools in the Data Grid [9] EarthScope, (2001) “EarthScope”, ( http://www.earthscope.org/). [10] ASCI, (1999) “Accelarated Strategic Computing Initiative”, A DOE Project, (http://www.llnl.gov/asci/). [11] IPG, (2000) “Information Power Grid”, A NASA Project, ( http://www.ipg.nasa.gov/). [12] SRB, (2001) “Storage Resource Broker, Version 1.1.8”, SDSC (http://www.npaci.edu/dice/srb). [13] MCAT, (2000) “MCAT: Metadata Catalog », SDSC (http://www.npaci.edu/dice/srb/mcat.html). [14] Rajasekar, A., R. Marciano, R. Moore, (1999), “Collection Based Persistent Archives,” Proceedings of the 16th IEEE Symposium on Mass Storage Systems, March 1999. [15] Moore, R., C. Baru, A. Gupta, B. Ludaescher, R. Marciano, A. Rajasekar, (1999), “Collection-Based longTerm Preservation,” GA-A23183, report to National Archives and Records Administration, June, 1999. [16] Moore, R., C. Baru, A. Rajasekar, B. Ludascher, R. Marciano, M. Wan, W. Schroeder, and A. Gupta, (2000), “Collection-Based Persistent Digital Archives – Parts 1& 2”, D-Lib Magazine, April/March 2000, http://www.dlib.org/ [17] Moore, R., (2001a) “Knowledge-based Grids,” Proceedings of the 18th IEEE Symposium on Mass Storage Systems and Ninth Goddard Conference on Mass Storage Systems and Technologies, San Diego, April 2001. [18] Moore, R., (2001b) “Knowledge-Based Data Management for Digital Libraries”, NIT2001, Beijing, China, May 2001 [19] Moore R., and A. Rajasekar, (2001) “Data and Metadata Collections for Scientific Applications”, High Performance Computing and Networking (HPCN 2001), Amsterdam, NL, June 2001. [20] Baru, C., R, Moore, A. Rajasekar, M. Wan, (1998) “The SDSC Storage Resource Broker,” Proc. CASCON'98 Conference, Nov.30-Dec.3, 1998, Toronto, Canada. Figure 1: SRB Main page showing the Collections with different objects and Operations Figure 2: File Ingestion Page with Metadata for Dublin Core Attributes and other user-defined attributes.