Cif Applications Crystallography Open Database - An Open-Access Collection of Crystal Structures
Cif Applications Crystallography Open Database - An Open-Access Collection of Crystal Structures
Cif Applications Crystallography Open Database - An Open-Access Collection of Crystal Structures
Journal of
Applied
Crystallography Open Database – an open-access
Crystallography collection of crystal structures
ISSN 0021-8898
a
Institute of Biotechnology, Graiciuno 8, LT-02241 Vilnius, Lithuania, bCRISMAT-ENSICAEN, Université de Caen
This article is dedicated to Michael Berndt.
Basse-Normandie, Bd. M. Juin, 14050 Caen, France, cDepartment of Geosciences, University of Arizona,
Tucson, Arizona 85721-0077, USA, dSchool of Chemical, Biological and Environmental Engineering, 315
Gleeson Hall, Oregon State University, Corvallis, OR 97331-2702, USA, eDepartamento de Quı́mica Inorgánica,
Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain, fDepartment of Materials Engineering,
University of Trento, via Mesiano, 77-38050 Trento, Italy, gPortland State University, Department of Physics, PO
Box 751, Portland, OR 97207-0751, USA, and hUniversité du Maine, Laboratoire des Oxydes et Fluorures, CNRS
UMR 6010, Avenue O. Messiaen, 72085 Le Mans Cedex 9, France. Correspondence e-mail: grazulis@ibt.lt
J. Appl. Cryst. (2009). 42, 726–729 Saulius Gražulis et al. Crystallography Open Database 727
cif applications
number of different structure types is close
to 30 000, the total number being attained by
adding series of isostructural virtual
compounds. For instance, there are 6400
different (Al/P)O4 compounds, and three
other series of isostructural compounds with
formulations SiO2, (Al/Si)O4 and (Al/S)O4.
Besides the possibility of searching through
the PCOD web interface, in a similar way to
the COD, 48 complete series of compounds
(characterized by the presence of the same
chemical elements) are downloadable for
prospective research.
For anybody who wishes to use the COD
and the PCOD databases, the collected files
are presented using standard open protocols
and formats. The database can be searched
online on the COD server using a simple
web-based search form, and the structural
results can be downloaded either one by one
or in a compressed .zip file. Alternatively, the
whole collection of the COD files and data-
base tables can be downloaded from the
COD web site (using the http protocol) as a
compressed .zip, .tar.gz or .tar.bz2 file, or
updated via an rsync protocol (http://samba.
anu.edu.au/rsync/) from rsync://www.
crystallography.net/cod-cif and rsync://
www.crystallography.net/pcod-cif so that the
files can be used and examined on a user’s
local machine. Finally, the COD and PCOD
CIFs, database dumps and web scripts are
available for anonymous checkouts from
the COD Subversion server (svn://
www.crystallography.net/cod and svn://www.
Figure 1
crystallography.net/pcod). From this server The COD deposition procedure. In this data flow diagram, circles indicate automatic processes and arrows show
an interested user can reconstitute locally the data paths. As in control flow diagrams, a trapezoid indicates manual processes and a rhomb indicates a
the whole COD database and the web site process where a decision to divert data via different paths is taken. Names after the colons in each node are the
for local searches, and also browse COD names of the Unix tools or COD-specific programs that were used for that operation. Rectangles are abstract
(web) data sources – data sources depicted in pink provide crystallographic and chemical information
deposition logs and retrieve older revisions, (coordinates, symmetry data, formulae), while those depicted on a blue background provide bibliographic data.
should they be necessary. Cylinders denote internal COD disk storage facilities (databases). File extensions indicate file formats used. The
To facilitate the use of the COD as a .mrk file format is an intermediate format similar to XML designed for ease of parsing and editing, and used only
internally by the COD deposition scripts.
reference database, it is planned that all data
published in the COD will be assigned
persistent URLs. Thus, any structure deposited in the COD should be human depositors for possible errors. Both steps can be automated
available as http://www.crystallography.net/cif/hCOD numberi, e.g. and parallelized. Finally, the structures still requiring human atten-
http://www.crystallography.net/cif/1000000.cif. tion can be checked and edited in parallel by numerous COD
The open-access nature of the COD and the PCOD permits the reviewers all over the world, provided there is adequate software and
creation of numerous mirrors of the COD and the PCOD. At present, enough volunteers participate in COD maintenance. Currently, the
three mirrors are available at http://cod.ibt.lt/, http://cod.ensicaen.fr/ number of people contributing or willing to contribute to the
and http://nanocrystallography.org/. Currently, one centralized repo- development of the COD amounts to several dozen, apparently
sitory is kept as an authoritative source of data, but with the growth enough to provide qualified peer-review for the incoming structures.
of the databases a decentralized implementation is possible. The development of the automatic data submission, annotation and
CIF correction software is under way. Calculation of powder patterns
is implemented for the PCOD data in the Match! software (http://
3.2. Future directions of COD and PCOD development www.crystalimpact.com/match/match18.htm).
A current challenge for all crystallographic databases, including For researchers who wish to publish their structure-related work,
the COD, is an exponential increase in the number of determined most journals require the deposition of structures with a crystal-
structural data entries. Fortunately, there is plenty of room to lographic database and ask for the database accession number as
improve the efficiency of the COD deposition procedure. The current proof of deposition. For such structures, a special deposition status,
procedure involves a step in which a COD number is assigned by ‘on hold until publication’, will be introduced. The structures
COD coordinator, and a step where the structures are checked by submitted to the COD with the ‘on hold’ flag will be included in the
728 Saulius Gražulis et al. Crystallography Open Database J. Appl. Cryst. (2009). 42, 726–729
cif applications
COD SQL database where their cell constants, composition, Laboratoire des Oxydes et Fluorures, Institut de Physique de la
symmetry and authorship will be indicated. A COD number will be Matière Condensée and Portland State University for donating
assigned to the structure and returned to the author, and will be original coordinate files, collected or determined in their premises.
visible through the search interface of the COD. The atomic coor- We acknowledge the numerous volunteers who helped to establish
dinates themselves, however, will not be released to the public until the COD data collection by donating data from their private
either the publication describing them appears, the authors inform collections of structural data. Many thanks to Patrick Ducrot from
the COD team that the coordinates should be released, or one year the Ecole Nationale d’Ingénieurs de Caen (ENSICAEN), France, for
elapses from the original deposition of the CIFs. If the structure is not establishing and maintaining the mirror COD web site, and to Amber
published within one year, an e-mail will be sent to the depositing Lauer of Portland State University and Boris Dušek of the Charles
author asking whether the structure should be released or withdrawn. University in Prague for setting up and maintaining the Portland
At present, one of the main limitations of the functionality of the State University COD mirror. The authors thank Virginijus Šikšnys
COD is the absence of a substructure search engine. In organic and from the Institute of Biotechnology (Vilnius) for comments and
metal–organic chemistry, the best way of defining such similarity is discussion of the manuscript. The work of JB and SG during the
generally the presence of a common group of atoms chemically linked summer of 2007 was partly supported by the Lithuanian Science
in the same way: this is what we call a ‘substructure’. For performing Council Student Research Fellowship Award. The COD Advisory
this task with COD data, we need to represent the chemical Board thanks Crystal Impact GbR for financial support of the open
connectivity of the structures included in the COD in a suitable access publication of this paper.
format, provide a tool for the user to input into the COD the defi-
nition of the substructure and finally employ a search–match engine References
that compares the user input against the COD data. A specialized
Allen, F. H. (2002). Acta Cryst. B58, 380–388.
chemical format such as CML (http://en.wikipedia.org/wiki/ Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. (2002). Acta Cryst.
Chemical_Markup_Language) or SMILES (http://www.opensmiles. B58, 364–369.
org/) with molecules already ‘grown’ across any possible crystal- Collins-Sussman, B., Fitzpatrick, B. W. & Pilato, C. M. (2008). Version Control
lographic symmetry elements and simplifying the possible presence with Subversion. Sebastopol: O’Reilly Media.
Downs, R. T. (2008). American Mineralogist Crystal Structure Database, http://
of chemically identical but crystallographically different moieties rruff.geo.arizona.edu/AMS/amcsd.php.
could be used for encoding the necessary information. The tools for Downs, R. T. & Hall-Wallace, M. (2003). Am. Mineral. 88, 247–250.
user-friendly structure input and search are available under both free Hall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655–685.
and commercial licenses (http://xdrawchem.sourceforge.net/, http:// Kabekkodu, S. N., Faber, J. & Fawcett, T. (2002). Acta Cryst. B58, 333–337.
www.cambridgesoft.com/software/ChemOffice/, http://sourceforge. Le Bail, A. (2003). IUCr Commission on Crystallographic Computing
Newsletter, No. 2, pp. 39–41.
net/projects/joelib/, http://sourceforge.net/projects/cdk/). The Le Bail, A. (2005). J. Appl. Cryst. 38, 389–395.
remaining task is the integration of these tools with the COD. Le Bail, A. (2008). Powder Diffr. 23, S5–S12.
Lutterotti, L., Matthies, S. & Wenk, H.-R. (1999). Proceedings of the Twelfth
The authors and the COD Advisory Board are grateful to the International Conference on Textures of Materials (ICOTOM-12), Vol. 1,
International Union of Crystallography for permission to automate pp. 1599–1604. Ottawa: NRC Research Press.
the download of published supplementary data and bibliographic McMahon, B. (1998). vcif, http://www.iucr.org/resources/cif/software/archived/
records of the IUCr online journals. We thank the Mineralogical vcif-0.58.
Patashnik, O. (2003). TUGboat, 24, 25–30.
Society of America, the Mineralogical Association of Canada, Wall, L., Christiansen, T. & Orwant, J. (2000). Programming Perl, 3rd ed.
Laboratoire de Cristallochimie et Physicochimie du Solide, Labor- Sebastopol: O’Reilly Media.
atoire de Cristallographie et Sciences des Matériaux (CRISMAT), White, P. S., Rodgers, J. R. & Le Page, Y. (2002). Acta Cryst. B58, 343–348.
J. Appl. Cryst. (2009). 42, 726–729 Saulius Gražulis et al. Crystallography Open Database 729