Academia.eduAcademia.edu

Diatom identification: a double challenge called ADIAC

1999, Proceedings 10th International Conference on Image Analysis and Processing

This paper introduces the project ADIAC (Automatic Diatom Identi cation and Classi cation), which started in May 1998 and which is nanced by the European MAST (Marine Science and Technology) programme. The main goal is to develop algorithms for an automatic identi cation of diatoms using image information: both valve shape (contour) and ornamentation. The paper presents the goals of the project as well as rst results on shape modeling and contour extraction. Public data are available in order to create student projects beyond the ADIAC partnership. For further information see http://www.ualg.pt/adiac

Diatom Identication: a Double Challenge Called ADIAC Hans du Buf, University of Algarve, Faro, Portugal Micha Bayer and Stephen Droop, Royal Botanic Garden Edinburgh, U.K. Ritchie Head and Steve Juggins, University of Newcastle, U.K. Stefan Fischer and Horst Bunke, University of Berne, Switzerland Michael Wilkinson and Jos Roerdink, University of Groningen, The Netherlands José Pech-Pacheco and Gabriel Cristóbal, Instituto de Optica (CSIC), Madrid, Spain Hamid Shahbazkia and Adrian Ciobanu, University of Algarve, Faro, Portugal Abstract: This paper introduces the project ADIAC (Automatic Diatom Identication and Classication), which started in May 1998 and which is nanced by the European MAST (Marine Science and Technology) programme. The main goal is to develop algorithms for an automatic identication of diatoms using image information: both valve shape (contour) and ornamentation. The paper presents the goals of the project as well as rst results on shape modeling and contour extraction. Public data are available in order to create student projects beyond the ADIAC partnership. For further information see http://www.ualg.pt/adiac 1 Introduction ADIAC is the acronym of the project Automatic Diatom Identication and Classication. In phycological research the word identication refers to what in pattern recognition is meant by classication, whereas the phycological meaning of classication is the establishment of the class-forming rules. In order to avoid a confusion, we will apply the phycological interpretation. Since this is the rst scientic publication by most of the project's partners, and because our goal is to promote diatom identication as a new and challenging area in pattern recognition, we start with explaining the history of the project, diatom research and the goals. Sections 2 to 4 present rst results on image databases, shape modeling and contour extraction. 1.1 History: from hobby to profession The ADIAC project was born in May 1998 but its conception took place a few years before at the Musée National de l'Histoire Naturelle de Paris, in Simone Servant's oce to be precise, although she seemed not quite aware of the very fact. So what happened? Microscopy and a general interest in Nature's richness in morphology in various areas such as zoology, geology and botany being one of my hobbies, I (HdB) came to collect antique microscope slides, the production of which had become an art that bloomed in the 19th century throughout Europe. Occasionally strolling around Paris in search for slide collections at the few scientic instruments antique shops, I came into contact with Bernard Coupel of the Vaast bookshop, rue Jussieu, who happened to purchase part of the Tempère collection including microscopes and very special ornamental slides made by J. Tempère, one of France's greatest diatomists. Bernard Coupel showed me a few of these artistic preparations with many dierent diatoms arranged like owers and beautifully coloured because of the light diraction, slides that were not for sale of course given their uniqueness [1]. He advised me to see Simone Servant at the nearby museum. She told me about the diatom collections there, the research and the existence of a journal named Diatom Research. I had seen a computer in her oce and because of my background a very natural question that I asked her was about the application of computers to diatom recognition. I cannot remember her precise answer but I grasped that she did not really know what I was talking about, but then and there I had an idea that I could not put aside; an idea that has now materialised into a three-year, seven-partner and 1.25 million Euro project with a main funding from the DG XII's MAST programme. 1.2 What are diatoms? Diatoms are unicellular algae related to brown algae (Phaeophyta, e.g. seaweeds like Fucus and Laminaria ), yellow-green (Xanthophyta ) and golden-brown (Chrysophyta ) algae, but not at all related to red, green or blue-green algae. Almost all need sunlight to grow, and live almost anywhere where there is enough light and moisture: in the water column of the sea, lakes and rivers, in sediments underneath and at the edge of wa- in archaeological, geological and forensic research. 1.3 Diatom research Diatom-based research ranges from purely systematic (classication and evolutionary studies) to purely applied, where diatoms are used as an analytical tool in ecological, geological, climatological, geographical, archaeological or forensic research. There are no clear boundaries, but many of those who use diatoms as a tool have more interest in (and are more skilled in) the application for which they are using the diatoms than in the diatoms themselves. In addition, many of these applications require identication of a large range of diatom species. This combination of factors (identication of many species by non-specialists) has led to a need for quick and eective identication aids. Unfortunately, written oras are not adequate, since they are slow to produce and update as new information becomes available and as classications change. A computer-based system, however, would have many advantages: identication would no longer require the level of human expertise that it does at present (although it would still require human expertise to interpret the computers' results), and a computer-based system could be kept completely up-to-date with respect to the publication of new species and other changes in the classication, especially if the reference database were kept centrally and accessed via the Web. Figure 1: SEM (scanning electron microscope) image of Diploneis heemskerkiana (top) and the optical microscopy (DIC) equivalent (bottom). ter bodies, and also near the surface of damp soils. Estimates vary widely, but there may be as many as 200,000 species in the world [3], making them the second most diverse group of plants after the owering ones. They are ecologically very important and contribute around 20% of the world's carbon xation, which makes them more productive than all the world's rainforests. There are three aspects of diatom biology that make them important well beyond their intrinsic appeal (see Fig. 1) and their contribution to world ecology. First, they have a cell wall made of silica that is very resistant to decay  the cell walls can survive in lake- and seabed sediments for thousands and even millions of years after the cell itself has died. Second, the ornamentation of the cell wall is highly specic, and most diatoms can be identied to species level on the basis of the cell wall alone. Third, each diatom species tends to be able to survive and grow only in a relatively narrow range of ecological conditions. These three factors together mean that diatoms can be used for ecological monitoring, for reconstructing past environments, and 1.4 ADIAC goals The main goal is of course the development of a complete software system for a completely unsupervised diatom identication using only image information. This aim is going too far because there are thousands of different taxa and the identication rules are sometimes not very clear. Also, in some cases one valve view is not enough and additional information is required. Nevertheless, ADIAC is the rst European project devoted to diatom identication on the basis of both valve contour and ornamentation. Explicit goals are: (1) to develop image databases with dierent discrimination complexities, (2) to develop methods for an automatic slide scanning on microscopes with motorised stages, (3) to develop methods for obtaining a complete, graphical, diatom-valve description, (4) to develop an identication system using for example graph matching that can produce a sorted list with best matches, (5) to test all methods using the image databases and (6) to integrate the methods into taxonomic and ecological database systems. The second goal is illustrated by Fig. 2, which shows a low-magnication overview of a strewn slide with many diatoms and biological debris. Only the automi- tact us and to let us know what their applications are and what they would expect. ADIAC will organise several workshops to which interested researchers will be invited, and we hope to establish active collaborations beyond the ADIAC project partnership. Images, publications, references etc can be found in the ADIAC webpages at URL http://www.ualg.pt/adiac and mirrored at http://www.rbge.org.uk/adiac. These contain all addresses, telephone numbers and related webpages of all partners plus additional links. Contacts, for example for establishing student projects or participation in ADIAC workshops and meetings, are not limited to the Coordinator. Main contact: Hans du Buf (coordinator), University of Algarve, Vision Laboratory, Faculty of Exact Sciences and Humanities, Campus de Gambelas - UCEH, 8000 Faro, Portugal; Tel: +351 89 800900 ext 7761; Fax: +351 89 818560; Email: dubuf@ualg.pt 2 Image databases Figure 2: Low-magnication image of a strewn slide. sation of the scanning process, i.e. by marking only the positions of almost complete diatoms and a subsequent high-magnication image capture at each marked position using also autofocusing, would already result in a tremendous saving of labour. The saving of labour by the complete processing as proposed and studied by ADIAC would be much more, and most researchers involved with diatoms could spend much more time doing their own work in geology, climatology, etc. Therefore, the ultimate goal will be to install analysis and identication software together with a huge diatom database on one or more central servers, which would allow researchers to send images by email and to receive automatically a sorted list of best matches. In other words, ADIAC and subsequent eorts are expected to carry diatom research well into the 20th century. 1.5 An open pilot study During the three ADIAC years it is expected to realise state-of-the-art algorithms and huge image databases. However, it is the rst project in which diatom contour and ornamentation will be explored for an identication. Hence, it is expected that the project will need a continuation to further improve the analysis tools, but mainly to establish image databases that contain all diatom species relevant for certain applications/sites. It is therefore extremely important that ADIAC creates a database with potential users by contacting researchers who could prot from the project. Because this is a very time-consuming task, we also invite individual researchers, institutions and companies to con- A database of ca 1200 digital images has already been created, which is likely to grow to ca 10,000 by the end of the project. Images are produced using a number of techniques: most are captured directly from the microscope by one of two digital cameras; others are photographed using monochrome lm and the developed negatives are scanned using a slide-scanner. Irrespective of their size, diatoms are captured using the maximum magnication that will allow for the entire specimen to be photographed at a resolution of 10 pixels per micron or better. This resolution is more or less the minimum that can capture all the resolution of which an optical microscope is capable. Photographs of diatoms usually include extraneous material or illumination artefacts that detract from their quality. The use of digital media means that such imperfections can be removed relatively easily (depending on their severity), providing their source is understood. Preprocessing involves removal of imperfections from the two main sources: those that are part of the magnication, illumination and imaging systems (and nothing to do with the specimens themselves), and those that are intrinsic to the specimen preparations (such as other diatoms, girdle bands or other debris that interfere with the diatom to be photographed). Ultimately, the storage of images will be in conjunction with a taxonomic database which is being developed for specialised diatom use [2]. Until the structure of the database is completed, images are kept in more or less unstructured folders, and the accompanying information (identication, provenance, slide number, microscope stage coordinates, pixel size and shape) is kept in a separate index le. Please refer to the project web site for the full methodology for specimen preparation and image capture. Figure 3: Dark and bright contours in the grey-level image (left), the double contour after a local thresholding (middle) and the double contour eliminated (right). 3 Contour extraction and shape analysis In a brighteld-microscopic image, a diatom's silica frustule leaves a dark outline. The organic material, that still remains on the frustule after cleaning it, leaves a very light signature (halo) outside the dark contour because of diraction, see Fig. 3 (left). We use this information in order to develop a low-level and dataoriented contour extraction. The idea is to threshold the grey-level image, to label the connected elements and to follow the external outline to obtain the contour. As the illumination variance is not regular in these images, a local thresholding must be used. However, a drawback of a local thresholding is that the bright part creates a double contour in the binarised image (Fig. 3 middle). To overcome this, rst we nd the Otsu threshold between the histogram's maximum and the histogram's brightest element. Next we set the value of all pixels above this threshold to the value of the threshold, and only then the local thresholding is applied to the image (Fig. 3 right). After the binarisation the connected elements are labeled and only the most central and largest element is chosen as a rst contour candidate. The contour is obtained by following the external part of the chosen element in the clockwise sense, using a always turn left algorithm. If the contour is not closed then the algorithm tries to connect the rst element to another element in the neighbourhood. The result goes through a correction process that eliminates concave but thin deformations and that lls in the internal gaps. If necessary the nal contour is rotated to a horizontal position by using symmetry axes and moments before analysing the precise shape. Valve shapes are roughly divided between centric (circular, triangular and square, the latter two normally cusped with rounded vertices) and pennate (elliptical including specic (a)symmetries). A standard approach is to study the Fourier series of the closed contours for discriminating these forms. This approach has been studied before by modeling pennate Tabellaria valves with up to 20 Fourier descriptors [4]. Because we don't know a priori how well we can separate and describe dierent pennate shapes with a xed or variable number of Fourier descriptors, and because it may be better to apply later a best-tting Figure 4: Synthesised diatom contours; see text. ellipse approach but using other mathematical functions, we studied a few alternatives. As far as we know only fourth-order Cassinian curves and two half ellipses glued together go beyond normal ellipses. Assuming a normalisation in terms of rotation (horizontal shapes) and size, Cassinian curves can approach panduriform shapes with two parameters and glued ellipses can approach semilanceolate shapes also with two parameters. Coming back to the rst pass in the Fourier shape analysis, i.e. the construction of two discrete parametric and periodic signals X (t) and Y (t) following an ellipse in a horizontal position, these signals are pure sines with a phase oset. We noticed that (a)symmetric deformations from a pure ellipse, as found with many pennate diatom shapes, aect only Y (t) and the dierences with a pure ellipse can be described by adding symmetrically Gaussian functions or derivatives of Gaussians to Y (t). After some experiments we found that many shapes can be described with a small number of parameters. An example is the sigmoid lanceolate shape, which can be modeled by X (t) = a cos(t) and Y (t) = b sin(t) + cfexp(?t2 =d) ? exp(?(t ?  )2 =d). Figure 4 shows some synthesised forms. These are, with the number of parameters between parentheses: left column top-to-bottom: bilobate (8), clavate (5), crescentic (5) and sigmoid lanceolate (5); middle column: semilanceolate (5), without name (8) and without name (6); right column: auricular (10) and spatulate (5). In conclusion, it may be possible to apply a best-tting ellipse algorithm, followed by a best-tting other-shapewith-few-parameters in order to determine the best parameters to be used in the identication process. 4 Diatom isolation As can be seen in Fig. 2, in many cases diatoms are connected to debris or partially occluded. Hence, we need algorithms that can isolate individual diatoms and their contours. Contour extraction is done in two steps. In a preprocessing step initial contours are extracted using a conventional edge-following algorithm Figure 5: Initial contours of two overlapping diatoms. like Canny's. Then the object contours are extracted by using the best-tting ellipse and a subsequent contour following in the elliptical polar-transformed image [5,6]. Figure 5 shows an edge-detection result of an image that contains two diatoms and debris plus a scale bar. Only the initial contours containing more than 200 pixels are shown. Due to small gaps in the edge image the diatom in the centre of the image is described by two contours. The contour of the second diatom is only partly detected since it is not sharply focused and occluded. Beginning with the contour of maximum length, the best-tting ellipse is determined for each initial contour. A general conic can be represented by an implicit second order polynomial G(a; x) = a  x = Ax2 + By2 + Cxy + Dx + Ey + F = 0. G(a; xi ) is called the algebraic distance of a point (x; y) to the conic G(a; x) = 0. The tting of a general conic can be done by minimising nPNthe sum of2squared o algebraic distances arg mina i=1 G(a; xi ) of the curve to the N data points xi . In order to force the conic to be an ellipse, the constraint B 2 ? 4AC < 0 has to be fullled. The ellipse-tting problem can be solved by using a generalised eigensystem. The polynomial coecients A; : : : ; F are computed by determining the eigenvector corresponding to the smallest eigenvalue. For the computation of the polar-transformed image the parametric ellipse form x = xc + a cos and y = yc + b sin is relevant. Here (xc ; yc ) represents the centroid, a; b the radii and the rotation angle. These parameters can be calculated using the polynomial coecients. Results of ellipse tting for the diatoms in the example image are shown in Fig. 6. Using the parameters of the best-tting ellipse, the elliptical polar coordinate transform is computed. This polar transform is based on sampling counterclockwise the original gray level image along scan beams with length r. The transform is done in discrete rotation Figure 6: Examples of best-tting ellipses. steps h = 1o . The polar-transformed image coordinates (r; h) are computed using the image coordinates (x; y) according to x = (m=a)  r  cos h and y = (m=b)  r  sin h, where m is used to normalise the position of the contour. In practice m is xed to one quarter of the maximum distance from the center of the ellipse to the borders of the image. Furthermore, to compensate for the rotation of the ellipse, each point is rotated around the angle using xr = x cos ? y sin + xc and yr = x sin + y cos + yc . In the polar-transformed image the problem of contour extraction reduces to the extraction of a nearly straight line from the top to the bottom. We apply a depth-rst search algorithm which evaluates the gray level changes along the path. One result can be seen in Fig. 7. The back-transformed contours of the isolated diatoms are shown in Fig. 8. Both diatom contours have been accurately isolated. 5 Conclusions We have presented a brief introduction to diatom research and the ADIAC project, as well as rst results on contour extraction and shape analysis. Actually, there is much more progress, but six pages are not sucient to present all work done; please refer to the webpages. ADIAC being a rst pilot project to study a completely computerised diatom identication on the basis of im- Figure 7: Contour detection in a polar-transformed image around the centre of the best-tting ellipse. age information, it is very likely to be continued with the aims of further improving identication accuracy and establishing huge databases on one or more central servers. Because most information, including image databases and identication rules, even image analysis and identication software, will become publicly available, we hope that many MSc and PhD projects will be created, as has happened before in the area of ngerprint analysis. This will be a fruitful collaboration as well as competition, i.e. the best algorithms will survive. But an international competition is not a problem because it serves only one goal: the creation of the best identication system which facilitates research in all diatom applications. Finally, that an automatic diatom identication is a new challenge in pattern recognition will not be a surprise, given the complexity and diversity of the diatom shapes and ornamentations. But why is ADIAC a double challenge? As in many pattern-recognition problems, the sometimes very subtle dierences in shape and ornamentation between dierent taxa hamper a clear distinction, even for diatomists. In other words, the second challenge is to try to establish exact class-forming rules, which is a non-trivial task in phycology given the inexact way that biological classications work: i.e. to be able to make an identication even if some of the classicatory information is absent, misobserved or contradictory. Acknowledgement: ADIAC is funded by the European MAST (Marine Science and Technology) programme, contract MAS3-CT97-0122. Figure 8: Back-transformed contours of the isolated diatoms. References [1] Champreux, F. (1989) Les diatomées et la diatomite. Minéraux et Fossiles, Vol. 15 No. 169, pp. 7-15. [2] Droop, S.J.M., Sims, P.A., Mann, D.G. and Pankhurst, R.J. (1993) A taxonomic database and linked iconograph for diatoms. In: van Dam, H. (ed.) Proc. Twelfth Int. Diatom Symposium, Renesse, The Netherlands, 30 August - 5 September 1992. Hydrobiologia, Vol. 269/270, pp. 503-508. [3] Mann, D.G. and Droop, S.J.M. (1996) Biodiversity, biogeography and conservation of diatoms. In: Kristiansen, J. (ed.) Biogeography of Freshwater Algae. Hydrobiologia, Vol. 336, pp. 19-32. [4] Mou, D. and Stoermer, E.F. (1992) Separating Tabellaria (Bacillariophyceae) shape groups based on Fourier descriptors. J. of Phycology, Vol. 28 (3), pp. 386-395. [5] Niemann, H., Bunke, H., Hofmann, I., Sagerer, G. Wolf, F. and Feistel, H. (1985) A knowledge based system for analysis of gated blood pool studies. PAMI 7(3), pp. 246-259. [6] Puli, M., Fitzgibbon, A.W. and Fisher, R.B. (1996) Ellipse-specic direct least-squares tting. IEEE International Conference on Image Processing.