Registration For Correlative Microscopy Using Image Analogies
Registration For Correlative Microscopy Using Image Analogies
Registration For Correlative Microscopy Using Image Analogies
Tian Cao1 , Christopher Zach3 , Shannon Modla4 , Debbie Powell4 , Kirk Czymmek4 , and Marc Niethammer1,2
UNC Chapel Hill 2 BRIC Microsoft Research Cambridge 4 University of Delaware
1
Abstract. Correlative microscopy is a methodology combining the functionality of light microscopy with the high resolution of electron microscopy and other microscopy technologies for the same biological specimen. In this paper, we propose an image registration method for correlative microscopy, which is challenging due to the distinct appearance of biological structures when imaged with dierent modalities. Our method is based on image analogies and allows to transform images of a given modality into the appearance-space of another modality. Hence, the registration between two dierent types of microscopy images can be transformed to a mono-modality image registration. We use a sparse representation model to obtain image analogies. The method makes use of representative corresponding image training patches of two dierent imaging modalities to learn a dictionary capturing appearance relations. We test our approach on backscattered electron (BSE) Scanning Electron Microscopy (SEM)/confocal and Transmission Electron Microscopy (TEM)/confocal images and show improvements over direct registration using a mutual-information similarity measure to account for dierences in image appearance.
Introduction
Correlative microscopy integrates dierent microscopy technologies including conventional light-, confocal- and electron transmission microscopy [1] for the improved examination of biological specimens. E.g., uorescent markers can be used to highlight regions of interest combined with an electron-microscopy image to provide high-resolution structural information of the regions. To allow such joint analysis requires the registration of multi-modal microscopy images. This is a challenging problem due to (large) appearance dierences between the image modalities. Fig. 1 shows an example of correlative microscopy for a confocal/TEM image pair. A solution for registration for correlative microscopy is to perform landmarkbased alignment, which can be greatly simplied by adding ducial markers [2]. Fiducial markers cannot easily be added to some specimen, hence an alternative
image-based method is needed. This can be accomplished in some cases by appropriate image ltering. This ltering is designed to only preserve information which is indicative of the desired transformation, to suppress spurious image information, or to use knowledge about the image formation process to convert an image from one modality to another. E.g., multichannel microscopy images of cells can be registered by registering their cell segmentations [3]. However, such image-based approaches are highly application-specic and di cult to devise for the non-expert.
Fig. 1: Example of Correlative Microscopy. The goal is to align (b) to (c). In this paper we therefore propose a method inspired by early work on texture synthesis in computer graphics using image analogies [4]. Here, the objective is to transform the appearance of one image to the appearance of another image (for example transforming an expressionistic into an impressionistic painting). The transformation rule is learned based on example image pairs. For image registration this amounts to providing a set of (manually) aligned images of the two modalities to be registered from which an appearance transformation rule can be learned. A multi-modal registration problem can then be converted into a mono-modal one. The learned transformation rule is still highly applicationspecic, however it only requires manual alignment of sets of training images which can easily be accomplished by a non-expert in image registration. Arguably, transforming image appearance is not necessary if using an image similarity measure which is invariant to the observed appearance dierences. In medical imaging, mutual information (MI) [5] is the similarity measure of choice for multi-modal image registration. We show for two correlative microscopy example problems that MI registration is indeed benecial, but that registration results can be improved by combining MI with an image analogies approach. To obtain a method with better generalizability than standard image analogies [4] we devise an image-analogies method using ideas from sparse coding [6], where corresponding image-patches are represented by a learned basis (a dictionary). Dictionary elements capture correspondences between image patches from dierent modalities and therefore allow to transform one modality to another modality. This paper is organized as follows: Sec. 2 describes the image analogies method with sparse coding and our numerical solutions approach. Image reg-
istration results are shown and discussed in Sec. 3. The paper concludes with a summary of results and an outlook on future work in Sec. 4.
Image Analogies
The objective for image analogies [4] is to create an image B 0 from an image B with a similar relation in appearance as a training image set (A, A0 ). Fig. 2 shows an image analogies example. The standard image analogies algorithm [4] achieves the mapping between B and B 0 by looking up best-matching patches for each image location between A and B which then imply the patch appearance for B 0 from the corresponding patch A0 (A and A0 are assumed to be aligned). These best patches are smoothly combined to generate the overall output image B 0 . To avoid costly lookups and to obtain a more generalizable model with noise-reducing properties we propose a sparse coding image analogies approach.
A: training TEM A0 : training confocal B: input TEM B 0 : output confocal Fig. 2: Result of Image Analogy: Based on a training set (A, A0 ) an input image B can be transformed to B 0 which mimics A0 in appearance.
2.1
Sparse representation is a technique to reconstruct a signal as a linear combination of a few basis signals from a typically over-complete dictionary. A dictionary is a collection of basis signals. The number of dictionary elements in an over-complete dictionary exceeds the dimension of the signal space (here the dimension of an image patch). Suppose a dictionary D is pre-dened. To sparsely represent a signal x the following optimization problem is solved [7]: = arg min k k0 ,
s.t. k x
D k2 ,
(1)
where is a sparse vector that explains x as a linear combination of columns in dictionary D with error and k k0 indicates the number of non-zero elements in the vector . Solving (1) is an NP-hard problem. One possible solution of this problem is based on a relaxation that replaces k k0 by k k1 , where k k1 is the L1 norm of a vector, resulting in the optimization problem [7], = arg min k k1 ,
s.t. k x
D k2 .
(2)
k k1 + k x
D k2 , 2
(3)
which is a convex optimization problem that can be solved e ciently [6, 8]. We adapt this formulation for our sparse coding image analogy method and learn the dictionary D directly from aligned sets of training images. 2.2 Image Analogies with Sparse Representation Model
For image registration of correlative microscopy images, given two training images A and A0 from dierent modalities, we can transform image B to the other modality by synthesizing B 0 . Consider the sparse, dictionary-based image denoising/reconstruction, u, given by minimizing N Z 1 1 X1 2 2 E(u, {i }) = (Lu f ) dx + k Ri u Di kV + k i k1 , (4) 2 N i=1 2
where f is the given (potentially noisy) image, D is the dictionary, {i } are the patch coe cients, Ri selects the i-th patch from the image reconstruction u, , > 0 are balancing constants, L is a linear operator (e.g., describing a convolution), and the norm is dened as k x k2 = xT V x, where V > 0 is positive denite. v Unlike most work in sparse coding, we are not computing alphas independently per patch rst, and then average the result [7]. Instead we jointly optimize for the coe cients and the reconstructed/denoised image. Formulation (4) can be extended to image analogies by minimizing Z 1 (1) (1) 1 (L u f (1) )2 + (L(2) u(2) f (2) )2 dx E(u(1) , u(2) ,{i }) = 2 2 X (1) (1) (5) N 1 1 u D 2 + k Ri i kV + k i k1 , u(2) D(2) N 2
i=1
where we have a set of two images {f (1) , f (2) }, their reconstructions {u(1) , u(2) } and corresponding dictionaries {D(1) , D(2) }. Note that there is only one set of coe cients i per patch, which indirectly relates the two reconstructions. This is similar to estimating a super-resolution image from a low-resolution one [7]. Patch-based (non-sparse) denoising has also been proposed for the denoising of uorescence microscopy images [9]. A conceptually similar approach using sparse coding and image patch transfer has been proposed to relate dierent magnetic resonance images in [10]. However, this approach does not address dictionary learning or spatial consistency considered in the sparse coding stage. Our approach addresses both and learns the dictionaries D(1) and D(2) jointly. 2.3 Sparse Coding
Assuming that the two dictionaries {D(1) , D(2) } are given, the objective is to minimize (5). However, unlike for image denoising, when computing image analogies only one of the images, f (1) , is given and we are seeking a reconstruction of
both, a denoised version of u(1) and f (1) as well as the corresponding analogous denoised image u(2) (without the knowledge of f (2) ). Hence, for sparse coding (5) simplies to E(u(1) , u(2) , {i }) = 1 (1) (1) (L u f (1) )2 dx 2 (1) (1) (6) N 1 X1 u D + ( k Ri i k2 + k i k1 ), V u(2) D(2) N 2
i=1
which is a denoising of f (1) inducing a denoised reconstruction of the sought for image u(2) . The problem is convex (for given D(i) ) which allows to compute a globally optimal solution. Sec. 2.6 describes our numerical solution approach. 2.4 Dictionary Learning
(1) (2)
Given sets of training patches {pi , pi } We want to estimate the dictionaries themselves as well as the coe cients {i } for the sparse coding. The problem is non-convex (bilinear in D and i ). The standard solution approach [7] is alternating minimization, i.e., solving for i keeping {D(1) , D(2) } xed and vice versa. Two cases need to be distinguished: (i) L locally invertible and (ii) L not locally-invertible (e.g., due to convolution). We only consider local dictionary learning here with L and V set to identities1 . We assume that the training patches {p(1) , p(2) } = {f (1) , f (2) } are unrelated, non-overlapping patches. Then the dictionary learning problem decouples from the image reconstruction and requires minimization of Ed (D, {i }) = =
N X1 i=1
k k fi
fi (2) fi
(1)
D(1) D(2)
i k2 + k i k1 (7)
N X1 i=1
Di k + k i k1 .
The image analogy dictionary learning problem is identical to the one for image denoising. The only dierence is a change in dimension for the dictionary and the patches (which are stacked up for the corresponding image sets). 2.5 Numerical Solution
Sparse Coding We use the simultaneous-direction method of multipliers (SDMM) [8, 11] which allows us to simplify the optimization problem, by breaking it into
1
Our approach can also be applied to L which are locally not invertible. However, this complicates the dictionary learning.
E=
k v (1)
vi C B wi CC (p) B wi C , or fi @ (2) A @ (2) AA (2) A vi wi wi (s) :f (q) z :=fi (qi ) ! }| ! { X N (1) (1) z }| { z }| { 1 1 vi wi + k k2 + k q i k1 + k q k2 , V 2 (2) (2) N i=1 2 2 vi wi 8 (1) 8 (1) >v = L(1) u(1) >w = D(1) > > > (2) > (2) <v = L(2) u(2) <w = D(2) s.t. and , (1) >vi = Ri u(1) > qi = W i i > > > (2) > : : q = W vi = Ri u(2) (8) where we introduced separate copies of the transformed image reconstructions u(1) and u(2) as well as of the patch coe cients and denotes the stacked up coe cients of all patches (which allows imposing spatial coherence onto the i through W if desired). Following [11] we can use SDMM to solve (8). For the dictionary-based sparse coding we have three sets of transformed variables, u(1) , u(2) and the copies. The images may even be of dierent dimensionalities (for example when dealing with a color and a gray-scale image). In our implementation of SDMM, we use L(1) = L(2) = I and Wi = W = I. Dictionary learning We use a dictionary based approach and hence need to be able to learn a suitable dictionary from the data. We use alternating optimiza(1) (2) tion. Assuming that the coe cients i and the measured patches {pi , pi } are given, we compute the current best least-squares solution for the dictionary as
(p) BB :=fi @@
}|
f (1)
00
{ z }| { 2 2 (2) (2) 2 k2 + kv f k2 2
(1) 10 (1) 11
:=fD (v (2) )
(2)
(1)
D=(
The optimization with respect to the i terms follows (for each patch independently) the SDMM algorithm. Since the local dictionary learning approach assumes that patches to learn the dictionary from are given, the only terms re(s) (p) maining from Eq. (8) are, fi and fi . Hence the problem completely decouples with respect to the coe cients i and we obtain ! ! N (1) 1 X (p) wi (s) E= f + fi (qi ) , (2) N i=1 i wi (10) s.t. wi
(1)
N X i=1
T pi i )(
N X i=1
T i i )
(9)
= D(1) i , wi
(2)
= D(2) i , qi = i .
Sparse coding Sparse coding follows the same numerical solution approaches for dictionary learning. However, since the dictionaries are known at the sparse
coding stage, no alternating optimization is necessary and we can simply solve for u(1) and u(2) using SDMM. The dierence is that for sparse coding for image (2) analogies the measurement of the second image f (2) is unknown. Hence, fD (v (2) ) is absent from the optimization and the reconstructed u(2) is the prediction.
Results
We (i) reconstruct the missing analogous image and (ii) consistently denoise the image to be registered with. We consider a ne registration in our experiments, but the method is applicable to other transformation models. The key is that training image pairs represent expected appearance variations well. 3.1 Data
We use four pairs of 2D correlative SEM/confocal images containing 100 nm gold ducials. The confocal image is the same in the four datasets and the SEM images are from the same area as the confocal image but for dierent views and magnications. We also have six pairs of TEM/confocal images of mouse brains with resolutions 582.24 pixels per m and 7.588 pixels per m respectively. 3.2 Registration of SEM/confocal images (with ducials)
Pre-processing The confocal image is denoised by the sparse representationbased denoising method [7]. We use a landmark based registration on the ducials to get the gold standard alignment result. Image Analogies (IA) Results We applied the standard image analogies method and our method. We trained the dictionaries using a leave-one-out method. In both image analogy methods we use 10 10 patches, and in our proposed method we randomly sample 20000 patches and learn 800 dictionary elements in the dictionary learning phase. We choose = 0.2 and = 1 in (6). In Fig. 3, both IA methods can reconstruct the confocal image very well but our proposed method preserves more structure than the standard IA method. Image Registration Results We resampled the estimated confocal images with up to 600 nm(15 pixels) in translation in the x and y directions (at steps of 1 pixel) and 15 in rotation (at steps of 1 degree) with respect to the gold standard alignment. Then we registered the resampled estimated confocal images to the corresponding original confocal images. Tab. 1 summarizes the registration results over all these experiments. Our method outperforms the standard image analogy method as well as a direct use of mutual information on the original images in terms of registration accuracy. Both image analogy methods achieve subpixel accuracy. 3.3 Registration of TEM/confocal images (without ducials)
Pre-processing We extracted the corresponding region of the confocal image and resample both confocal and TEM images to an intermediate resolution. The
(c) Standard IA
(d) Proposed IA
Fig. 3: Results of estimating a confocal (b) from an SEM image (a) using the standard image analogy (c) and our proposed sparse image analogy method (d). Table 1: Registration errors on translation and rotation( translation tx and ty q are in nm, pixel size is 40nm; rotation r is in degree; RM S = t2 + t2 ) x y case 1 r stdr tx ty RMS stdRM S Our method 0.171 0.191 14.687 28.451 33.5482 6.4561 Standard IA 0.134 0.252 15.26 27.677 32.6751 8.4876 Original SEM/confocal 0.401 0.157 30.584 85.708 94.2085 8.0601 Our method 0.165 0.258 15.537 26.462 30.6862 6.5831 Standard IA 0.268 0.212 14.756 28.238 32.0217 6.8241 Original SEM/confocal 0.557 0.530 56.392 70.312 90.5242 6.2284 Our method 0.246 0.537 19.924 80.512 83.7206 7.1757 Standard IA 0.368 0.511 20.548 79.821 84.7861 6.8433 Original SEM/confocal 0.368 0.372 33.452 109.054 114.469378 9.3514 Our method 0.226 0.583 17.069 19.024 26.3190 6.3156 Standard IA 0.232 0.640 13.954 25.35 29.9319 6.2327 Original SEM/confocal 1.27 0.776 46.278 58.724 75.3439 5.4435
nal resolution is 14.52 pixels per m, and the image size is about 200 200 pixels. The datasets are already roughly registered based on manually labeled landmarks with a similarity transformation model. Image Analogies Results We tested the standard image analogy method and our proposed sparse method. For both image analogy methods we use 15 15 patches, and for our method we randomly sample 20000 patches and learn 900 dictionary elements in the dictionary learning phase. We choose = 0.01 and = 1 in (6). The image analogies results in Fig. 4 show that our proposed method preserves more local structure than the standard image analogy method. Image Registration Results We manually determined 10 15 corresponding landmark pairs on each dataset to establish a gold standard for registration. The same type and magnitude of shifts and rotations as for the SEM experiment are applied. The image registration results based on both image analogies methods are compared to the landmark based image registration results using mean absolute errors (MAE) and standard deviations (STD) of the absolute errors on all
(c) Standard IA
(d) Proposed IA
Fig. 4: Result of estimating the confocal image (b) from the TEM image (a) for the standard image analogy method (c) and the proposed sparse image analogy method (d) which shows better preservation of structure. the corresponding landmarks. We use both SSD and mutual information (MI) as similarity measure. The registration results are displayed in Tab. 2. The landmark based image registration result is the best result achievable given the a ne transformation model. We show the results for both image analogy methods as well as using the original TEM/confocal image pairs2 . Tab. 2 shows that the MI based image registration results are similar among the three methods and also close to the landmark based registration results (best registration results). For SSD based image registration, our proposed method is more robust than the other two methods for the current datasets, for example, using the standard image analogies method results in large MAE values in case 3 and case 4 while using the original TEM/confocal images for registration results in large MAE values in case 2 and case 6. While our method does not currently give the best results for all the cases available to us, it appears to be the most consistent with results close to the best among all the methods investigated for all cases. Table 2: Image Registration Results (in m, pixel size is 0.069 m) Our method Standard IA Original TEM/Confocal Landmark case 1 2 3 4 5 6
2
MAE STD MAE STD MAE SSD 0.3174 0.2698 0.3119 0.2622 0.3353 MI 0.3146 0.2657 0.3036 0.2601 0.5161 SSD 0.3912 0.1642 0.3767 0.2160 2.5420 MI 0.4473 0.1869 0.4747 0.3567 0.4140 SSD 0.4381 0.2291 1.8940 1.0447 0.4063 MI 0.3864 0.2649 0.4761 0.2008 0.4078 SSD 0.4451 0.2194 0.4416 0.2215 0.4671 MI 0.4554 0.2298 0.4250 0.2408 0.4740 SSD 0.3271 0.2505 1.2724 0.6734 0.7204 MI 0.3843 0.2346 0.4175 0.2429 0.4030 SSD 0.7832 0.5575 0.7169 0.4975 2.2080 MI 0.7259 0.4809 1.2772 0.4285 0.7183
STD 0.2519 0.2270 1.6877 0.1780 0.2318 0.2608 0.2484 0.2374 0.3899 0.2519 1.4228 0.4430
MAE STD 0.2705 0.1835 0.3091 0.1594 0.3636 0.1746 0.3823 0.2049 0.2898 0.2008 0.3643 0.1435
We inverted the grayscale values of original TEM image for SSD based image registration of original TEM/confocal images.
10
Conclusion
We developed a multi-modal registration method for correlative microscopy. The method is based on image analogies with a sparse representation model. It estimates the transformation from one modality to another based on training datasets of two dierent modalities. Our image registration results suggest that the sparse image analogy method can improve registration accuracy. Our future work includes additional validation on a larger number of datasets from dierent modalities. Our goal is also to estimate the local quality of the image analogy result. This quality estimate could then be used to weight the registration similarity metrics to focus on regions of high condence. We will also apply our sparse image analogy method to 3D images, which is straightforward.
Acknowledgments. This research is supported by NSF EECS-1148870, NSF EECS0925875, NIH NIHM 5R01MH091645-02 and NIH NIBIB 5P41EB002025-28.
References
1. Caplan, J. and Niethammer, M. and Taylor II, R.M. and Czymmek, K.J.: The power of correlative microscopy: multi-modal, multi-scale, multi-dimensional. Current Opinion in Structural Biology (2011) 2. Fronczek, D. and Quammen, C. and Wang, H. and Kisker, C. and Superne, R. and Taylor, R. and Erie, DA and Tessmer, I.: High accuracy FIONA-AFM hybrid imaging. Ultramicroscopy. Elsevier (2011) 3. Yang, S. and Kohler, D. and Teller, K. and Cremer, T. and Le Baccon, P. and Heard, E. and Eils, R. and Rohr, K.: Nonrigid registration of 3-D multichannel microscopy images of cell nuclei. Image Processing, IEEE Transactions on. 17(4), 493499 (2008) 4. Hertzmann, A. and Jacobs, C.E. and Oliver, N. and Curless, B. and Salesin, D.H.:Image analogies. In: the 28th annual conference on Computer graphics and interactive techniques, pp. 327340. ACM (2001) 5. Wells III, W.M. and Viola, P. and Atsumi, H. and Nakajima, S. and Kikinis, R.: Multi-modal volume registration by maximization of mutual information. Medical Image Analysis. 1(1), 3551 (1996) 6. Bruckstein, A.M. and Donoho, D.L. and Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM review. 51(1), 3481 (2009) 7. Elad, M.: Sparse and redundant representations: from theory to applications in signal and image processing. Springer Verlag (2010) 8. Boyd, S. and Parikh, N. and Chu, E. and Peleato, B. and Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Machine Learning. 3(1), 1123 (2010) 9. Boulanger, J. and Kervrann, C. and Bouthemy, P. and Elbau, P. and Sibarita, J.B. and Salamero, J.: Patch-based nonlocal functional for denoising uorescence microscopy image sequences. Medical Imaging, IEEE Transactions on. 29(2), 442 454 (2010) 10. Roy, S. and Carass, A. and Prince, J.: A Compressed Sensing Approach for MR Tissue Contrast Synthesis. In: Information Processing in Medical Imaging. 371383 (2011) 11. Combettes, P.L. and Pesquet, J.C.: Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering. pp. 185 212. Springer (2011)