Sift

1
Automatic Registration of Images with Inconsistent

Content Through Line-Support Region
Segmentation and Geometrical Outlier Removal
Ming Zhao, Yongpeng Wu, Senior Member IEEE, Shengda Pan, Fan Zhou, Bowen An, André Kaup, Fellow, IEEE
Abstract—The implementation of automatic image registra- • Images to be registered are usually acquired by differ-
tion is still difficult in various applications. In this paper, ent sensors or from different viewpoint, which causes
arXiv:2204.00832v1 [eess.IV] 2 Apr 2022
an automatic image registration approach through line-support geometrical deformations, such as translation, rotation,
region segmentation and geometrical outlier removal (ALRS-
GOR) is proposed. This new approach is designed to address the scaling, and sheared. The scenes exited in the reference
problems associated with the registration of images with affine images do not always stay in the corresponding sensed
deformations and inconsistent content, such as remote sensing images.
images with different spectral content or noise interference, • Spectral content difference and illumination
or map images with inconsistent annotations. To begin with, changes usually exist in multispectral/multisensor
line-support regions, namely a straight region whose points
share roughly the same image gradient angle, are extracted to images/multitemporal images. The inconsistent spectral
address the issues of inconsistent content existing in images. content increases the difficulty of corresponding feature
To alleviate the incompleteness of line segments, an iterative matching in automatic registrations.
strategy with multi-resolution is employed to preserve global • The particular interferences cause the scene content to
structures that are masked at full resolution by image details or be inconsistent between images to be registered. For
noise. Then, Geometrical Outlier Removal (GOR) is developed to
provide reliable feature point matching, which is based on affine- example, the speckle noises inevitably presented in SAR
invariant geometrical classifications for corresponding matches images make the feature extraction and identification
initialized by SIFT. The candidate outliers are selected by difficult. For a better visualization, the interest icons
comparing the disparity of accumulated classifications among all and texts of street names existing in map images don’t
matches, instead of conventional methods which only rely on local always keep the same transformations with the whole
geometrical relations. Various image sets have been considered in
this paper for the evaluation of the proposed approach, including map images [3], [4].
aerial images with simulated affine deformations, remote sensing Numerous previous works have been proposed for the high
optical and synthetic aperture radar images taken at different desired automatic image registration. These methods can be
situations (multispectral, multisensor, and multitemporal), and generalized into two major categories: intensity-based and
map images with inconsistent annotations. Experimental results
demonstrate the superior performance of the proposed method feature-based [2], [5]. The intensity-based methods compare
over the existing approaches for the whole data set. the similarity between pixel intensities to determine the im-
age alignments. The widely used similarity measures include
Index terms— Linear features, scale invariant feature trans- normalized cross-correlation coefficient [6], mutual informa-
formation, feature point matching, image segmentation, auto- tion [7], [8], and maximum likelihood [9]. However, the
matic image registration. computational complexities of these intensity-based methods
are expensive. Moreover, the performance of these methods
I. I NTRODUCTION declines significantly when applied to images with significant
geometrical deformations, images acquired by the sensors with
Image registration is a vital yet challenging task, which aims different modalities, or images taken in different illumination
at aligning two or more images with overlapping scenes cap- conditions. Feature-based methods attempt to extract salient
tured at different times, by different sensors, or from different features from the images to be registered, and establish cor-
viewpoints. It has been widely applied in many fields, such responding matches between these features. Salient points,
as computer vision, remote sensing, medical image analysis, lines, curves, edges, line intersections, and regions around
pattern matching, but far from being commonly automatized each feature are the most commonly used image features [10].
[1], [2]. The feature-based methods are capable of handling significant
Automatic image registration is still challenging due to the geometry inconsistency betweens scenes. Moreover, they have
presence of particular difficulties as follows: low implementation complexity for limited numbers of pixels
associated with extracted features. Although feature-based
This work was supported in part by National Natural Science Foundation methods are effective to most of homologous image regis-
of China under Grant 61302132, 41701523, 61504078, the Shanghai Educa- tration, they have limited performance when directly applied
tional Development Foundation under Grant 13CG51, the Scientific Research
Foundationx of Guangxi Education Department under Grant YB2014207. to register the images with illumination changes, difference of
M. Zhao, S. Pan, F. Zhou, and B. An are with the Department of Information spectral contents, or inconsistent objects.
Engineering, Shanghai Maritime University, Shanghai, 201306, China (e-mail: As a widely used local feature descriptor, Scale Invariant
mingzhao@shmtu.edu.cn).
Y. Wu is with the Department of Electronic Engineering, Shang- Feature Transform (SIFT) has been proved to be a powerful
hai Jiao Tong University, Shanghai, 200240, China (e-mail: yong- feature point matching approach for images with geometrical
peng.wu@sjtu.edu.cn). deformations, noise, and illumination change in a certain
A. Kaup is with the Chair of Multimedia Communications and Signal
Processing, Friedrich-Alexander University Erlangen-Nürnberg, Cauerstr. 7, extent [11]. Various adapted versions of SIFT have been
91058 Erlangen, Germany (e-mail: andre.kaup@fau.de). proposed to improve the performance of SIFT, such as Prin-
2
cipal component analysis (PCA)-SIFT [12], Bilateral filter problem in terms of correspondence matrix between initial
SIFT (BF-SIFT) [13], and Adaptive binning (AB-SIFT) [14]. corresponding feature points with parametric or nonparametric
Nevertheless, SIFT-like methods cannot easily produce mean- geometrical constraint. Examples of this strategy include the
ingful matching results when directly applied to significantly classical RANSAC algorithm that typically rely on parametric
different spectral contents. Moreover, SIFT feature points are models, and Vector Field Consensus (VFC) [26], robust point
easily concentrated on the scene with salient texture details, matching via L2E [27], Locally Linear Transforming (LLT)
such as the interest icons and texts of street names in map that rely on nonparametric models [28]. Ma et al. [28] devel-
images [3]. However, these features are usually inconsistent oped a local geometrical constraint to preserve local structures
between the images with different views. To preserve salient among neighboring SIFT points. The basic idea in [28] is to
and consistent features, the segmentation stage is a good formulate the feature point matching problem as a maximum-
alternative to exclude the effects of illumination changes and likelihood estimation of a Bayesian model with hidden/latent
inconsistent content for feature point extractions and matching variables to indicate whether matches in the putative sets are
[15]. inliers or outliers. It is worth mentioning that the transforma-
Image segmentation partitions an image into regions ac- tion between non-rigid images is modeled in a reproducing
cording to given criteria, and transforms the image to a kernel Hilbert space (RKHS), and a sparse approximation is
binary image to distinguish objects and background. Various applied to the transformation that reduces the method com-
image segmentation methods have been proposed, such as fea- putation complexity to linearithmic. Another group of graph-
ture space clustering, region-based approaches, edge detection based methods for feature point matching tries to explore
approaches, histogram thresholding [16–20]. However, they the similarity of graph structures between feature points to
have been scarcely adopted into image registration except the reject outliers. Aguilar et al. [29] proposed a point matching
followings. Goshtasby et al. [21] proposed a region refinement method named Graph Transformation Matching (GTM) based
to obtain similar corresponding close-boundary regions by on finding consensus K Nearest Neighbor (KNN) graphs. This
iterative thresholding segmentations. The correspondences are method iteratively eliminates dubious matches by selecting the
determined between the centers of gravity of close-boundary maximal disparities of edges connecting with KNN points.
regions according to the clustering technique [22]. Knowing Izadi et al. [30] proposed Weighted Graph Transformation
that the performance of clustering-based matching highly de- Matching (WGTM) algorithm, which not only adopts KNN as
pends on the corresponding samples of regions, the method in the geometrical relation, but also utilizes the angular distances
[21] is only appropriate for simple image contents. Troglio et between edges that connect a feature point to its KNN as the
al. [23] proposed a region-based approach to extract ellipsoidal weight. Besides of KNN, graph structures such as bilateral
features for planetary image registration purposes. The water- KNN [31], Delaunay triangulation [32], and triangle area [33]
shed segmentation algorithm is adopted to identify the struc- are also explored. Candidate outliers are distinguished from
tures of rocks and craters according to the intensity gradients. initial matches by comparing their corresponding graph struc-
The optimal transformation matrix for registration is obtained tures. The proposed graph based methods mentioned above are
by a genetic algorithm. However, this segmentation method invariant with respect to translation, scales, and rotations, but
is only adequate for simple ellipsoidal objects, such as rocks variant to shear deformations. As a result, they have limited
and craters entirely contained in the images. Gonçalves et al. performance for images with shear deformations. Moreover,
[24] developed an automatic image registration method called only local geometrical relations for each of candidates are
HAIRIS through histogram-based image segmentation. This considered in these methods. Therefore, they may easily fail in
method utilizes a relaxation parameter on the histogram mode obtaining reliable matches when outliers have the same local
delineation for segmentation. The extracted objects at the structures. Also, the inliers with outlier existing in their local
segmentation stage are characterized by four attributes, which structures are likely to be mistakenly removed.
allow for their adequate morphological description. Then, In this paper, we propose an automatic image registration
the transformation parameters are determined by restricting through line-support region segmentation and geometrical out-
possible values on a statistical basis. Although leading to a lier removal (ALRS-GOR). The main contributions of this
subpixel accuracy, HAIRIS only applies for the registration paper are as follows:
of image pairs with geometrical differences in rotations and 1) The line-support region, namely a straight region whose
translations. Morago et al. [25]presented a contextual frame- points share roughly the same image gradient angle, is
work using an ensemble feature. The surrounding regions of first explored to segment images to be registered, which
keypoints are described in terms of salient structural features can alleviate the challenges of inconsistent contents in
and the rich texture information by line segment extraction image registration.
and histograms of gradients (HOG) respectively. Maximally 2) Geometrical Outlier Removal (GOR) is developed to
stable extremal regions (MSER) are adopted to determine the eliminate outliers and preserve inliers based on the
neighborhood sizes. The iterative and global refinement stages affine-invariant geometrical classifications for candidate
using corner, edge, and gradient information across the entire matches. The directed edges connected by any two fea-
image planes are implemented after combining several local ture points are utilized to classify all of initial feature
keypoint and regional template matching techniques. If only points according to their locations. The candidate outliers
a small percentage of the identified ensemble features are are selected by comparing the disparity of accumulated
actually inliers, it is very unlikely to find a correct image classifications for each matched pair.
alignment. 3) To deal with the incompleteness of detected line seg-
Regarding feature correspondence techniques to support the ments, an iterative strategy with multi-resolution is em-
initial SIFT matches, various approaches have tried to explore ployed to preserve global structures that masked at full
the geometrical relations between feature points for solving the resolution by image details or noise.
feature point matching problem. A popular group of methods ALRS-GOR allows for the registration of image pairs
based on geometric transformation models is to formulate this with affine deformations and inconsistent content, such as
3
multispectral remote sensing images and map images with masked at the full resolution by image details or noise. It
inconsistent annotations. The experimental results demonstrate can provide better initial conditions for feature extraction and
the superior performance of the proposed method over the matching, and also filter out the slightly different details in
existing design for the whole dataset. the corresponding images. This is because line-support regions
The remainder of this paper is organized as follows: Section are extracted relying on the similarity of gradients, which are
II interprets the motivation of each process of the proposed easily affected by noise. The noise in the full images can be
approach. Section III describes the proposed approach in decreased at coarser scales. Also, the coarser images lose some
detail. Section IV presents the performance evaluation of details with the decimation. The multi-resolution strategy has
the proposed algorithm and illustrates experimental results been explored in current state-of-the-art registration methods
with representative applications in image registration. Finally, to decompose an image into fine and coarse details based on
Section V presents the concluding remarks. scale. For instance, wavelet and shearlet transforms are multi-
resolution, so that images can be decomposed into subimages
II. MOTIVATION with features of progressively finer scales. Different levels
In this section, we interpret the motivation of the proposed of wavelets and shearlets in multi-resolution pyramids are
registration approach. First, we state the reason of utilizing used both for invariant feature extraction and for representing
line-support regions to segment image before feature point images at multiple spatial resolutions to accelerate registration
extraction. The benefit of the iterative strategy to re-segment and increases the robustness of the algorithms [35–38]. In
and re-match features with multi-resolution is explained. Then, this paper, we propose the iterative strategy to ensure the
we describe the advantage of the proposed geometrical outlier accuracy of registration by down-sampling the image to a
removal for feature point matching. coarser resolution, and then re-segmenting and re-matching
until an expected accuracy is achieved. The implementation
of the iterative strategy is described in detail in Section III-C.
A. Motivation of Utilizing Iterative Strategy of Line-support Fig. 1 displays an example of extracting line segments and
Regions as Segmentation line-support regions from an aerial image pair of a circular
Image segmentation utilized as a previous step for image road at different resolutions. At the first iteration, the original
registration allows to simplify the image representations, and image pair of 700×700 pixels are downsampled into 350×350
significantly reduce the inconsistent appearance of the same pixels. The x-axis and y-axis are each reduced to 50% of the
scene in the images to be registered. In contrast to other original size. As demonstrated in Fig. 1 (i)-(j) with 175×175
features, linear segments offer important information about pixels, the original images are downsampled twice by scaling
geometrical contents in images. Also, elaborated shapes in into 25%×25% of the original size. It can be observed that the
scenes can be easily analyzed and detected through the basic incompleteness of line segment extraction can be alleviated by
line segments. dowsampling the original images. Most of isolated fragments
To the best of our knowledge, line segment detector (LSD) disappear when the images are downsampled into a coarser
has been mainly applied to extract linear features [34]. It resolution. Besides that, many overlapped or broken segments
aims to detect line segments from images on the consensus extracted in the original images are merged into complete
that most shapes accept an economic description by straight segments in the dowsampled images.
lines. The line segments are extracted from the line-support
regions, whose points roughly share the same image gradient B. Motivation of Utilizing SIFT with Geometrical Outlier
angles. Despite of well describing linear features for images, Removal as Feature Point Matching
line segments are not sufficient to support SIFT extraction and The geometrical distortions always exist in the images
matching. This is because it is difficult to extract SIFT feature acquired by different sensors, or from different viewing angles.
points from line segments with limited texture details. In this Compared by Mikolajzyk [39], SIFT is one of powerful
paper, we adopt the line-support region as the segmentation intensity-based descriptors for local interest regions, which
stage for image registration for the following reasons: is invariant to image scaling and rotations, and also partially
1) Line-support regions are extracted relying on the similar- invariant to intensity changes and shear deformations.
ity of gradients rather than pixel intensities. Regardless However, even after the identification of matching can-
of different pixel intensities, the corresponding regions didates by SIFT matching, many false SIFT matches arise
with inconsistent spectral content have similar gradi- between the similar local features of different scenes. It leads
ents. Therefore, the corresponding line-support regions to a further incorrect geometrical correction [40]. Therefore,
from multispectral/multisensor images or multitemporal a reliable outlier removal is needed to ensure the accuracy
images with illumination changes are capable of being of feature point matching. In this paper, a geometrical outlier
identified and matched. removal is proposed based on the consensus that the feature
2) The image details without linear features are discarded points clustering in the same side of a line remain in the same
during line segment detections. As proved in [34] and [3], side after the geometrical transformation. The directed edges
the subsequent feature matching would not be affected hpi 7→ pj i are explored to connect any two feature points pi
by the unexpected details, such as speckle noise in SAR and pj . The rest of feature points are classified according to
images or inconsistent annotations in map images. their relative locations with these directed edges, i.e., lying on
3) The implementation of line-support region extraction is the left side of hpi 7→ pj i, on the right side of hpi 7→ pj i, or
efficient and fully automatic without human assistance. exactly on the directed edge. Then, the unreliable candidate
However, the line segments detected by LSD are incom- matches are excluded iteratively relying on the disparity of
plete, in the sense that line features with details or noise corresponding classifications.
are easily detected as overlapped segments or few broken Fig. 2 provides a demonstration of geometrical outlier
segments. It has been proved in [34] that analyzing at a coarser removal with rotated point sets. If the feature points are all cor-
resolution by LSD helps to detect global structures, which are rectly matched, the classifications by any directed edges should
4
are more different than those directed edges established by

inliers, we propose an iterative outlier removal method GOR
in this paper. The disparities of classifications are accumulated
for all directed edges associated with each corresponding
feature points. The matches with the maximum disparity are
selected as candidate outliers and removed at each iteration.
(a) (b)
The implementation of feature point matching with GOR is
(c)
described in details in Section III-B.
III. METHODOLOGY
A. Line-Support Region Segmentation
Given an input image, a line-support region is defined
as a straight region whose points roughly share the same
image gradient angle. The line-support regions are extracted
(d) (e) (f) by grouping connected pixels that share the same gradient
angle up to a certain tolerance. Each region R starts with one
randomly selected pixel pi . The region angle θR orthogonal
to the gradient angle is defined as the level-line angle [34]:
P !
pi ∈R sin (θpi )
θR = arctan P . (1)
pi ∈R cos (θpi )
(g) (h) (i) Then, the adjacent pixels pi whose level-line orientations
are equal to the region angle up to a certain precision τ are
added to the region R:
( !
sin θpi
R←R ∪ pi arctan
cos θpi
P ! )
(j) (k) (l) pi ∈R sin (θpi )
− arctan P < τ . (2)
pi ∈R cos (θpi )
Fig. 1: Demonstration of iterative line segment detection and line-support
regions. (a) and (d) Line segments from the reference and sensed image of
700×700 pixels. (b) and (e) Line segments from the reference and sensed
image of 350×350 pixels. (c) and (f) Line segments from the reference and The process is repeated until no new pixels can be added.
sensed image of 175×175 pixels. (g) and (j) Line-support regions of the The value of τ is the angle tolerance used in the search for
reference and sensed image of 700×700 pixels. (h) and (k) Line-support line-support regions. A small value of τ leads to an over-
regions of the reference and sensed image of 350×350 pixels. (i) and (l)
Line-support regions of the reference and sensed image of 175×175 pixels. partition of line segments, while a large one results in large
regions. As suggested by Burns et al. [34], the value of τ
in this paper is set to be 22.5◦ , which corresponds to eight
different angle bins. There is no theory behind this parameter
be identical. {p1 , p2 , p3 , p4 , p5 , p6 } and {q1 , q2 , q3 , q4 , q5 , q6 } value, but supported by the results on thousands of images in
are the corresponding matched feature point sets, where {pi } [34].
and {qi } are correctly matched. Any corresponding directed In this paper, we utilize the extracted line-support regions
edges hpi 7→ pj i and hqi 7→ qj i, ∀i, j ∈ {1, 2, 3, 4, 5, 6} divide to represent the segmentations of the scene. A line segment
the rest of corresponding feature points from the respective associated with the corresponding line-support region is ap-
point sets into the same classifications. An example of the proximated by a rectangle region Li = (oi , θi , li , wi ), which
classifications for hp1 7→ p4 i and hp1 7→ p4 i is shown in Fig. is determined by its center point oi , region angle θi , length
2 (a). Both {p2 , p3 , p5 } and {q2 , q3 , q5 } are located on the right li , and width wi . For the binary images to be described, the
side of hp1 7→ p4 i and hq1 7→ q4 i, while {p6 } and {q6 } are on pixels belonging to the rectangular approximations of line-
the left side. However, the classifications for the corresponding support regions are assigned as 1, and others are assigned as
feature points will be different with any additions of outliers. 0.
Fig. 2 (b) demonstrates the geometrical graphs with the addi-
tion of outlier (p7 , q7 ), which displays in red. Only the outlier
(p7 , q7 ) is located on the two different sides of hp1 7→ p4 i and B. SIFT Feature Point Matching with Geometrical Outlier
hq1 7→ q4 i, while the rest corresponding points are on the same Removal
side. Compared to the directed edge established by inliers, the 1) Initial Correspondence Obtention: The initial SIFT cor-
disparity of feature point classifications increases when the responding matches are established from the segmented
directed edges are established by outliers. An example of the line-support regions of the reference and sensed images.
classifications for hp1 7→ p7 i and hq1 7→ q7 i is shown in Fig. SIFT matching includes five major steps: scale-space
2 (c). {p2 , p3 , p4 , p5 } and {q2 , q3 } are located on the right extrema detection, keypoint localization, orientation as-
side of hp1 7→ p7 i and hq1 7→ q7 i respectively, while {p6 } and signments, keypoint descriptor, and keypoint matching.
{q4 , q5 , q6 } are on the left side. Regarding the fact that the As mentioned previously, SIFT matching is effectively
classifications of the directed edges established by outliers implemented through the nearest neighbor approach. The
5
p6
p6 q1
p1 p6 q1 q1 p1 q2
q2 p1 q2
q6 q6 q6
p2 p7
p2 p2 p4 p4 q3
p4 q3 p7 q3 q4
q4 q4 q7
q7 p3
p3 p3
p5 q5
p5 q5 p5 q5
p1 6 p4 q1 6 q4 p1 6 p4 q1 6 q4 p1 6 p7 q1 6 q7
left : p6 q6 left : p6 , p7 q6 left : p6 q4 , q5 , q6
on : p1 , p4 q1 , q4 on : p1 , p4 q1 , q4 on : p1 , p7 q1 , q7
right : p2 , p3 , p5 q2 , q3 , q5 right : p2 , p3 , p5 q2 , q3 , q5 , q7 right : p2 , p3 , p4 , p5 q2 , q3
(a) (b) (c)
Fig. 2: Demonstrations for geometrical graphs with rotations. (a) The classifications by the directed edges hp1 7→ p4 i and hq1 7→ q4 i without outliers. (b)
The classifications by the directed edges hp1 7→ p4 i and hq1 7→ q4 i with the outlier (p7 , q7 ). (c) The classifications by the directed edges hp1 7→ p7 i and
hq1 7→ q7 i with the outlier (p7 , q7 ).
nearest neighbor is defined as the keypoint with mini-

mum Euclidean distance between the 128-element SIFT (
0, sign (det (pi 7→ pj , pk ))
vectors. The effective measure for matching is the ratio diff i7→j (k) = = sign (det (qi 7→ qj , qk )) . (4)
dratio , which denotes the ratio between the distance 1, otherwise
of the nearest neighbor and that of the second nearest
neighbor. More details about SIFT can be found in [11]. If all the keypoints in {pi } and {qi } are matched
Knowing that the image is segmented by line-support correctly, diff i7→j (k) should be zero for the identical
regions, it is highly probable that the correct matches only location relationships. Thus, the candidate outliers iout
occur between the keypoints with the same binary values. that achieve the maximum accumulated difference of
Therefore, the inital matching for line-support regions classifications are selected:
can be simplified by refining the matching between the N
N X
keypoints with the same pixel values respectively from X
iout = arg max diff i7→j (k). (5)
the two corresponding feature points sets. i=1,2,...,N j k
2) Geometrical Outlier Removal: Incorrect matches may still
exist after SIFT matching as mentioned before. Therefore, When the candidate outliers are identified, all directed
we propose a reliable geometrical outlier removal to edges related to the candidate outliers should be removed
exclude false matches in segmented images. Given two accordingly. A new iteration begins with the decrement
initial corresponding sets of matched keypoints {pi } and of residual correspondences to classify the residual key-
{qi } belonging to the reference and sensed segmented points through the location relations. The iteration stops
images respectively. For each targeted correspondence when diff i7→j (k) = 0, ∀i, j, k ∈ {1, 2, ..., N }. It indi-
(pi , qi ), N − 1 directed edges hpi 7→ pj i and hqi 7→ qj i cates that there is no difference between the correspond-
starting from (pi , qi ) to any other corresponding key- ing location relations, and no candidate outlier needs to
points (pj , qj ) exist. There are generally three cases be removed.
for classifying all of N keypoints in the feature point
sets relying on the relations between the keypoints and C. Iterative Strategy of ALRS-GOR
the directed edges. Take hpi 7→ pj i for example. All of
keypoints can be classified into three groups, i.e., lying on As analyzed in Section II-A, an iterative strategy with
the left side of hpi 7→ pj i, on the right side of hpi 7→ pj i, multi-resolution is proposed to detect the global structures
and in the directed edge. A quick judge to distinguish the that masked at full resolution by image details or noise.
location relations between hpi 7→ pj i and pk is provided Moreover, slight differences in the corresponding images with
as follows: inconsistent content can be filtered out in coarser resolutions.
The framework of the the proposed iterative strategy is demon-
strated in Fig. 3. First, the reference and sensed images at
xi xj xk full resolution are segmented by line-support regions. Then,
det (pi 7→ pj , pk ) = yi yj yk (3) the corresponding matches are obtained from the segmented
1 1 1 images by SIFT with geometrical outlier removal. The root
mean square error (RMSE) related to the current resolution
E(L) is adopted to estimate the accuracy of corresponding
where (x, y) denotes the keypoint location and |·| solves matches:
the determinant of the matrix. The three different relations
v
u
u1 X N
mentioned above can be represented by different symbol L
Ē(L) = 2 × t kT (pk , θ∗ ) − qk k
2
(6)
signs of det (pi 7→ pj , pk ), including plus, minus, and N
k=1
zero. We use diff i7→j (k) to measure the disparity of
classifications for each keypoint (pk , qk ) by the corre- where T (·) is the transformation model, and the parameters
sponding directed edges hpi 7→ pj i and hqi 7→ qj i: θ∗ are estimated by the residual matches through the common
6
model parameter estimation approach least squares method 1) ImgSet1: simulated images with different affine defor-
(LSM) [41]. L is the number of iterations. mations, including rotation, scaling, and shear transfor-
Given an original image, the coarser image at iteration L is mations;
generated by averaging the image at full resolution with 2L × 2) ImgSet2: multispectral remote sensing images, including
2L windows. Re-extraction and re-matching are implemented multispectral images with similar patterns, multispectral
for the images with coarser resolution. The multiresolution with significant spectral difference, multispectral images
framework works iteratively until Ē(L) < . with temporal difference, and multispectral with SAR
noise;
3) ImgSet3: images with inconsistent annotations, including
navigation maps, computer-generated graphics maps, and
maps overlaid with satellite images.
Line Segment Detection
First, the performance of feature point extraction and
matching is validated with the simulated images in ImgSet1.
Line Support Region Segmentation ImgSet2-1 and ImgSet2-2 with poor initial inliers are provided
to prove the necessity of iterative strategy. Then, the registra-
Reduce reference and sensed images
tion applications of the proposed approach are illustrated to
SIFT Initial Matching
in size by 2L images with affine transformation and inconsistent content,
including the real remote sensing images in ImgSet2 with
Classify all keypoints through the relations different spectral content or speckle noise, map images in
between the target point and the directed edges
ImgSet3 with inconsistent annotations. Finally, the proposed
registration approach is compared with other representative
Compute the disparity of classification diff i 6 j No
for each corresponding matches E H? registration methods in terms of registration accuracy and
Yes
computation complexity.
Yes
diff 0? Estimate transformation parameters by LSM
with the residual correspondences
A. Feature Point Extraction and Matching Comparison
No
Remove candidate outliers with maximum

To evaluate the effectiveness of feature extraction and point
accumulated difference of classifications matching for affine distortions, the two aerial images of
(a) ImgSet1-1 and ImgSet1-2 are simulated with different affine
Fig. 3: The framework of ALRS-GOR. transformations, namely, rotation, scale, and shear deforma-
tions. The initial SIFT feature points are extracted and matched
with dradio = 0.8 for the original images and segmented line-
D. Complexity Analysis support regions (LSR), respectively. The subsequent outlier re-
movals are implemented from the initial sets generated by the
In this section, we present the issues related to the execution
segmented line-support regions. The feature point matching
complexity of the proposed methodology. Line-support region
of the proposed approach are compared with three matching
segmentation can be divided into two kinds of stages, i.e.,
methods RANSAC, WGTM (K=10), and LLT (K=15) in terms
computing the gradient angles and summing the regions.
of recall and precision [42]. Here, recall measures the pro-
The time computation of the gradient angles is O(m),where
portion of actual matches that correctly identified. precision
m is the number of image pixels. Summing the region is
represents the correctly matches with respect to the residual
proportional to the total number of pixels involved in all
matches.
regions, which also requires O(m). We assume n as the
number of corresponding matches initialized by SIFT from residual correct matches
the line-support region images. The first iteration for geo- recall = (7)
initial correct matches
metrical outlier removal is breakdown into classifying feature
points and searching for outliers. Classifying feature points residual correct matches
involves two stages, i.e., establishing n(n − 1)/2 directed precision = (8)
residual matches
edges and distinguishing the location relations for the rest of
(n − 2) points. This is the most time consuming step in the For the aerial image of ImgSet1-1, a segment of 512×512
proposed algorithm. In implementing this step, we provide pixels from the original image is selected as the reference
an improvement by updating the directed edges and location image. A segment of same size is taken as the sensed image
relations in each iteration, instead of re-establishing and re- after a simulated rotation of 120◦ clockwise and a simulated
distinguishing them. Then, the time complexity of classifying scale factor of 0.8. As shown in Fig. 4 (a) and (b), the line
feature points can be reduced to O(n2 logn). Searching for segments extracted from the reference image and the sensed
the candidate outliers with maximum accumulated disparity image respectively are highlighted in green. The segmented
of classifications requires O(n). The number of iterations line-support regions are depicted in Fig. 4 (c) and (d). The ini-
depends on initial matches and the percentage of the outliers. tial correspondences are established between the line-support
regions as shown in Fig. 4 (e), which consist of 11 inliers and
79 outliers. The matching performance of RANSAC, WGTM,
IV. EXPERIMENTS AND ANALYSIS LLT, and GOR is demonstrated in Fig. 4 (f)-(i). Inliers of
In this section, we present experiments to validate the residual correspondences are depicted by yellow lines, whereas
effectiveness of the proposed methodology in a laptop with the outliers are represented by red lines. All outliers in the
2-GHz CPU and 8-GB RAM (Intel Core i5). Here, the dataset initial correspondences are removed by WGTM and GOR.
of experiments are composed by three image sets (see Table Despite of preserving as many inliers as GOR, RANSAC is
I), i.e., unable to remove a large amount of outliers. WGTM removes
7
TABLE I: SPECIFICATIONS OF IMAGE DATASET FOR EXPERIMENTS

Dataset Sensor/platform Size(pixel) Spatial Resolution Date Description
1 Aerial 512×512 20m 1977 Aerial image for simulation, from USC, San Diego
ImgSet1
2 Aerial 400×400 15m 1977 Aerial image for simulation, from USC, airport
1 Landsat TM band 5 600×600 30m 1986 Multispectral images with similar patterns
Landsat TM band 7 600×600 30m 1988 from UCSB
2 SPOT band 3 256×256 20m 1995 Multispectral images with small size sparking features
Landsat TM band 4 256×256 30m 1994 Brasilia, Brazil
3 SPOT panchromatic mode 669×539 10m unknown Multispectral images with significant spetral difference
ImgSet2
SPOT XS3 band 329×278 20m unknown unknown
4 ASTER L1B band 1 512×512 15m 1999 Multispectral images with speckle noise
PALSAR fine mode 512×512 18m 2006 Tokyo bay, Japan
5 Orthophotograph green band 512×512 1m unknown Multispectral images with temporal difference
IKONOS panchromatic mode 512×512 1m unknown Porto, Portugal
1 Google map 700×700 200m 2016 Google navigation map for mobile phone
Google map 700×700 200m 2016 auto rotated with direction changes, Erlangen, Germany
2 Google map 650×650 100m 2016 Google map with computer-generated graphics
ImgSet3
Google map 650×650 50m 2016 zoom in, Shanghai Oriental Pearl TV Tower, China
3 Google map 650×650 1000m 2016 Maps overlaid with real satellite images
Google map 650×650 500m 2016 rotate 90◦ and zoom in, Erlangen, Germany
(a) (b) (c) (d) (e)
(f) (g) (h) (i)

Fig. 4: Feature point matching results for ImgSet1-1 with rotation and scale deformations. (a) Line segments of the reference image. (b) Line segments of the
sensed image. (c) Line-support regions of the reference image. (d) Line-support regions of the sensed image. (e) Initial correspondences by SIFT. (f) Point
correspondences by RANSAC. (g) Point correspondences by WGTM. (h) Point correspondences by GOR. (i) Point correspondences by LLT.
(a) (b) (c) (d) (e)
(f) (g) (h) (i)

Fig. 5: Feature point matching results for ImgSet1-2 with shear deformations. (a) Line segments of the reference image. (b) Line segments of the sensed image.
(c) Line-support regions of the reference image. (d) Line-support regions of the sensed image. (e) Initial correspondences by SIFT. (f) Point correspondences
by RANSAC. (g) Point correspondences by WGTM. (h) Point correspondences by GOR. (i) Point correspondences by LLT.
not only all of outliers but also a lot of inliers. LLT preserves tion and matching for ImgSet1, ImgSet2, and ImgSet3. SIFT
more inliers than others. However, some stubborn outliers with feature points are extracted and matched from the original
the similar local structures are still preserved by LLT. images and the line support region (LSR-SIFT), respectively.
For the aerial image of ImgSet1-2, the original image Compared to SIFT extracting from original images with in-
sheared in both of the horizontal (h) and vertical (v) directions consistent content, LSR-SIFT is beneficial for providing much
with the factors of h=0.1 and v=0.1 is regarded as the sensed more initial inliers. The comparative feature point matching
image. The detected line segments are shown in Fig. 5 (a) results of RANSAC, WGTM, LLT, and GOR are presented
and (b). 34 inliers and 144 outliers matched by SIFT are in the right part of Table II. It can be observed that the
used as initial correspondences. Compared with RANSAC and proposed GOR is superior in removing more outliers and
WGTM, LLT and GOR have good performance in preserving preserving sufficient inliers than RANSAC and WGTM, espe-
inliers. Several outliers that close to the correct locations have cially for the shear deformations. RANSAC generally provides
not been removed by LLT. The compared results in Fig. 5 (f)- the closest recall values to those of GOR, but it degenerates
(i) indicate the advantage of the proposed GOR matching on much more seriously in the situations of large proportion
sheared deformations. of initial outliers. This is because RANSAC estimates the
parameters of the transformation model from a set of cor-
Table II summarizes the results of SIFT feature point extrac-
8
respondences containing outliers. It can produce reasonable C. Registration for Multispectral Remote Sensing Images
results only within certain proportion of outliers. WGTM has The image pair of ImgSet2-3 is composed by a segment with
a good performance in removing outliers for images with a size of 669×539 pixels from a panchromatic SPOT image
rigid transformation. However, the performance of WGTM (with a spatial resolution of 10 m) and a segment with a size
degenerates rapidly when dealing with sheared transformed of 329×278 pixels from a SPOT image of XS3 (near IR) band
images. This is because that shear deformations cause the (with a spatial resolution of 20 m). Beyond the scaling defor-
inconformity of geometrical distance and angles between mation between these images, the significant difference exists
vectors of corresponding matches. Accordingly, the angular in the same scene. For example, water mass appears bright in
distances between vectors of corresponding matches utilized in the panchromatic image, but appears dark in the XS3 band.
WGTM will be changed with sheared deformations. Compared The extracted line segments are superimposed on the reference
with other three algorithms, LLT usually has high value of and sensed images in Fig. 8 (a) and (b). The main structures
recall. However, the performance of LLT degenerates in terms from multispectral contents are preserved in the corresponding
of precision when the initial corresponding set contains large line-support regions as shown in Fig. 8 (c) and (d). 9 correct
percentage of outliers. This is because the local structures matches are preserved in Fig. 8 (e) with GOR. The coefficients
among neighboring feature points are easily affected by a of affine transformation model are estimated by Least Square
large amount of outliers. The local geometrical constraint will Method (LSM) [41] using the residual corresponding matches.
be confused by outliers with the similar neighboring feature The checkerboard mosaiced image in Fig. 8 (f) shows that the
points and inliers with neighboring outliers. features of the two multispectral images such as the river are
precisely overlapped.
B. Iterative Strategy The image pair of ImgSet2-4 consists of two images with
As shown in Table II, feature point matching without the same size of 512×512 pixels taken by the sensor of
iterative strategy is effective to most of image sets, merely ASTER L1B band 1 and PALSAR fine mode, covering the
except for ImgSet2-1 and ImgSet2-2. In both of the iterative bay of Tokyo, Japan. The substantial disparity according to
examples, all of the three matching methods are incapable of the visual appearance can be observed between the optical and
excluding outliers with the poor initial inliers at full resolution. SAR images. The SAR image from PALSAR shown in Fig. 9
The meaningful matching results with iterative strategy can be (b) is inevitably contaminated by the speckle noise and scatter
respectively achieved when the original images of ImgSet2- signals from the earth surface, which brings more challenges
1 and ImgSet2-2 are downsampled to 50%×50% of their to multispectral image registration. As illustrated in Fig. 9
original sizes. (c) and (d), the line-support regions from both of the images
The first considered image pair of ImgSet2-1 are obtained perserve similar shapes. It can be indicated from the registered
from Landsat TM band 5 and band 7 with the same size image that image features such as the coasts of the bay are
of 600×600 pixels. Beyond the spectral difference and dis- precisely overlapped.
placement between the two images, it can be clearly seen The image pair of ImgSet2-5 consists of an orthophotograph
that the similar patterns appeared in the scene. At the first image with the green band and a panchromatic image covering
iteration of full resolution as shown in Fig. 6 (c) and (d), a part of the city of Porto with the same size of 512×512 pixels.
large amount of isolated line segments are extracted from the Significant changes exist between these images because of
original images. Most of line-support regions are isolated and spectral and one year temporal difference. For example, several
similar in appearance. It makes this example a quite difficult new buildings appears in the sensed image, which cannot be
to provide sufficient initial inliers at the full resolution. As found in the reference image. These temporal differences make
shown in Fig. 6 (e), 201 pairs of corresponding matches are the situations more complicated. As shown in Fig. 10 (c) and
initialized by SIFT with only 10 inliers. At the second iteration (d), most of structures of the urban roads are preserved in the
of decreasing the original images to 50%×50% of their size, segmented line-support regions. 10 corresponding inliers are
both of the reference and sensed image are downsampled matched by GOR, and the checkerboard mosaiced image are
to the resolution of 300×300 pixels. The structures of line- shown in Fig. 10 (f).
support regions in the second iteration are preserved much
D. Registration for Images with Inconsistent Annotation
more completely. The number of inliers in the initial set
increases to 13, with respect to 68 initial correspondences. ImgSet3-1 comprises two navigation maps generated by
Finally, 9 pairs of corresponding matches without any outliers google map for Android mobile system with the same location
are obtained by the proposed method with two iterations. but different orientations. The map rotates globally in clock-
The second considered image pair of ImgSet2-2 are com- wise, except for the bus icons and texts of street names. The
posed by the two images with the same size of 256×256 icons and texts stay nearly horizontal for a better visualization.
pixels, respectively from SPOT band 3 (0.78-0.89µm) and ImgSet3-2 consists of two google map images with
Landsat TM band 4 (0.76-0.90µm) over an urban area in computer-generated graphics by searching for the place of
Brasilia, Brazil. Although the water body are completely “Oriental Pearl TV Tower of Shanghai”. The sensed map
preserved in the first iteration, a large amount of isolated line image is obtained by zooming in the reference map for a large
segments are extracted due to the presence of small size of scale. Noted that most of the interest icons and Chinese char-
sparkling features around the water body as shown in Fig. acters remain in the same size in the scaled map. Moreover,
7 (c) and (d). At the second iteration, the salient structure several new icons and texts of street names appear in the scaled
of the water body are still preserved, while isolated segments map.
extracted from the downsampled images significantly decrease. ImgSet3-3 consists of the maps overlaid with real satellite
It leads to the drastically reduced outliers initialized by SIFT images generated by google map. The maps covers the part
in Fig. 7 (k). The residual matches Finally, 11 pairs of of the city of Erlangen, Germany. The reference map is the
corresponding matches without any outliers are obtained by top view of google map with the spatial resolution of 1000
the proposed method with two iterations. m. The sensed map is obtained by rotating the reference map
9
TABLE II: THE COMPARISON RESULTS OF FEATURE POINT MATCHING WITH OUTLIER REMOVAL
SIFT LSR-SIFT RANSAC WGTM LLT GOR

Image Pair iter
Initial Correct Initial Correct Recall Precision Recall Precision Recall Precision Recall Precision
Match Match Match Match
ImgSet1-1 1 243 175 90 11 0.24 0.45 0.36 1.00 0.91 0.42 0.64 1.00
ImgSet1-2 1 372 204 178 34 0.76 0.70 0.12 0.67 0.97 0.85 0.47 1.00
1 520 126 201 10 0.10 0.04 0.00 0.00 0.60 0.22 0.20 0.25
ImgSet2-1
2 170 38 68 13 0.92 0.60 0.62 0.23 0.92 0.86 0.77 1.00
1 24 7 174 29 0.10 0.05 0.21 0.60 0.83 0.35 0.03 0.13
ImgSet2-2
2 19 5 39 17 0.59 0.27 0.41 1.00 0.88 1.00 0.65 1.00
ImgSet2-3 1 136 10 76 19 0.85 0.50 0.53 0.83 0.79 0.71 0.53 1.00
ImgSet2-4 1 18 3 61 23 0.65 0.36 0.26 1.00 0.70 1.00 0.39 1.00
ImgSet2-5 1 158 4 90 32 0.44 0.45 0.13 1.00 0.66 0.78 0.34 1.00
ImgSet3-1 1 95 19 288 110 0.87 0.79 0.29 0.82 0.67 0.59 0.34 1.00
ImgSet3-2 1 60 12 109 48 0.25 0.26 0.15 1.00 0.60 0.88 0.23 1.00
ImgSet3-3 1 498 54 251 33 0.67 0.56 0.12 0.50 0.61 0.74 0.52 1.00
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)

Fig. 6: Iterative line segments extraction and matching for ImgSet2-1. (a) Line segments of reference image at the first iteration. (b) Line segments of sensed
image at the first iteration. (c) Line-support regions of reference image at the first iteration. (d) Line-support regions of sensed image at the first iteration.
(e) Initial correspondences by SIFT at the first iteration. (f) Point correspondences by GOR at the first iteration. (g) Line segments of reference image at the
second iteration. (h) Line segments of sensed image at the second iteration. (i) Line-support regions of reference image at the second iteration. (j) Line-support
regions of sensed image at the second iteration. (l) Initial correspondences by SIFT at the second iteration. (l) Point correspondences by GOR at the second
iteration.
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)

Fig. 7: Iterative line segments extraction and matching for ImgSet2-2. (a) Line segments of reference image at the first iteration. (b) Line segments of sensed
image at the first iteration. (c) Line-support regions of reference image at the first iteration. (d) Line-support regions of sensed image at the first iteration.
(e) Initial correspondences by SIFT at the first iteration. (f) Point correspondences by GOR at the first iteration. (g) Line segments of reference image at the
second iteration. (h) Line segments of sensed image at the second iteration. (i) Line-support regions of reference image at the second iteration. (j) Line-support
regions of sensed image at the second iteration. (l) Initial correspondences by SIFT at the second iteration. (l) Point correspondences by GOR at the second
iteration.
90◦ in clockwise, and scaling to the spatial resolution of 500 regions in Fig. 11 and Fig. 12. The map overlaid with real
m. The icons and texts are not exactly transformed along with satellite image in ImgSet3-3 is a complicated registration case
the map. due to the scale, rotation, and inconsistent text annotations.
Similar to the remote sensing images, line-support regions
The examples of the registration process achieved for extracted for ImgSet3-3 are shown in Fig. 13 (c) and (d). For
ImgSet3-1, ImgSet3-2, and ImgSet3-3 are shown in Fig. 11-13 map images in ImgSet3, it is worth to mention that most of
respectively. Distinguished from real images, 2-D map images the text annotations are excluded in the line-support regions.
with computer-generated graphics are mainly composed by Relying on the well-extracted line-support regions, sufficient
linear features. It is very helpful to detect line segments. inliers are obtained by SIFT matching equipped with GOR as
Therefore, most of line graphics in both of the map images are shown in Fig. 11-13 (e). By visual inspection of the registered
extracted as line segments as shown in Fig. 11 (a)-(b) and Fig. image in Fig. 11-13 (f), it appears that the registration is valid
12 (a)-(b). The streets can be represented by the line-support
10
(b) (d)
(a) (c)
(e) (f)
Fig. 8: Image registration for ImgSet2-3. (a) Line segments of reference image. (b) Line segments of sensed image. (c) Line-support regions of reference
image. (d) Line-support regions of sensed image. (e) Point correspondences by GOR. (f) Checkerboard mosaiced image.
(a) (b) (c) (d)
(e) (f)
and accurate for map images with affine transformation and Otsu’s thresholding-based image segmentation and SIFT with
inconsistent annotations. The registration results presented in bivariate histogram-based outlier removal. The performance
Table IV can also verify the effectiveness of the proposed of these methods is evaluated through the following measures
method for registering map images. proposed in [43]: Nred (number of redundant points), RM Sall
E. Comparisons with Other Automatic Image Registration (rmse considering all residual points together), RM SLOO
Methods (rmse computation of the residual points based on the leave-
one-out method), and BP P (2.0) (bad point proportion with
The proposed methodology for automatic image registration
norm higher than 2.0). The obtained registration results are
is compared with three automatic registration methods for
presented in Table III, where the failure cases are marked with
the previously described and demonstrated image sets. The
‘-’.
first method SIFT-MI in [40] designs a coarse-to-fine scheme
based on SIFT and MI. The coarse process adopts SIFT It can be observed that ALRS-GOR generally outperforms
approach equipped with the outlier removal, which removes other three registration methods, in particular for remote
outliers that scattered away from the cluster of the scale sensing images with inconsistent spectral contents and map
histogram. The subsequent fine-tuning process is implemented images with inconsistent annotations. Although SIFT-WGTM
by the maximization of mutual information. The second has achieved the accuracy results comparable to ALRS-GOR
method SIFT-WGTM adopts SIFT matching as the initial for the images with rigid transformation, their performance
set of matches, then utilizes Weighted Graph Transformation is limited for the images of ImgSet1-2 with sheared transfor-
Matching (WGTM) [30] as outlier rejection. IS-SIFT [15] is mation. This is because that the angular distance adopted in
the third considered method, which is the combination of the WGTM is invariant with scaling and rotation deformations,
11
(a) (b) (c) (d)
(e) (f)
(a) (b) (c) (d)
(e) (f)
except for sheared transformation. the most time-consuming stage for SIFT-based registration
Both of SIFT-MI and SIFT-WGTM are equipped with SIFT methods. The computational cost depends on the number of
initial matches from the original images. They cannot deal detected feature points. Therefore, SIFT-MI and SIFT-WGTM
with more complex situations, such as images with significant is computationally expensive than IS-SIFT and ALRS-GOR.
spectral differences and inconsistent annotations. The main This is because that they require much higher computation
reason behind this is that the initial matching by SIFT from time to detect and match feature points in the original images
inconsistent contents are not able to provide sufficient correct with more texture details than the segmented images. Com-
matches. The performance of the subsequent outlier removal pared to IS-SIFT, the proposed method takes more acceptable
decreases with less initial inliers. By adopting the Otsu’s computational costs with the exceptions of ImgSet2-1 and
thresholding-based method as segmentation, IS-SIFT achieves ImgSet2-2. Re-extraction and re-matching of the proposed
acceptable results in the cases of remote sensing images with method in both of these cases require additional computational
small differences in the spectral contents and map images with costs with multi-resolution strategy.
few annotations. However, significant differences in image F. Sensitivity to the Stopping Condition Threshold ()
content cannot be excluded by the coarse thresholding-based The proposed ALRS-GOR relies on the stopping threshold
segmentation. It exhibits in general not precisely enough with to terminate the iterative process. The value of depends
the more complex situations, such as optical-SAR image pairs on the required accuracy of registration. The following exper-
with the presence of noise, or map images with significant imental results in this section provide a brief analysis of the
inconsistent texts. sensitivity of the algorithm to the stopping condition threshold
Regarding computational efficiency, feature matching is .
12
(a) (b) (c) (d)
(e) (f)
(a) (b) (c) (d)
(e) (f)
Fig. 14 shows the values of precision and recall for the point extraction and matching for the segmented images is
three image datasets with different setting of , i.e., = 2.0, implemented by SIFT approach equipped with geometrical
= 1.0, and = 0.5. It can be observed from Fig. 14 outlier removal. The proposed feature matching method GOR
(a)-(c) that the smaller values of lead to higher precisions rejects outliers and preserves inliers based on comparing the
but lower recalls. In order to achieve a smaller RMSE, more disparity of geometrical relationships. Furthermore, an itera-
outliers can be identified with more iterations. Accordingly, tive strategy with multi-resolution is employed to improve the
more inlier are easily removed in the outlier removal iterations. incompleteness of line segments. In our experiments, we have
Besides that, a smaller stopping threshold with re-extraction tested the proposed method on image pairs in the following
and re-matching inevitably increases the time complexity of situations: aerial images with simulated affine deformations,
the proposed algorithm. Therefore, it is a trade-off to set the multispectral remote sensing image pairs, and 2-D map image
values of between the precision values, recall values, and the pairs. The experimental results show that the proposed method
time complexity. obviously improves the accuracy of registration and achieves
V. CONCLUSION AND FUTURE WORK acceptable computational efficiency. One of our future work
The paper has reported a novel automatic registration includes incorporating proper clustering techniques into outlier
approach for images with affine transformation and incon- removal for a more effective feature point matching. Regarding
sistent content. This approach consists of an iterative line- the registration of images with the differences of the terrain
support region segmentation and SIFT matching equipped height, the affine transformation model is not suitable. Other
with geometrical outlier removal. To begin with, line-support more reasonable transformation models should be considered
regions is proposed to extract linear features. Next, feature to handle the influence of differences in the terrain height.
13
TABLE III: QUANTITATIVE COMPARISONS OF REGISTRATION RESULTS (-: THIS METHOD WAS NOT ABLE TO REGISTER THIS PAIR OF
IMAGE.)
Pair AIR Nred RM Sall RM SLOO BP P (2.0) Time(sec) Pair AIR Nred RM Sall RM SLOO BP P (2.0) Time(sec)
SIFT-MI 17 0.510 0.493 0.412 39 SIFT-MI - - - - -
SIFT-WGTM 32 0.356 0.307 0.313 33 SIFT-WGTM - - - - -
ImgSet1-1 ImgSet2-4
IS-SIFT 5 0.593 0.614 0.200 27 IS-SIFT 7 39.728 27.632 0.429 22
ALRS-GOR 7 0.466 0.392 0.143 14 ALRS-GOR 9 0.459 1.091 0.111 17
SIFT-MI - - - - - SIFT-MI - - - - -
ImgSet1-2 ImgSet2-5
IS-SIFT 29 22.041 20.730 0.069 15 IS-SIFT 17 0.655 0.620 0.118 37
ALRS-GOR 16 0.289 0.305 0.000 19 ALRS-GOR 11 0.274 0.233 0.000 28
SIFT-MI 28 20.147 20.166 20.071 274 SIFT-MI - - - - -
SIFT-WGTM 33 0.205 0.273 0.030 60 SIFT-WGTM 14 0.281 0.326 0.071 49
ImgSet2-1 ImgSet3-1
IS-SIFT 7 0.392 0.361 0.000 48 IS-SIFT 26 0.626 0.654 0.115 32
ALRS-GOR 10 0.168 0.252 0.000 56 ALRS-GOR 37 0.304 0.380 0.054 34
SIFT-MI 4 0.493 0.607 0.250 49 SIFT-MI - - - - -
ImgSet2-2 ImgSet3-2
IS-SIFT 9 0.308 0.214 0.222 37 IS-SIFT - - - - -
ALRS-GOR 11 0.275 0.209 0.000 54 ALRS-GOR 11 0.162 0.187 0.000 21
SIFT-MI - - - - - SIFT-MI - - - - -
SIFT-WGTM - - - - - SIFT-WGTM - - - - -
ImgSet2-3 ImgSet3-3
IS-SIFT 14 0.702 1.680 0.143 32 IS-SIFT - - - - -
ALRS-GOR 10 0.318 0.475 0.100 29 ALRS-GOR 17 0.493 0.417 0.118 40
ε = 2.0 ε = 1.0 ε = 0.5

110 110 110
100 100 100
90 90 90
Precision and Recall Vaules (%)

80 80 80
70 70 70
60 60 60
50 50 50
40 40 40
30 Precision 30 Precision 30
Precision
Recall Recall Recall
20 20 20
10 10 10
0 0 0
1−1 1−2 2−1 2−2 2−3 2−4 2−5 3−1 3−2 3−3 1−1 1−2 2−1 2−2 2−3 2−4 2−5 3−1 3−2 3−3 1−1 1−2 2−1 2−2 2−3 2−4 2−5 3−1 3−2 3−3
ImgSet ImgSet ImgSet
(a) (b) (c)

Fig. 14: Performance comparisons with different stopping threshold . (a) = 2.0. (b) = 1.0. (c) = 0.5.
R EFERENCES voronoi integrated spectral point matching,” IEEE Trans.

Geosci. Remote Sens., vol. 53, no. 11, pp. 6058–6072,
[1] B. Zitoza and B. Flusser, “Image registration methods: Nov. 2015.
a survey,” Image Vis. Comput., vol. 21, no. 11, pp. 977– [11] D. G. Lowe, “Distinctive image features from scale-
1000, Jul. 2003. invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2,
[2] A. Wong and D. A. Clausi, “ARRSI: Automatic regis- pp. 91–110, Nov. 2004.
tration of remote-sensing images,” IEEE Trans. Geosci. [12] F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F.
Remote Sens., vol. 45, no. 5, pp. 1483–1493, May. 2007. Tupin, “SAR-SIFT: A SIFT-Like algorithm for SAR
[3] G. Yammine, E. Wige, F. Simmet, D. Niederkorn, and images,” IEEE Trans. Geosci. Remote Sens., vol. 53,
A. Kaup, “Novel similarity-invariant line descriptor and no. 1, pp. 453–465, Jan. 2015.
matching algorithm for global motion estimation,” IEEE [13] S. H. Wang, H. J. You, and K. Fu, “BFSIFT: A novel
Trans. Circuits Syst. Video Technol., vol. 24, no. 8, pp. method to find feature matches for sar image registra-
1323–1335, Aug. 2014. tion,” IEEE Geosci. Remote Sens Lett., vol. 9, no. 4, pp.
[4] G. Yammine, E. Wige, D. Niederkorn, and A. Kaup, “A 649–653, Oct. 2012.
novel similarity-invariant line descriptor for geometric [14] A. Sedaghat and H. Ebadi, “Remote sensing image
map registration,” in Proc. IEEE ICIP, Melbourne, Sep matching based on adaptive binning SIFT descriptor,”
2013, pp. 3017–3021. IEEE Trans. Geosci. Remote Sens., vol. 53, no. 10, pp.
[5] Z. Wu and A. Goshtasby, “Adaptive image registration 5283–5293, Oct. 2015.
via hierarchical voronoi subdivision,” IEEE Trans. Image [15] H. Goncçalves, L. Corte-Real, and J. A. Goncçalves,
Process., vol. 21, no. 5, pp. 2464–2473, May. 2012. “Automatic image registration through image segmen-
[6] W. Shi, F. Z. Su, R. R. Wang, and J. F. Fan, “A visual tation and SIFT,” IEEE Trans. Geosci. Remote Sens.,
circle based image registration algorithm for optical vol. 49, no. 7, pp. 2589–2600, Jul. 2011.
and SAR imagery,” in Proc. IEEE IGARSS, Munich, [16] H. D. Cheng, “Color image segmentation: advances and
Germany, Jul 2012, pp. 2109–2112. prospects,” Pattern Recog., vol. 34, no. 12, pp. 2259–
[7] H. Chen, P.Varshney, and M. Arora, “Mutual information 2281, Dec. 2001.
based image registration for remote sensing data,” Int. J. [17] O. J. Tobias and R. Seara, “Image segmentation by
Remote Sens., vol. 24, no. 18, pp. 3701–3706, Feb. 2004. histogram thresholding using fuzzy sets,,” IEEE Trans.
[8] S. Suri and P. Reinartz, “Mutual-information-based reg- Image Process, vol. 11, no. 12, pp. 1457–1465, Dec.
istration of TerraSAR-X and Ikonos imagery in urban 2002.
areas,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 2, [18] S. K. Pal, A. Ghosh, and B. U. Shankar, “Segmentation
pp. 939–949, Feb. 2010. of remotely sensed images with fuzzy thresholding, and
[9] W. Li and H. Leung, “A maximum likelihood approach quantitative evaluation,” Int. J. Remote Sens., vol. 21,
for image registration using control point and intensity,” no. 11, pp. 2269–2300, Feb. 2000.
IEEE Trans. Image Process., vol. 13, no. 8, pp. 1115– [19] N. R. Pal and S. K. Pal, “A review on image segmentation
1127, Aug. 2004. techniques,” Pattern Recognit., vol. 26, no. 9, pp. 1277–
[10] H. Sui, C. Xu, J. Y. Liu, and F. Hua, “Automatic optical- 1294, Feb. 1993.
to-SAR image registration by iterative line extraction and
14
[20] D. Mahapatra and Y. Sun, “Integrating segmentation tion wavelet feature pyramids for automatic registration
information for improved MRF-based elastic image reg- of multisensor imagery,” IEEE Trans. Image Process.,
istration,” IEEE Trans. Image Process., vol. 21, no. 1, vol. 14, no. 6, pp. 770–7821, Jun. 2005.
pp. 170–183, Jan. 2012. [36] J. L. Moigne, W. J. Campbell, and R. F. Cromp, “An
[21] A. Goshtasby, G. C. Stockman, and C. V. Page, “A automated parallel image registration technique based on
region-based approach to digital image registration with the correlation of wavelet features,” IEEE Trans. Geosci.
subpixel accuracy,” IEEE Trans. Geosci. Remote Sens., Remote Sens., vol. 40, no. 8, pp. 1849–1864, Aug. 2002.
vol. GE-24, no. 3, pp. 390–399, May. 1986. [37] J. M. Murphy, J. L. Moigne, and D. J. Harding, “Auto-
[22] G. Stockman, S. Kopstein, and S. Benett, “Matching matic image registration of multimodal remotely sensed
images to models for registration and object detection data with global shearlet features,” IEEE Trans. Geosci.
via clusterin,” IEEE Trans. Pattern Anal. Mach. Intell., Remote Sens., vol. 54, no. 3, pp. 1685–1704, Mar. 2016.
vol. PAMI-4, no. 3, pp. 229–241, May. 1982. [38] J. M. Murphy and J. L. Moigne, “Shearlet features for
[23] G. Troglio, J. L. Moigne, J. A. Benediktsson, G. Moser, registration of remotely sensed multitemporal images,”
and S. B. Serpico, “Automatic extraction of ellipsoidal in Proc. IEEE IGARSS, Milan, Jul 2015, pp. 1084–1087.
features for planetary image registration,” IEEE Geosci. [39] K. Mikolajzyk and C. Schmid, “A performance evalu-
Remote Sens Lett., vol. 9, no. 1, pp. 95–99, Jan. 2012. ation of local descriptors,” IEEE Trans. Pattern Anal.
[24] H. Gonçalves, J. A. Gonçalves, and L. Corte-Real, Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005.
“HAIRIS: A method for automatic image registration [40] M. G. Gong, S. Z. Zhao, L. C. Jial, D. Y. Tian, and S.
through histogram-based image segmentation,” IEEE Wang, “A novel coarse-to-fine scheme for automatic im-
Trans. Image Process., vol. 20, no. 3, pp. 776–789, Mar. age registration based on SIFT and mutual information,”
2011. IEEE Trans. Geosci. Remote Sens., vol. 52, no. 7, pp.
[25] B. Morago, G. Bui, and Y. Duan, “An ensemble approach 4328–4338, Jul. 2014.
to image matching using contextual features,” IEEE [41] S. Umeyama, “Least-squares estimation of transforma-
Trans. Image Process., vol. 24, no. 11, pp. 4474–4487, tion parameters between two point patterns,” IEEE Trans.
2015. Pattern Anal. Mach. Intell., vol. 13, no. 4, pp. 376–380,
[26] J. Ma, J. Zhao, T. Tian, A. L. Yuille, and Z. Tu, “Robust Apr. 1991.
point matching via vector field consensus,” IEEE Trans. [42] Z. X. Liu, J. B. An, and Y. Jing, “A simple and robust
Image Process., vol. 23, no. 4, pp. 1706–1721, Apr. 2014. feature point matching algorithm based on restricted
[27] J. Ma, W. Qiu, J. Zhao, Y. Ma, A. L. Yuille, and Z. Tu, spatial order constraints for aerial image registration,”
“Robust L2 E estimation of transformation for non-rigid IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp.
registration,” IEEE Trans. Image Process., vol. 63, no. 5, 514–527, May. 2012.
pp. 1115–1129, Mar. 2015. [43] B.Zitoza and B.Flusser, “Measures for an objective eval-
[28] J. Ma, H. Zhou, J. Zhao, Y. Gao, J. Jiang, and J. uation of the geometric correction process quality,” IEEE
Tian, “Robust feature matching for remote sensing image Geosci. Remote Sens Lett., vol. 6, no. 2, pp. 292–296,
registration via locally linear transforming,” IEEE Trans. Apr. 2009.
Geosci. Remote Sens., vol. 53, no. 12, pp. 6469–6481,
Dec. 2015.
[29] W. Aguilar, Y. Frauel, F. Escolano, M. E. Martinez-Perez,
A. Espinosa-Romero, and M. A. Lozano, “A robust graph
transformation matching for non-rigid registration,” Im-
age Vis. Comput., vol. 27, no. 7, pp. 897–910, Jun. 2009.
[30] M. Izadi and P. Saeedi, “Robust weighted graph trans-
formation matching for rigid and nonrigid image regis-
tration,” IEEE Trans. Image Process., vol. 21, no. 10, pp.
4369–4382, Oct. 2012.
[31] M. Zhao, B. W. An, Y. P. Wu, and C. Q. Lin, “Bi-
SOGC: A graph matching approach based on bilateral
KNN spatial orders around geometric centers for remote
sensing image registration,” IEEE Geosci. Remote Sens
Lett., vol. 10, no. 6, pp. 1429–1434, Nov. 2013.
[32] M. Zhao, B. W. An, Y. P. Wu, B. Y. Chen, and S. L. Sun,
“A robust delaunay triangulation matching for multispec-
tral/multidate remote sensing image registration,” IEEE
Geosci. Remote Sens Lett., vol. 12, no. 4, pp. 711–715,
Apr. 2015.
[33] Z. L. Song, S. G. Zhou, and J. H. Guan, “A novel Ming Zhao received the B.S. degree in telecom-
image registration algorithm for remote sensing un- munication engineering from Wuhan University,
Wuhan, China, in July 2007, the Ph.D. degree in
der affine transformation,” IEEE Trans. Geosci. Remote physical electronics with Shanghai Institute of Tech-
Sens., vol. 52, no. 8, pp. 4895–4912, Aug. 2014. nical Physics, the institute of Chinese Academy of
[34] R. Gioi, J. Jakubowicz, J. M. Morel, and G. Randall, Sciences (CAS), Shanghai, China, in June 2012.
She is currently an associate professor with
“LSD: A fast line segment detector with a false detection Shanghai Maritime University, China. From 2015
control,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, to 2016, she was a Visiting Scholar with Friedrich-
no. 4, pp. 722–732, Apr. 2010. Alexander University Erlangen-Nürnberg, Germany.
Her research interests include remote sensing image
[35] I. Zavorin and J. L. Moigne, “Use of multiresolu- processing, pattern recognition, and computer vision.
15
Yongpeng Wu (S’08–M’13) received the B.S. de- André Kaup (M’96–SM’99–F’13) received the
gree in telecommunication engineering from Wuhan Dipl.-Ing. and Dr.-Ing. degrees in electrical en-
University, Wuhan, China, in July 2007, the Ph.D. gineering from Rheinisch-Westfälische Technische
degree in communication and signal processing with Hochschule (RWTH) Aachen University, Aachen,
the National Mobile Communications Research Lab- Germany, in 1989 and 1995, respectively.
oratory, Southeast University, Nanjing, China, in He was with the Institute for Communication
November 2013. Engineering, RWTH Aachen University, from 1989
Dr. Wu is currently a Tenure-Track Associate to 1995. He joined the Networks and Multimedia
Professor with the Department of Electronic En- Communications Department, Siemens Corporate
gineering, Shanghai Jiao Tong University, China. Technology, Munich, Germany, in 1995 and became
Previously, he was a Senior Research Fellow with In- Head of the Mobile Applications and Services Group
stitute for Communications Engineering, Technical in 1999. Since 2001 he has been a Full Professor and
University of Munich, Germany and the Humboldt Research Fellow and the the Head of the Chair of Multimedia Communications and Signal Processing,
Senior Research Fellow with Institute for Digital Communications, University University of Erlangen- Nuremberg, Erlangen, Germany. From 1997 to 2001
Erlangen-Nürnberg, Germany. During his doctoral studies, he conducted he was the Head of the German MPEG delegation. From 2005 to 2007 he
collaborative research at the Department of Electrical Engineering, Missouri was a Vice Speaker of the DFG Collaborative Research Center 603. From
University of Science and Technology, USA. His research interests include 2015 to 2017 he served as Head of the Department of Electrical Engineering
massive MIMO/MIMO systems, physical layer security, signal processing for and Vice Dean of the Faculty of Engineering. He has authored around 350
wireless communications, and multivariate statistical theory. journal and conference papers and has over 70 patents granted or pending.
Dr. Wu was awarded the IEEE Student Travel Grant for IEEE International His research interests include image and video signal processing and coding,
Conference on Communications (ICC) 2010, the Alexander von Humboldt and multimedia communication.
Fellowship in 2014, the Travel Grant for IEEE Communication Theory Work- André Kaup is a member of the IEEE Multimedia Signal Processing
shop 2016, the Excellent Doctoral Thesis Award of China Communications Technical Committee, a member of the scientific advisory board of the
Society 2016, and the Excellent Editor Award of IEEE Communications German VDE/ITG, and a Fellow of the IEEE. He served as an Associate
Letters 2017. He was an Exemplary Reviewer of the IEEE Transactions on Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR
Communications in 2015 and 2016. He is the lead guest editor for the special VIDEO TECHNOLOGY and was a Guest Editor for IEEE JOURNAL OF
issue “Physical Layer Security for 5G Wireless Networks” of the IEEE Journal SELECTED TOPICS IN SIGNAL PROCESSING. From 1998 to 2001 he
on Selected Areas in Communications. He is currently an editor of the IEEE served as an Adjunct Professor with the Technical University of Munich,
Access and IEEE Communications Letters. He has been a TPC member of Munich. He was a Siemens Inventor of the Year 1998 and received the
various conferences, including Globecom, ICC, VTC, and PIMRC, etc. Dr. 1999 ITG Award. He has received several best paper awards, including
Wu is an expect on physical layer security for 5G wireless networks and has the Paul Dan Cristea Special Award from the International Conference on
rich experiences on handing technical papers. Systems, Signals, and Image Processing in 2013. His group won the Grand
Video Compression Challenge at the Picture Coding Symposium 2013 and he
received the Teaching Award of the Faculty of Engineering in 2015.
Shengda Pan received his Ph.D. degree in circuits

and systems from University of Chinese Academy of
Sciences, Shanghai, China, in 2013. He is currently a
lecturer in Shanghai Maritime University , Shanghai,
China. His research interests include remote sensing
image processing and pattern recognition.
He is currently a full professor with Shanghai
Maritime University, China. His research interests
include photoelectric signal acquisition and remote
sensing image processing.
Fan Zhou is a lecturer in Shanghai Maritime Uni-

versity , Shanghai, China. He received his Ph.D.
in school of remote sensing and information engi-
neering at Wuhan University in 2014. His research
interests lie in computer vision, photogrammetry and
Remote sensing
Bowen An received the M.S. degree in commu-

nication and information engineering from Wuhan
University, Wuhan, China, in April 2004, the Ph.D.
degree in circuits and systems with Shanghai Insti-
tute of Technical Physics, the institute of Chinese
Academy of Sciences (CAS), Shanghai, China, in
July 2006.

Sift

Uploaded by

Copyright:

Available Formats

Sift

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sift

Uploaded by

Copyright:

Available Formats

1

Automatic Registration of Images with Inconsistent

are more different than those directed edges established by

nearest neighbor is defined as the keypoint with mini-

Remove candidate outliers with maximum

TABLE I: SPECIFICATIONS OF IMAGE DATASET FOR EXPERIMENTS

(a) (b) (c) (d) (e)

(f) (g) (h) (i)

(a) (b) (c) (d) (e)

(f) (g) (h) (i)

SIFT LSR-SIFT RANSAC WGTM LLT GOR

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(a) (b) (c) (d)

(a) (b) (c) (d)

(a) (b) (c) (d)

(a) (b) (c) (d)

(a) (b) (c) (d)

ε = 2.0 ε = 1.0 ε = 0.5

100 100 100

Precision and Recall Vaules (%)

Precision and Recall Vaules (%)

(a) (b) (c)

R EFERENCES voronoi integrated spectral point matching,” IEEE Trans.

Shengda Pan received his Ph.D. degree in circuits

Fan Zhou is a lecturer in Shanghai Maritime Uni-

Bowen An received the M.S. degree in commu-

You might also like