Unsupervised Matching of Deformable Objects
by Agglomerative Correspondence Clustering

Authors

Created Jul 25, 2009
Updated Jul 18, 2011

Abstract

Object-based image matching between unsupervised real-world images is studied in which multiple deformable objects or patterns appear with severe clutter and occlusion. Direct image matching in this unsupervised setup has been a challenging open problem in computer vision research. In such cases, establishing reliable feature-level correspondences is hindered by numerous outliers and ambiguous matches, thereby requiring an approach interleaved with discovering high-level correspondences. In the present paper, a simultaneous feature matching and clustering method is introduced to address the interconnected problems of feature matching, object matching, and outlier elimination in an effective and integrated way. The problem of object-based image matching is formulated as an unsupervised multi-class clustering problem on a set of candidate feature correspondences. A robust dissimilarity measure based on local invariant features is developed, and a novel matching algorithm are proposed in the framework of hierarchical agglomerative clustering. The algorithm leverages both photometric and geometric consistency of features, and handles a significant amount of outliers and deformation, as well as multiple clusters. The experimental evaluation of feature matching, object recognition, and object-based image matching demonstrates that the proposed method is a powerful tool for a wide range of real-world image matching problems.

Paper

Journal version 2011 paper (under review)
ICCV 2009 paper (pdf, 1.5MB)

Presentation

ICCV 2009 pdf file (poster), 2.14MB

ICCV 2009 pdf file (slides), 2.90MB

The comparative experiment on object recognition is summarized in here.

Dataset

The ETHZ toys dataset used in our paper are available from the link below.
http://www.vision.ee.ethz.ch/~vferrari/datasets.html

The images used in Sec 5.2 of ICCV paper can be available here (zip, 6.85MB).

Code

MATLAB code (zip, 5.5MB)
Updated Jul 18, 2011:
Feature extraction, dissimilarity computation, and
our upgraded algorithm (with intra-cluster 1-to-1 constraints) are all included.

Citation

Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Unsupervised Matching of Deformable Objects by Agglomerative Correspondence Clustering, (under review)

Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Feature Correspondence and Deformable Object Matching via Agglomerative Correspondence Clustering, IEEE International Conference on Computer Vision (ICCV) 2009

Overview

Agglomerative process of the proposed algorithm. Each correspondence is represented by the pair of dots with the same color. In (a), all initial candidate correspondences are shown. In (b), (c) and (d), only the clusters having more than one correspondence are shown for observing growing clusters. Features in each cluster is delineated by their convex hull. In (d), two largest clusters, red and green, detects two true object correspondences. The final result is shown above.

Object-based Image Matching Experiment

Here, we show unsupervised object-based image matching experiments performed on challenging real images. (Refer to our papers for several other experiments.) For evaluation of object-based image matching, 10 image pairs of different objects were collected from several public datasets. As shown in the Figures below, the dataset contains various deformed objects as well as repetitive patterns. For each pair, we generated candidate correspondences using the MSER[1] and hessian-affine detector[2] and the SIFT descriptor[2]. According to the distance of 128-dim SIFT descriptor[3], the best 1000 candidate matches were collected , allowing multiple correspondences for each feature. Then, the ground truths were manually labeled for all the candidates. To measure the dissimilarity between two candidate region correspondences (i,a) and (j,b), we adopted the affine transfer error function d_ia;jb proposed in our paper. For evaluation, the recall and precision rates are computed for each image, and the result of ACC was compared with those of SM[4], GS[5], and UNN[3] as a baseline. As shown in the results, ACC achieves successful many-to-many object matching under intra-cluster one-to-one constraint, yielding even higher recall rate than the other methods. The average performances of all the methods on the dataset are reported in Table 1. ACC significantly outperforms SM, GS, and UNN in both recall and precision rates. Note that, because the ground truths are generously labeled allowing one-to-many or many-to-one relationships, the overall recall rates are underestimated. Without any matching constraint, ACC achieves an even better recall rate (0.834) due to the generous ground truths; however, it drastically decreases the precision rate (0.479). In contrast, when using the naive one-to-one constraint, ACC produces a significantly worse recall rate (0.267) with a comparable precision (0.729) on this challenging dataset.

[1] J. Matas, O. Chum, M. Urban and T. Pajdla. "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions," BMVC 2002.
[2] K. Mikolajczyk and C. Schmid, "Scale and affine invariant interest point detectors," IJCV 2004.
[3] David G. Lowe. "Object Recognition from Local Scale-Invariant Features", ICCV 1999.
[4] M. Leordeanu and M. Hebert, "A spectral technique for correspondence problems using pairwise constraints," CVPR 2005.
[5] H. Liu and S. Yan, "Common visual pattern discovery via spatially coherent correspondences," CVPR 2010.

Methods	ACC	UNN [3]	SM [4]	GS [5]
Recall (%)	61.3 %	5.9 %	40.1 %	47.7 %
Precision (%)	70.8 %	22.4 %	17.9 %	20.5 %

Table 1. Matching performance on the real image dataset
In the following figures, the algorithm, true matches per detected matches, are captioned. This experiment and the dataset are designed for producing the challenging feature matching problems where unary local features are very ambiguous.


image pair	1000 initial matches, 41 true	UNN: 7 true / 13 detected

ACC: 27 true / 31 detected	SM: 17 true / 156 detected	GS: 27 true / 155 detected

image pair	1000 initial matches, 138 true	UNN: 2 true / 15 detected

ACC: 89 true / 107 detected	SM: 46 true / 248 detected	GS: 56 true / 254 detected

image pair	1000 initial matches, 67 true	UNN: 5 true / 70 detected

ACC: 49 true / 49 detected	SM: 36 true / 375 detected	GS: 41 true / 376 detected

image pair	1000 initial matches, 57 true	UNN: 3 true / 20 detected

ACC: 55 true / 55 detected	SM: 39 true / 178 detected	GS: 48 true / 180 detected

image pair	1000 initial matches, 58 true	UNN: 5 true / 41 detected

ACC: 45 true / 84 detected	SM: 32 true / 137 detected	GS: 40 true / 135 detected

image pair	1000 initial matches, 110 true	UNN: 0 true / 8 detected

ACC: 41 true / 95 detected	SM: 27 true / 176 detected	GS: 30 true / 182 detected

image pair	1000 initial matches, 104 true	UNN: 3 true / 15 detected

ACC: 51 true / 71 detected	SM: 43 true / 249 detected	GS: 42 true / 255 detected

image pair	1000 initial matches, 127 true	UNN: 3 true / 28 detected

ACC: 61 true / 183 detected	SM: 25 true / 166 detected	GS: 28 true / 178 detected

image pair	1000 initial matches, 110 true	UNN: 8 true / 24 detected

ACC: 53 true / 70 detected	SM: 37 true / 225 detected	GS: 37 true / 225 detected

image pair	1000 initial matches, 215 true	UNN: 14 true / 24 detected

ACC: 106 true / 177 detected	SM: 63 true / 205 detected	GS: 70 true / 212 detected