Unsupervised Matching of Deformable Objects
by Agglomerative Correspondence Clustering
Authors
Created Jul 25, 2009Updated Jul 18, 2011
Abstract
Object-based image matching between unsupervised real-world images is studied in which multiple deformable objects or patterns appear with severe clutter and occlusion. Direct image matching in this unsupervised setup has been a challenging open problem in computer vision research. In such cases, establishing reliable feature-level correspondences is hindered by numerous outliers and ambiguous matches, thereby requiring an approach interleaved with discovering high-level correspondences. In the present paper, a simultaneous feature matching and clustering method is introduced to address the interconnected problems of feature matching, object matching, and outlier elimination in an effective and integrated way. The problem of object-based image matching is formulated as an unsupervised multi-class clustering problem on a set of candidate feature correspondences. A robust dissimilarity measure based on local invariant features is developed, and a novel matching algorithm are proposed in the framework of hierarchical agglomerative clustering. The algorithm leverages both photometric and geometric consistency of features, and handles a significant amount of outliers and deformation, as well as multiple clusters. The experimental evaluation of feature matching, object recognition, and object-based image matching demonstrates that the proposed method is a powerful tool for a wide range of real-world image matching problems.
Paper
Journal version 2011 paper (under review)ICCV 2009 paper (pdf, 1.5MB)
Presentation
ICCV 2009 pdf file (poster), 2.14MBICCV 2009 pdf file (slides), 2.90MB
Dataset
http://www.vision.ee.ethz.ch/~vferrari/datasets.html
Code
MATLAB code (zip, 5.5MB)Updated Jul 18, 2011:
Feature extraction, dissimilarity computation, and
our upgraded algorithm (with intra-cluster 1-to-1 constraints) are all included.
Citation
Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Unsupervised Matching of Deformable Objects by Agglomerative Correspondence Clustering, (under review)
Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Feature Correspondence and Deformable Object Matching via Agglomerative Correspondence Clustering, IEEE International Conference on Computer Vision (ICCV) 2009
Overview
Object-based Image Matching Experiment
Here, we show unsupervised object-based image matching experiments performed on challenging real images. (Refer to our papers for several other experiments.) For evaluation of object-based image matching, 10 image pairs of different objects were collected from several public datasets. As shown in the Figures below, the dataset contains various deformed objects as well as repetitive patterns. For each pair, we generated candidate correspondences using the MSER[1] and hessian-affine detector[2] and the SIFT descriptor[2]. According to the distance of 128-dim SIFT descriptor[3], the best 1000 candidate matches were collected , allowing multiple correspondences for each feature. Then, the ground truths were manually labeled for all the candidates. To measure the dissimilarity between two candidate region correspondences (i,a) and (j,b), we adopted the affine transfer error function dia;jb proposed in our paper. For evaluation, the recall and precision rates are computed for each image, and the result of ACC was compared with those of SM[4], GS[5], and UNN[3] as a baseline. As shown in the results, ACC achieves successful many-to-many object matching under intra-cluster one-to-one constraint, yielding even higher recall rate than the other methods. The average performances of all the methods on the dataset are reported in Table 1. ACC significantly outperforms SM, GS, and UNN in both recall and precision rates. Note that, because the ground truths are generously labeled allowing one-to-many or many-to-one relationships, the overall recall rates are underestimated. Without any matching constraint, ACC achieves an even better recall rate (0.834) due to the generous ground truths; however, it drastically decreases the precision rate (0.479). In contrast, when using the naive one-to-one constraint, ACC produces a significantly worse recall rate (0.267) with a comparable precision (0.729) on this challenging dataset.
[1] J. Matas, O. Chum, M. Urban and T. Pajdla. "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions," BMVC 2002.
[2] K. Mikolajczyk and C. Schmid, "Scale and affine invariant interest point detectors," IJCV 2004.
[3] David G. Lowe. "Object Recognition from Local Scale-Invariant Features", ICCV 1999.
[4] M. Leordeanu and M. Hebert, "A spectral technique for correspondence problems using pairwise constraints," CVPR 2005.
[5] H. Liu and S. Yan, "Common visual pattern discovery via spatially coherent correspondences," CVPR 2010.
Methods | ACC | UNN [3] | SM [4] | GS [5] |
Recall (%) | 61.3 % | 5.9 % | 40.1 % | 47.7 % |
Precision (%) | 70.8 % | 22.4 % | 17.9 % | 20.5 % |
In the following figures, the algorithm, true matches per detected matches, are captioned. This experiment and the dataset are designed for producing the challenging feature matching problems where unary local features are very ambiguous.