Unsupervised Matching of Deformable Objects
by Agglomerative Correspondence Clustering


Created Jul 25, 2009
Updated Jul 18, 2011


Object-based image matching between unsupervised real-world images is studied in which multiple deformable objects or patterns appear with severe clutter and occlusion. Direct image matching in this unsupervised setup has been a challenging open problem in computer vision research. In such cases, establishing reliable feature-level correspondences is hindered by numerous outliers and ambiguous matches, thereby requiring an approach interleaved with discovering high-level correspondences. In the present paper, a simultaneous feature matching and clustering method is introduced to address the interconnected problems of feature matching, object matching, and outlier elimination in an effective and integrated way. The problem of object-based image matching is formulated as an unsupervised multi-class clustering problem on a set of candidate feature correspondences. A robust dissimilarity measure based on local invariant features is developed, and a novel matching algorithm are proposed in the framework of hierarchical agglomerative clustering. The algorithm leverages both photometric and geometric consistency of features, and handles a significant amount of outliers and deformation, as well as multiple clusters. The experimental evaluation of feature matching, object recognition, and object-based image matching demonstrates that the proposed method is a powerful tool for a wide range of real-world image matching problems.

paper thumbnail


Journal version 2011 paper (under review)
ICCV 2009 paper (pdf, 1.5MB)


ICCV 2009 pdf file (poster), 2.14MB

ICCV 2009 pdf file (slides), 2.90MB

  • The comparative experiment on object recognition is summarized in here.


  • The ETHZ toys dataset used in our paper are available from the link below.
  • The images used in Sec 5.2 of ICCV paper can be available here (zip, 6.85MB).


    MATLAB code (zip, 5.5MB)
    Updated Jul 18, 2011:
    Feature extraction, dissimilarity computation, and
    our upgraded algorithm (with intra-cluster 1-to-1 constraints) are all included.


    Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Unsupervised Matching of Deformable Objects by Agglomerative Correspondence Clustering, (under review)

    Minsu Cho, Jungmin Lee, Kyoung Mu Lee. Feature Correspondence and Deformable Object Matching via Agglomerative Correspondence Clustering, IEEE International Conference on Computer Vision (ICCV) 2009


    Agglomerative process of the proposed algorithm. Each correspondence is represented by the pair of dots with the same color. In (a), all initial candidate correspondences are shown. In (b), (c) and (d), only the clusters having more than one correspondence are shown for observing growing clusters. Features in each cluster is delineated by their convex hull. In (d), two largest clusters, red and green, detects two true object correspondences. The final result is shown above.

    Object-based Image Matching Experiment

    Here, we show unsupervised object-based image matching experiments performed on challenging real images. (Refer to our papers for several other experiments.) For evaluation of object-based image matching, 10 image pairs of different objects were collected from several public datasets. As shown in the Figures below, the dataset contains various deformed objects as well as repetitive patterns. For each pair, we generated candidate correspondences using the MSER[1] and hessian-affine detector[2] and the SIFT descriptor[2]. According to the distance of 128-dim SIFT descriptor[3], the best 1000 candidate matches were collected , allowing multiple correspondences for each feature. Then, the ground truths were manually labeled for all the candidates. To measure the dissimilarity between two candidate region correspondences (i,a) and (j,b), we adopted the affine transfer error function dia;jb proposed in our paper. For evaluation, the recall and precision rates are computed for each image, and the result of ACC was compared with those of SM[4], GS[5], and UNN[3] as a baseline. As shown in the results, ACC achieves successful many-to-many object matching under intra-cluster one-to-one constraint, yielding even higher recall rate than the other methods. The average performances of all the methods on the dataset are reported in Table 1. ACC significantly outperforms SM, GS, and UNN in both recall and precision rates. Note that, because the ground truths are generously labeled allowing one-to-many or many-to-one relationships, the overall recall rates are underestimated. Without any matching constraint, ACC achieves an even better recall rate (0.834) due to the generous ground truths; however, it drastically decreases the precision rate (0.479). In contrast, when using the naive one-to-one constraint, ACC produces a significantly worse recall rate (0.267) with a comparable precision (0.729) on this challenging dataset.

    [1] J. Matas, O. Chum, M. Urban and T. Pajdla. "Robust Wide Baseline Stereo from Maximally Stable Extremal Regions," BMVC 2002.
    [2] K. Mikolajczyk and C. Schmid, "Scale and affine invariant interest point detectors," IJCV 2004.
    [3] David G. Lowe. "Object Recognition from Local Scale-Invariant Features", ICCV 1999.
    [4] M. Leordeanu and M. Hebert, "A spectral technique for correspondence problems using pairwise constraints," CVPR 2005.
    [5] H. Liu and S. Yan, "Common visual pattern discovery via spatially coherent correspondences," CVPR 2010.

    Methods ACC UNN [3] SM [4] GS [5]
    Recall (%) 61.3 % 5.9 % 40.1 % 47.7 %
    Precision (%) 70.8 % 22.4 % 17.9 % 20.5 %
    Table 1. Matching performance on the real image dataset

    In the following figures, the algorithm, true matches per detected matches, are captioned. This experiment and the dataset are designed for producing the challenging feature matching problems where unary local features are very ambiguous.

    image pair
    1000 initial matches, 41 true
    UNN: 7 true / 13 detected
    ACC: 27 true / 31 detected
    SM: 17 true / 156 detected
    GS: 27 true / 155 detected
    image pair
    1000 initial matches, 138 true
    UNN: 2 true / 15 detected
    ACC: 89 true / 107 detected
    SM: 46 true / 248 detected
    GS: 56 true / 254 detected
    image pair
    1000 initial matches, 67 true
    UNN: 5 true / 70 detected
    ACC: 49 true / 49 detected
    SM: 36 true / 375 detected
    GS: 41 true / 376 detected
    image pair
    1000 initial matches, 57 true
    UNN: 3 true / 20 detected
    ACC: 55 true / 55 detected
    SM: 39 true / 178 detected
    GS: 48 true / 180 detected
    image pair
    1000 initial matches, 58 true
    UNN: 5 true / 41 detected
    ACC: 45 true / 84 detected
    SM: 32 true / 137 detected
    GS: 40 true / 135 detected
    image pair
    1000 initial matches, 110 true
    UNN: 0 true / 8 detected
    ACC: 41 true / 95 detected
    SM: 27 true / 176 detected
    GS: 30 true / 182 detected
    image pair
    1000 initial matches, 104 true
    UNN: 3 true / 15 detected
    ACC: 51 true / 71 detected
    SM: 43 true / 249 detected
    GS: 42 true / 255 detected
    image pair
    1000 initial matches, 127 true
    UNN: 3 true / 28 detected
    ACC: 61 true / 183 detected
    SM: 25 true / 166 detected
    GS: 28 true / 178 detected
    image pair
    1000 initial matches, 110 true
    UNN: 8 true / 24 detected
    ACC: 53 true / 70 detected
    SM: 37 true / 225 detected
    GS: 37 true / 225 detected
    image pair
    1000 initial matches, 215 true
    UNN: 14 true / 24 detected
    ACC: 106 true / 177 detected
    SM: 63 true / 205 detected
    GS: 70 true / 212 detected