Co-Recognition of Image Pairs
by Data-Driven Monte Carlo Image Exploration



We introduce a new concept of co-recognition for object-level image matching between an arbitrary image pair. Our method augments putative local regionmatches to reliable object-level correspondences without any supervision or prior knowledge on common objects. It provides the number of reliable common objects and the dense correspondences between the image pair. In this paper, generative model for co-recognition is presented. For inference, we propose data-driven Monte Carlo image exploration which clusters and propagates local region matches by Markov chain dynamics. The global optimum is achieved by a guiding force of our data-driven sampling and posterior probability model. In the experiments, we demonstrate the power and utility on image retrieval and unsupervised recognition and segmentation of multiple common objects.

paper thumbnail


ECCV 2008 paper. (pdf, 23.5MB)


pdf file (poster), 2.44MB

pdf file (slides), 8.53MB


Minsu Cho, Young Min Shin, and Kyoung Mu Lee. "Co-Recognition of Image Pairs by Data-Driven Monte Carlo Image Exploration", European Conference on Computer Vision (ECCV), 2008.



Segmentation accuracy

Segmentation accuracy in the paper on the above dataset are as follows.

Mickey's Minnie's Jigsaws Toys Books Bulletins Average
Hit Ratio 80.7 % 83.2 % 80.0 % 83.5 % 94.6 % 91.2 % 85.5 %
Bk Ratio 20.6 % 37.4 % 22.8 % 25.2 % 11.8 % 16.8 % 22.4 %

The detailed explanation is presented in our paper.

A Video Clip of Co-Recognition Process

paper thumbnail Co-Recognition is developed for recognizing and segmenting multiple common objects in an arbitrary image pair without any supervision. For an overview of Co-Recognition, we provide a video clip which shows the process of co-recognition. (click on the image left to download)

A Comparative Example with CoSegmentation [8]

Fig.5 in cosegmentation paper [8] (below) showed that their cosegmentation can be used as more robust similarity measure than conventional global histogram matching.

However, our co-recognition provides considerably better measure than both cosegmentation and global histogram matching. see below.

Moreover, Their method segments only one common region, but ours deals with multiple common object-level regions and dense correspondences (See the figure below), which is not provided by their method. Thus, our method gives more discriminative, accurate, and powerful measure.

A Comparative Example with Co-Saliency matching [10]

Fig.1 in Co-Saliency matching paper [10] (left) demonstrated that their co-saliency matching improves naive local features by integrating segmentation cue. However, note that their method does not cover all the common regions and that the region correspondences are not so correct.

On the contrary, our co-recognition generates more dense and accurate region matching, and the segmentation result is also better. See the figure below.
In addition, we just have performed the same experiment as in [10], the place recognition on ICCV 2005 contest images. The results are as follows.( For details of the experiment, see 6.2 of [10])
As you can see in the figure above , our method outperforms co-saliency matching [10] largely in this experiments ( on both Test4 and Final5 ).

Comparison with Yuan et al.'s Common Visual Pattern Discovery [12]

The figure above shows the input image pair and the result of Yuan et al.'s[12] , borrowed from Fig.1 and 6 of [12]. Their result is very coarse. Moreover, it provides neither object-level correspondence nor its dense correspondence. Hit ratio and Background ratio are 0.76 and 0.29, respectively.

Our result above shows more accurate segmentation even with object-level correspondences and their dense correspondences. Hit ratio and Background ratio are 0.91 and 0.17, respectively.

Comparison with Ferrari et al.'s Image Exploration [6]

Ferrari et al.'s image exploration of [6] was proposed for the problem of standard view-based object recognition. Thus, it requires model views each having a single object and the foreground mask. On the contrary, our co-recognition problem deals with multiple common objects without foreground mask. For comparison, we applied our method to their model and test image. Note that it belongs to only sub-problem of ours. See the figure below.

The figure above shows the model and the test image, and the result of image exploration, borrowed from [6].

In the figure above, the left image shows our co-recognition without the foreground mask. It shows comparable result with [6] without foreground mask. For fair comparison, we produced the result using the foreground mask. Our data-driven Monte Carlo image exploration shows better quality than the result of image exploration of [6].


This research is supported in part by: