2
Landmark Recognition and Retrieval: From 2D to 3D Xian Xiao 1, 2 , Changsheng Xu 1, 2 , Jinqiao Wang 1, 2 , Min Xu 1,3 1 National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China 2 China-Singapore Institute of Digital Media, Singapore, 119615, Singapore 3 University of Technology, Sydney 123 Broadway, NSW 2007, Australia Email: {xxiao, csxu, jqwang}@nlpr.ia.ac.cn, [email protected] ABSTRACT Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In an offline module, firstly, attention-based 3D reconstruction method is proposed to reconstruct sparse 3D landmark models. Secondly, we construct textured 3D landmark model for each sparse 3D landmark model. Finally, a 3D landmark recognizer is built for each landmark based on the 3D landmark model. In online module, query images are recognized by the 3D landmark recognizers using a 2D to 3D matching approach. For each recognized query image, a 3D landmark model and a 3D landmark texture model are presented as a query result. Experimental results demonstrate the effectiveness of our proposed approach. Categories and Subject Descriptors I.4.8 [Scene Analysis]: Object recognition General Terms Algorithms, Performance, Experimentation Keywords Landmark retrieval, 3D Reconstruction, Matching 1. INTRODUCTION The proliferation of social media sharing websites (e.g., Flickr, Facebook, and YouTube) has led to enormous sightseeing images/videos uploaded and spread. For most commercial search engines, the retrieval of landmarks is conducted by matching query keywords to user tags. Nevertheless, as (user tags) these tags are sparse and subjective, in general text-based search cannot satisfy user need, even with the help of sophisticated natural language processing. Visual content-based retrieval offers a new perspective for landmark search. Compared with general image search by visual content, landmark search has its own peculiarity. First, landmark is unique and all pictures about a landmark are shot in the same scene. Second, for landmark pictures originated from the same scene, there could be different presentation styles due to various circumstances during picture capturing, including lighting, viewpoint, zoom and occlusion. These peculiarities make visual landmark retrieval a very challenging problem. The state- of-the-art content-based methods [1][2][3] on landmark retrieval can only return images which are most similar to the query image, while cannot retrieve images of the same landmark with different presentation styles. As discussed above, either text-based retrieval or content-based retrieval are not able to obtain satisfactory results. In this paper, we attempt to improve the experience of landmark retrieval by two steps: 1) building 3D landmark models using landmark Figure 1. Framework of reconstructing and retrieving 3D landmark models. images collected from the web; 2) retrieving a 3D landmark model using an unlabeled landmark image as query. Since we use text keywords of landmarks to search landmark images to reconstruct 3D landmark models, each 3D landmark model has a text label. Therefore, we only use images as query in our work. Fig.1 illustrates the framework of the proposed approach, which consists of two independent modules: offline and online. In the offline module consists of four steps: 1) Iconic image selection: Landmark images are clustered into different clusters by k-means with the global descriptor GIST. A group of images, which are closest to the cluster center, are selected as iconic image. 2) Attention based 3D landmark reconstruction: the visual attention regions of the selected iconic images are detected and used to reconstruct a sparse 3D model of each landmark with the structure-from-motion method. 3) Landmark texture model generation: Through projecting 3D points in a 3D model to iconic images, we are able to obtain the distribution of 3D points in each iconic image. The distribution is utilized to select an image which involves the entire landmark. We use the texture of the selected image above as the 3D surface to produce a 3D landmark texture model. 4) Landmark recognizer construction: Each 3D point in a 3D landmark model corresponds to several 2D SIFT feature points. These 2D SIFT feature points are extracted from different iconic images. Finally, by using all SIFT features corresponding to 3D points, a k-dimensional tree (KD-tree) is constructed to achieve fast landmark image recognition. The KD-tree to a specific landmark is named as a 3D recognizer of this landmark. In the online module, we recognize a query image by 3D landmark recognizer. The matching points between query image and each 3D landmark recognizer are obtained by direct 2D to 3D matching. We select a landmark with the largest number of correct matching points as the retrieval result. For each recognized query image, a 3D landmark model and a 3D landmark texture model are presented. Compared with the existing approaches, the contributions of our work are summarized as follows: 1) we build a 3D landmark recognizer for each landmark to recognize unlabeled landmark images by direct 2D to 3D matching and retrieve a 3D landmark Copyright is held by the author/owner(s). J-HGBU’11, December 1, 2011, Scottsdale, Arizona, USA. ACM 978-1-4503-0998-1/11/12. 77

Landmark Recognition and Retrieval: From 2D to 3D

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Landmark Recognition and Retrieval: From 2D to 3D Xian Xiao1, 2, Changsheng Xu1, 2, Jinqiao Wang1, 2, Min Xu1,3

1National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China 2China-Singapore Institute of Digital Media, Singapore, 119615, Singapore

3University of Technology, Sydney 123 Broadway, NSW 2007, Australia

Email: {xxiao, csxu, jqwang}@nlpr.ia.ac.cn, [email protected]

ABSTRACT Existing landmark retrieval methods cannot provide a comprehensive solution, by which user can view different angles of landmark. In this paper, we propose a novel approach to reconstruct and retrieve 3D landmark models by direct 2D to 3D matching. In an offline module, firstly, attention-based 3D reconstruction method is proposed to reconstruct sparse 3D landmark models. Secondly, we construct textured 3D landmark model for each sparse 3D landmark model. Finally, a 3D landmark recognizer is built for each landmark based on the 3D landmark model. In online module, query images are recognized by the 3D landmark recognizers using a 2D to 3D matching approach. For each recognized query image, a 3D landmark model and a 3D landmark texture model are presented as a query result. Experimental results demonstrate the effectiveness of our proposed approach.

Categories and Subject Descriptors I.4.8 [Scene Analysis]: Object recognition

General Terms Algorithms, Performance, Experimentation

Keywords Landmark retrieval, 3D Reconstruction, Matching

1. INTRODUCTION The proliferation of social media sharing websites (e.g., Flickr, Facebook, and YouTube) has led to enormous sightseeing images/videos uploaded and spread. For most commercial search engines, the retrieval of landmarks is conducted by matching query keywords to user tags. Nevertheless, as (user tags) these tags are sparse and subjective, in general text-based search cannot satisfy user need, even with the help of sophisticated natural language processing. Visual content-based retrieval offers a new perspective for landmark search. Compared with general image search by visual content, landmark search has its own peculiarity. First, landmark is unique and all pictures about a landmark are shot in the same scene. Second, for landmark pictures originated from the same scene, there could be different presentation styles due to various circumstances during picture capturing, including lighting, viewpoint, zoom and occlusion. These peculiarities make visual landmark retrieval a very challenging problem. The state-of-the-art content-based methods [1][2][3] on landmark retrieval can only return images which are most similar to the query image, while cannot retrieve images of the same landmark with different presentation styles.

As discussed above, either text-based retrieval or content-based retrieval are not able to obtain satisfactory results. In this paper, we attempt to improve the experience of landmark retrieval by two steps: 1) building 3D landmark models using landmark

Figure 1. Framework of reconstructing and retrieving 3D landmark models. images collected from the web; 2) retrieving a 3D landmark model using an unlabeled landmark image as query. Since we use text keywords of landmarks to search landmark images to reconstruct 3D landmark models, each 3D landmark model has a text label. Therefore, we only use images as query in our work.

Fig.1 illustrates the framework of the proposed approach, which consists of two independent modules: offline and online.

In the offline module consists of four steps: 1) Iconic image selection: Landmark images are clustered into

different clusters by k-means with the global descriptor GIST. A group of images, which are closest to the cluster center, are selected as iconic image.

2) Attention based 3D landmark reconstruction: the visual attention regions of the selected iconic images are detected and used to reconstruct a sparse 3D model of each landmark with the structure-from-motion method.

3) Landmark texture model generation: Through projecting 3D points in a 3D model to iconic images, we are able to obtain the distribution of 3D points in each iconic image. The distribution is utilized to select an image which involves the entire landmark. We use the texture of the selected image above as the 3D surface to produce a 3D landmark texture model.

4) Landmark recognizer construction: Each 3D point in a 3D landmark model corresponds to several 2D SIFT feature points. These 2D SIFT feature points are extracted from different iconic images. Finally, by using all SIFT features corresponding to 3D points, a k-dimensional tree (KD-tree) is constructed to achieve fast landmark image recognition. The KD-tree to a specific landmark is named as a 3D recognizer of this landmark.

In the online module, we recognize a query image by 3D landmark recognizer. The matching points between query image and each 3D landmark recognizer are obtained by direct 2D to 3D matching. We select a landmark with the largest number of correct matching points as the retrieval result. For each recognized query image, a 3D landmark model and a 3D landmark texture model are presented.

Compared with the existing approaches, the contributions of our work are summarized as follows: 1) we build a 3D landmark recognizer for each landmark to recognize unlabeled landmark images by direct 2D to 3D matching and retrieve a 3D landmark

Copyright is held by the author/owner(s). J-HGBU’11, December 1, 2011, Scottsdale, Arizona, USA. ACM 978-1-4503-0998-1/11/12.

77

model; 2) we enrich the experience of landmark retrieval by returning 3D landmark texture models.

2. EXPERIMENTS We conduct various experiments to validate the effectiveness of our proposed approach. In the first experiment, the proposed recognition approach is compared with a classifier based method [6] and a threshold based method [7]. The method in [6] detected landmark regions of training images and extracted local features (SIFT) from landmark regions to train recognizers. This method avoided noisy and redundant information in training classifiers and improved the performance of recognition. The method in [7] constructed an iconic graph to recognize query images with global features GIST. The second experiment illustrates several landmark retrieval results using landmark images as queries.

Our experimental dataset consists of 217 landmarks. 167,201 images are collected from Google image and Flickr by using the landmark name [4] as keyword. For each landmark collection after data filtering, we cluster the images using k-means with k = 50 which is experimentally set. From each cluster of each landmark, an iconic image is selected to reconstruct 3D landmark model and 3D landmark recognizer. Then, we randomly select a landmark image except for the iconic image from each cluster of each landmark to construct a test dataset and 10,731 images are selected. In addition, 20,000 non-landmark images are added to the test dataset. Finally, the test dataset consists of 30,731 images.

2.1 Landmark Recognition The performance of landmark recognition is evaluated by plotting a recall/precision curve of the test images ordered from the highest to lowest score. Comparisons on the performance of three different landmark recognition methods are shown in Fig.2. The three methods are: 1) our proposed approach, 2) a classifier based method [6], 3) a recognizer based method [7]. As shown in Fig.2, our proposed method achieved the best performance, which demonstrates the effectiveness of our proposed method. The reason may be that our method avoids the involvement of noisy and redundant information and utilizes 2D to 3D matching to obtain accurate feature matching.

2.2 3D Landmark Retrieval Some retrieval results using same query images are illustrated in Fig.4. In Fig. 4, we provide a 3D landmark model and a 3D landmark texture model for each recognized query image. The 3D models corresponding to query image are constructed in offline module. In the offline constructed 3D models, compared with Fig. 3, we can view the landmark from any viewpoints and scales by rotating and zooming in/out the model in a 3D space. This is an attractive landmark retrieval experience.

3. CONCLUSIONS In this paper, we have presented a novel approach for 3D landmark model reconstruction and retrieval. In order to achieve high performance of landmark retrieval, we build 3D landmark recognizers to recognize unlabeled landmark images by 2D to 3D matching. The retrieved 3D landmark model and 3D landmark texture model enrich the experience of landmark retrieval. Experimental results on landmark recognition and landmark retrieval have demonstrated the effectiveness of our method.

In the future, we will investigate landmark rendering and interactively landmark touring in virtual reality to further complement and enhance the proposed approach.

Figure 2. Overall performance of landmark image recognition.

Figure 4. Our landmark retrieval results.

4. Acknowledgement The research is supported by National Natural Science

Foundation of China (Grant No.: 60970092, 60905008, 60833006, 61003161) and National Basic Research Program (973) of China under contract No.2010CB327905.

5. REFERENCE [1] Y. Avrithis, Y. Kalantidis, G. Tolias, E. Spyrou. Retrieving

landmark and non-landmark images from community photo collections. In: ACM MM, 2010.

[2] Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, S. Kollias. VIRaL: Visual image retrieval and localization. In: Multimedia Tools and Applications, 2011.

[3] E. Gawes, C.G.M. Snoek. Landmark image retrieval using visual synonyms. In: ACM MM, 2010.

[4] http://en.wikipedia.org/wiki/List_of_landmarks [5] X. Xiao, C.S. Xu, Y. Rui. Video based 3D reconstruction

using spatio-temporal attention analysis. In ICME, 2010. [6] X. Xiao, C.S. Xu, J.Q. Wang. Landmark image classification

using 3D point clouds. In ACM MM, 2010. [7] X. Li, C. Wu, C. Zach, S. Lazebnik, and J.M. Frahm.

Modeling and recognition of landmark image collections using iconic scene graphs. In ECCV, 2008.

78