Plant identification for ImageClef contest 2013

Project n° 1 Topic: Image and video processing

Plant identification for ImageClef contest 2013

Context: Plant species identification is usually a very difficult task, even for professionals (such as farmers or wood exploiters) or for the botanists themselves. Using image retrieval technologies is nowadays considered by botanists as a promising direction in reducing this taxonomic gap. Recently, a number of work have been proposed for plant species identification based on image retrieval. In order to evaluate the performance of these works, since 2011, ImageClef has proposed Plant identification task.

Figure 1: Images of different organs of the plant with different views in two background conditions:

sheetasbackground and natural background in the database of ImageClef 2013 Main novelties of the ImageClef, plant identification task 2013, compared to last year are the following [1]: - more species: the number of species will be this year about 250, which is an important step towards covering the entire flora of a given region. - multi-view plant retrieval vs. leaf-based retrieval: query and test pictures will now cover different organs such as leaf, flower, stem, fruit, and entire (or views) of the individual plants and not only their leaves. The performance of the plant identification method will be evaluated by the following metric.

where U : number of users (who have at least one image in the test data) Pu : number of individual plants observed by the u-th user Nu,p : number of pictures taken from the p-th plant observed by the u-th user Su,p,n : score between 1 and 0 equals to the inverse of the rank of the correct species (for the n-th picture taken from the p-th plant observed by the u-th user) Computer Vision department of MICA institute has participated to this task in 2013 and we have obtained a good result for leaf images with sheet as background [2]. Our work is based on interested points (SURF) and Bag of Word technique. However, for natural background images, the results are still limited. The main reason is that for the interested points have been detected in the regions that do not contain object of interest. The main objective of this work is to improve our algorithm. Work: Theory:

• Study plant identification task of ImageClef 2013 • Improve the existing algorithm for plant identification

Practice: • Develop and evaluate the proposed algorithm with ImageClef 2013 dataset

Requirements: This subject is dedicated to Vietnamese students as well as foreigner students at Master degree of Signal and Image processing option. The students who have a fairly good knowledge about image processing and C++ programming are privileged.

Student profile:

• Engineer student (final 5th year) or Master student from computer sciences field • Vietnamese or French student (or from other countries)

Supervisors: This internship will be taken place in MICA, under the supervision of

• Le Thi Lan, Researcher/Ph.D in MICA, Vietnam, email: [email protected]

References 1. http://www.imageclef.org/2013/plant 2. Le Thi Lan, Pham Ngoc Hai, MICA at ImageClef 2013 Plant Identification Task, CLEF 2013 conference, Valencia.


Plant identification system for Android

Context: Developing a plant identification system for a personal computer with average hardware power is a challenge. However, building a system for plant identification using mobile devices is an even more sophisticated process as these devices have many limitations in terms of size, hardware functionalities, storage, etc. Even many works have been proposed for automatic plant identification, there are few plant identification applications on the market. To the best of our knowledge, Leafsnap [1] is the first automatic plant identification application. However, this application is dedicated to iOS users and is working with tree species of the Northeastern United States. Today, a huge number of Android users makes an interesting market for developing plant identification for Android. In MICA, we have developed a plant identification system for Android [2]. Figure 1 and Figure 2 shows the architecture and some snapshot of this system.

Figure 1: Leaf based identification system for Android

(a) (b) (c)

Figure 2: Some screen photos of leaf based plant identification application for Android (a) Leaf image captured by users, (b) Identification results including scientific name, common name and other information of the identified

plant (c) Results screen with more images and information of the identified plant.

However, the current system has the following characteristics: • All image processing modules are in the server; • This system supports only plant identification based on leaf image; • The metadata for each species is still simple

The main objective of this project is to improve the existing system by: • Analyze and propose the relevant architecture • Deploy plant identification based on other parts of the plant such as flower, fruit. • Extend the metadata of the species • Evaluate the proposed system with ImageClef 2013 database


Student profile:




References 1. Neeraj, K., et al., Leafsnap: A Computer Vision System for Automatic Plant Species Identification., in

ECCV 20122012, Springer. p. 502-516. 2. Quang-Khue Nguyen, Thi-Lan Le, Ngoc-Hai Pham, Leaf based plant identification system for Android

using SURF features in combination with Bag of Words model and supervised learning, ATC 2013

Project n° 3 Topic: Image Processing and Computer Vision

Building an image database for plant species in VietNam

Context: If agricultural development is to be successful and biodiversity is to be conserved, then the accurate knowledge of the identity, geographic distribution and uses of plants are essential. Unfortunately, such basic information is often only partially available for professional stakeholders, teachers, scientists and citizens, and often incomplete for ecosystems that possess the highest plant diversity. So that simply identifying plant species is usually a very difficult task, even for professionals (such as farmers or wood exploiters) or for the botanists themselves. Using image retrieval technologies is nowadays considered by botanists as a promising direction in reducing this taxonomic gap. In order to evaluate the image retrieval technologies for plant species identification, since 2011, ImageClef has built a common plant species images in Plant identification task [1]. These images are taken by different people. Each image has a xml file (see Figure 1 and Figure 2). However, this task contains ony the images of the plants in France.

Figure 1: Image of the flower

<?xml version="1.0" encoding="UTF-8"?> <Image> <FileName>29300.jpg</FileName> <IndividualPlantId>1883</IndividualPlantId> <Date>16/08/09</Date> <Locality>France - ChÃ¢teauneuf-De-Randon</Locali ty> <GPSLocality> <Longitude>3.675263</Longitude> <Latitude>44.641705</Latitude> </GPSLocality> <Author>Gregoire Duche</Author> <Organization>Tela Botanica</Organization> <Type>NaturalBackground</Type> <Content>Flower</Content> <ClassId>Epilobium angustifolium</ClassId> <Taxon> <Regnum>Plantae</Regnum> <Class>Equisetopsida C. Agardh</Class> <Subclass>Magnoliidae NovÃ¡k ex Takht.</Subclas s> <Superorder>Rosanae Takht.</Superorder> <Order>Myrtales Juss. ex Bercht. & J. Presl </Order> <Family>Onagraceae Juss.</Family> <Genus>Epilobium L.</Genus>

<Species>Epilobium angustifolium L.</Species> </Taxon> <VernacularNames>Fireweed</VernacularNames> <Year>ImageCLEF2013</Year> </Image>

Figure 2: The xml of the image in Figure 1

Vietnam is a country with a great biodiversity. Our work aims at building a plant identification system for Vietnamese users. In order to do this, we have to build an image of plant species in Vietnam. This is the main objective of this project. Work: Theory:

• Define the plant species of interest in VietNam • Define the metadata for each image of plant

Practice: • Collect image database and metadata of the defined plant species • Evaluate the database

Requirements: This subject is dedicated to Vietnamese students as well as foreigner students at Master degree of Signal and Image processing option. The students who have a fairly good knowledge about image processing and C++ programming are privileged. Student profile: - Engineer student (final 5th year) or Master student from computer sciences field - Vietnamese or French student (or from other countries)

Supervisors:

Thi-Lan Le, Computer Vision Department, MICA. Email: [email protected]

References:

[1] http://www.imageclef.org/2013

Project n° 4

Topic: Image and video processing

Human detection and tracking from 3D camera network

Context: Human detection and tracking is an essential requirement for surveillance systems employing one or more sensors, together with computer systems. Nowadays, we can see a lot of applications enabled by human tracking such as Security and surveillance - to recognize people, to provide better sense of security using visual information; Traffic management - to analyze flow, to detect accidents. Tracking humans is of interest for a variety of applications such as surveillance, activity monitoring and gait analysis. With the limited field of view (FOV) of video cameras, it is necessary to use multiple, distributed cameras to completely monitor a site. Typically, surveillance applications have multiple video feeds presented to a human observer for analysis. However, the ability of humans to concentrate on multiple videos simultaneously is limited. Therefore, there has been an interest in developing computer vision systems that can analyze information from multiple cameras simultaneously and possibly present it in a compact symbolic fashion to the user.

Figure 1: Multiple people detection and tracking

Figure 2: Information acquired from Kinect Sensor: RGB, Depth and Skeleton

At MICA, we have implemented human detection and tracking from one camera. The problem here is when we have more than one camera observing the same environment, how to determine which camera give the best view of the environment and/or how to combine the information coming from multiple camera in order to best detect and tracking human in the scene. In addition, in this project, we would like to use 3D cameras, more specifically Kinect sensors. These sensors provide not only RGB but also depth information that makes the detection and tracking more robust. Works to accomplish: Theory:

• Study methods for fast human detection and tracking from RGB-D camera • Study methods for fusing visual information from multiple camera

Practice: • Design the system of human detection and tracking from multiple 3D cameras. This system needs to

provide the following functionalities: o Manage and Display visual information from some required camera o Detect human from each frame of a required camera o Real-time tracking people o Fuse detection and tracking results from several camera

• Test and evaluate the system in a real condition

Requirements: Knowledge: Preferable

• Image processing, computer vision, sensor network Skill:

• C/C++ language, Microsoft Visual Studio Student profile:

• Engineer, master student from computer sciences field • Vietnamese or French student (or from other countries)

Contacts and Supervisors:

• Dr. Thanh-Hai Tran, email : [email protected] References: [1] Sohaib Khan, et al., Human Tracking in Multiple Cameras, in ICCV. [2] Taketoshi Mori, et al., Multiple Persons Tracking with Data Fusion of Multiple Cameras and Floor Sensors Using Particle Filters, in Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications. 2007. [3] M. Luber, L. Spinello, and K. O. Arras, Learning to Detect and Track People in RGBD Data. Workshop on RGB-D : Advanced Reasoning with Depth Cameras, 2011. [4] http://home.iitk.ac.in/~akar/cs397/Skeletal%20Tracking%20Using%20Microsoft%20Kinect.pdf [5] http://dlab.sit.kmutt.ac.th/event/InterConfer/Orasa/human%20gesture%20recognition-JCSSE12-orasa.pdf


Dynamic hand gesture recognition from Kinect sensor

Context: Humans are capable of recognizing patterns like hand gestures after seeing just one example. Can machines do that too? In this project, we would perform hand gesture recognition with only one shot learning. The visual information captured from Kinect sensor, that gives RGB and depth images.

Figure 3. Color rendering of depth images from the gesture challenge database were recorded with a KinectTM camera and its components

Figure 4. Two informations provided by Kinect sensor (Depth – Left, RGB – Right)

Figure 5. Some examples of gestures

The objective is given a sequence of frames of a certain dynamic hand gesture; we try to recognize which type of hand gesture it could belong to. In fact, Microsoft has organized a contest (CHALEARN) about this problem. CHALEARN is a new challenge that focuses on recognizing gestures from video data recorded by a Microsoft KinectTM camera, which provides both RGB images and depth images obtained from an infrared sensor. KinectTM. This contest is organized in two rounds, one in conjunction with the CVPR conference (Providence, Rhode Island, USA, June 2012) and another with the ICPR conference (Tsukuba, Japan, November 2012). In this project, we propose to use data from the contest for training and testing the algorithm. The dataset of CHALEARN is very impressive. We summarize here some information about this dataset. They are portraying a single user in front of a fixed camera, interacting with a computer by performing gestures to play a game, remotely control appliances or robots, or learn to perform gestures from educational software. The they have collected a large dataset of gestures using the Microsoft Software Development Kit (SKD) interfaced to Matlab

Figure 6: Examples of dynamic hand gestures in the dataset

Last year, we have worked on this problem. We have tried several methods for hand representation such as DTW (Time Warping), PCA, GIST on Motion History Image (MHI) of the sequence of hand gesture and tested on a subset of the overall database. This year, we would like to improve the algorithm (DTW or using Kernel descriptor) and test with entire database. Work: Theory:

• Study dynamic Hand gesture recognition methods

• Propose a method for dynamic hand gesture recognition from Kinect sensor Practice:

• Temporal segmentation • Gesture representation • Machine learning for hand gesture recognition • Test and evaluate with CHALEARN data


Student profile:



• Tran Thi Thanh Hai, Researcher/Ph.D in MICA, Vietnam, email: [email protected]

References [1] Z. Ren, J. Meng, and J. Yuan, Depth Camera Based Hand Gesture Recognition and its Applications in Human-Computer-Interaction, in 8th International Conference on Information, Communications and Signal Processing (ICICS) 2011. p. 1-5. [2] Z. Ren, et al., Robust Hand Gesture Recognition with Kinect Sensor. MM '11 Proceedings of the 19th ACM international conference on Multimedia, 2011: p. 759-760 [3] Poonam Suryanarayan, Anbumani Subramanian, and Dinesh Mandalapu, Dynamic Hand Pose Recognition using Depth Data International Conference on Pattern Recognition, 2010 [4] Michael Van den Bergh and Luc Van Gool, Combining RGB and ToF Cameras for Real-time 3D Hand Gesture Interaction, in IEEE Workshop on Applications of Computer Vision (WACV). 2011. p. 66-72. [5] M.r R. Malgireddy, I. Inwogu, and V. Govindaraju, Temporal Bayesian Model for Classifying, Detecting and Localizing Activities Manavender R. Malgireddyin Video Sequences, in CVPR Workshop on Gesture Recognition. 2012. [6]C. Keskin, et al., Randomized Decision Forests for Static and Dynamic Hand Shape Classification, in CVPR Workshop on Gesture Recognition 2012. [7] D. Wu, F. Zhu, and L. Shao, One shot learning gesture recognition from RGBD images, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),. 2012. p. 7-12. [8] Yannick L. Gweth, Christian Plahl, and Hermann Ney. Enhanced Continuous Sign Language Recognition using PCA and Neural Network Features. in CVPR Workshop on Gesture Recognition. 2012 [9] I. Guyon, et al., ChaLearn Gesture Challenge: Design and First Results, in CVPR Workshop on Gesture Recognition. 2012


Vision based localization

Context: Localize human and object in environment is an essential task for many further applications such as navigation or services adapted to user profile. This project is described in the context of the project VLIR in which we want to aid visually impaired people to navigate in environment using visual sensor.

Figure 7: Visual information for navigational aid

Figure 8: Example of generated map using visual sensor for localization

The blind person, bringing a camera phone captures visual information surrounding him. This information will be analyzed (by a remote machine probably) then a characterization of environment will be described to the person. In addition, we will use information from camera to determine the path and localize the position of the person. Work: Theory:

• Study localization methods • Propose a method localization using multiple mobile camera sensors

Practice: • Implementation of the method • Test under real condition with real blind persons


Student profile:



• Tran Thi Thanh Hai, Researcher/Ph.D in MICA, Vietnam, email: [email protected] • Nguyen Quoc Hung, Ph.D student in MICA

References G.Bishop and G.Welch (2001). An Introduction to the Kalman Filter, University of North Carolina, Chapel Hill, NC 27599-3175. Hugh Durrant-Whyte and and Tim Bailey (2006). "Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms." Robotics & Automation Magazine, IEEE (Volume:13 , Issue: 2 ). N. Muhammad, D. Fofi, et al. (2009). "Current state of the art of vision based SLAM."Image Processing: Machine Vision Applications II. Edited by Niel, Kurt S.; Fofi, David. Proceedings of the SPIE, Volume 7251 (2009)., article id. 72510F, 12 pp. (2009). Stephen Se, David G. Lowe, et al. (2005). "Vision-Based Global Localization and Mapping for Mobile Robots." IEEE TRANSACTIONS ON ROBOTICS, 21(3): 364-. David Van Hamme, Peter Veelaert, Wilfried Philips: Robust Visual Odometry Using Uncertainty Models. ACIVS 2011: 1-12 David Van Hamme, Peter Veelaert, Wilfried Philips: Robust monocular visual odometry by uncertainty voting. Intelligent Vehicles Symposium 2011: 643-647


Human activity recognition with Kinect device

Context: Nhận dạng hoạt động là một bài toán nhận được nhiều sự quan tâm trong các năm gần đây. Để nhận dạng hoạt động, trước tiên các đặc trưng của hoạt động cần phải được trích chọn. Một đặc điểm quan trọng của đặc trưng về hoạt động là tính động của các hoạt động (các hoạt động thường diễn ra trong một khoảng thời gian nhất định). Để mô tả được tính động này một số đặc trưng thường được trích chọn như MHI (motion histogram image) hay MEI (motion energy image). Sau khi trích chọn được các đặc trưng này, các thuật toán nhận dạng sẽ được áp dụng để phân loại một hành động diễn ra vào một tập hành động nào đó đã biết trước. .

Hình 1: Đặc trưng MHI trích chọn trên video [1]

Thiết bị Kinect cùng với Xbox ra đời làm thay đổi hoàn toàn quan niệm về trò chơi và giải trí. Ngày nay, với khả năng cung cấp nhiều nguồn thông tin Kinect ngày càng được xem xét để áp dụng trong nhiều lĩnh vực, phục vụ đời sống của con người như y tế, giáo dục…Đề tài hướng đến sử dụng Kinect để nhận dạng các hoạt động diễn ra thường ngày của con người.

Hình 2: Kinect và các thông tin thu nhận được từ Kinect

Works to accomplish: Theory:

• Study on motion features • Study on classification method for motion features

Pratice: • Implementation of one method for human activity recognition using Kinect • Test and evaluation


Student profile:



• Le Thi Lan, Researcher/Ph.D in MICA, Vietnam, email: [email protected] • Tran Thi Thanh Hai, Researcher/Ph.D in MICA, Vietnam, email: [email protected]

References [1] Md. Atiqur Rahman Ahad, J. K. Tan, H. Kim, Motion history image: its variants and applications, Machine

Vision and Application.

Project n° 8 Topic: Image Processing and Computer Vision

Casting Yourself in Emoticons by using Facial Synthesis & Expression Techniques

Context: Quickly developing of social network such as Facebook on Mobile device (smart phone, PDA, notebook) attracts a huge number of members, particularly, teenage members. To make communicating between members more attractively, the applications such as chatting, updating status, uploading avatar in social networks usually provide a special language, which names the emoticons language. This language is to express status sad, happiness, wondering ... In these applications, the emoticons are usually simplified by several sketches of the human face. Thus, it does not present characteristics of a member itself. On the other hand, almost mobile devices now are associated with a front-face camera. A user easily captures his face using this camera. Utilizing this capacity, we propose a research allow a user to be able to create an emoticon language himself. The main technologies contain combinations of facial caricature and facial synthesis methods. The facial caricature/facial illustration techniques (Fig. 1) were proposed and evaluated in several research [1,2]. However, these techniques have two main limitations: 1. They utilize only one facial image captured from a single-view. Because of this limitation, the illustrated image could not express status of face with different view-point such as look-down, look-up, side-view. 2. The current facial caricature/facial illustration techniques could not create facial faces that include dynamic expressions such as wink at eyes, open mouth in smile status. Therefore, this research will extend current techniques of facial caricature/facial illustration by utilizing facial synthesis and facial animations techniques.

Facial synthesis techniques have been developed in field of computer graphic. For example, to synthesis face from different viewpoints and to express dynamic activities of eye, mouth, works in [3] utilized facial images captured from two or several poses (see Fig.2). However, [3] required marking control points between (at least) two or several frames; it may raise boring/inconvenience for markers, or impacts to quality of the synthesized images. Deploying the algorithms of [3] on mobile devices can overcome this problem because it is easier to human interact than that deploying on desktop PC. Therefore, the purpose of in this research is to take advantages of mobile devices (such as touch screen) in order to create synthesized images more attractively. Work: Theory:

• Study facial caricature/facial illustration, techniques

(b) (a) Fig. 1: Facial caricature techniques. (a) A neutral face captured from a font face camera. (b) A

facial caricature of (a). (Images are adopted in [1])

• Study combination of facial caricature and facial synthesis to present human status with dynamics expressions, and multi-view directions

Practice: • A method to take advantages of facial synthesis on mobile device • A method embed the results to applications on social networks

Fig. 2: Works in [3] to synthesis image with different viewpoint and various statues. (a) Facial images captured with two different viewpoints. (b) Control points are marked on a reference image. (c) The synthesized images

with different viewpoints and various statuses. (Images are adopted from [3]) Requirements: This subject is dedicated to Vietnamese students as well as foreigner students at Master degree of Signal and Image processing option. The students who have a fairly good knowledge about image processing and C++ programming are privileged. Student profile: - Engineer student (final 5th year) or Master student from computer sciences field - Vietnamese or French student (or from other countries)

Supervisors:

Vu Hai, Computer Vision Department, MICA. Email: [email protected]

References:

[1]. Junfa Liu et al., “Creative Cartoon Face Synthesis System forMobile Entertainment”, PCM 2005, LNCS 3768, pp. 1027-1038 [2]. B. Gooch et al., “Human facial illustrations: Creation and psychophysicalEvaluation”, ACM Trans. Graph.23, 1 (2004), 27–44. [3]. Y. Mukaigawa et al., “Facial Animation from several images”, Proc. of International Symposium on Real-Time Imaging and Dynamic Analysis (ISPRS'98), Part 5, pp.906-911, Jun.1998

(a) (b) (c)

Project n° 9 Topic: Image processing and Computer Vision

Multiple Objects Detection and Segmentation in an Image Sequence based on Coherent Probabilistic Model

Context Image sequences captured from a video camera that is attached on a mobile device usually contain background regions such as building, tree, lake, and dynamic objects such as means of transportations (car, moving subjects) (See Fig. 1). This research proposes algorithms to automatically detect and segment foreground objects from backgrounds in the captured image sequences.

Fig. 1: Two images are captured at time t1 and t2. The research will propose algorithms to automatically detect

and segment statics objects (building, tree) and dynamic objects (moving of cars)

Object detection and multi-class segmentation are great challenges of computer vision and having a lot of attention in related fields. The common approaches scan a window region to extract informative features. Based on a series extracted features, they utilize classifiers to identify objects existing. Consequently, the detected regions will be put into segmentation methods to segment object from background regions. The limitations of these approaches are that the candidate regions usually including pixels data of other objects; or it is a part of object that need to detect. These approaches therefore usually produce unreliable results. Recently, work [1] proposed a new approach to issues of object detection and multi-class segmentation in a still image. The authors proposed a model including three layers (See Fig. 2a) in order to embed both works of detection and segmentations. They utilize an energy function to formulate scene structure and semantics of objects. Fig. 3 shows several attractive results of works in [1]. The proposed algorithms in [1] is able to detect and segment various objects (subject, car) from background (sky, building, tree). The works in [1] detect and segmentations objects in still image. However, image sequences captured from camera that is attached on mobile device usually contains moving objects. Therefore, besides task of detections we need to track object in the collected image sequences. There are two approaches to extend works in [1]:

- Extend formulation problems in [1] with time dimensions - Utilizing concept of blob tracking that was proposed in [2] (see Fig 2b). Work in [2] detected objects as 3-

D blobs utilizing spatial-temporal features. Some results of blob tracking are shown in Fig. 3b

Work: Theory:

• Study algorithms in [1] and [2] • Extend model of [1] with temporal features • Extend model of [1] with concept of blob tracking

Practice: • Preprocessing video techniques: video stabilization, normalizing image in different lighting conditions • Deploying algorithms in [1] and [2] using C++ • Evaluating results [1] with/without temporal features

Requirements: This subject is dedicated to Vietnamese students as well as foreigner students at Master degree of Signal and Image processing option. The students who have a fairly good knowledge about image processing and C++ programming are privileged. Student profile: - Engineer student (final 5th year) or Master student from computer sciences field

(a)

Fig. 3: (a) Results of object detection and segmentation in [1]. In each panel; Left: Original image; Right: The detected and segmented objects including red/green boundary. (b) Results of blob tracking in [2]: a car enters/exits in captured frames. (Images are adopted in [1] và [2])

(b)

(a) (b) Hình 2 – (a) The three layers model proposed in [1]. (b) Blob tracking proposed in [2]

- Vietnamese or French student (or from other countries)

Supervisors: Vu Hai, Computer Vision Department, MICA. Email: [email protected]

References: [1]. S. Gould et al., “Region-based Segmentation and Object Detection”, In Proceeding of NIPS 2009. [2]. H.Greenspan et al., “A Probabilistic Framework for Spatio-Temporal Video Representation & Indexing”, in Proceeding of ECCV 2002


Multiclass object categories recognition

Context:

Recently, object (object category) recognition has attracted many attention of research community. The standard approach to object recognition is to compute pixel attributes in small windows around (a subset of) pixels. For example, the gradient orientation and magnitude attributes in SIFT, a most successful feature in computer vision, are computed from 5x5 image windows. A key question for object recognition is then how to measure the similarity of image patches based on the attributes of pixels within them, because this similarity measure is used in a classifier such as a support vector machine (SVM). Techniques based on histogram features such as SIFT or HOG, discretize the individual pixel attribute values into bins and then compute a histogram over the discrete attribute values within a patch. The similarity between two patches can then be computed based on their histograms. Unfortunately, the binning restricts the similarity measure and introduces quantization errors, which limit the recognition accuracy.

In [1], the authors highlight the kernel view of SIFT, HOG, and bag of visual words, and show that histogram features are a special, rather restricted case of efficient match kernels. This novel insight allows to design a family of kernel descriptors. Kernel descriptors avoid the need for pixel attribute discretization and are able to generate rich patch-level features from different types of pixel attributes. Here, the similarity between two patches is based on a kernel function, called match kernel, that averages over the continuous similarities between all pairs of pixel attributes in the two patches. This approach has obtained very good results with several databases. In the framework of MOST project [2,3], we have developed an advertising service based on image content. In order to generate automatically the advertisement string, we need to recognize the object category. For this, several features (e.g. HOG, Haar, Gist) and classification methods (SVM, kNN) have been evaluated. However, the results are still limited. By this work, we would like to apply the advances of kernel descriptor in order to get better results.

Figure 1: Advertising service based on image content

Work: Theory:

• Study multiclass object categories recognition • Propose new object category recognition algorithm based on kernel descriptor

Practice: • Evaluate the object category recognition algorithm on the common datasets such as ImageNet, our own

database. • Deploy the object recognition module in advertising service based on image content


Student profile:




References 1. Liefeng Bo, Xiaofeng Ren and Dieter Fox, Depth Kernel Descriptors for Object Recognition, in IEEE/RSJ

International Conference on Intelligent Robots and Systems (IROS), September 2011 2. Đồng Văn Thái, Phát hiện đối tượng trên ảnh ứng dụng trong các hệ thống quảng cáo. 3. Quoc- Hung Nguyen, Thi Thanh - Hai Tran, Thi-Lan Le, Hai-Vu, Ngoc-Hai Pham, Quang - Hoan Nguyen,

Object classification: A comparative study and applying for advertisement services based on image content, Journal of Science and Technology Technical Universities, ISSN: 0868-3980, 2013.

Documents

Plant identification for ImageClef contest 2013