Unsupervised Cleaning for Web Image Retrieval … Cao, Yang Peng, Mustaqim Moulavi, Harsh Bhavsar Unsupervised Cleaning for Web Image Retrieval Title PowerPoint Presentation Author

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36x48

inch professional poster. You can use it to create

your research poster and save valuable time placing

titles, subtitles, text, and graphics.

We provide a series of online tutorials that will

guide you through the poster design process and

answer your poster production questions.

To view our template tutorials, go online to

PosterPresentations.com and click on HELP DESK.

When you are ready to print your poster, go online

to PosterPresentations.com.

Need Assistance? Call us at 1.866.649.3004

Object Placeholders

Using the placeholders

To add text, click inside a placeholder on the poster

and type or paste your text. To move a placeholder,

click it once (to select it). Place your cursor on its

frame, and your cursor will change to this symbol

Click once and drag it to a new location where you

can resize it.

Section Header placeholder

Click and drag this preformatted section header

placeholder to the poster area to add another

section header. Use section headers to separate

topics or concepts within your presentation.

Text placeholder

Move this preformatted text placeholder to the

poster to add a new body of text.

Picture placeholder

Move this graphic placeholder onto your poster, size

it first, and then click it to add a picture to the

poster.

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB icon.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint

(version 2007 or newer) skills. Below is a list of

commonly asked questions specific to this template.

If you are using an older version of PowerPoint some

template features may not work properly.

Template FAQs

Verifying the quality of your graphics

Go to the VIEW menu and click on ZOOM to set your

preferred magnification. This template is at 100%

the size of the final poster. All text and graphics will

be printed at 100% their size. To see what your

poster will look like when printed, set the zoom to

100% and evaluate the quality of all your graphics

before you submit your poster for printing.

Modifying the layout

This template has four different

column layouts. Right-click

your mouse on the background

and click on LAYOUT to see the

layout options. The columns in

the provided layouts are fixed and cannot be moved

but advanced users can modify any layout by going

to VIEW and then SLIDE MASTER.

Importing text and graphics from external sources

TEXT: Paste or type your text into a pre-existing

placeholder or drag in a new placeholder from the

left side of the template. Move it anywhere as

needed.

PHOTOS: Drag in a picture placeholder, size it first,

click in it and insert a photo from the menu.

TABLES: You can copy and paste a table from an

external document onto this poster template. To

adjust the way the text fits within the cells of a

table that has been pasted, right-click on the table,

click FORMAT SHAPE then click on TEXT BOX and

change the INTERNAL MARGIN values to 0.25.

Modifying the color scheme

To change the color scheme of this template go to

the DESIGN menu and click on COLORS. You can

choose from the provided color combinations or

create your own.

© 2013 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

● Image retrieval cleaning aims at improving

the accuracy of web image search and

retrieval. It first removes the irrelevant images

and then re-orders the rest of the images using

the confidence score to be relevant images. A

practicable and efficient cleaning method

should be highly accurate without manual

labeling and pre-processing.

● In this project, a novel algorithm for

unsupervised image retrieval cleaning is

proposed. We delved into the data distribution

and devised three stages noise robust

approaches to solve the problem.

INTRODUCTION

FRAMEWORK

● Graph Building

X={x1,…,xn} represents the features of

retrieved images, the edge matrix W is built as

wij=exp(-||xi-xj||2/2σ2) if i≠j and wii=0. W is

normalized as S=D-1/2WD-1/2, and D is

diagonal with dii=∑j=1:n wij.

● Category-level outlier filtering

X is mapped to a new feature space as

g(xi)=F*(i,∙)T, where F*=(I-αS)-1[y1,…,yi,...yn]

=(I-αS)-1, yi is a column vector with yi=1 and

yothers=0. xi is warped as the i-th row of matrix

F*, and a dominant score is computed as to

sum all dimensions of this row.

Spectral clustering is implemented on g(X) for

k clusters. Clusters with the lowest mean

dominant score are considered as noise.

● Dataset-level noise absorption

There are total t keywords-queries denoted as

Q={q1,…,qi,…,qt} and the retrieval dataset

X(Q)={X(1),…,X(i),…,X(t)}. Linear kernel SVM

classifier is trained for every two categories qi

and qj. The confidence score 𝐶𝑖 𝑥𝑎𝑖

of an

image 𝑥𝑎𝑖

to be the relevant image in its

category is 𝐶𝑖 𝑥𝑎𝑖

= min𝑗≠𝑖

D𝑖, j 𝑥𝑎𝑖

. If the

score is low, 𝑥𝑎𝑖

is regarded as noise to be

filtered and absorbed by qj.

● Category-level confident sample selection

For each category, we want to select the data

points with high density score, i.e. confidence,

and discard the data points with low density

score. In order to calculate the density score,

we use elastic net SVM regression to estimate

the density of data points. The density score of

x is a function of 𝑤𝑇𝑥. We use a modified

version of SMO (Sequential Minimization

Optimization) to train the model.

ALGORITHMS

Figure: Parallel MapReduce based SVM on

Amazon EC2 clusters.

RESULTS

CONCLUSION In this project, we propose a new unsupervised

cleaning algorithm to improve the accuracy of

web image retrieval, by three level irrelevant

images filtering and relevant samples

selection. We warp data into a noise resistant

spectral space to cluster and remove isolated

noise. Then a specially designed cross

category noise absorption algorithm is

proposed to make the dataset cleaner. Finally,

confident samples used for re-ordering are

selected by a regression formulation with

linear kernel and sparsity constraints. Our

approach demonstrates the best performance

among all the recent methods in experimental

comparison on INRIA dataset.

REFERENCES

ACKNOWLEDGEMENTS Great thanks to Dr. Daisy Wang for providing us with an

opportunity to work on this project and present this poster.

Thanks to Amazon for providing the AWS credits. These

credits were utilized for training and testing the SVM

classifiers on Amazon EC2 clusters. We also thank Yottamine

Analitics whose services were used for implementing SVM

classifier on Amazon EC2 clusters.

Many categories of images are retrieved by

different textual-keywords via search engines

and image sharing sites. First, graphs are built

on pairwise images to explore the in-category

data distribution. Isolated nodes are regarded

as irrelevant images. Then, classifiers are

trained on pairwise categories. Images that

classified to other categories are also treated as

irrelevant images. After we remove all

irrelevant images, for each category, images in

dominant clusters, which are most likely to be

relevant images, are used as manifold learning

pseudo queries for re-ordering.

Figure: The flowchart of our three stages

algorithm.

[Spectral Clustering] A. Ng, M. Jordan, and Y. Weiss. On

spectral clustering: Analysis and an algorithm. In NIPS, 2002.

[Elastic Net SVM] G.-B. Ye, Y. Chen, and X. Xie. Efficient

variable selection in support vector machines via the

alternating direction method of multipliers. In ICAIS, 2011.

[SMO] J. Platt, and et al. Sequential minimal optimization: A

fast algorithm for training support vector machines. In

technical report Microsoft Research, 1998.

[INRIA dataset and LogClass] J. Krapac, M. Allan, J. Verbeek,

and F. Juried. Improving web image search results using

query-relative classifiers. In CVPR, 2010.

[Mrank] D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B.

Sch¨olkopf. Ranking on data manifolds. In NIPS, 2004.

[LabelDiag] J. Wang, Y. Jiang, and S. Chang. Label diagnosis

through self tuning for web image search. In CVPR, 2009.

[SpecFilter] W. Liu, Y. Jiang, J. Luo, and S. Chang. Noise

resistant graph ranking for improved web image search. In

CVPR, 2011.

Figure: Examples in INRIA dataset: Top-15

ranked images using (a) the search engine and

(b) our approach.

Figure: INRIA dataset: MAP of ranking order

results over 353 categories.

Department of Computer & Information Science & Engineering, University of Florida

Chen Cao, Yang Peng, Mustaqim Moulavi, Harsh Bhavsar

Unsupervised Cleaning for Web Image Retrieval

http://www.facebook.com/pages/PosterPresentationscom/217914411419?v=app_4949752878&ref=ts

Documents

Unsupervised Cleaning for Web Image Retrieval … Cao, Yang Peng, Mustaqim Moulavi, Harsh Bhavsar Unsupervised Cleaning for Web Image Retrieval Title PowerPoint Presentation Author