24
Logo mining application on Flickr ® By: Ximing Hou

Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo mining application on Flickr ®By: Ximing Hou

Page 2: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Internet and web-based application are widely used

Enormous data transmit volume

◦ 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB)

Main contribution: Logo on the Map System (LMS)

Three experiments conducted with LMS

Team project with Zilong Wang

◦ Zilong is responsible for developing the visual matching algorithm

◦ I am in charge of the LMS application development

Page 3: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Algorithm

◦ Integrated based on SIFT- Scale-Invariant Feature Transform

Programming

◦ Matlab

◦ PHP, XML, JavaScript, MySQL

Other web sources

◦ Flickr ® and Google ® API

Page 4: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Why Flickr ®

◦ Billions of web images

◦ Tagged by millions of viewers

◦ Provide API

Why Google ®

◦ Widely used

◦ Ability to label a position on the map

◦ Provide API

Page 5: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Description of LMS

◦ Web-based application

◦ Extract large amount of data from Flickr®

◦ Clean the Image data

◦ Label the location of image data on dynamic map

Page 6: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Basic architecture

Logo MatchingModule

Picture Extraction

Module

logo geography labeling moduleDatabase

Back-End

Front-End

2

31

Flickr ® Website

Google ® Website

Page 7: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Database structure

◦ First to know:

http://farm{farm-id}.static.flickr.com/{server-id}/{id}_{secret}_[mstzb].jpg

Pictures on Flickr® consist of {farm-id}, {server-id}, {secret}, {id}

[mstzb] stands for different picture size.

◦ Database details

MySQL

pic stores standard logo

photo stores web images from Flickr®

Page 8: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Picture extraction module

◦ Input name and amount

◦ Send request to Flickr®

◦ Return XML from Flickr®

◦ Parse XML

◦ Store image information in database

◦ Classify picture by name

Picture Extraction Module

Flickr ® Website

Flickr ® API

Internet

Graphical User Interface

Nam

e

Database

AmountName

XML Reader

Page 9: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Picture extraction module (Cont.)

◦ Sending request:

◦ http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key={api_key}&tags={tags}

&per_page={per_page}&has_geo=1&in_gallery=true&sort=interestingness-desc

◦ Return XML:

<?xml version="1.0" encoding="utf-8" ?> - <rsp stat="ok">- <photos page="1" pages="7449" perpage="2" total="14898"><photo id="5677848399" owner="62409281@N08"

secret="e55f3f02ec" server="5224" farm="6" title="IMG_0934" ispublic="1" isfriend="0" isfamily="0" /> <photo id="5639118139" owner="50831163@N07"

secret="8efce2761d" server="5182" farm="6" title="Australian Open" ispublic="1" isfriend="0" isfamily="0" /> </photos></rsp>

Retrieve picture

information from

XML

Page 10: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo matching module

◦ Image data cleaning

Using SIFT algorithm to screen out the picture including the target logo

Searching for Subway

Page 11: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo matching module

Picture Extraction Module

Flickr ®

Website

Flickr ® API

Internet

Information of

Raw picture

Standard

logo

Raw

Pictures

SIFT

Algorithm

Updated

information of

picture

Database

XML Reader

。Retrieve web image

from database

。Using improved SIFT to

clean image data

。Update the clean image

data in the database

Page 12: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo matching module

Before matching After matching

All the noise data are removed

Page 13: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo geography labeling module

◦ If data set are huge after cleaning

◦ New York City, London, Tokyo, Paris, Hong Kong, Chicago, Los

Angeles, Singapore, Sydney, Seoul, Brussels, San

Francisco, Washington, D.C. ,Toronto, Beijing, Berlin, Madrid, New

Castle, Vienna, Boston, Frankfurt, Shanghai, Buenos, Aires, Stockholm, Zurich, M

oscow, Barcelona, Dubai, Rome, Amsterdam, Mexico City…..

◦ City like New Castle

New Castle – UK, Australia?

Page 14: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo geography labeling module

。 Retrieve web image

geography information from

database

。Display them on the map

Page 15: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Logo matching module

We can see the

exact position of all

the picture visually

Page 16: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Test the advantage of the visual matching in web mining

with LMS◦ Textual word keywords search for text web source

◦ For web images

Images are different for different people

Lose accuracy

Using visual match search

Example: starbucks

Starbucks

Page 17: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Test the advantage of the visual matching in web mining

with LMS◦ Using textual keyword “starbucks” to search on Flickr®

◦ Download 200 pictures (52.6% are not pictures about Starbucks®)

◦ Using Logo matching module to clean the data

Accuracy = (0.325+0.540)/1=86.5%.

Specificity = 0.540/(0+0.540)=100%

Precision = 0.325/(0.325+0)=100%

Recall = 0.325/(0.325+0.135)=70.6%

◦ Visual matching search is suitable for image searching rather than

textual keywords searching.

Page 18: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Effect of brand name on web searching◦ Some brand names have more meanings

◦ Four brand logos in two groups

◦ Download 50, 100 and 200 pictures of each brand

Starbucks® McDonald Subway® Apple®

Group 1 Group 2

Page 19: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Effect of brand name on web searching◦ 50 pictures P(Starbucks)=23/50=46%

P(mcdonalds)=13/50=26%

P (apple)=2/50=4%

P(subway)=0/50=0%

◦ 100 pictures P(starbucks)=49%

P(mcdonalds)=22%

P(apple)=5%

P(subway)=0%

◦ 200 pictures P(starbucks)=47.5%

P(mcdonalds)=18%

P(apple)=4.5%

P(subway)=0%

P(starbucks)=47.5%

P(mcdonalds)=22%

P(apple)=4.5%

P(subway)=0%

Average

Search by keywords, the brand name

with unique meaning have much

more accurate search result than the

brand name with ambiguities

Page 20: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

LMS research on people’s interest on different topic of

pictures in different time ◦ As time goes, people’s interests is changing all the time

2001 - 911 Attack

2008 - Olympics

2010 - Michael Jackson

◦ Two experiments:

Shanghai EXPO (2010)

Vancouver Winter Olympic Games (2010)

200 pictures for each from Flickr® and summarize the time distribution

with LMS

Page 21: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

LMS research on people’s interest on different topic of

pictures in different time ◦ Histogram of each result

◦ We can obtain a right result based on the web picture number

statistics with LMS

0

50

100

150

200

2002 2006 2008 2009 2010 2011

Number of pictures for EXPO from 2002 to 2011 in Shanghai

number

0

10

20

30

40

50

60

19

68

19

87

20

00

20

01

20

02

20

03

20

06

20

07

20

08

20

09

20

10

20

11

Number of pictures on Olympics in Vancouver

Number of pictures on Olympics in Vancouver

Page 22: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

LMS is an effect tool for web image mining

◦ Data classification

◦ Data cleaning

◦ Visualize the geographical distribution

Future works◦ Install on smart phone (e.g iPhone®)

◦ Data analysis on image authors

Page 23: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

Thanks to◦ Uwe R. Zimmer – project meeting, project document

◦ Lexing Xie – project supervising, technical support

◦ Zilong Wang – technical support, cooperation

◦ Everyone in today’s presentation

Page 24: Logo mining application on Flickr ® By: Ximing Houcourses.cecs.anu.edu.au/courses/CS_PROJECTS/11S1/Final... · 2012-01-03 · 1.2 ZB in 2010 35 ZB in 2020 (1ZB=1012 GB) Main

[1] IDC (2009), Digital Data to Earth: You have run out of memory, retrieved on May 30th, 2011 from TG Daily website:

http://www.tgdaily.com/hardware-features/49611-digital-data-to-earth-you-have-run-out-of-memory

[2] SearchCRM.com (2002), Web mining, retrieved on May 30th, 2011 from SerachCRMwebsite: http://searchcrm.techtarget.com/definition/Web-mining

[3] Modern mind (2009), Brand effectiveness, retrieved on May 30th, 2011 from Modern mind website: http://www.modernmind.com/brand.htm

[4] Lowe, D. G., “Object recognition from local scale-invariant features”, International Conference on Computer Vision, Corfu, Greece, September 1999.

[5] Shapiro, Linda and Stockman, George. "Computer Vision", Prentice-Hall, Inc. 2001

[6] Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.

[7] Screenshot of tags on del.icio.us in 2004 and Screenshot of a tag page on del.icio.us, also in 2004, both published by Joshua Schachter on July 9, 2007.

[8] Yan-Tao Zheng, Shi-Yong Neo, Tat-Seng Chua, Qi Tian, “Toward a higher-level visual representation for object-based image retrieval”, November 2008

[9] Branding (2009), brand name development, Retrieved on June, 2nd, 2011 from the Branding website: http://www.brandidentityguru.com/brand-name.htm

[10] Fauna, October 19, 2010, Shanghai World Expo Sees 1+ Million Visitors In A Single Day, Chinasmack