5 TOMCCAP Browse by Chunks

8/12/2019 5 TOMCCAP Browse by Chunks

1/22

ACM Transactions on

Multimedia Computing,Communications and

Applications

2011Volume 7S, Number 1


2/22

Special Section on ACM Multimedia 2010 Best Paper CandidatesArticle 20 S. Shirmohammadi Introduction(2 pages) J. Luo

J. YangA. El Saddik

Article 21 S. Bhattacharya A Holistic Approach to Aesthetic Enhancement of Photographs(21 pages) R. Sukthankar

M. Shah

Article 22 S. Tan Using Rich Social Media Information for Music Recommendation via

(22 pages) J. Bu Hypergraph ModelC. ChenB. XuC. WangX. He

Article 23 S. Milani A Cognitive Approach for Effective Coding and Transmission of 3D Video(21 pages) G. Calvagno

Article 24 R. Hong Video Accessibility Enhancement for Hearing-Impaired Users(19 pages) M. Wang

X.-T. YuanM. XuJ. JiangS. Yan

T.-S. Chua

SPECIAL ISSUE ON SOCIAL MEDIAArticle 25 S. Boll Introduction(2 pages) R. Jain

J. LuoD. Xu

Article 26 Y.-C. Lin Exploiting Online Music Tags for Music Emotion Classification(16 pages) Y.-H. Yang

H. H. Chen

Article 27 M. Rabbath Automatic Creation of Photo Books from Stories in Social Media(18 pages) P. Sandhaus

S. Boll

Article 28 W. Hu Recognition of Adult Images, Videos, and Web Page Bags(24 pages) H. Zuo

O. WuY. ChenZ. ZhangD. Suter

Article 29 Y.-R. Lin SCENT: Scalable Compressed Monitoring of Evolving Multirelational(22 pages) K. S. Candan Social Networks

H. SundaramL. Xie

Article 30 J. Sang Browse by Chunks: Topic Mining and Organizing on Web-Scale Social Media(18 pages) C. Xu

Article 31 R. Ji Mining Flickr Landmarks by Modeling Reconstruction Sparsity

(22 pages) Y. GaoB. ZhongH. YaoQ. Tian

Article 32 M. I. Mandel Contextual Tag Inference(18 pages) R. Pascanu

D. EckY. BengioL. M. AielloR. SchifanellaF. Menczer

Article 33 J.-I. Biel VlogSense: Conversational Behavior and Social Attention in YouTube(21 pages) D. Gatica-Perez


3/22

ACM2 Penn Plaza, Suite 701New York, NY 10121-0701

Tel.: (212) 869-7440Fax: (212) 869-0481

Home Page: http://tomccap.acm.org/

Editor-in-Chief

Ralf Steinmetz Technische Universitt Darmstadt / Darmstadt, Germany / http://www.kom.e-technik.tu-darmstadt.de/People/Staff/Ralf_Steinmetz/ralf_steinmetz.html / email: [email protected]

Associate Editors

Kiyoharu Aizawa University of Tokyo / Tokyo, Japan / email: [email protected]

Grenville Armitage Swinburne University of Technology / Melbourne, Australia / http://caia.swin.edu.au/cv/garmitage / email: [email protected]

Susanne Boll University of Oldenburg / Oldenburg, Germany / http://medien.informatik.uni-oldenburg.de / email: [email protected]

Wolfgang Effelsberg University of Mannheim / Manheim, Germany / http://www.informatik.uni-mannheim.de / email: [email protected]

Abdulmotaleb El Saddik University of Ottawa / Ottawa, Canada / email: [email protected]

Gerald Friedland University of California / Berkeley, CA / http://www.icsi.berkeley.edu/~fractor/homepage/About_Me.html / email:[email protected]

Carsten Griwodz University of Oslo / Oslo, Norway / http://www.simula.no/portal_memberdata/griff / email: [email protected]

Mohamed Hefeeda Simon Fraser University / Surrey, BC V3T 0A3, Canada / http://www.cs.sfu.ca/~mhefeeda / email: [email protected]

Mohan S. Kankanhalli National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Emohan/ email: [email protected]

Karrie Karahalios University of Illinois / Urbana-Champaign, IL / email: [email protected]

Rainer Lienhart University of Augsburg / Augsburg, Germany / http://www.lienhart.de/ / email: [email protected]

Ketan Mayer-Patel University of North Carolina / Chapel Hill, NC / http://www.cs.unc.edu/%7Ekmp /email: [email protected]

Klara Nahrstedt University of Illinois / Urbana-Champaign, IL / http://cairo.cs.uiuc.edu/%7Eklara/home.html / email: [email protected]

Thomas Plagemann University of Oslo / Oslo, Norway / http://heim.ifi.uio.no/%7Eplageman / email: [email protected]

Yong Rui Microsoft Research / Redmond, WA / http://research.microsoft.com/%7Eyongrui / email: [email protected]

Shervin Shirmohammadi University of Ottawa/ Ottawa, Ontario, Canada / http://www.site.uottowa.ca/%7Eshervin/ email: [email protected]

Hari Sundaram Arizona State University / Tempe, AZ / http://www.public.asu.edu/%7Ehsundara / email: [email protected]

Svetha Venkatesh Curtin University of Technology / Australia / http://www.computing.edu.au/%7esvetha/ email: [email protected]

Michelle X. Zhou IBM Research Almaden / San Jose, CA / email: [email protected] Zimmerman National University of Singapore / Singapore / http://www.comp.nus.edu.sg/%7Erogerz/roger.html / email: rogerz@

comp.nus.edu.sgInformation Director

Lasse Lehmann AGT Group (R&D) GmbH / Darmstadt, Germany / email: [email protected]

Sebastian Schmidt Technische Universitt Darmstadt / Darmstadt, Germany / http ://www.kom.tu-darmstadt.de/en/kom-multimedia-communications-lab/people/staff/sebastian-schmidt / email: [email protected]

Headquarters Staff

Laura Lander Journal Manager

Irma Strolia Editorial Assistant

Media Content Marketing Production

The ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) (ISSN: 1551-6857) is published quarterly in Spring, Summer, Fall,and Winter by the Association for Computing Machinery (ACM), 2 Penn Plaza, Suite 701, New York, NY 10121-0701. Printed in the U.S.A. POSTMASTER: Sendaddress changes toACM Transactions on Multimedia Computing, Communications and Applications, ACM, 2 Penn Plaza, Suite 701, New York, NY 10121-0701.

For manuscript submissions, subscription, and change of address information, see inside back cover.

Copyright 2011 by the Association for Computing Machinery (ACM). Permission to make digital or hard copies of part or all of this work for personal or class-room use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the fullcitation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy oth-erwise, to republish, to post on servers, or to redistribute to l ists, requires prior specific permission and/or a fee. Request permission to republish from: PublicationsDepartment, ACM, Inc. Fax +1 212-869-0481 or email [email protected].

For other copying of articles that carry a code at the bottom of the first or lastpage or screen display, copying is permitted provided that the per-copy feeindicated in the code is paid through the Copyright Clearance Center, 222Rosewood Drive, Danvers, MA 01923.

Cover images from A Holistic Approach to AestheticEnhancement of Photographs, by S. Bhattacharya,R. Sukthankar, and M. Shah, in this issue.

ACM Transactions on

Multimedia Computing,Communications

and Applications


4/22

ACM Transactions on Multimedia Computing, Communications and Applications

http://tomccap.acm.org/

Guide to Manuscript SubmissionSubmission to the ACM Transactions on Multimedia Computing, Communications and Applications is done electronicallythrough http://acm.manuscriptcentral.com. Once you are at that site, you can create an account and password with whichyou can enter the ACM Manuscript Central manuscript review tracking system. From a drop-down list of journals, chooseACM Transactions on Multimedia Computing, Communications and Applications and proceed to the Author Center to sub-mit your manuscript and your accompanying files.

You will be asked to create an abstract that will be used throughout the system as a synopsis of your paper. You will also beasked to classify your submission using the ACM Computing Classification System through a link provided at the Author Center.For completeness, please select at least one primary-level classification followed by two secondary-level classifications. To makethe process easier, you may cut and paste from the list. Remember, you, the author, know best which area and sub-areas arecovered by your paper; in addition to clarifying the area where your paper belongs, classification often helps in quickly identi-fying suitable reviewers for your paper. So it is important that you provide as thorough a classification of your paper as possible.

The ACM Production Department prefers that your manuscript be prepared in either LaTeX or Ms Word format. Style filesfor manuscript preparation can be obtained at the following location: http://www.acm.org/pubs/submissions/submission.htm. For editorial review, the manuscript should be submitted as a PDF or Postscript file. Accompanying material can be inany number of text or image formats, as well as software/documentation bundles in zip or tar-gzipped formats.

Questions regarding editorial review process should be directed to the Editor-in-Chief. Questions regarding the post-acceptance production process should be addressed to the Journal Manager, Laura Lander, at [email protected].

Subscription, Single Copy, and Membership Information.

Send orders to:

ACM Member Services Dept.General Post OfficePO Box 30777New York, NY 10087-0777

For information, contact:

Mail: ACM Member Services Dept.2 Penn Plaza, Suite 701New York, NY 10121-0701

Phone: +1-212-626-0500Fax: +1-212-944-1318Email: [email protected]: http://www.acm.org/catalog

Subscription rates for ACM Transactions on Multimedia Computing, Comm unications and Applications are $ 40 per year forACM members, $35 for students, and $140 for nonmembers. Single copies are $18 each for ACM members and $40 fornonmembers. Your subscription expiration date is coded in four digits at the top of your mailing label; the first two digitsshow the year, the last two show the month of expiration.

About ACM. ACM is the worlds largest educational and scientific computing society, uniting educators, researchers andprofessionals to inspire dialogue, share resources and address the fields challenges. ACM strengthens the computing pro-fessions collective voice through strong leadership, promotion of the highest standards, and recognition of technicalexcellence. ACM supports the professional growth of its members by providing opportunities for life-long learning, careerdevelopment, and professional networking.

Visit ACM's Website: http://www.acm.org.

Change of Address Notification: To notify ACM of a change of address, use the addresses above or send an email [email protected].

Please allow 68 weeks for new membership or change of name and address to become effective. Send your old label withyour new address notification. To avoid interruption of service, notify your local post office before change of residence.For a fee, the post office will forward 2nd- and 3rd-class periodicals.


5/22

Browse by Chunks: Topic Mining and Organizingon Web-Scale Social Media

JITAO SANG and CHANGSHENG XU, Institute of Automation, China and China-Singapore Instituteof Digital Media, Singapore

The overwhelming amount of Web videos returned from search engines makes effective browsing and search a challenging task.Rather than conventional ranked list, it becomes necessary to organize the retrieved videos in alternative ways. In this article,we explore the issue of topic mining and organizing of the retrieved web videos in semantic clusters. We present a frameworkfor clustering-based video retrieval and build a visualization user interface. A hierarchical topic structure is exploited to encodethe characteristics of the retrieved video collection and a semi-supervised hierarchical topic model is proposed to guide the topichierarchy discovery. Carefully designed experiments on web-scale video dataset collected from video sharing websites validatethe proposed method and demonstrate that clustering-based video retrieval is practical to facilitate users for effective browsing.

Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and RetrievalClustering

General Terms: Algorithms, Design, Experimentation, Performance

Additional Key Words and Phrases: Hierarchical topic model, search result clustering, semisupervised learning, social media,topic mining, video retrieval

ACM Reference Format:

Sang, J. and Xu, C. 2011. Browse by chunks: Topic mining and organizing on web-scale social media. ACM Trans. MultimediaComput. Commun. Appl. 7S, 1, Article 30 (October 2011), 18 pages.DOI= 10.1145/2037676.2037687 http://doi.acm.org/10.1145/2037676.2037687

1. INTRODUCTION

With the development of multimedia technology and increasing proliferation of social media in Web 2.0,an overwhelming volume of professional and user-generated videos has been posted to video sharing

websites. YouTube,1

one of the most popular video sharing websites, announced that its users uploadabout 65,000 new videos and view more than 100 million videos each day. To detect and track hot eventsor topics, more and more people prefer to search and watch videos on the web, which is timely and

1http://www.youtube.com.

This work was supported by the National Natural Science Foundation of China (Grant No. 90920303) and 973 Program (ProjectNo. 2012CB316304).

Authors address: J. Sang and C. Xu (corresponding author): National Lab of Pattern Recognition, Institute of Automation, CAS,Beijing 100190 China; email:{jtsang, csxu}@nlpr.ia.ac.cn.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee providedthat copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first pageor initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute tolists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may berequested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481,or [email protected] ACM 1551-6857/2011/10-ART30 $10.00

DOI 10.1145/2037676.2037687 http://doi.acm.org/10.1145/2037676.2037687

ACM Transactions on Multimedia Computing, Communications a nd Applications, Vol. 7S, No. 1, Article 30, Publication date: October 2011 .


6/22

30:2 J. Sang and C. Xu

Fig. 1. An example page from Youtube for query of 9/11 attack. 7,800 videos are returned. Alternative search options are alsoshown.

convenient. With the explosion of shared videos, a heavy demand to provide users an effective way toretrieve and access videos of interest has emerged. The goal of this work is to offer a novel topic miningand organizing solution and build a visualization user interface by displaying topics as hierarchicalsemantic clusters, which facilitates users browsing the retrieved videos and locating interesting ones.

Conventional video search engines order the retrieved videos according to their relevance to thequery. When a user issues a query, search engines return a ranked list including hundreds or thousandsof matches. Users have to painstakingly browse through the long list to judge whether the resultsmatch their requirements and then locate the interesting videos. One question naturally arises: inaddition to a ranked list, is there any more effective way to organize the retrieved videos?

Clustering and visualizing the returned videos into semantically consistent groups offers alternativesolutions. Clustering the retrieved videos can help users get a quick overview of the retrieved video setand thus locate interesting videos more easily. YouTube provides several options that allow users tofilter search results by U pload date,Category, Duration and Features(see Figure 1). While the coarsegroups involve generic categories of the videos, they provide users little information to understandthe internal configuration and semantic meaning of the returned video collection. There have alsobeen research attempts [Liu et al. 2008b; Ramachandran et al. 2009] on employing clustering to assist

video retrieval. The strategy was to build a static clustering of the entire collection and then match thequery to the cluster centroid. This is so-called preretrieval clustering. From the perspective of featureselection, preretrieval clustering is based on features that are frequent in the whole collection butirrelevant to the query, whereas post-retrieval clusters are tailored to the characteristics of the query,which makes use of query-specific features. We cannot assume clustering to play a one-size-fits-allclassification role. Therefore, it is more reasonable to put clustering as a postprocessing step. In this

article we propose a postretrieval web video clustering method for cluster-based video retrieval (seeFigure 2 for illustration).

Our method is illustrated by the following observation. Simply taking a glance at the examplein Figure 1, we find that almost all the returned videos contain words like 9/11, attack, terrorism,WTC, etc. This phenomenon implies that although diverse topics are involved in the retrieved video

ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 7S, No. 1, Article 30, Publication date: October 2011.


7/22

Browse by Chunks: Topic Mining and Organizing on Web-Scale Social Media 30:3

Fig. 2. Visualization of user interface of cluster-based video retrieval: (top)User submits a query, the underlying topic hierarchy

is exploited and displayed on the left as a complementary view to the conventional flat list. (bottom)When a user chooses onesubtopic (video cluster), the included videos will be shown on the right in the order of its relevance to this subtopic as computedby Equation (2).

collection, they usually share one common topic referred in query, and we refer to the shared topicas the parent topic. We elaborate this idea with the same example in Figure 2(top). On the left, thecircle illustrates the latent semantic structure of the retrieved video collection. Each color along thecircle demonstrates a subtopic, annotated with tag-cloud of its top eight probable terms. The length ofthe arc is proportional to the number of videos belonging to this subtopic. The parent topic at theroot node of hierarchy is located in the center. Each retrieved video can be viewed as a combinationof the parent topic and one child topic (subtopic, which can be enumerated as Live attack and rescuevideo, Domestic and international response afterwards, Investigation and The Else: long-term effect and

memorial).

Delighted from this, we extend the hierarchical topic model [Blei et al. 2010] to exploit a two-leveltopic tree in the retrieved video collection and cluster the collected videos into the leaf-level subtopics.Compared with flat structure based clustering method (e.g. k-means, LDA), utilizing the hierarchicaltopic model will prevent the shared topic from being mixed within other topics and thus ensure theclustering performance. Furthermore, we encode the consistency between the query and the root-level



8/22


topic (we denote it as the query-root-topic knowledge in this paper), as the prior information to form asemi-supervised hierarchical topic model.

Since there are no ready metrics for evaluating the performance of cluster-based video retrieval, werefer to text search result clustering and employ objective metrics as well as user study tasks to assess

the performance of the proposed method.

2. RELATED WORK

In this section, we review the previous researches on Web video mining and search result clustering.The relations are also discussed.

2.1 Web Video Mining

In an effort to keep up with the tremendous growth of the Web videos, a lot of work targeted on ana-lyzing web video content and structure to help the users finding desired information more efficientlyand accurately.

Topic detection and tracking (TDT), first proposed in the 1990s for news document, has attractedincreasing attention for web video analysis [Liu et al. 2008a; Yuan et al. 2008; Cao et al. 2010; Yuanet al. 2010]. By automatically filtering out topic candidates and tracking hot topics, TDT strives toorganize large-scale Web videos into topics, facilitating users and advertisers efficiently browse andtrack the evolution of topics. Other work considering clustering the whole collection into semantic top-ics [Liu et al. 2008b; Ramachandran et al. 2009] can also be grouped into this category. We notice thatin TDT video clustering is performed in advance and on the whole document collection. The numberof topics is also predefined. Since web is a dynamic environment, statical and pre-computed clustersshould have to be constantly updated.

Near-duplicate video clustering and elimination [Cheung and Zakhor 2004; Wu et al. 2007] is anotherway to help users retrieve and access web videos. With the explosion of web video pool, video searchengines tend to return similar or near-duplicate videos together in the result lists. Clustering thesearch results according to their content and visual similarities is considered to be a practical way tofacilitate users for fast browsing. However, video clips in the same near-duplicate cluster are basicallyderived from the same original copy. It cannot be used for topic-level browsing and fails to solve the

problem we bring forward in this paper either.2.2 Search Result Clustering

Search result clustering [Carpineto et al. 2009], clustering the raw result set into different semanticgroups has been investigated in text retrieval [Cutting et al. 1992; Zamir and Etzioni 1998;Kummamuru et al. 1998] and image retrieval fields [Cai et al. 2004; Jing et al. 2006]. By groupingthe results returned by a conventional search engine into labeled clusters, it allows better topic un-derstanding and favors systematic exploration of the search results. The work in this paper can beregarded as video search result clustering.

To the best of our knowledge, until now, the only work addressing the problem of video search resultclustering is Hindle et al. [2010]. They clustered the top returned videos based on visual similarityof low-level appearance features and textual similarity of term vector features. Their clustered videogroups are near-duplicate alike. Our experiments demonstrate that the size of the clusters derived

by their method is much smaller than the cluster size of the underlying subtopic and the number ofclusters is relatively large.

We notice that most of the previous search result clustering methods are devoted to solving the am-biguous problem resulted from nonspecific queries. The queries most involve general objects or names,and the cluster labels correspond to alternative interpretations of the query. For example, queryapple



9/22


Fig. 3. System framework of video search result clustering.

with interpretation of computer, ipod, logo and fruit [Cai et al. 2004]; query stingwith interpretationof musician, wrestler and film [Hindle et al. 2010]. In this article we focus on more complex queriesconcerning political and social events or issues. The semantic clusters inside the returned videos arediverse aspects of the query-corresponding events (e.g. query of9/11 attack, see Figure 2) or different

viewpoints on controversial issues (e.g. query ofabortionwith opposing viewpoints of pro-life and pro-choice). In this case, limited general terms are insufficient for users to understand the subtopics. It isbest described by a set of representative keywords. In this paper, we introduce topic model to describethe subtopic with a probability distribution over terms in a large vocabulary.

In addition, illustrated by the observation that the returned results share one common topic, weexplicitly considers the basic characteristic into the clustering process and exploit the inherent hierar-chical topic structure.

3. FRAMEWORK

In this article, we propose a hierarchical topic model based framework for clustering-based video re-trieval. The framework contains two steps, query expansion and hierarchical topic model based topichierarchy discovery. The input of our algorithm is web videos collected from video sharing websites,and the output is the generated video clusters as well as the topic hierarchy. This is shown in Figure 3.When video sharing websites (e.g. YouTube, Metacafe, Vimeo, etc.) capture a query submission froma user, the search engine will return a raw ranked list of the videos. Metadata around each video arecollected and represented as a document-term matrix.

Hierarchical Latent Dirichlet Allocation (hLDA) [Blei et al. 2010; Blei et al. 2004] is a generalizationof the (flat) Latent Dirichlet Allocation (LDA) model [Blei et al. 2003]. We employ hLDA for unsu-

pervised discovery of the topic hierarchy in the retrieved video collection. To effectively incorporatequery-relevant terms into the root topic, we employ association mining as well as WordNet conceptualrelation between words to expand the query words, resulting in a seed word set. The seed word setis viewed as supervision information (query-root-topic knowledge) and an extension to the standardhLDA, semi-supervised hLDA (SShLDA) is proposed to guide the inference of the topic hierarchy.



10/22


Fig. 4. Hierarchy relation of word attack in WordNet3.0.

After probabilistic inference of topic modeling, each video is assigned a single path from the rootnode to a leaf node. The videos assigned to the same path will be grouped together to form a clusterand the subtopics in the leaf node constitute the description for the corresponding video clusters.

The contributions of this article are summarized as follows: 1) We propose a novel solution frame-

work for clustering-based video retrieval. Hierarchical topic model is introduced to explore the inherenthierarchical topic structure in the retrieved video collection. 2) Query-root-topic knowledge is incorpo-rated to guide the topic hierarchy discovery and a semi-supervised extension to the standard hierarchi-cal topic model is presented. 3) For cluster representation, topics characterized by term distributionsare utilized to deal with complex queries of political and social events or issues.

4. QUERY EXPANSION

Query expansion (QE) is the process of reformulating a seed query to improve retrieval performancein information retrieval operations. For Web search engines, query expansion involves evaluating ausers input and expanding the search query to match additional documents. In our case, we employquery expansion, combining WordNet and association mining to extend the query terms into a seedword setS= {s1, . . . ,sC }, which composes the root topic of the derived topic hierarchy.

WordNet [Miller et al. 1990] is an online lexical dictionary which describes word relationships inthree dimensions of Hypernym, Hyponym and Synonym. It is organized conceptually. As in Figure 4,fight is a hypernym of the verb attack and bombing is a hyponymy of the noun attack. Gong et al.[2005] utilized WordNet nouns hypernym/hyponym and synonym relation between words to expandthe queries. To avoid bringing in noisy terms, they supplemented their method with a term semanticnetwork to filter out low-frequency and unusual words. According to our mechanism of incorporatingthe supervision information (detailed in Section 5), adding noisy words not included in the vocabularywill not detract from the topic modeling process. This means we are allowed to extend the query asmuch as we can, on condition that no words concerned with subtopics are mixed. Therefore, we excludewords having hyponym or troponym relations to the query in WordNet. In addition, instead of removingunusual words, we employ association mining and add high-frequency words into the seed word set.

We utilize WordNet as the basic rule to extend the query along two dimensions including hypernymand synonym relations. The original query 9/11 attack, for instance, may be expanded to include 911

attack assault aggress assail fight struggle contend onslaught onset attempt operation approach event.Since WordNet has narrow coverage for domain specific queries [Chandramouli et al. 2008], we useassociation rules to exploit collection-dependent word relationships. We examine the vocabulary andadd the words with both top 10 highest conf idenceand supportwith the original query words into thequery expansion. The final seed word set of query9/11 attackmay beS = {911 attack Assault aggress



11/22


Fig. 5. (a) LDA graphical model. (b) Hierarchical LDA graphical model. (c) Semi-supervised Hierarchial LDA graphical model.is the controlling the strength of our constraint derived from the seed set. The proposed SShLDA differs from standard hLDAin the way w is generated.

assail fight struggle contend onslaught onset attempt operation approach event wtc world trade center

terrorist terrorism 9-11}.

5. SEMI-SUPERVISED HIERARCHICAL TOPIC MODEL

We begin by briefly reviewing LDA and the standard hLDA. Then we introduce our extension to hLDA,SShLDA, and derive the parameter estimation and prediction algorithm. We will describe the modelsusing the original terms documents (in our case, each video correspond to one document) and wordsas used in the topic model literature.

5.1 Latent Dirichlet Allocation and Hierarchical Topic Model

Suppose we have a corpus of Mdocuments, {w1,w2, . . . ,wM} containing words from a vocabulary ofV terms. Further we assume that the order of words in a particular document is ignored. This is abag-of-words model.

LDA.The Latent Dirichlet Allocation model [Blei et al. 2003] assumes that documents are generatedfrom a set ofK(Kneeds to be predefined) latent topics. In a document, each word wi is associated witha hidden variable zi {1, . . . ,K} indicating the topic from which wi was generated. The probability ofword wi is expressed as

P(wi)=K

j=1

P(wi |zi = j)P(zi = j), (1)

where P(wi|zi = j) = ij is a probability of word wi in topic j and P(zi = j) = j is a document specificmixing weight indicating the proportion of topic j.

LDA treats the multinomial parameters and as latent random variables sampled from a Dirich-let prior with hyperparameters and respectively. The corresponding graphical model is shown inFigure 5(a).

Hierarchical LDA. The LDA model we have described has a flat topic structure. Each document isa superposition of all Ktopics with document specific mixture weights. The hierarchical LDA modelorganizes topics in a tree of fixed depth L. Each node in the tree has an associated topic and each

document is assumed to be generated by topics on a single path from the root to a leaf through thetree. Note that all documents share the topic associated with the root node, this feature of hLDA isconsistent with the characteristics of search result collection we mentioned in Section 1.

The merit of the hLDA model is that both the topics and the structure of the tree are learnt fromthe training data. This is achieved by placing a nested Chinese restaurant process (nCRP) [Teh et al.



12/22


2006] prior on the tree structure. nCRP specifies a distribution on partitions of documents into pathsin a fixed depth L-level tree. To generate a tree structure from nCRP, assignments of documents topaths are sampled sequentially, where the first document forms an initial L-level path, i.e. a tree witha single branch. The probability of creating novel branches is controlled by parameter , where smaller

values ofresult in a tree with fewer branches.In the hLDA, each document is assumed drawn from the following process.

i. Pick a L-level pathcdfrom the nCRP prior: cd nCRP().

ii. Sample L-dimensional topic proportion vector d G EM(m, ).

iii. For each wordwd,n wd:(a) Choose levelzd,n {1, . . . ,L} Discrete(d);(b) Sample a word wd,n Discrete(cd |zd,n), which is parameterized by the topic in levelzd,non the

pathcd.

The corresponding graphical model is shown in Figure 5(b). Further details of hLDA can be found inBlei et al. [2010].

5.2 Semi-Supervised Hierarchical LDA Model

When we utilize hierarchical topic model for the video clustering task, one subtopic corresponds toone cluster. The cluster membership of each video is decided by its posterior path assignment cd. Thecluster videos are sorted by their proportion on the subtopic as computed by:

wd,nwd|zd,n= 2|

Nd, (2)

where| |is indicator function and the numerator accumulates the word allocated at the leaf level, Nddenotes the word number.

To incorporate the query-root-topic knowledge into the hierarchical topic modeling, we propose anextension to the standard hLDA, which we call Semi-Supervised Hierarchical LDA model (SShLDA).The supervised information we add is the seed word set derived from query expansion, S = {s1, . . . , sC }.We jointly model the documents and the seed word set, in order to guide the discovery of topic hierarchy

so that the words in the seed word set will have high probability in the root topic and low probabilityin subtopics.

We first explain how query-root-topic knowledge can be incorporated into the topic modeling process.In the standard hLDA, the topic level allocation zd,nfor word nin document d is a latent variable andneeds to be inferred through the model learning process. Assume we have the supervised informationof zd,n, that is, the topic level allocation for a given word in a given document. This can be seen assimilar to semi-supervised learning with labeled features [Druck et al. 2008]. In our case, we denoteit as hard constraint when the seed set words are restricted to be shown only in the root topic. Inpractical applications, each word tends to be generated from every topic with different probabilities.Therefore, we relax this strong assumption. Instead of providing topic level allocation zd,nfor each seedword, we modify the generative process of standard hLDA so that sampling seed words from root topicand subtopics will have different probabilities.

Specifically, the proposed SShLDA differs from hLDA in the way wd,n is generated. The generative

process of SShLDA is:

i. Pick a L-level pathcdfrom the nCRP prior: cd nCRP().

ii. Sample L-dimensional topic proportion vector d G EM(m, ).

iii. For each wordwd,n wd:



13/22


(a) Choose levelzd,n {1, . . . ,L} Discrete(d);(b) Sample a word wd,n Constraint(,zd,n) Discrete(cd |zd,n)

The corresponding graphical model is shown in Figure 5(c). Constraint(,zd,n) is the soft constraintfunction defined as follows:

Constraint(,zd,n)=

(wd,n S) + 1 , zd,n= 1,

(wd,n /S) + 1 , zd,n=1.(3)

where() is an indicator function and (0 1) is the strength parameter of the supervision. = 0reduces to standard hLDA and = 1 recovers the hard constraint.

This formulation provides us a flexible way to insert a prior domain knowledge into the inference oflatent topics with different definitions of the constraint function, for instance, with prior informationon the latent subtopics,Scan be set independently for the specific subtopic.

5.3 Inference and Learning

Having the SShLDA model, we need to perform posterior inference [Bishop 2006], that is, to invertthe generative process of documents described above for estimating the hidden topical structure. We

modify the Gibbs sampling algorithm in hLDA to approximate the posterior for SShLDA model.The goal is to obtain samples from the posterior distribution of the latent tree structure T, the

level allocations z of all words and the path assignments c for all documents conditioned on the ob-served collectionwand seed words constraint S. In a Gibbs sampler, each latent variable is iterativelysampled conditioned on the observations and all the other latent variables. Collapsed Gibbs sam-pling [Liu 1994] is employed, in which we marginalize out the topic parameters and per-documenttopic proportions d to speed up the convergence. Therefore, the posterior we need to approximateis p(c1:D,z1:D|,m,,,,w1:D), where and are the hyperparamters of nCRP and the topic-worddistribution,{m, }is the sticking-breaking parameter for topic proportions. controls the strength ofseed word set constraint. These parameters can be fixed according to the analysis and prior expectationabout the data, which will be discussed in the Experiment section.

The state of the Markov chain for a single document is illustrated in Figure 6. (The assignmentsare taken at the approximate mode of the SShLDA posterior conditioned on search results metadata

collection of query 9/11 attack). For each document, the process of Gibbs sampler is divided into twosteps: resample the per-word level allocations to topics zd,nand resample the per-document paths cd.

Sampling Level Allocations. Given the current path assignments, we need to re-sample the levelallocation variable zd,nfor word n in documentd:

p(zd,n|z(d,n),c, w,m,,) p(wd,n|z,c, w(d,n), )p(zd,n|zd.n,m, ), (4)

where z(d,n) and w(d,n) are vectors of level allocations and observed words leaving out zd,n and wd,nrepectively,zd,ndenotes the level allocations in document d, leaving outzd,n. This is the same notationas in Blei et al. [2010].

The first term in Equation (4) is the probability of a given word based on a possible assignment. Instandard hLDA, it is assumed that the topic parameters are generated from a symmetric Dirichletdistribution, thus the frequency of seeing word wd,nallocated to the topic at level zd,nof the path cdis:

p(wd,n|z,c, w(d,n), ) #[z(d,n) =zd,n, czd,n =cd,zd,n,w(d,n) =wd,n] + , (5)where #[] counts the elements of an array satisfying a given condition.

Let

qd,n= #[z(d,n) =zd,n, czd,n =cd,zd,n, w(d,n) =wd,n] + .



14/22


15/22


Table I. Collected Video Sharing Web Sites Dataset InformationID Query Video retrieved Video collected Vocabulary Total word

1 9/11 attack 8,361 791 2140 38747

2 gay rights 602,885 799 2048 35538

3 abortion 66,606 797 1770 331444 Iraq war invasion 4,425 702 1778 36760

5 Beijing Olympics 202,511 787 1718 32370

6 Israel palestine conflict 252,746 798 1814 38499

7 US president election 36,037 731 1792 33249

6. EXPERIMENTS

Among the different metadata around a video, title, tag and description are more likely to be infor-mative in revealing the semantic meaning. There may be possibilities for mining other metadata (e.g.,comments), but we leave it for future research. We present two experiments to demonstrate the perfor-mance of the proposed clustering-based video retrieval framework. First, we refer to text search resultclustering and evaluate subtopic reach time with state-of-the-art algorithms on a benchmark dataset.Then we consider assessing the retrieval effectiveness in a web-scale video dataset collected from videosharing websites.

6.1 Dataset

Text subtopic retrieval dataset. We utilized a benchmark text search result clustering evaluationdataset, AMBIENT.2 AMBIENT consists of 44 topics, each with a set of subtopics and a list of 100search results with corresponding subtopic relevance judgments. The topics were selected from the listof ambiguous Wikipedia entries. The 100 search results associated with each topic were collected from

Yahoo, and their relevance to each of the Wikipedia subtopics were manually assessed.Video sharing Web site dataset.Since the goal of this paper is to present a clustering-based browsing

algorithm for Web video retrieval, it is important to devise methods for evaluating its performancein real video sharing websites. After careful examination of the hottest topics in Youtube, GoogleZeitgeist, and Twitter, we selected seven social and political topics as queries. We issued these queriesto Youtube, Metacafe, and Vimeo, and crawl the top 500,150 and 150 (if there are) returned videos for

experiments, respectively. We focused on the topmost search results to avoid bringing too many unre-lated videos. Videos with no tags are filtered out. The videos collected from each query form a videoset. The queries and information about corresponding video set are listed in Table I.

6.2 Parameter Settings

The work in Hindle et al. [2010] (we refer it as BCS) has a similar motivation, but our work differs fromtheirs in several aspects: 1) BCS employs a flat-structure clustering algorithm; 2) BCS uses the clustercentroid to represent the cluster and provides no mechanism for how to derive the cluster labels. Sincethis is the most relevant work with us, we performed their method on our dataset as a comparasion.The most important parameters for BCS are the weights for adopted features, visual, tag, title, anddescription. Affinity propagation (AP) and normalized cut (NC) are utilized as the clustering algorithmand they demonstrated AP generally outperforms NC. Therefore, we fixed the set of feature weightsshowing best performance with AP clustering: visual-0.3, tag-0.49, title-0.07, description-0.14.

To further evaluate the advantage of exploiting a hierarchical topic structure, we also implementedLDA and compare it with hLDA and SShLDA. Topic models make assumptions about the topic struc-ture by the settings of hyperparameters. We empirically fixed the hyperparameters according to the

2http://credo.fub.it/ambient.



16/22


Fig. 7. Average subtopic number error as changes.

prior expectation about the data. The hyperparameter controls the smoothing/sparsity of topic-worddistribution. Small encourages more words to have high probability in each topic. (For LDA, it re-quires less topics to explain the data. For hLDA and SShLDA, it leads to a small tree with compacttopics.) Delighted from this, we empirically chose a relatively small value of and set = 0.5. BothhLDA and SShLDA have an additional hyperparameter, CRF parameter , which decides the size ofthe inferred tree. As in Blei et al. [2004], we set = 1 to reduce the likelihood choosing new paths

when traversing nCRP.Dirichlet prior hyperparameter for LDA and the GEM parameters m, for hLDA and SShLDA

jointly control over the mixing of document-topic distribution. For LDA, our goal is to group documentsinto topic-specific clusters according to the dominant topic proportions. Therefore, is fixed to a valuemuch larger than 1 ( = 50) to encourage high mixing of topics. For hLDA and SShLDA, GEM pa-rametersm, reflect the stick-breaking distribution. We set m to be a small value (m = 0.1), and theposterior is more likely to assign more words to the leaf level of the inferred tree. Setting variance to be a small value ( = 10) means that the word allocation adheres to the parameter settings, thusaccelerates the convergence speed.

For the choice of supervision strength parameter , we divided the AMBIENT dataset into two sub-sets: one consisting of 10 topics for the determination ofand one consisting of 34 topics for evaluatingthe clustering performance. We assume that appropriate brings no perturbation to the hierarchicaltopic discovery process and the derived topic tree should be consistent with the latent hierarchicalstructure. Therefore, we analyzed the error between the subtopic number of ground truth and the de-rived subtopic number over the different values of(see Figure 7). = 0.5 achieves the least error.Therefore, we fixed = 0.5 in the following experiments. In fact, we also compared the retrieval per-formance with respect to various in Section 6.4, and found that the performances for the differentqueries share a similar variation pattern: the results deteriorate as approaches 0 or 1, and there islittle difference when [0.3, 0.6]. Therefore, for practical implementation where a training set is notavailable, is suggested to set as 0.5.

6.3 Experiments on a Text Subtopic Retrieval Dataset

We first performed experiments on AMBIENT. To evaluate the retrieval performance by search resultclustering, we borrowed the metric of subtopic reach time (SRT) [Carpineto et al. 2009], which is amodelization of the time taken to locate a relevant document. For each querys subtopic, the subjects

first select the most appropriate label (or topic representation) created by the clustering algorithm.The SRT value is then computed by summing the number of clusters and the position of the firstrelevant result in the selected cluster. For instance, the SRT for the subtopic Live attack and rescuevideo in Figure 2(b) would be 5, given by the number of clusters (4) plus the position of the firstrelevant search result (Never before seen Video of WTC 9/11 attack) within the selected cluster (1).



17/22


Table II.Comparison of subtopic reach time of state-of-the-art text search results

clustering with LDA, hLDA and SShLDA on the AMBIENT text collection.

CREDO Lingo Lingo3G STC TRSC LDA hLDA SShLDA

14.96 15.05 13.11 15.82 17.46 15.73 12.7 10.92

When no appropriate cluster fits the subtopic at hand, or the selected cluster does not contain anyrelevant result, SRT is given by the number of clusters plus the position of the first result relevant tothe subtopic in the ranked list.

We noticed that Hindle et al. [2010] adopted visual feature in the clustering process, and it is notfair to examine it in a text-based AMBIENT database. Therefore, we only compared the results (whichis averaged over the test set of 34 queries) of LDA, hLDA and proposed SShLDA with state-of-the-arttext search clustering algorithms in Table II (the results of text search clustering algorithms are takenfrom [Carpineto et al. 2009]). Four graduate students participated in the user study task as subjects.The best performance is achieved by SShLDA, followed by hLDA, which is due to the separation ofshared common topic from subtopics. It is interesting to note that the SRT for LDA is relatively high.The topics AMBIENT included are most general terms, e.g. Eos, Cube,B-52. The descriptive power of

topic model for complex queries cannot be exerted.

6.4 Experiments on Video Sharing Web Sites

Visualization of the discovered subtopics.We visualize the discovered subtopics of video collections fortest queries in Figure 13. For the query of 9/11 attack, the subtopics derived from LDA and topichierarchies derived from hLDA and SShLDA are presented together for comparison. It is shown thatLDA mixes common words like attack, 911, September, terroristin different subtopics and fails to dis-cover the shared topic. The topic hierarchy recovered by hLDA finds the shared topic on the root level.However, without constraint of topic distribution over the seed word set, words describing the sharedtopic, for instance, wtc, terrorist, 11, attackalso appear in subtopics. This contaminates the subtopicsand limits its power to subevents or viewpoints detection. Incorporated with supervision information,SShLDA prevents seed words generating from the subtopics, and results in a topic hierarchy with

concise subtopics focusing on the refined themes.Comparing different clustering methods. For evaluation, human accessors create ground-truthsubtopic themes after browsing the retrieved videos for each query-corresponding video set. For exam-ple, the subtopic themes inside the video collection derived from the query abortion are summarizedas pro-abortion, anti-abortion, and neutral. Videos are manually labeled as belonging to a certainsubtopic, (cluster). The ground-truth subtopic number and derived subtopic (cluster) number by BCS,hLDA, and SShLDA for the test queries are shown in Figure 8(left). We can see that all three modelsfail to recover the ground-truth subtopic number for some video sets. The reason is that the ground-truth subtopic themes created by subjective assessment may not reflect the nature of the video set,especially when unrelated noisy videos are involved. We also notice that SShLDA and hLDA performsbetter than BCS. The BCS curve is high above the ground truth. This is due to its duplicate clusteringalike mechanism, which results in small-size duplicate video clusters.

We first compare the SRT of LDA, hLDA and SShLDA on the collected video dataset in Figure 8

(right). The result is consistent with the experimental result on AMBIENT dataset that SShLDA andhLDA achieves lower SRT than LDA.

In addition to SRT which aims to access the retrieval performance, we use four criteria to quantifythe clustering quality,purity[Tan et al. 2005],F measure,cluster description readabilityandcomputa-tional efficiency. Figure 9(left) show the cluster purity for BCS, LDA, hLDA and SShLDA. We find that



18/22


Fig. 8. (left:)The ground-truth subtopic number and automatically derived cluster number for test queries. (right:)Subtopicreach time (SRT) for test queries.

Fig. 9. (left:)Purity rates. (right:)F1 measure for test queries.

BCS noticeably outperforms the other algorithms. High purity is easy to achieve when the numberof clusters is large. Therefore, we cannot use purity to trade off the quality of the clustering againstthe number of the clusters. A measure to make this trade-off is F measure[Steinbach et al. 2000]. We

evenly penalize false negatives and false positives, i.e. the F1 measure (Figure 9(right)). It is shownthat BCS performs poorly on F1 measure, even much worse than LDA. The reason is that BCS focuseson clustering duplicate or near-duplicate videos, which limits the cluster size and forces considerablenumber of semantically similar videos assigned to different clusters.

The quality of the cluster description is crucial to the usability of clustering-based video retrieval.If a cluster cannot be described, it is presumably of no value to the user. BCS employs the clustercentroid as the cluster representation, which lacks real descriptions and is of litter use for guidingthe user understanding the cluster content and locating the interesting videos. The cluster descrip-tion readability is evaluated as follows. Each cluster corresponding subtopic characterized by the top 5probable words was shown to the participants with the top 3 ranked videos in this subtopic. The par-ticipants were asked to evaluate the cluster description readability in two aspects: whether the topicdescription itself is sensible, comprehensive and compact (question 1) and whether the topic descrip-tion is consistent with the representative videos (question 2). For each question, participants rated

from 1 to 5 where 5 is best. The average ratings are shown in Figure 10. The proposed SShLDA showssuperiority on generating meaningful cluster descriptions, especially on generating sensible, compre-hensive and compact representations (question 1). We note that ratings for query 5 Beijing Olympicsare relatively low. In the retrieved video set of Beijing Olympics, diverse events or subtopics are in-

volved, for instance opening ceremony, game video, athlete interview, torch relay, etc. The discovered



19/22


Fig. 10. Mean ratings of cluster description readability for (left:) Question 1 (right:) Question 2.

Table III. Average Time Cost of Different ClusteringAlgorithms

BCS LDA hLDA SShLDA1 SShLDA2

Time Cost (s) 0.7 3.5 6.8 4.7 6.1

Fig. 11. Mean rating score of Youtube and our method.

Fig. 12. Subtopic reach time as strength parameter changes.

topic structure is sparse and less meaningful. Besides, some unrelated videos regarding issue of Tibetare also included.

For clustering-based video retrieval, the clustering is performed online, which requires necessarilyshort response time. We focus on the efficiency of clustering algorithms and do not consider about



20/22


Fig. 13. Discovered subtopics from the video collection of seven queries from Youtube. (a) 9/11 attack, comparison between LDA,hLDA and SShLDA. For SShLDA, we also present 2 video examples having the largest proportion associated with the topics(b) gay rights; (c) abortion; (d) Iraq war invasion; (e) Beijing Olympics; (f) Israeli Palestine conflicts; (g) US president election.



21/22


the video acquisition time cost. We assume that visual features used in BCS are extracted offline andtake no account of text preprocessing time. Table III illustrates time complexity for the clusteringalgorithms. (SShLDA1 denotes the clustering time cost only, SShLDA2 also considers the query ex-pansion time from local-storaged WordNet). Since BCS uses AP for clustering, it achieves lower time

cost than the generative topic models. The speedup of SShLDA over hLDA is due to that incorporatedprior guides the seed words gradually generated from the root set and thus speedups the convergenceprocess. We noticed that the computational cost dramatically increases when dealing with large-scaleweb videos, and we will be researching towards this in future work.

Clustering versus ranked lists. To compare the proposed clustering-based video retrieval with ex-isting video search engines, for instance, Youtube, we design a specific task. The task assumes theparticipant is a news editor and wants to allsidedly introduce a hot event or topic to users, search for10 Web videos. Participants choose Youtube or the proposed clustering-based interface to complete thetask in a random order. After the task, participants are required to select from four options for bothsystems. The options are very satisfied (4), somewhat satisfied (3), unsatisfied (2) and very unsatisfied(1). The average ratings are shown in Figure 11. For five out of seven test queries, participants preferthe proposed clustering-based method to ranked list-based search engine.

Clustering performance with respect to strength parameter .To analyze the influence of the strength

parameterto the clustering performance, we performed an experiment to evaluate the SRT by tuning [0, 1] at a step of 0.1. With the results illustrated in Figure 12, we come up with three observations:1) As changes from 0 to 1, the retrieval performance of different queries varies similarly, with query1 varies slightly different. A rough conclusion is that different datasets share a unique pattern ofchoosing. 2) The results deteriorate dramatically when =1, which verifies our assumption that ahard constraint is not practical. 3) While the results deteriorate as approaches 0 or 1, there is littledifference when [0.3, 0.6]. This means that the incorporation of prior knowledge is effective andour algorithm does not heavily depend on the choice of the strength parameter. A chart of subtopicsis given in Figure 13.

7. CONCLUSIONS

In this article, we have presented a hierarchical topic model based framework for clustering-based web

video retrieval. Instead of showing a long ranked list videos, we explore the hierarchical topic struc-ture in the retrieved video collection and present users with videos organized into semantic clusters.Experiments demonstrate the effectiveness of the proposed method.

In the future, we will improve our current work along three directions. 1) Unrelated videos in re-trieved video collections will affect the clustering performance. We will develop noisy subtopic awarehierarchical topic model to reduce the influence of noises as well as remove unrelated videos. 2) Somesummary videos cover various aspects of query related topic, for instance, an introductive video de-scribes 3 main viewpoints towards the issue of abortion: pro-life, pro-choice and neutral. In this case,the video cannot be grouped into arbitrary subtopic. The SShLDA needs to be extended to multipathassignment version: each document exhibits multiple paths through the tree and topic depth Lcan

vary from document to document. 3) So far our experiments have been based on textual analysis andconsider no visual information. Web videos carry rich visual contents and visual information providesimportant clues for video clustering that should not be ignored. We are now working towards incorpo-

rating visual information into the hierarchical topic modeling framework.

REFERENCES

BISHOP, C. M. 2006. Pattern Recognition and Machine Learning. Springer.

BLEI, D., NG, A., AND JORDAN, M. 2003. Latent dirichlet allocation. J. Mach. Learn. Res. 7, 9931022.



22/22


BLEI, D. M., GRIFFITHS, T. L., AND J ORADAN, M. I. 2010. The nested chinese restaurant process and bayesian nonparametricinference of topic hierarchies. J. ACM 57,2, 130.

BLEI,D .M.,GRIFFITHS, T. L., JORADAN,M.I.,AND TENENBAUM, J. 2004. Hierarchical topic models and the nested chinese restaurantprocess. InAdvances in Neural Information Processing Systems. MIT Press, 1724.

CAI, D., HE, X., LI, Z., MA, W. Y., AND WEN, J. R. 2004. Hierarchical clustering of www image search results using visual textualand link information. InProceedings of the ACM Multimedia Conference (MM). 952959.

CAO, J., NGO, C.-W., ZHANG, Y.-D., ZHANG, D.-M., AND MA, L. 2010. Trajectory-based visualization of web video topics. InProceed-ings of the ACM Multimedia Conference (MM). 16391642.

CARPINETO, C., OSINSKI, S., ROMANO, G., AND WEISS, D. 2009. A survey of web clustering engines. ACM Comput. Surv. 41,3, 138.

CHANDRAMOULI, K., KLIEGR, T., NEMRAVA, J., SVATEK, V., AND IZQUIERDO, E. 2008. Query refinement and user relevance feedbackfor contextualized image retrieval. In Visual Information Engineering, Xian, China, 452458.

CHEUNG, S. S. AND ZAKHOR, A. 2004. Fast similarity search and clustering of video sequences on the world-wide-web. IEEETrans. Multimedia 7,3, 524537.

CUTTING, D. R., PEDERSEN, J. O., KARGER, D. R., AND TUKEY, J. W. 1992. Scatter/gather: a cluster-based approach to browsinglarge document collections. In Proceedings of the Annual International ACM SIGIR Conference on Research and Developmentin Information Retrieval (SIGIR). 318329.

DRUCK, G., MANN, G., AND MCCALLUM, A. 2008. Learning from labeled features using generalized expectation criteria. InPro-ceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).595602.

GONG, Z., CHEANG, C. W., AND U, L. H. 2005. Web query expansion by wordnet. In Proceedings of the International Conference onDatabase and Expert S ystems Applications (DEXA). Springer-Verlag, 166175.

HINDLE, A., SHAO, J., LIN, D., LU, J., AND ZHANG, R. 2010. Clustering web video search results based on integration of multiplefeatures. InProceedings of the International World Wide Web Conference (WWW), 121.

JING, F., WANG, C., YAO, Y., DENG, K., ZHANG, L., AND MA, W. Y. 2006. Igroup: web image search results clustering. In Proceedingsof the ACM Multimedia Conference (MM). 377384.

KUMMAMURU, K., LOTIKAR, R., AND ETZIONI, O. 1998. Web document clustering: A feasibility demonstration. InProceedings of the21st International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 4654.

LIU, J. 1994. The collapsed gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer.Stat. Ass oc. 89, 958966.

LIU, L., RUI, Y., SUN, L.-F., YANG, B., ZHANG, J., AND YANG, S.-Q. 2008b. Topic mining on web-shared videos. In Proceedings of theInternational Conference on Acoustics, Speech, and Signal Processing (ICASS P). 21452148.

LIU, L., SUN, L.-F., RUI, Y., SHI, Y., AND YANG, S.-Q. 2008a. Web video topic discovery and tracking via bipartite graph reinforce-ment model. InProceedings of the International World Wide Web Conference (WWW). 10091018.

MILLER, G. A., BECKWITH, R., FELBAUM, C., GROSS, D., ANDMILLER, K. 1990.Introduction to WordNet: An On-line Lexical Database.

Vol. 3. Oxford University Press.RAMACHANDRAN, C., MALIK, R., JIN, X., GAO, J., AND HAN, J. 2009. Videomule: a consensus learning approach to multi-label

classification from noisy user-generated videos. In Proceedings of the Multimedia Conference (MM).

STEINBACH, M., KARYPIS, G., AND KUMAR, V. 2000. A comparison of document clustering techniques. InProceedings of the ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 3542.

TAN, P., STEINBACH, M., AND KUMAR, V. 2005. Introduction to Data Mining. Vol. 19. Addison Wesley.

TEH, Y. W., JORDAN, M. I., BEAL, M. J., AND BLEI, D. M. 2006. Hierarchical dirichlet processes. J. Amer. Stat. Asso. 101, 476,15661581.

WU, X., HAUPTMANN, A. G., AND NGO, C.-W. 2007. Practical elimination of near-duplicates from web video search. In Proceedingsof the ACM MultiMedia Conference (MM). 218227.

YUAN, J., LUO, J., AND WU, Y. 2010. Mining compositional features from gps and visual cues for event recognition in photocollections. IEEE Trans. Multimedia 12,7, 705716.

YUAN, J., MENG, J., WU, Y., AND LUO, J. 2008. Mining recurring events through forest growing. IEEE Trans. Circuits Syst. VideoTechn. 18,11, 15971607.

ZAMIR

, O.AND

ETZIONI

, O. 1998. Web document clustering: A feasibility demonstration. InProceedings of the 21st InternationalACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, 4654.

Received September 2010; revised March 2011; accepted July 2011


Documents

5 TOMCCAP Browse by Chunks