Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Topic Evolution in a Stream of Documents
Andr�e Gohr
Leibniz Institute of Plant
Biochemistry, IPB, Germany
Alexander Hinneburg
Martin-Luther-University
Halle-Wittenberg, Germany
Ren�e Schult and Myra Spiliopoulou
Otto-von-Guericke-University Magdeburg, Germany
fschult,[email protected]
Abstract
Document collections evolve over time, new topics emergeand old ones decline. At the same time, the terminologyevolves as well. Much literature is devoted to topic evolutionin �nite document sequences assuming a �xed vocabulary. Inthis study, we propose \Topic Monitor" for the monitoringand understanding of topic and vocabulary evolution overan in�nite document sequence, i.e. a stream. We useProbabilistic Latent Semantic Analysis (PLSA) for topicmodeling and propose new folding-in techniques for topicadaptation under an evolving vocabulary. We extract aseries of models, on which we detect index-based topic threadsas human-interpretable descriptions of topic evolution.
1 Introduction
Scholars, journalists and practitioners of many disci-plines use the Web for regular acquisition of informa-tion on topics of interest. Although powerful tools haveemerged to assist in this task, they do not deal withthe volatile nature of the Web. The topics of interestmay change over time as well as the terminology asso-ciated with each topic. This yields text mining modelsobsolete and pro�le-based �lters ine�ective. We addressthis problem by a method that monitors the evolutionof topic-word associations over a document stream. Our\TopicMonitor" discovers topics, adapts them as docu-ments arrive and detects topic threads, while consideringthe evolution of terminology.
Most of the studies on topic evolution can becategorized into methods for (1) �nite sequences and(2) for streams. Methods of the �rst category observethe arriving documents as a �nite sequence and infermodel parameters under the closed world assumption.Essentially, the document collection is assumed to beknown. The vocabulary over all documents inducesa �xed feature space. The second category makesthe open world assumption, thus allowing that newdocuments arrive, new words emerge and old ones areforgotten. This implies a changing feature space fordocuments over time.
Stream-based approaches are intuitively closer tothe volatile nature of the Web. However, evolutionarytopic detection methods, including recent ones basedon Probabilistic Latent Semantic Analysis and LatentDirichlet Allocation [11, 4, 18] all assume that the doc-uments form a �nite sequence. To deal with a documentstream, they take a retrospective view of the world: ateach new timepoint, all information seen so far is usedto re-build an extended model. This perspective hastwo disadvantages. First, it only works if the documentstream is slow such that retrospection of the completepast is feasible. Second, the feature space grows withtime, thus giving raise to the curse of dimensionalityproblem: the number of data points needed for a re-liable estimate (of any property) grows exponentiallywith the number of features. Since a stream cannot beslow and fast at the same time, evolutionary topic mon-itoring requires adaptation to both changes in the datadistribution and in the vocabulary/feature space.
TopicMonitor deals with these problems by adapt-ing the feature space and the underlying model gradu-ally. We invoke Probabilistic Latent Semantic Analysis(PLSA) in a window that slides across the stream: asthe window slides from one timepoint to the next, wedelete outdated words and documents, incorporate newdocuments into the previous model and then adapt theold model to new words, deriving the current model ofeach timepoint.
Gradual model adaptation brings an inherent ad-vantage over re-learning at each discrete timepoint: notonly the model as a whole is adapted, rather each dis-crete topic built at timepoint ti transforms naturally toits followup topic at ti+1. Hence, the adaptation pro-cess leads to a separation of topics into distinct index-
based topic threads over time. These threads provide acomprehensive summary of topic and terminology evo-lution. In Section 3.2 we describe how we extend con-
859 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
ventional PLSA for dynamic data, explain our incre-mental model adaptation over a document stream andshow how index-based topic threads are built.
The evaluation of our approach is not trivial, be-cause it involves comparing methods that learn on dif-ferent feature spaces. In Section 4 we describe our eval-uation framework and describe the reference method forour experiments, which we present in Section 5.
2 Related Work
Many studies on topic evolution derive topics by doc-ument clustering. Most of them consider a �xed fea-ture space over the document stream: Morinaga andYamanishi [12] use Expectation-Maximization to buildsoft clusters. A topic consists of the words with thelargest information gain for a cluster. Aggarwal and Yu[1] trace droplets over document clusters. A droplet con-sists of two vectors that accommodate words associatedwith the cluster. The cluster evolution monitor MONIC[17] and its variants [15, 14, 13] treat topic evolution asa special case of cluster evolution over a feature spacethat may change, too.
Topic models like PLSA [9] and Latent DirichletAllocation (LDA) [5] characterize a topic as multino-mial distribution over words rather than as cluster ofdocuments. This is closer to the intuitive notion of\topic" in a collection. Most of evolutionary topic mod-els based on PLSA or LDA [11, 4, 18] make the closedworld assumption: they observe the arriving documentsas a �nite sequence with a �xed vocabulary. For exam-ple, Mei and Zhai [11] assume the complete vocabularyto be known and a static background model over alltimepoints. The background model propagates statis-tics about non-speci�c word usage in time (forwards andbackwards).
The retrospective approach of Mei and Zhai [11]builds a PLSA model at each timepoint in addition tothe background model that is �xed over all timepoints.Gri�ths and Steyvers [8] build a single LDA modelbased on collapsed Gibbs sampling to �nd scienti�ctopics. They assign temporal properties to topics, suchas becoming \cold" or \hot". Incremental LDA [16]updates the parameters of the LDA model as newdocuments arrive, but again assumes a �xed vocabularyand does neither forget the in uence of past documentsnor of outdated words. The distributed asynchronousversion of LDA [3] allows to incrementally grow an LDAmodel. However, it still assumes a static vocabulary.
AlSumait et al. [2] propose Online LDA. They ex-tend the Gibbs sampling approach suggested by Grif-�ths and Steyvers to handle streams of documents.Gibbs sampling at one timepoint is used to derive thehyperparameters of the topic-word associations at the
next timepoint, so that successive LDA models are cou-pled. New words are collected as they are seen, so Al-Sumait et al. do not assume that the whole vocabularyis known in advance. However, the vocabulary growsover time so that the curse of dimensionality problemstill occurs.
The topics over time model (TOT) [18] extendsLDA to generate timestamps of documents. In this way,a topic is discovered in exactly those timepoints in whichthe vocabulary on it is homogeneous. In other words,the model associates a timepoint to some static topic.
Dynamic topic model (DTM) [4] uses a basic LDAmodel at each timepoint. Hyperparameters of theseLDA models are propagated forward and backward intime via a state space model. A \dynamic topic"is a sequence of vectors over time. Each vector isa multinomial distribution over words of the �xedvocabulary.
TopicMonitor derives the model for the currenttimepoint by adapting the model of the previous time-point to both new documents and words while forget-ting old documents and obsolete words. Closest to ourwork is the recently published incremental probabilisticlatent semantic indexing method of Chou and Chen [7].We additionally address the issue of over�tting by in-vestigating the maximum a posteriori estimation of theunderlying PLSA model.
Di�erences Between PLSA and LDA. SincePLSA and LDA are frequently juxtaposed in the topicmodeling literature, we provide a brief discussion oftheir di�erences here. The PLSA approach proposedby Hofmann [9] does not use prior distributions for themodel-parameters. The LDA approach proposed byBlei et al. [5] uses Dirichlet priors for model-parameters:the hyperparameters are learned by Empirical Bayesfrom data. However, many LDA-oriented approaches[8, 3, 19, 20] on collapsed Gibbs sampling always assume�xed hyperparameters. These variants of LDA comequite close to the maximum a posteriori (MAP)-PLSAmodel, as we used it here. Beside inference, the onlynotable di�erence between the two approaches is thatMAP-PLSA still uses empirically estimated document-probabilities1, while LDA never uses this quantity.In our approach document-probabilities are needed toaccount for new words, as we describe in the nextsection.
3 Topic Discovery With an Adaptive Method
The core of TopicMonitor is an adaptive process deriv-ing a sequence of PLSA models over time. Each PLSAmodel is adapted from the previous one to comprise
1Document-probabilities P (d) are denoted by �d later on.
860 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Table 1: Summary of Notation
jXj number of elements in set Xl size of sliding window in timepointsD;W set of documents, words (vocabulary)d, w, z denote documents, words,
hidden topicsk, i run over topics and timepointsnd;w number of occurrences of (d;w)� = (���;!!!;���) PLSA modelK number of hidden topics of �
��� 2 RjDj�K document-speci�c topic-mixtureproportions
!!! 2 RK�jW j topic-word associations��� 2 RjDj empirical document-probabilities
��� 2 RjW j empirical word-probabilitiesm EM-iterations for recalibrations; r smoothing parameters controlling
prior distributions of topic-mixtureproportions (s) and topic-wordassociations (r), respectively
smooth evolution of topics over time. We use a Maxi-mum A Posteriori (MAP) estimator of the PLSA model2
to guard against over�tting and we propose new folding-in techniques for adaptation of PLSA models. We �rstpresent how we apply the MAP-principle to estimatea PLSA model and introduce our new folding-in tech-niques upon it. In Subsection 3.3 we describe how ouradaptive PLSA uses MAP and performs folding-in tolearn a new model gradually from a previous one. Thenotation is summarized in Table 1.
3.1 MAP-PLSA and Folding-In of Documents.
Conventional PLSA [9] models co-occurrences (pairs ofdocuments and words)3 by a mixture model with Kcomponents (topics). Modeled documents are denotedby D, modeled words by W . The number of times a(document,word) pair occurs is denoted by nd;w.
Assuming K hidden topics zk (1 � k � K),the probability of observing document d and word wdecomposes into P (d;w) = P (d)
PK
k=1 P (wjzk)P (zkjd).Hence, a PLSA model � is parametrized by a tuple(���;!!!;���) containing:
1. 8 d 2 D : document-speci�c topic-mixture propor-tions ��� = [���d] = [�d;k = P (zkjd)],
each K-dimensional row ���d ful�lls:PK
k=1 �d;k = 1
2A conceptually similar estimator is used in [6].3Co-occurrence means that a particular document-ID appears
together with a ID of a speci�c word type stating a word is seenin a document.
2. 8w 2 W : topic-word associations !!! = [!!!k] =[!k;w = P (wjzk)],each row !!!k ful�lls:
Pw2W !k;w = 1
3. 8 d 2 D : document-probabilities ��� = [�d = P (d)]
Each row !!!k is a multinomial distribution over allmodeled words and describes the kth topic.
The log-likelihood of the data D given thePLSA model � is
Pd2D
Pw2W nd;w � logP (d;w) with
P (d;w) = �dPK
k=1 !k;w �d;k.The parameters are usually estimated by maximiz-
ing the likelihood (cf. [9]). Document-probabilities ��� areestimated by
(3.1) �d =Xw2W
nd;w=Xd02D
Xw2W
nd0;w
The EM-algorithm is used to estimate the other param-eters ���;!!!.
MAP estimation of ���;!!! determines those parame-ters that maximize the log-a-posteriori probability. Thefollowing log-priors for topic-mixture proportions andtopic-word associations are used:
(3.2) logP (�) =Xd2D
logDir(���djs) +KXk=1
logDir(!!!kjr)
The hyperparameters of the Dirichlet distributions (Dir)are determined by s > �1 and r > �1 as follows:[s + 1]k=1;:::;K and [r + 1]w2W . The parameters s andr are called smoothing parameter for the estimates ofP (zjd) and P (wjz). Setting s = 0, r = 0 makes MAPestimates equal to ML estimates because the Dirichletpriors become uniform. Note that the priors remain�xed throughout EM-learning (in contrast to LDA [5]which learns the hyperparameters itself). The EM-algorithm [10] leads to these update-equations duringthe ht+ 1ith iteration:
E-step:
kd;w = P (zkjw; d; �hti) =
!htik;w �
htid;kPK
k0=1 !htik0;w �
htid;k0
(3.3)
M-step:
�ht+1id;k =
s+P
w2W nd;w kd;wK � s+
Pw2W nd;w
(3.4)
!ht+1ik;w =
r +P
d2D nd;w kd;wjW j � r +
Pd2D nd;w0 kd;w0
(3.5)
If nd;w or kd;w become very small, the EM update-equations (3.4) and (3.5) will be problematic since themaximum of the posterior density might get unde�ned.
861 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Figure 1: Document stream inducing partial orderingon documents
This might result in negative parameter estimates [6].To avoid that problem, we restrict the smoothing pa-rameters to s � 0 and r � 0.
The folding-in procedure by Hofmann [9] allowsto incorporate new documents into an already trainedmodel. MAP-folding-in of a new document d0 into anexisting model � estimates the topic-mixture propor-tions ���d0 : the EM-algorithm continues to estimate ���d0having �xed the topic-word associations !!! (underlinedin Eq. 3.3 and 3.5). The extended model comprisesthe concatenation of ���d0 row-wise to the already de-rived ones: [��� ���d0 ]. The document-probabilities are re-estimated for all modeled documents using Eq. 3.1. It isrequired that d0 contains some words modeled by � sinced0 is reduced to those words for folding-in. To graduallyadapt a PLSA model according to a document-stream,we use MAP-folding-in of new arriving documents intoan existing model, as explained in Section 3.2.
3.2 Adaptive PLSA for Topic Monitoring {
Overview. Streaming documents d1; d2; : : : arrive at(possibly irregularly spaced) timepoints t1; t2; : : :. Weallow that several documents arrive at the same time.Hence, time induces a partial ordering on documents.Figure 1 depicts a possible stream of documents.
Topic-detection at timepoint ti implies building aPLSA model �i upon all documents that arrive in theinterval (ti � l; ti]. The parameter l a�ects the size ofthe interval and allows for taking some documents fromthe near past into account. The result is a sequence ofPLSA models �1; �2; : : :, one at each point in time.
Almost all topic monitoring methods so far as ex-plained in Section 2 exploit the old model � at timepointti�1 to build � at ti assuming an unchanged vocabularyfrom one timepoint to the next. We judge this assump-tion questionable for real stream data because futurevocabulary is unknown at each timepoint. If the vo-cabulary does change, the baseline approach will be tobuild models �i independently of each other taking thevocabulary of (ti�l; ti] into account. To uncover threadsof topics over time, one would measure similarity be-tween each detected topic at successive timepoint and\connect" those being most similar according to somesimilarity function. For PLSA-based topic monitoring,
Figure 2: MAP-PLSA-model allowing to fold-in newdocuments (left) and inverted MAP-PLSA-model allow-ing to fold-in new words (right). Bayesian calculus isused to transform between both forms.
we call this baseline \independent PLSA".In contrast thereto, TopicMonitor builds the new
model � by adapting the one of the previous timepoint� to new words and new documents { it evolves the oldmodel to the new one. Because successive PLSA modelshave been adaptively learned, the kth analyzed topicat ti evolves from the previously analyzed kth topic atti�1. Hence, we formally de�ne for each k an index-
based topic thread that consists of the sequence of word-topic associations !1k; !
2k; : : : at timepoints t1; t2; : : :.
Each index-based topic thread describes the evolutionof hidden topics zk over time.
Essential to our approach are the two equivalentforms of an PLSA model which assume the followingdi�erent decompositions:
P (d;w) = P (d)KXk=1
P (wjzk)P (zkjd)(3.6)
= P (w)KXk=1
P (djzk)P (zkjw)(3.7)
Plate models of these two forms are given in Figure 2.If a PLSA model was given in its document-based form(Eq. 3.6), new documents would conveniently be folded-in into that model. In the same way, new words wouldconveniently be folded-in into an existing model if themodel was given in its word-based form (Eq. 3.7).
Since not only the documents but also the vocab-ulary changes over time, we want the adaptive processto learn model � at timepoint ti from model � at time-point ti�1 by adapting to both: new documents and newwords. This includes to incorporate new and to \forget"out-dated words. Forgetting out-dated words preventsTopicMonitor from accumulating all words ever seenand hence TopicMonitor does not unnecessarily blowup the feature space over time.
Roughly, the adaptive process that evolves themodel � at time ti from the model � at time ti�1 worksas follows.
1. remove documents modeled by � but not coveredby (ti � l; ti]
862 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Figure 3: Overview of the steps necessary to adapt a model � at timepoint ti�1 to �� at timepoint ti+1.
2. fold-in new documents covered by (ti � l; ti] into �using its document-based form
3. use Bayesian calculus to invert � into its word-basedform
4. remove words modeled by � not seen in (ti � l; ti]
5. fold-in word �rstly seen in (ti � l; ti]
6. use Bayesian calculus to invert � again into itsdocument-based form
7. recalibration gives the next PLSA model �
Figure 3 graphically describes the adaptive process ofTopicMonitor . Mathematical details of that processare given in Section 3.3.
3.3 Adaptive PLSA for Topic Monitoring {
Mathematics. For ease of presentation we denote by �the PLSA model at timepoint ti which should be evolvedfrom model � at timepoint ti�1. In addition, W denotesall words seen in (ti � l; ti] and W all words seen in(ti�1 � l; ti�1]. In the same way D and D are used.
Recall that a model � = (���;!!!;���) in its document-based form consists of three parameter sets: document-speci�c topic-mixture proportions ���, topic-word associ-ations !!! and document-probabilities ���. The �rst twoparameter sets can be seen as matrices of dimensionsjDj �K for ��� and K � jW j for !!!. Folding-in new doc-uments d 2 D n D translates to adding rows to the�rst matrix as explained in Section 3.1. Next, topic-mixture proportions of documents which are not cov-ered anymore by the current window at timepoint ti arenot needed and deleted. The current document-speci�c
topic-mixture proportions are given by: e��� = [���d ���d0 ]
with d 2 D \ D and d0 2 D n D. The new document-
probabilities e��� = [e�d] with d 2 D are re-estimated asshown in Equation 3.1 but using the old vocabulary Wonly.
New documents may introduce new words not con-tained in the current vocabulary W . The topic-wordassociations !!! of model � describe only words w 2W .
To handle new words w 2 �W n W , they willbe fold-in into the current PLSA model. For thatpurpose that model is inverted into its word-basedform by Bayesian calculus. Note that the elements
of e��� and !!! estimate conditional probabilities P (zkjd)and P (wjzk), respectively, which are inverted by the
following formulae: e�0d;k = P (djzk) = P (zkjd)P (d)=P (zk)and !0
w;k = P (zkjw) = P (wjzk)P (zk)=P (w). The wordprobabilities for w 2 W are derived by marginalization� 0w = P (w) =
Pd2D
PK
k=1 P (d)P (zkjd)P (wjzk) =Pd2D
PK
k=1e�d � e�d;k �!w;k. The inversion of the current
model into its word-based form is derived by:
e�0k;d = e�d;k � e�dPd02D
e�d0;k � e�d0 and
!0w;k =
!k;wP
d02De�d0;k � e�d0
� 0w
with d 2 D and w 2 W . This inversion gives
� 0 = (e���0;!!!0; ��� 0). Note that documents and words havechanged their roles. Thus, !!!0 denotes word-speci�c
mixture proportions and e���0 denotes document-topicassociations. Figure 2 schematically depicts the models.
The word-based form � 0 allows folding-in words in asimilar way the document-based form allows folding-in
863 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
documents. New words w 2W nW are folded-in into � 0
using their occurrences in documents of D. In addition,mixture proportions of words which do not appear inW are deleted.
The resulting word-speci�c topic-mixture propor-tions are given by: e!!!0
= [!!!0w !!!0
w0 ] with w 2 W \ Wand w0 2 W n W . Word-probabilities e��� 0 = [e� 0w]with w 2 W are empirically re-estimated: e� 0w =Pd2D
nd;w=Pw02W
Pd2D
nd;w0 .Having incorporated new and deleted out-dated
words, the current extended model e� 0 = (e���0; e!!!0;e��� 0)
is inverted back into its document-based form usingBayesian calculus again.
Parameters according to new documents and wordsare derived separately by folding-in. In order to couplethe in uence of both: new documents, and new words,we run the full EM-algorithm for a small number m ofiterations using all data in (ti � l; ti]. The result is theadapted model �� = (����; �!!!; ����) at timepoint ti.
Successively adapting �i from �i�1 as new dataarrive, gives a sequence of PLSA models �1; �2; : : :. Thatsequence models K topics at each timepoint and thustheir evolution over time. Tracing the topics having thesame index over time is possible because the adaptiveapproach evolves the kth topic-word association �!!!k attimepoint ti from the kth topic-word association !!!k attimepoint ti�1 without renumbering the topic indexes.
4 Evaluation Framework
Topic evolution with simultaneous adaptation to bothan evolving document stream and an evolving vocabu-lary has not been intensively studied in the past. Thus,there is no generally accepted ground truth available,which would allow a cross model evaluation with TOT[18] or DTM [4]. Therefore, objective of our evaluationis to show the improvements of our approach with re-spect to the \independent PLSA model" as described inSubsection 3.2, which is the natural baseline to tacklethe problem discussed here. Independent PLSA derivesfor each position of the sliding window a new PLSAmodel from scratch. If the in uence of the backgroundcomponent is set to zero, the approach of Mei and Zhai[11] will be equal to independent PLSA. That back-ground component uses global knowledge and thus can-not be used in a stream scenario. Hence, independentPLSA imitates the approach of Mei and Zhai as muchas possible in a stream scenario.
We use a real data document stream, namely theproceedings of the ACM SIGIR conferences from 2000to 2007.
4.1 Model Comparison. In the absence of groundtruth, our adaptive PLSA approach can be compared
to independent PLSA by computing perplexity on hold-out data. Informally, perplexity measures how surpriseda probabilistic model is seeing hold-out data. Hence, itassesses the ability of model generalizing to unseen data.
Hold-out data are constructed by splitting the setof word-occurrences of the current data into a trainingpart (80%) and a hold-out part (20%). We maintaintwo counters for occurrences of word w in documentd: nd;w for the training part and n̂d;w for the hold outpart. Both counters sum up to the total number of theoccurrences of (d;w) in the given data. During training,the learning algorithm does not see that documentscontain actually some more words. Hold-out perplexityis computed on the hold-out part of the data unusedduring training. Hold-out perplexity according to anarbitrary PLSA-model �� at timepoint ti is computedas:
(4.8) perplex(��) =
exp
0BB@�
Pd2 �D;w2 �W n̂d;w log
�PK
k=1 �d;k!k;w
�P
d2 �D;w2 �W n̂d;w
1CCA
The computation of hold-out perplexity does not requireany folding-in of new documents. Thus it is notdistorted in the way explained in [19].
Firstly, we analyze the impact of the smoothing pa-rameters s and r used to de�ne the prior distributionsof the topic-mixture proportions and topic-word asso-ciations on the generalization ability of the model. AllSIGIR documents are used as a single batch. We verifyby this experiment the advantage of MAP-PLSA overthe maximum-likelihood estimator of PLSA.
Further, we compare the performance of adaptivePLSA and independent PLSA by average hold-out per-plexity. Our adaptive method derives the model at eachtimepoint by adapting the model of the previous time-point: it folds-in new documents and words, forgetsold documents and words, and recalibrates the parame-ters. Note that both models di�er only by initialization.At each timepoint, document-word-pairs contained inthe sliding window are held out in order to computeits perplexity. We analyze the average hold-out per-plexity over all points in time versus the number oflearning-steps. Learning-steps are either the numberof EM-iterations for independent PLSA or the numberof EM-recalibration steps m for adaptive PLSA. In ad-dition, the number of EM-steps of adaptive PLSA tofold-in documents and words are also limited to m. Forthe purpose of a fair comparison, independent PLSA isrestarted three times at each timepoint; only the bestresults are reported. This experiment analyzes whether
864 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
coupling models by initialization indeed propagates use-ful information along the sequence of models and helpsthe models to better �t the data.
The third experiment investigates to which degreethe natural order of the stream contains important la-tent information that helps to predict future documents.In addition, we analyze how the natural order in uencesprediction-power of PLSA models determined by ourproposed adaptive approach compared to the indepen-dent one. We compute at each timepoint the predictiveperplexity of documents from the next timepoint. Tothat end, documents at time ti have to be folded-ininto the model computed at timepoint ti�1. Afterward,these predictive perplexities are averaged over all time-points. Averaged predictive perplexities are computedfor the natural stream order and for a randomly per-muted stream. If our assumptions hold, the averagepredictive perplexity according to the permuted streamshould be signi�cantly larger. We follow the idea ofhalf-folding-in [19] to avoid over-optimistic estimates ofpredictive perplexity. The idea is to fold-in new docu-ments based on only a part of their words, while the restof them is held out. The predictive perplexity accordingto these documents is computed by (4.8) only using theheld-out part unseen during folding-in.
4.2 Comparison of Topic Threads. We study thesemantic meaningfulness of index-based topics threadsover time obtained by our adaptive PLSA method.We have de�ned index-based topic threads for a givensequence of PLSA models in Section 3.2. These topicthreads are primarily de�ned via a technical parameter:the index of the topic.
The independent PLSA model computes topics atdi�erent timepoints independently. Thus, topic indexesat di�erent timepoints are arbitrary. Hence, seman-tically close topics at di�erent timepoints have to bematched using some similarity measure in order to de-�ne threads of topics over time. Best-match topicthreads are built as follows: each topic-word associa-tion from model �i is matched to that topic-word asso-ciation from model �i+1 being maximal similar accord-ing to cosine-similarity subject to minimum similaritythreshold MinSim. The pairs of matching topic-wordassociations are called best-matchings.
Best-match topic threads constitute a more intu-itive de�nition of topic threads that match human's in-tuition about meaningful semantics. In case of very wellmatching topologies of both, we argue that index-basedtopic threads indeed constitute semantic meaningful-ness.
Year 2000 2001 2002 2003jDj 57 71 85 87Year 2004 2005 2006 2007jDj 115 121 133 194
Table 2: Number of documents per year of the SIGIRdata set
5 Experiments
Here, we describe details of the used data and the resultsof our experiments. Additionally, we demonstrate theability of our approach to �nd topic threads which havemeaningful semantics.
5.1 Data Sets. We use a real-world data set, whichcontains articles published at the ACM SIGIR confer-ences from 2000 to 2007, as they appear in the ACMDigital Library. Only titles and abstracts of the docu-ments are considered. Stemming and stopword removalwas applied to all documents. Posters are not includedbecause they often do not have abstracts. The amountof included documents per year is shown in Table 2.Further statistics of this data set are given in Figure 4.We refer to this data set as SIGIR hereafter.
We analyze the SIGIR data set at yearly timepoints.The sliding window spans the current point in time and,if it is of length l > 1, the previous l � 1 ones. Wordsconstituting the feature space at a certain timepoint arethose appearing in at least two documents covered bythe respective sliding window.
5.2 Model Comparison.
5.2.1 E�ect of Priors on PLSA. MAP-PLSA al-lows in contrast to ML-PLSA to specify priors, whichdi�er from a at Dirichlet. We analyze the impact of thesmoothing parameters s and r, which de�ne the priorsfor topic-mixture proportions and topic-word associa-tions respectively, (see Section 3.1). The in uence ofsmoothing is estimated for s; r 2 f0:01; 0:1; 1; 10; 100g.MAP-PLSA is trained with K = 10 on the total SI-GIR data without time stamps. Average perplexities(50 repetitions) on 20% hold-out data are shown in Fig-ure 5. Curves for r = f10; 100g are not shown, as theyhave much larger perplexities. Setting s = 0:1, r = 0:1yields signi�cantly lower hold-out perplexities than oth-ers. Using s = 0:01, r = 0:01 yields a nearly at Dirich-let prior, which corresponds to ML-PLSA. Thus, theresults show: i) that MAP-PLSA opens room for im-provement over ML-PLSA, and ii) that a careful studyto adapt hyperparameters is necessary in order to yieldgood performance.
865 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
●
●
● ●
●
●
●
●
Time
Siz
e of
Voc
abul
ary
2000 2002 2004 2006
500
1000
1500
2000
Length l=1
● complete per time point
used by model
common words from prev. model
●
●●
●
●
●
●
Time
Siz
e of
Voc
abul
ary
2000 2001 2002 2003 2004 2005 2006 2007
1000
1500
2000
2500
3000
Length l=2● complete per time point
used by model
common words from prev. model
Figure 4: SIGIR word statistics for di�erent window-lengths.
Smoothing Parameter s
Hol
d−O
ut P
erpl
exity
0.01 1 10 100
650
700
750
800
900
r=1r=0.1r=0.01
Figure 5: In uence of smoothing versus hold-out per-plexity (lower values are better). Smoothing-parameterss and r control the Dirichlet hyperparameters for the es-timates of P (zjd) and P (wjz), respectively.
5.2.2 Comparison of Adaptive and Indepen-
dent PLSA. In this experiment, we compare our adap-tive PLSA with independent PLSA trained at each slid-ing window independently. Technically speaking, thetwo methods di�er in the information used during theinitialization phase: our method folds-in documents andwords, while the independent PLSA performs a randominitialization. To omit side-e�ects of randomness, werestart the independent PLSA three times with di�er-ent random initializations. We allow each method toperform m iterations.
Figure 6 shows the hold-out perplexities for di�er-ent values of m 2 f1; 2; 4; 8; 16; 32; 64; 128g and di�erentlengths of the sliding window: l 2 f1; 2g. Each indi-vidual experiment is repeated 50 times and we presentaverages of the hold-out perplexities. The results showthat our approach needs less iterations to reach the min-
imal hold-our perplexity. More importantly, the bestreached hold-out perplexity for adaptive PLSA is muchbetter than for independent PLSA. This demonstratesthat the initialization by folding-in new documents andwords helps the model to better �t the data.
5.2.3 In uence of Natural Stream Order. Thethird experiment evaluates the e�ect of relying on thenatural order of the document stream. We use the bestparameter settings estimated in the previous hold-outexperiments. We evaluate how well the topic modelsmay predict documents from the next timepoint byestimating predictive perplexity on the new documents.As explained in Section 4.1, we follow the concept ofhalf-folding-in [19].
To assess the in uence of the natural stream or-der we compare average predictive perplexity (50 rep-etitions) on the natural stream order versus randomstream order. In particular, experiments for randomstreams use a di�erent order each time the experimentis repeated. In addition we compare adaptive PLSAwith independent PLSA. The results are shown in Fig-ure 7.
In case of l = 1, the natural order of the streamsigni�cantly helps to predict documents from the nexttimepoint. The adaptive process to evolve PLSA modelsfrom each other can exploit this latent information muchbetter than independent PLSA. In case of l = 2, theimpact of the natural order is more blurred. SinceSIGIR concepts do not change drastically from year toyear, it is quite expected that a larger window size willblur the di�erence between natural order and randomorder.
5.3 Topic Threads and Their Semantics. In thefollowing we experimentally analyze whether index-
866 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
●
●
●●
●
●● ●
Learning & Recalibration Steps m
Hol
d−O
ut P
erpl
exity
1 16 32 64 128
340
350
360
370
380
390
400
410
Length l=1
●
Independent PLSAAdaptive PLSA
●
●
●●
● ●● ●
Learning & Recalibration Steps m
Hol
d−O
ut P
erpl
exity
1 16 32 64 128
440
460
480
500
520
Length l=2
●
Independent PLSAAdaptive PLSA
Figure 6: Impact of number of recalibration-steps on the achieved hold-out perplexity (lower values are better).
Ave
rage
Pre
dict
ion
Per
plex
ity25
031
037
0
Natural OrderRandom Order
l=1 l=2 l=1 l=2Adaptive PLSA Independent PLSA
**
Figure 7: Impact of natural stream order on predictingnew documents from next timepoint using di�erent sizesof sliding window. The parameter l is the length of thesliding window used for training models. The star notessigni�cant di�erences using t-test with signi�cance level0:05.
based topic threads show semantic meaningfulness andpresent a concrete example of index-based topic threadsdetermined by TopicMonitor using the SIGIR data.
5.3.1 Comparing Topic Threads. Following ourevaluation framework we compute index-based andbest-match topic threads on a sequence of PLSA mod-els obtained by TopicMonitor with 10 topics. Multi-nomial distributions (topic-word associations) at di�er-ent timepoints are made comparable by �lling in zero-probabilities for the di�ering words. This is necessary,since we allow di�ering vocabularies over time. We usecosine similarity, which in contrast to KL-divergence[11] can cope with zero-probabilities. Given that se-quence of PLSA models, we compute the percentage of
best-matchings which connect two topic-word associa-tions having the same index. A high value indicatesthat both heuristics generate threads which have simi-lar local topological structure. This means that index-based topic threads are similar to those that are con-structed following human intuition and thus are seman-tically meaningful.
The results are shown in Figure 8 for di�erentlengths of the sliding window l 2 f1; 2g. The percentageis never below 94:5% (l = 1) even if all best-matchingsare included (MinSim = 0). This demonstrates thatmatching-based threads, similar to those used in [11],can be approximated by index-based threads, whichare build by concatenating the topics with the sameindex. This demonstrates further that our initialization-based propagation of word statistics keeps the sequenceof models consistent, meaning index-based threads aresimilar in structure to dynamic topics found by DTMs[4].
5.3.2 Semantics of Selected Topic Threads. InTable 3 we show an illustrative example of index-basedtopic threads determined using SIGIR data. In order toproduce a small example �tting paper size, we chosethe number of topics K = 5 and the length of thesliding window l = 1. Each thread consists of asequence of topic-word associations over time havingthe same index. The top 30 words with largest topic-word association are printed for each timepoint and foreach topic-index. As seen so far, adaptive learning ofthe sequence of PLSA models enables for semanticallymeaningful threads. If PLSA was applied to each slidingwindow anew, this continuity would not haven beenachieved.
Even without a deep inspection of the SIGIR con-ference subjects, we �nd (in the SIGIR context) mean-
867 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
threshold MinSim
% m
atch
ing
pairs
with
sam
e in
dex
0 0.1 0.3 0.5 0.7
9596
9798
9910
0
● ● ● ● ●●
●
●
●
threshold MinSim
# m
atch
ing
pairs
0 0.1 0.3 0.5 0.7
2030
4050
6070
● for l=1for l=2
Figure 8: Percentage of best-matchings with same index (left). Total number of matching pairs. MinSim is thelower bound of similarity between consecutive topics of a thread (right).
ingful index-based topic threads:
Thread 1: main theme is \Evaluation"; the early sub-area \Multilingual IR" disappears later
Thread 2: \Presentation"; later with emphasis on\Multimedia"
Thread 3: \Supervised Machine Learning"; originally\Classi�cation"; later more elaborate aspects suchas \Feature Selection" and \SVM"s
Thread 4: \Web"; originally from viewpoint \Infor-mation Extraction" and \Link traversal"; later\Web Search" and \User Queries"
Thread 5: \Document Clustering"
We see some strongly evolving index-based topicthreads while others are remarkably stable. For exam-ple, Thread 5 is clearly on document clustering. Thread3 is rather on supervised machine learning (a multi-term that does not appear among the words), origi-nally focusing on classi�cation and later elaborating onspecialized methods like support vector machines. An-other concept of Thread 3 is feature selection. Theimportance of concepts changes in Thread 2, too. Itis about presentation, visualization and frameworks.Later, much words are about multimedia (early years:audio) for which presentation forms are indeed expectedto be important.
We study Thread 4 a bit closer in Figure 9. Thisindex-based topic thread is about Web, originally stud-ied in the context of information extraction and linktraversal. Later, web search and user queries becomedominant.
Further experiments investigating longer and morediverse streams are planned for future work.
1
1
1
1
1
11 1
2
2
2
22
22
2
3
33
3
3
3 3
3
4 4
4
4
4
4 4
4
Time
Top
ic−
Wor
d A
ssoc
iatio
n P
(w|z
)
2000 2003 2006
1234
linkextractusersearch
Figure 9: Evolution of topic-word associations over timefor the Web index-based topic thread (4).
6 Conclusions
We study topic monitoring over a stream of documentsin which both: topics and vocabulary may change. Wewaive the assumption that the complete vocabulary isknown in advance. Our approach allows to study faststreams and alleviates the negative impact of large vo-cabularies with many words of only temporary impor-tance.
TopicMonitor discovers and traces topics by usinga new, adaptive variant of the well-known PLSA. Ouradaptive PLSA derives the model at a certain timepointby adapting the model found at the previous point intime to new documents and unseen words.
We evaluate TopicMonitor on a document-streamof ACM SIGIR conference papers (2000{2007). Ourevaluation concerns the predictive performance of ourapproach to adaptively learn PLSA models in compar-ison to a baseline. Additionally, the semantic compre-
868 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
Table3:
Index-based
topicthreadsfortheSIG
IRfrom
2000-2007.
Show
naretop-30wordsper
thread
andtimepoint.
K2000
2001
2002
2003
2004
2005
2006
2007
1evalu
search
queri
document
relev
approach
result
method
effect
retriev
predict
base
experi
ir
user
measur
filter
compartestcollect
improv
number
recal
demonstr
text
translat
vi-
sualsentencpaper
experim
ent
queri
document
relev
evalu
search
retriev
languag
result
translat
summarieffecten-
gin
term
techniqu
approach
system
summar
method
rank
inform
highli
generpaperselect
betterproposuser
feedback
sentenc
probabl
queri
retriev
search
term
docu-
ment
result
relev
techniqu
inform
translat
effect
system
perform
improv
approach
show
arab
collect
irevalu
paperbase
presentstatisttest
trec
word
stem
resourcautomat
retriev
queri
sys-
tem
effect
evalu
searchcollectword
inform
techniqure-
sult
perform
relev
xmlautomatterm
document
expans
interact
translat
test
match
rank
languag
differ
show
demonstr
present
improv
gener
retriev
querirelev
document
collect
inform
system
feedback
evalu
term
translatrank
topic
perform
im-
prov
measur
xml
result
show
effect
test
search
word
blindexperiphrase
precis
clirmethod
length
queriretriev
term
inform
feedback
perform
system
document
evalu
relev
translat
result
approach
test
measur
effect
user
collect
paper
precis
trec
show
improv
word
av-
eragexpansstatist
experi
maxim
um
crosslanguag
queri
retriev
document
relev
result
evalu
term
feedback
improv
precis
collectrank
perform
method
system
measur
user
set
effect
estim
show
av-
erag
test
statist
expans
demonstr
judgment
element
techniqumean
queriretriev
relev
system
rank
evalu
perform
collect
measur
effect
document
term
method
result
testprecis
improv
techniquuserirset
compar
judgment
topic
approach
feedback
differ
proposselecttwo
2inform
userretriev
model
document
system
base
ap-
proach
process
studidiffercognit
structurtopic
need
develop
search
presentir
typeon
techniqu
session
poster
contribut
represent
propos
generbuildintegr
model
retriev
in-
form
userlanguag
system
document
estim
feedback
problem
proba-
biliststudipresent
concept
approach
experim
ent
search
process
provid
markov
queripro-
duc
framework
word
method
per-
form
managbrows
collectim
plement
modelinform
lan-
guag
retriev
user
document
queri
collect
method
paramet
perform
present
propos
baseestim
smooth
music
paper
tra-
dit
precis
differ
interfac
categori
framework
ac-
cess
integr
data
probabl
cooccurr
describ
model
inform
re-
triev
imag
user
languag
video
perform
annot
show
type
docu-
ment
probabilist
probabl
describ
base
task
gener
visual
index
pro-
pos
queridevelop
tool
paper
experi
distribut
music
estim
predict
model
retriev
in-
form
languagim
ag
approach
docu-
ment
framework
distribut
propos
video
perform
show
irprocess
gener
paper
term
present
result
paramet
set
con-
text
belief
empir
us
provid
level
annotautomat
modelretrievim
ag
inform
languag
approach
annot
generproposshow
function
irterm
perform
proba-
bilist
estim
exist
present
framework
paper
concept
set
deriv
result
rela-
tionship
document
experirelev
novel
depend
model
retriev
in-
form
document
languag
approach
probabl
base
space
framework
probabilist
topic
translatstructurir
measur
knowledg
show
sampl
oper
paperbetterrelev
smoothlogic
result
onproposcontextu
relat
model
retriev
inform
languag
base
term
imag
propos
approach
show
contextgener
depend
paper
expert
document
weight
perform
estim
data
import
collect
semant
word
algorithm
present
probabl
improv
frequenc
match
3perform
queri
select
classifi
method
classif
textvectorsystem
databas
result
train
data
retriev
model
hierarch
distribut
improv
effect
machin
boolean
sim
ilar
collect
good
cate-
gor
categori
find
accuraci
support
central
textperform
clas-
sif
sim
ilar
improv
stori
approach
combin
featurdis-
tribut
data
index
classifi
categor
consist
structur
xmlstrategiim
ag
new
train
differ
support
three
machin
base
cor-
pora
result
learn
svm
text
index
classifi
result
classif
data
corpu
approach
tim
einvert
learn
categor
process
method
papersize
show
algorithm
efficiexperiapplic
file
evalu
collect
space
perform
structur
frequenc
filtertrain
textclassif
classifi
method
featur
re-
sult
structurcate-
gorterm
approach
extract
learn
pa-
per
algorithm
ex-
peri
machin
vec-
tor
improv
effici
show
content
ac-
curacieffectgener
categorisourcseg-
ment
train
imag
optim
featur
classif
text
method
learn
al-
gorithm
perform
achiev
automat
train
name
im-
prov
entiticlassifi
machin
accuraci
result
structur
problem
paper
detect
collect
semant
techniqu
effectproposshow
pattern
gener
categor
method
text
clas-
sif
featur
detect
extract
base
au-
tomat
show
data
summar
document
titl
new
summari
techniqu
collect
classifi
problem
articlim
provtrain
match
pattern
question
propos
inform
paperrelat
categor
text
question
document
featur
classif
approach
method
task
sen-
tenc
summar
answerpaperlearn
system
classifi
train
base
compar
problem
relat
in-
form
propos
set
three
hierarch
new
present
novel
browssvm
featur
learn
text
classif
answer
approach
paper
task
classifi
base
method
rank
train
sentenc
problem
question
propos
detectinform
seg-
ment
svm
label
new
result
algo-
rithm
event
data
perform
opinion
differ
4web
qualiti
page
system
document
techniqu
question
retriev
answer
paper
task
high
gener
link
propos
automat
profil
present
user
learn
avail
collect
in-
vestig
statist
larg
manual
metric
dictionari
show
problem
webpagetopic
an-
swerlink
automat
text
question
task
gener
system
algorithm
data
show
find
inform
analysi
method
differperform
two
identifi
extract
paper
larg
queri
import
content
experiapproach
web
extract
data
system
item
text
analysisearchpage
separ
set
collect
question
rank
per-
form
tool
redund
problem
merg
engin
find
design
recommend
cluster
form
answergener
techniqu
specif
differ
irweb
research
search
system
question
answer
inform
data
task
field
databas
link
perform
commun
area
import
user
engin
evalu
find
algorithm
page
combin
present
goalworkstructur
needspecif
web
search
page
system
user
task
document
answer
evalu
result
person
inform
present
engin
perform
analysi
techniqu
provid
effect
base
sim
ilar
commun
structurir
databas
linkcurrentapplic
tim
edata
search
web
user
page
engin
result
algorithm
inform
collect
rank
data
link
system
paper
relationship
studi
base
task
analysi
relev
answerlocat
question
structur
document
interfac
two
order
provid
effect
search
web
user
inform
page
re-
sult
network
rank
algorithm
struc-
tur
task
engin
link
socialsystem
generfind
interest
behavior
queri
pagerank
present
differ
provid
on
paper
problem
commun
content
interact
search
web
user
queriinform
page
engin
result
relev
log
present
studi
develop
system
rank
link
find
provid
task
con-
tentpersonsuggest
gener
support
be-
havior
sim
ilar
ad
sourcsnippetfile
5document
clus-
ter
word
method
retriev
algorithm
inform
perform
system
improv
featur
learn
filter
event
set
text
re-
sult
combin
precis
relev
track
paper
base
threshold
sim
ilar
probabilist
topic
model
term
collect
document
method
base
threshold
in-
dex
segmentscore
term
distribut
algorithm
retriev
topic
relev
describ
investig
detect
optim
inform
perform
cluster
weight
metasearch
automat
paper
model
represent
schemetask
effect
present
document
cluster
topic
algorithm
set
method
score
evalu
model
sim
-
ilar
weight
result
averagprecis
class
differ
comput
gener
compar
inform
function
select
base
featur
sentenc
obtain
initiim
prov
track
detect
document
topic
cluster
method
basealgorithm
set
approach
sim
ilar
propos
semant
object
find
con-
cept
result
estim
sentencrelevscore
select
show
fil-
ter
measur
term
qualiti
improv
experim
entresourc
evalu
databas
document
cluster
filter
method
al-
gorithm
userdata
semantindexdiffer
collabor
propos
paper
lsi
effect
result
base
sim
i-
lar
two
structur
weight
rate
topic
inform
combin
problem
perform
factor
approach
normal
method
document
topic
algorithm
optim
semant
sim
ilardata
index
result
approach
problem
latent
base
inform
tim
e
propos
util
set
differ
filter
event
user
effici
colla-
bor
show
comput
papertracktwo
cluster
document
data
method
sim
ilar
semant
propos
algorithm
index
rate
show
filter
tim
ebase
result
usercombin
topic
space
graph
analysitwo
object
techniqu
item
latent
text
larg
paperapproach
document
index
cluster
method
algorithm
result
proposoptim
per-
form
problem
tim
e
filterefficisim
ilar
data
show
combin
multipl
number
inform
topic
spam
paper
detect
base
textgenercomput
cachexperi
869 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
hensiveness of the topics being monitored is analyzed.Since no ground truth for such studies exists, nor abaseline for topic monitoring with a vocabulary thatis not known in advance, we introduce a new evalua-tion framework. The evaluation is based on the con-cepts of perplexity and of topic threads. The formercaptures the extent to which a model is able to gen-eralize to unseen data. The latter re ects whether se-mantically related topics found at di�erent timepointsare observed as part of the same \thread". Our experi-ments show that adaptation of models by TopicMonitorleads to better results than construction of models fromscratch at each timepoint with respect to prediction per-formance. In addition, the experiments indicate thatindex-based topic threads monitored by TopicMonitor
are indeed semantically meaningful.Our TopicMonitor is appropriate for fast streams,
since it does not require to pause the stream in orderto recompute the feature space before building models.We intend to study the speed and space demandsof our approach and devise methods that allow topicmonitoring on very fast and large streams, e.g. onlineblogs. Further research issues are the identi�cationof recurring topics and the detection of groups ofcorrelated topics that occur at distant moments acrosstime.
Acknowledgments
We thank Jan Grau for valuable discussions and com-ments on the manuscript.
References
[1] Charu C. Aggarwal and Philip S. Yu. A framework forclustering massive text and categorical data streams.In SDM, 2006.
[2] L. AlSumait, D. Barbara, and C. Domeniconi. On-lineLDA: adaptive topic models for mining text streamswith applications to topic detection and tracking. InICDM, 2008.
[3] A. Ascuncion, P. Smyth, and M. Welling. Asyn-chronous distributed learning of topic models. InNIPS, 2008.
[4] D. Blei and J. La�erty. Dynamic topic models. InICML, 2006.
[5] David M. Blei, Andrew Y. Ng, and Michael I. Jordan.Latent Dirichlet Allocation. J. Mach. Learn. Res.,3:993{1022, 2003.
[6] J.-T. Chien and M.-S. Wu. Adaptive bayesian la-tent semantic analysis. IEEE Transactions on Audio,
Speech, and Language Processing, 16(1):198{207, Jan.2008.
[7] Tzu-Chuan Chou and Meng Chang Chen. Usingincremental PLSI for threshold-resilient online event
analysis. IEEE Trans. on Knowl. and Data Eng.,20(3):289{299, 2008.
[8] T.L. Gri�ths and M. Steyvers. Finding scienti�ctopics. PNAS, 101(suppl 1):5228{5235, 2004.
[9] T. Hofmann. Unsupervised learning by probabilis-tic latent semantic analysis. Machine Learning,42(1):177{196, 2001.
[10] Geo�rey J. McLachlan and Thiriyambakam Krishnan.EM Algorithm and Extensions. Wiley, 1997.
[11] Qiaozhu Mei and ChengXiang Zhai. Discovering evo-lutionary theme patterns from text: an exploration oftemporal text mining. In SIGKDD, pages 198{207,New York, NY, USA, 2005. ACM.
[12] Satoshi Morinaga and Kenji Yamanishi. Trackingdynamics of topic trends using a �nite mixture model.In SIGKDD, pages 811{816. ACM, 2004.
[13] Rene Schult. Comparing clustering algorithms andtheir in uence on the evolution of labeled clusters. InDEXA, pages 650{659, 2007.
[14] Rene Schult and Myra Spiliopoulou. Discoveringemerging topics in unlabelled text collections. In AD-
BIS, pages 353{366, 2006.[15] Rene Schult and Myra Spiliopoulou. Expanding the
taxonomies of bibliographic archives with persistentlong-term themes. In SAC, pages 627{634, 2006.
[16] Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, andMing-Ting Sun. Modeling and predicting personal in-formation dissemination behavior. In SIGKDD, pages479{488. ACM, 2005.
[17] Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis,and Rene Schult. Monic: modeling and monitoringcluster transitions. In SIGKDD, pages 706{711. ACM,2006.
[18] Xuerui Wang and Andrew McCallum. Topics overtime: a non-markov continuous-time model of topicaltrends. In SIGKDD, pages 424{433. ACM, 2006.
[19] Max Welling, Chaitanya Chemudugunta, and NathanSutter. Deterministic latent variable models and theirpitfalls. In SDM, 2008.
[20] Max Welling, Y.W. Teh, and B. Kappen. Hybridvariational-MCMC inference in bayesian networks. InUAI 2008, 2008.
870 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.