T opic Evolution in a Stream of Documentsusers.informatik.uni-halle.de/~hinnebur/PS_Files/sdm09_APLSA.pdf · e RenSc hult and Myra Spiliopoulou Otto-von-Guericke-University Magdeburg,

Topic Evolution in a Stream of Documents

Andr�e Gohr

Leibniz Institute of Plant

Biochemistry, IPB, Germany

[email protected]

Alexander Hinneburg

Martin-Luther-University

Halle-Wittenberg, Germany

[email protected]

Ren�e Schult and Myra Spiliopoulou

Otto-von-Guericke-University Magdeburg, Germany

fschult,[email protected]

Abstract

Document collections evolve over time, new topics emergeand old ones decline. At the same time, the terminologyevolves as well. Much literature is devoted to topic evolutionin �nite document sequences assuming a �xed vocabulary. Inthis study, we propose \Topic Monitor" for the monitoringand understanding of topic and vocabulary evolution overan in�nite document sequence, i.e. a stream. We useProbabilistic Latent Semantic Analysis (PLSA) for topicmodeling and propose new folding-in techniques for topicadaptation under an evolving vocabulary. We extract aseries of models, on which we detect index-based topic threadsas human-interpretable descriptions of topic evolution.

1 Introduction

Scholars, journalists and practitioners of many disci-plines use the Web for regular acquisition of informa-tion on topics of interest. Although powerful tools haveemerged to assist in this task, they do not deal withthe volatile nature of the Web. The topics of interestmay change over time as well as the terminology asso-ciated with each topic. This yields text mining modelsobsolete and pro�le-based �lters ine�ective. We addressthis problem by a method that monitors the evolutionof topic-word associations over a document stream. Our\TopicMonitor" discovers topics, adapts them as docu-ments arrive and detects topic threads, while consideringthe evolution of terminology.

Most of the studies on topic evolution can becategorized into methods for (1) �nite sequences and(2) for streams. Methods of the �rst category observethe arriving documents as a �nite sequence and infermodel parameters under the closed world assumption.Essentially, the document collection is assumed to beknown. The vocabulary over all documents inducesa �xed feature space. The second category makesthe open world assumption, thus allowing that newdocuments arrive, new words emerge and old ones areforgotten. This implies a changing feature space fordocuments over time.

Stream-based approaches are intuitively closer tothe volatile nature of the Web. However, evolutionarytopic detection methods, including recent ones basedon Probabilistic Latent Semantic Analysis and LatentDirichlet Allocation [11, 4, 18] all assume that the doc-uments form a �nite sequence. To deal with a documentstream, they take a retrospective view of the world: ateach new timepoint, all information seen so far is usedto re-build an extended model. This perspective hastwo disadvantages. First, it only works if the documentstream is slow such that retrospection of the completepast is feasible. Second, the feature space grows withtime, thus giving raise to the curse of dimensionalityproblem: the number of data points needed for a re-liable estimate (of any property) grows exponentiallywith the number of features. Since a stream cannot beslow and fast at the same time, evolutionary topic mon-itoring requires adaptation to both changes in the datadistribution and in the vocabulary/feature space.

TopicMonitor deals with these problems by adapt-ing the feature space and the underlying model gradu-ally. We invoke Probabilistic Latent Semantic Analysis(PLSA) in a window that slides across the stream: asthe window slides from one timepoint to the next, wedelete outdated words and documents, incorporate newdocuments into the previous model and then adapt theold model to new words, deriving the current model ofeach timepoint.

Gradual model adaptation brings an inherent ad-vantage over re-learning at each discrete timepoint: notonly the model as a whole is adapted, rather each dis-crete topic built at timepoint ti transforms naturally toits followup topic at ti+1. Hence, the adaptation pro-cess leads to a separation of topics into distinct index-

based topic threads over time. These threads provide acomprehensive summary of topic and terminology evo-lution. In Section 3.2 we describe how we extend con-

859 Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

ventional PLSA for dynamic data, explain our incre-mental model adaptation over a document stream andshow how index-based topic threads are built.

The evaluation of our approach is not trivial, be-cause it involves comparing methods that learn on dif-ferent feature spaces. In Section 4 we describe our eval-uation framework and describe the reference method forour experiments, which we present in Section 5.

2 Related Work

Many studies on topic evolution derive topics by doc-ument clustering. Most of them consider a �xed fea-ture space over the document stream: Morinaga andYamanishi [12] use Expectation-Maximization to buildsoft clusters. A topic consists of the words with thelargest information gain for a cluster. Aggarwal and Yu[1] trace droplets over document clusters. A droplet con-sists of two vectors that accommodate words associatedwith the cluster. The cluster evolution monitor MONIC[17] and its variants [15, 14, 13] treat topic evolution asa special case of cluster evolution over a feature spacethat may change, too.

Topic models like PLSA [9] and Latent DirichletAllocation (LDA) [5] characterize a topic as multino-mial distribution over words rather than as cluster ofdocuments. This is closer to the intuitive notion of\topic" in a collection. Most of evolutionary topic mod-els based on PLSA or LDA [11, 4, 18] make the closedworld assumption: they observe the arriving documentsas a �nite sequence with a �xed vocabulary. For exam-ple, Mei and Zhai [11] assume the complete vocabularyto be known and a static background model over alltimepoints. The background model propagates statis-tics about non-speci�c word usage in time (forwards andbackwards).

The retrospective approach of Mei and Zhai [11]builds a PLSA model at each timepoint in addition tothe background model that is �xed over all timepoints.Gri�ths and Steyvers [8] build a single LDA modelbased on collapsed Gibbs sampling to �nd scienti�ctopics. They assign temporal properties to topics, suchas becoming \cold" or \hot". Incremental LDA [16]updates the parameters of the LDA model as newdocuments arrive, but again assumes a �xed vocabularyand does neither forget the in uence of past documentsnor of outdated words. The distributed asynchronousversion of LDA [3] allows to incrementally grow an LDAmodel. However, it still assumes a static vocabulary.

AlSumait et al. [2] propose Online LDA. They ex-tend the Gibbs sampling approach suggested by Grif-�ths and Steyvers to handle streams of documents.Gibbs sampling at one timepoint is used to derive thehyperparameters of the topic-word associations at the

next timepoint, so that successive LDA models are cou-pled. New words are collected as they are seen, so Al-Sumait et al. do not assume that the whole vocabularyis known in advance. However, the vocabulary growsover time so that the curse of dimensionality problemstill occurs.

The topics over time model (TOT) [18] extendsLDA to generate timestamps of documents. In this way,a topic is discovered in exactly those timepoints in whichthe vocabulary on it is homogeneous. In other words,the model associates a timepoint to some static topic.

Dynamic topic model (DTM) [4] uses a basic LDAmodel at each timepoint. Hyperparameters of theseLDA models are propagated forward and backward intime via a state space model. A \dynamic topic"is a sequence of vectors over time. Each vector isa multinomial distribution over words of the �xedvocabulary.

TopicMonitor derives the model for the currenttimepoint by adapting the model of the previous time-point to both new documents and words while forget-ting old documents and obsolete words. Closest to ourwork is the recently published incremental probabilisticlatent semantic indexing method of Chou and Chen [7].We additionally address the issue of over�tting by in-vestigating the maximum a posteriori estimation of theunderlying PLSA model.

Di�erences Between PLSA and LDA. SincePLSA and LDA are frequently juxtaposed in the topicmodeling literature, we provide a brief discussion oftheir di�erences here. The PLSA approach proposedby Hofmann [9] does not use prior distributions for themodel-parameters. The LDA approach proposed byBlei et al. [5] uses Dirichlet priors for model-parameters:the hyperparameters are learned by Empirical Bayesfrom data. However, many LDA-oriented approaches[8, 3, 19, 20] on collapsed Gibbs sampling always assume�xed hyperparameters. These variants of LDA comequite close to the maximum a posteriori (MAP)-PLSAmodel, as we used it here. Beside inference, the onlynotable di�erence between the two approaches is thatMAP-PLSA still uses empirically estimated document-probabilities1, while LDA never uses this quantity.In our approach document-probabilities are needed toaccount for new words, as we describe in the nextsection.

3 Topic Discovery With an Adaptive Method

The core of TopicMonitor is an adaptive process deriv-ing a sequence of PLSA models over time. Each PLSAmodel is adapted from the previous one to comprise

1Document-probabilities P (d) are denoted by �d later on.


Table 1: Summary of Notation

jXj number of elements in set Xl size of sliding window in timepointsD;W set of documents, words (vocabulary)d, w, z denote documents, words,

hidden topicsk, i run over topics and timepointsnd;w number of occurrences of (d;w)� = (��;!!!;��) PLSA modelK number of hidden topics of �

�� 2 RjDj�K document-speci�c topic-mixtureproportions

!!! 2 RK�jW j topic-word associations�� 2 RjDj empirical document-probabilities

�� 2 RjW j empirical word-probabilitiesm EM-iterations for recalibrations; r smoothing parameters controlling

prior distributions of topic-mixtureproportions (s) and topic-wordassociations (r), respectively

smooth evolution of topics over time. We use a Maxi-mum A Posteriori (MAP) estimator of the PLSA model2

to guard against over�tting and we propose new folding-in techniques for adaptation of PLSA models. We �rstpresent how we apply the MAP-principle to estimatea PLSA model and introduce our new folding-in tech-niques upon it. In Subsection 3.3 we describe how ouradaptive PLSA uses MAP and performs folding-in tolearn a new model gradually from a previous one. Thenotation is summarized in Table 1.

3.1 MAP-PLSA and Folding-In of Documents.

Conventional PLSA [9] models co-occurrences (pairs ofdocuments and words)3 by a mixture model with Kcomponents (topics). Modeled documents are denotedby D, modeled words by W . The number of times a(document,word) pair occurs is denoted by nd;w.

Assuming K hidden topics zk (1 � k � K),the probability of observing document d and word wdecomposes into P (d;w) = P (d)

PK

k=1 P (wjzk)P (zkjd).Hence, a PLSA model � is parametrized by a tuple(��;!!!;��) containing:

1. 8 d 2 D : document-speci�c topic-mixture propor-tions �� = [��d] = [�d;k = P (zkjd)],

each K-dimensional row ��d ful�lls:PK

k=1 �d;k = 1

2A conceptually similar estimator is used in [6].3Co-occurrence means that a particular document-ID appears

together with a ID of a speci�c word type stating a word is seenin a document.

2. 8w 2 W : topic-word associations !!! = [!!!k] =[!k;w = P (wjzk)],each row !!!k ful�lls:

Pw2W !k;w = 1

3. 8 d 2 D : document-probabilities �� = [�d = P (d)]

Each row !!!k is a multinomial distribution over allmodeled words and describes the kth topic.

The log-likelihood of the data D given thePLSA model � is

Pd2D

Pw2W nd;w � logP (d;w) with

P (d;w) = �dPK

k=1 !k;w �d;k.The parameters are usually estimated by maximiz-

ing the likelihood (cf. [9]). Document-probabilities �� areestimated by

(3.1) �d =Xw2W

nd;w=Xd02D

Xw2W

nd0;w

The EM-algorithm is used to estimate the other param-eters ��;!!!.

MAP estimation of ��;!!! determines those parame-ters that maximize the log-a-posteriori probability. Thefollowing log-priors for topic-mixture proportions andtopic-word associations are used:

(3.2) logP (�) =Xd2D

logDir(��djs) +KXk=1

logDir(!!!kjr)

The hyperparameters of the Dirichlet distributions (Dir)are determined by s > �1 and r > �1 as follows:[s + 1]k=1;:::;K and [r + 1]w2W . The parameters s andr are called smoothing parameter for the estimates ofP (zjd) and P (wjz). Setting s = 0, r = 0 makes MAPestimates equal to ML estimates because the Dirichletpriors become uniform. Note that the priors remain�xed throughout EM-learning (in contrast to LDA [5]which learns the hyperparameters itself). The EM-algorithm [10] leads to these update-equations duringthe ht+ 1ith iteration:

E-step:

kd;w = P (zkjw; d; �hti) =

!htik;w �

htid;kPK

k0=1 !htik0;w �

htid;k0

(3.3)

M-step:

�ht+1id;k =

s+P

w2W nd;w kd;wK � s+

Pw2W nd;w

(3.4)

!ht+1ik;w =

r +P

d2D nd;w kd;wjW j � r +

Pd2D nd;w0 kd;w0

(3.5)

If nd;w or kd;w become very small, the EM update-equations (3.4) and (3.5) will be problematic since themaximum of the posterior density might get unde�ned.


Figure 1: Document stream inducing partial orderingon documents

This might result in negative parameter estimates [6].To avoid that problem, we restrict the smoothing pa-rameters to s � 0 and r � 0.

The folding-in procedure by Hofmann [9] allowsto incorporate new documents into an already trainedmodel. MAP-folding-in of a new document d0 into anexisting model � estimates the topic-mixture propor-tions ��d0 : the EM-algorithm continues to estimate ��d0having �xed the topic-word associations !!! (underlinedin Eq. 3.3 and 3.5). The extended model comprisesthe concatenation of ��d0 row-wise to the already de-rived ones: [�� d0 ]. The document-probabilities are re-estimated for all modeled documents using Eq. 3.1. It isrequired that d0 contains some words modeled by � sinced0 is reduced to those words for folding-in. To graduallyadapt a PLSA model according to a document-stream,we use MAP-folding-in of new arriving documents intoan existing model, as explained in Section 3.2.

3.2 Adaptive PLSA for Topic Monitoring {

Overview. Streaming documents d1; d2; : : : arrive at(possibly irregularly spaced) timepoints t1; t2; : : :. Weallow that several documents arrive at the same time.Hence, time induces a partial ordering on documents.Figure 1 depicts a possible stream of documents.

Topic-detection at timepoint ti implies building aPLSA model �i upon all documents that arrive in theinterval (ti � l; ti]. The parameter l a�ects the size ofthe interval and allows for taking some documents fromthe near past into account. The result is a sequence ofPLSA models �1; �2; : : :, one at each point in time.

Almost all topic monitoring methods so far as ex-plained in Section 2 exploit the old model � at timepointti�1 to build � at ti assuming an unchanged vocabularyfrom one timepoint to the next. We judge this assump-tion questionable for real stream data because futurevocabulary is unknown at each timepoint. If the vo-cabulary does change, the baseline approach will be tobuild models �i independently of each other taking thevocabulary of (ti�l; ti] into account. To uncover threadsof topics over time, one would measure similarity be-tween each detected topic at successive timepoint and\connect" those being most similar according to somesimilarity function. For PLSA-based topic monitoring,

Figure 2: MAP-PLSA-model allowing to fold-in newdocuments (left) and inverted MAP-PLSA-model allow-ing to fold-in new words (right). Bayesian calculus isused to transform between both forms.

we call this baseline \independent PLSA".In contrast thereto, TopicMonitor builds the new

model � by adapting the one of the previous timepoint� to new words and new documents { it evolves the oldmodel to the new one. Because successive PLSA modelshave been adaptively learned, the kth analyzed topicat ti evolves from the previously analyzed kth topic atti�1. Hence, we formally de�ne for each k an index-

based topic thread that consists of the sequence of word-topic associations !1k; !

2k; : : : at timepoints t1; t2; : : :.

Each index-based topic thread describes the evolutionof hidden topics zk over time.

Essential to our approach are the two equivalentforms of an PLSA model which assume the followingdi�erent decompositions:

P (d;w) = P (d)KXk=1

P (wjzk)P (zkjd)(3.6)

= P (w)KXk=1

P (djzk)P (zkjw)(3.7)

Plate models of these two forms are given in Figure 2.If a PLSA model was given in its document-based form(Eq. 3.6), new documents would conveniently be folded-in into that model. In the same way, new words wouldconveniently be folded-in into an existing model if themodel was given in its word-based form (Eq. 3.7).

Since not only the documents but also the vocab-ulary changes over time, we want the adaptive processto learn model � at timepoint ti from model � at time-point ti�1 by adapting to both: new documents and newwords. This includes to incorporate new and to \forget"out-dated words. Forgetting out-dated words preventsTopicMonitor from accumulating all words ever seenand hence TopicMonitor does not unnecessarily blowup the feature space over time.

Roughly, the adaptive process that evolves themodel � at time ti from the model � at time ti�1 worksas follows.

1. remove documents modeled by � but not coveredby (ti � l; ti]


Figure 3: Overview of the steps necessary to adapt a model � at timepoint ti�1 to �� at timepoint ti+1.

2. fold-in new documents covered by (ti � l; ti] into �using its document-based form

3. use Bayesian calculus to invert � into its word-basedform

4. remove words modeled by � not seen in (ti � l; ti]

5. fold-in word �rstly seen in (ti � l; ti]

6. use Bayesian calculus to invert � again into itsdocument-based form

7. recalibration gives the next PLSA model �

Figure 3 graphically describes the adaptive process ofTopicMonitor . Mathematical details of that processare given in Section 3.3.

3.3 Adaptive PLSA for Topic Monitoring {

Mathematics. For ease of presentation we denote by �the PLSA model at timepoint ti which should be evolvedfrom model � at timepoint ti�1. In addition, W denotesall words seen in (ti � l; ti] and W all words seen in(ti�1 � l; ti�1]. In the same way D and D are used.

Recall that a model � = (��;!!!;��) in its document-based form consists of three parameter sets: document-speci�c topic-mixture proportions ��, topic-word associ-ations !!! and document-probabilities ��. The �rst twoparameter sets can be seen as matrices of dimensionsjDj �K for �� and K � jW j for !!!. Folding-in new doc-uments d 2 D n D translates to adding rows to the�rst matrix as explained in Section 3.1. Next, topic-mixture proportions of documents which are not cov-ered anymore by the current window at timepoint ti arenot needed and deleted. The current document-speci�c

topic-mixture proportions are given by: e�� = [��d ��d0 ]

with d 2 D \ D and d0 2 D n D. The new document-

probabilities e�� = [e�d] with d 2 D are re-estimated asshown in Equation 3.1 but using the old vocabulary Wonly.

New documents may introduce new words not con-tained in the current vocabulary W . The topic-wordassociations !!! of model � describe only words w 2W .

To handle new words w 2 �W n W , they willbe fold-in into the current PLSA model. For thatpurpose that model is inverted into its word-basedform by Bayesian calculus. Note that the elements

of e�� and !!! estimate conditional probabilities P (zkjd)and P (wjzk), respectively, which are inverted by the

following formulae: e�0d;k = P (djzk) = P (zkjd)P (d)=P (zk)and !0

w;k = P (zkjw) = P (wjzk)P (zk)=P (w). The wordprobabilities for w 2 W are derived by marginalization� 0w = P (w) =

Pd2D

PK

k=1 P (d)P (zkjd)P (wjzk) =Pd2D

PK

k=1e�d � e�d;k �!w;k. The inversion of the current

model into its word-based form is derived by:

e�0k;d = e�d;k � e�dPd02D

e�d0;k � e�d0 and

!0w;k =

!k;wP

d02De�d0;k � e�d0

� 0w

with d 2 D and w 2 W . This inversion gives

� 0 = (e��0;!!!0; �� 0). Note that documents and words havechanged their roles. Thus, !!!0 denotes word-speci�c

mixture proportions and e��0 denotes document-topicassociations. Figure 2 schematically depicts the models.

The word-based form � 0 allows folding-in words in asimilar way the document-based form allows folding-in


documents. New words w 2W nW are folded-in into � 0

using their occurrences in documents of D. In addition,mixture proportions of words which do not appear inW are deleted.

The resulting word-speci�c topic-mixture propor-tions are given by: e!!!0

= [!!!0w !!!0

w0 ] with w 2 W \ Wand w0 2 W n W . Word-probabilities e�� 0 = [e� 0w]with w 2 W are empirically re-estimated: e� 0w =Pd2D

nd;w=Pw02W

Pd2D

nd;w0 .Having incorporated new and deleted out-dated

words, the current extended model e� 0 = (e��0; e!!!0;e�� 0)

is inverted back into its document-based form usingBayesian calculus again.

Parameters according to new documents and wordsare derived separately by folding-in. In order to couplethe in uence of both: new documents, and new words,we run the full EM-algorithm for a small number m ofiterations using all data in (ti � l; ti]. The result is theadapted model �� = (��; �!!!; ��) at timepoint ti.

Successively adapting �i from �i�1 as new dataarrive, gives a sequence of PLSA models �1; �2; : : :. Thatsequence models K topics at each timepoint and thustheir evolution over time. Tracing the topics having thesame index over time is possible because the adaptiveapproach evolves the kth topic-word association �!!!k attimepoint ti from the kth topic-word association !!!k attimepoint ti�1 without renumbering the topic indexes.

4 Evaluation Framework

Topic evolution with simultaneous adaptation to bothan evolving document stream and an evolving vocabu-lary has not been intensively studied in the past. Thus,there is no generally accepted ground truth available,which would allow a cross model evaluation with TOT[18] or DTM [4]. Therefore, objective of our evaluationis to show the improvements of our approach with re-spect to the \independent PLSA model" as described inSubsection 3.2, which is the natural baseline to tacklethe problem discussed here. Independent PLSA derivesfor each position of the sliding window a new PLSAmodel from scratch. If the in uence of the backgroundcomponent is set to zero, the approach of Mei and Zhai[11] will be equal to independent PLSA. That back-ground component uses global knowledge and thus can-not be used in a stream scenario. Hence, independentPLSA imitates the approach of Mei and Zhai as muchas possible in a stream scenario.

We use a real data document stream, namely theproceedings of the ACM SIGIR conferences from 2000to 2007.

4.1 Model Comparison. In the absence of groundtruth, our adaptive PLSA approach can be compared

to independent PLSA by computing perplexity on hold-out data. Informally, perplexity measures how surpriseda probabilistic model is seeing hold-out data. Hence, itassesses the ability of model generalizing to unseen data.

Hold-out data are constructed by splitting the setof word-occurrences of the current data into a trainingpart (80%) and a hold-out part (20%). We maintaintwo counters for occurrences of word w in documentd: nd;w for the training part and n̂d;w for the hold outpart. Both counters sum up to the total number of theoccurrences of (d;w) in the given data. During training,the learning algorithm does not see that documentscontain actually some more words. Hold-out perplexityis computed on the hold-out part of the data unusedduring training. Hold-out perplexity according to anarbitrary PLSA-model �� at timepoint ti is computedas:

(4.8) perplex(��) =

exp

0BB@�

Pd2 �D;w2 �W n̂d;w log

�PK

k=1 �d;k!k;w

�P

d2 �D;w2 �W n̂d;w

1CCA

The computation of hold-out perplexity does not requireany folding-in of new documents. Thus it is notdistorted in the way explained in [19].

Firstly, we analyze the impact of the smoothing pa-rameters s and r used to de�ne the prior distributionsof the topic-mixture proportions and topic-word asso-ciations on the generalization ability of the model. AllSIGIR documents are used as a single batch. We verifyby this experiment the advantage of MAP-PLSA overthe maximum-likelihood estimator of PLSA.

Further, we compare the performance of adaptivePLSA and independent PLSA by average hold-out per-plexity. Our adaptive method derives the model at eachtimepoint by adapting the model of the previous time-point: it folds-in new documents and words, forgetsold documents and words, and recalibrates the parame-ters. Note that both models di�er only by initialization.At each timepoint, document-word-pairs contained inthe sliding window are held out in order to computeits perplexity. We analyze the average hold-out per-plexity over all points in time versus the number oflearning-steps. Learning-steps are either the numberof EM-iterations for independent PLSA or the numberof EM-recalibration steps m for adaptive PLSA. In ad-dition, the number of EM-steps of adaptive PLSA tofold-in documents and words are also limited to m. Forthe purpose of a fair comparison, independent PLSA isrestarted three times at each timepoint; only the bestresults are reported. This experiment analyzes whether


coupling models by initialization indeed propagates use-ful information along the sequence of models and helpsthe models to better �t the data.

The third experiment investigates to which degreethe natural order of the stream contains important la-tent information that helps to predict future documents.In addition, we analyze how the natural order in uencesprediction-power of PLSA models determined by ourproposed adaptive approach compared to the indepen-dent one. We compute at each timepoint the predictiveperplexity of documents from the next timepoint. Tothat end, documents at time ti have to be folded-ininto the model computed at timepoint ti�1. Afterward,these predictive perplexities are averaged over all time-points. Averaged predictive perplexities are computedfor the natural stream order and for a randomly per-muted stream. If our assumptions hold, the averagepredictive perplexity according to the permuted streamshould be signi�cantly larger. We follow the idea ofhalf-folding-in [19] to avoid over-optimistic estimates ofpredictive perplexity. The idea is to fold-in new docu-ments based on only a part of their words, while the restof them is held out. The predictive perplexity accordingto these documents is computed by (4.8) only using theheld-out part unseen during folding-in.

4.2 Comparison of Topic Threads. We study thesemantic meaningfulness of index-based topics threadsover time obtained by our adaptive PLSA method.We have de�ned index-based topic threads for a givensequence of PLSA models in Section 3.2. These topicthreads are primarily de�ned via a technical parameter:the index of the topic.

The independent PLSA model computes topics atdi�erent timepoints independently. Thus, topic indexesat di�erent timepoints are arbitrary. Hence, seman-tically close topics at di�erent timepoints have to bematched using some similarity measure in order to de-�ne threads of topics over time. Best-match topicthreads are built as follows: each topic-word associa-tion from model �i is matched to that topic-word asso-ciation from model �i+1 being maximal similar accord-ing to cosine-similarity subject to minimum similaritythreshold MinSim. The pairs of matching topic-wordassociations are called best-matchings.

Best-match topic threads constitute a more intu-itive de�nition of topic threads that match human's in-tuition about meaningful semantics. In case of very wellmatching topologies of both, we argue that index-basedtopic threads indeed constitute semantic meaningful-ness.

Year 2000 2001 2002 2003jDj 57 71 85 87Year 2004 2005 2006 2007jDj 115 121 133 194

Table 2: Number of documents per year of the SIGIRdata set

5 Experiments

Here, we describe details of the used data and the resultsof our experiments. Additionally, we demonstrate theability of our approach to �nd topic threads which havemeaningful semantics.

5.1 Data Sets. We use a real-world data set, whichcontains articles published at the ACM SIGIR confer-ences from 2000 to 2007, as they appear in the ACMDigital Library. Only titles and abstracts of the docu-ments are considered. Stemming and stopword removalwas applied to all documents. Posters are not includedbecause they often do not have abstracts. The amountof included documents per year is shown in Table 2.Further statistics of this data set are given in Figure 4.We refer to this data set as SIGIR hereafter.

We analyze the SIGIR data set at yearly timepoints.The sliding window spans the current point in time and,if it is of length l > 1, the previous l � 1 ones. Wordsconstituting the feature space at a certain timepoint arethose appearing in at least two documents covered bythe respective sliding window.

5.2 Model Comparison.

5.2.1 E�ect of Priors on PLSA. MAP-PLSA al-lows in contrast to ML-PLSA to specify priors, whichdi�er from a at Dirichlet. We analyze the impact of thesmoothing parameters s and r, which de�ne the priorsfor topic-mixture proportions and topic-word associa-tions respectively, (see Section 3.1). The in uence ofsmoothing is estimated for s; r 2 f0:01; 0:1; 1; 10; 100g.MAP-PLSA is trained with K = 10 on the total SI-GIR data without time stamps. Average perplexities(50 repetitions) on 20% hold-out data are shown in Fig-ure 5. Curves for r = f10; 100g are not shown, as theyhave much larger perplexities. Setting s = 0:1, r = 0:1yields signi�cantly lower hold-out perplexities than oth-ers. Using s = 0:01, r = 0:01 yields a nearly at Dirich-let prior, which corresponds to ML-PLSA. Thus, theresults show: i) that MAP-PLSA opens room for im-provement over ML-PLSA, and ii) that a careful studyto adapt hyperparameters is necessary in order to yieldgood performance.


●

●

● ●

●

●

●

●

Time

Siz

e of

Voc

abul

ary

2000 2002 2004 2006

500

1000

1500

2000

Length l=1

● complete per time point

used by model

common words from prev. model

●

●●

●

●

●

●

Time

Siz

e of

Voc

abul

ary

2000 2001 2002 2003 2004 2005 2006 2007

1000

1500

2000

2500

3000

Length l=2● complete per time point

used by model

common words from prev. model

Figure 4: SIGIR word statistics for di�erent window-lengths.

Smoothing Parameter s

Hol

d−O

ut P

erpl

exity

0.01 1 10 100

650

700

750

800

900

r=1r=0.1r=0.01

Figure 5: In uence of smoothing versus hold-out per-plexity (lower values are better). Smoothing-parameterss and r control the Dirichlet hyperparameters for the es-timates of P (zjd) and P (wjz), respectively.

5.2.2 Comparison of Adaptive and Indepen-

dent PLSA. In this experiment, we compare our adap-tive PLSA with independent PLSA trained at each slid-ing window independently. Technically speaking, thetwo methods di�er in the information used during theinitialization phase: our method folds-in documents andwords, while the independent PLSA performs a randominitialization. To omit side-e�ects of randomness, werestart the independent PLSA three times with di�er-ent random initializations. We allow each method toperform m iterations.

Figure 6 shows the hold-out perplexities for di�er-ent values of m 2 f1; 2; 4; 8; 16; 32; 64; 128g and di�erentlengths of the sliding window: l 2 f1; 2g. Each indi-vidual experiment is repeated 50 times and we presentaverages of the hold-out perplexities. The results showthat our approach needs less iterations to reach the min-

imal hold-our perplexity. More importantly, the bestreached hold-out perplexity for adaptive PLSA is muchbetter than for independent PLSA. This demonstratesthat the initialization by folding-in new documents andwords helps the model to better �t the data.

5.2.3 In uence of Natural Stream Order. Thethird experiment evaluates the e�ect of relying on thenatural order of the document stream. We use the bestparameter settings estimated in the previous hold-outexperiments. We evaluate how well the topic modelsmay predict documents from the next timepoint byestimating predictive perplexity on the new documents.As explained in Section 4.1, we follow the concept ofhalf-folding-in [19].

To assess the in uence of the natural stream or-der we compare average predictive perplexity (50 rep-etitions) on the natural stream order versus randomstream order. In particular, experiments for randomstreams use a di�erent order each time the experimentis repeated. In addition we compare adaptive PLSAwith independent PLSA. The results are shown in Fig-ure 7.

In case of l = 1, the natural order of the streamsigni�cantly helps to predict documents from the nexttimepoint. The adaptive process to evolve PLSA modelsfrom each other can exploit this latent information muchbetter than independent PLSA. In case of l = 2, theimpact of the natural order is more blurred. SinceSIGIR concepts do not change drastically from year toyear, it is quite expected that a larger window size willblur the di�erence between natural order and randomorder.

5.3 Topic Threads and Their Semantics. In thefollowing we experimentally analyze whether index-


●

●

●●

●

●● ●

Learning & Recalibration Steps m

Hol

d−O

ut P

erpl

exity

1 16 32 64 128

340

350

360

370

380

390

400

410

Length l=1

●

Independent PLSAAdaptive PLSA

●

●

●●

● ●● ●

Learning & Recalibration Steps m

Hol

d−O

ut P

erpl

exity

1 16 32 64 128

440

460

480

500

520

Length l=2

●

Independent PLSAAdaptive PLSA

Figure 6: Impact of number of recalibration-steps on the achieved hold-out perplexity (lower values are better).

Ave

rage

Pre

dict

ion

Per

plex

ity25

031

037

0

Natural OrderRandom Order

l=1 l=2 l=1 l=2Adaptive PLSA Independent PLSA

**

Figure 7: Impact of natural stream order on predictingnew documents from next timepoint using di�erent sizesof sliding window. The parameter l is the length of thesliding window used for training models. The star notessigni�cant di�erences using t-test with signi�cance level0:05.

based topic threads show semantic meaningfulness andpresent a concrete example of index-based topic threadsdetermined by TopicMonitor using the SIGIR data.

5.3.1 Comparing Topic Threads. Following ourevaluation framework we compute index-based andbest-match topic threads on a sequence of PLSA mod-els obtained by TopicMonitor with 10 topics. Multi-nomial distributions (topic-word associations) at di�er-ent timepoints are made comparable by �lling in zero-probabilities for the di�ering words. This is necessary,since we allow di�ering vocabularies over time. We usecosine similarity, which in contrast to KL-divergence[11] can cope with zero-probabilities. Given that se-quence of PLSA models, we compute the percentage of

best-matchings which connect two topic-word associa-tions having the same index. A high value indicatesthat both heuristics generate threads which have simi-lar local topological structure. This means that index-based topic threads are similar to those that are con-structed following human intuition and thus are seman-tically meaningful.

The results are shown in Figure 8 for di�erentlengths of the sliding window l 2 f1; 2g. The percentageis never below 94:5% (l = 1) even if all best-matchingsare included (MinSim = 0). This demonstrates thatmatching-based threads, similar to those used in [11],can be approximated by index-based threads, whichare build by concatenating the topics with the sameindex. This demonstrates further that our initialization-based propagation of word statistics keeps the sequenceof models consistent, meaning index-based threads aresimilar in structure to dynamic topics found by DTMs[4].

5.3.2 Semantics of Selected Topic Threads. InTable 3 we show an illustrative example of index-basedtopic threads determined using SIGIR data. In order toproduce a small example �tting paper size, we chosethe number of topics K = 5 and the length of thesliding window l = 1. Each thread consists of asequence of topic-word associations over time havingthe same index. The top 30 words with largest topic-word association are printed for each timepoint and foreach topic-index. As seen so far, adaptive learning ofthe sequence of PLSA models enables for semanticallymeaningful threads. If PLSA was applied to each slidingwindow anew, this continuity would not haven beenachieved.

Even without a deep inspection of the SIGIR con-ference subjects, we �nd (in the SIGIR context) mean-


threshold MinSim

% m

atch

ing

pairs

with

sam

e in

dex

0 0.1 0.3 0.5 0.7

9596

9798

9910

0

● ● ● ● ●●

●

●

●

threshold MinSim

# m

atch

ing

pairs

0 0.1 0.3 0.5 0.7

2030

4050

6070

● for l=1for l=2

Figure 8: Percentage of best-matchings with same index (left). Total number of matching pairs. MinSim is thelower bound of similarity between consecutive topics of a thread (right).

ingful index-based topic threads:

Thread 1: main theme is \Evaluation"; the early sub-area \Multilingual IR" disappears later

Thread 2: \Presentation"; later with emphasis on\Multimedia"

Thread 3: \Supervised Machine Learning"; originally\Classi�cation"; later more elaborate aspects suchas \Feature Selection" and \SVM"s

Thread 4: \Web"; originally from viewpoint \Infor-mation Extraction" and \Link traversal"; later\Web Search" and \User Queries"

Thread 5: \Document Clustering"

We see some strongly evolving index-based topicthreads while others are remarkably stable. For exam-ple, Thread 5 is clearly on document clustering. Thread3 is rather on supervised machine learning (a multi-term that does not appear among the words), origi-nally focusing on classi�cation and later elaborating onspecialized methods like support vector machines. An-other concept of Thread 3 is feature selection. Theimportance of concepts changes in Thread 2, too. Itis about presentation, visualization and frameworks.Later, much words are about multimedia (early years:audio) for which presentation forms are indeed expectedto be important.

We study Thread 4 a bit closer in Figure 9. Thisindex-based topic thread is about Web, originally stud-ied in the context of information extraction and linktraversal. Later, web search and user queries becomedominant.

Further experiments investigating longer and morediverse streams are planned for future work.

1

1

1

1

1

11 1

2

2

2

22

22

2

3

33

3

3

3 3

3

4 4

4

4

4

4 4

4

Time

Top

ic−

Wor

d A

ssoc

iatio

n P

(w|z

)

2000 2003 2006

1234

linkextractusersearch

Figure 9: Evolution of topic-word associations over timefor the Web index-based topic thread (4).

6 Conclusions

We study topic monitoring over a stream of documentsin which both: topics and vocabulary may change. Wewaive the assumption that the complete vocabulary isknown in advance. Our approach allows to study faststreams and alleviates the negative impact of large vo-cabularies with many words of only temporary impor-tance.

TopicMonitor discovers and traces topics by usinga new, adaptive variant of the well-known PLSA. Ouradaptive PLSA derives the model at a certain timepointby adapting the model found at the previous point intime to new documents and unseen words.

We evaluate TopicMonitor on a document-streamof ACM SIGIR conference papers (2000{2007). Ourevaluation concerns the predictive performance of ourapproach to adaptively learn PLSA models in compar-ison to a baseline. Additionally, the semantic compre-


Table3:

Index-based

topicthreadsfortheSIG

IRfrom

2000-2007.

Show

naretop-30wordsper

thread

andtimepoint.

K2000

2001

2002

2003

2004

2005

2006

2007

1evalu

search

queri

document

relev

approach

result

method

effect

retriev

predict

base

experi

ir

user

measur

filter

compartestcollect

improv

number

recal

demonstr

text

translat

vi-

sualsentencpaper

experim

ent

queri

document

relev

evalu

search

retriev

languag

result

translat

summarieffecten-

gin

term

techniqu

approach

system

summar

method

rank

inform

highli

generpaperselect

betterproposuser

feedback

sentenc

probabl

queri

retriev

search

term

docu-

ment

result

relev

techniqu

inform

translat

effect

system

perform

improv

approach

show

arab

collect

irevalu

paperbase

presentstatisttest

trec

word

stem

resourcautomat

retriev

queri

sys-

tem

effect

evalu

searchcollectword

inform

techniqure-

sult

perform

relev

xmlautomatterm

document

expans

interact

translat

test

match

rank

languag

differ

show

demonstr

present

improv

gener

retriev

querirelev

document

collect

inform

system

feedback

evalu

term

translatrank

topic

perform

im-

prov

measur

xml

result

show

effect

test

search

word

blindexperiphrase

precis

clirmethod

length

queriretriev

term

inform

feedback

perform

system

document

evalu

relev

translat

result

approach

test

measur

effect

user

collect

paper

precis

trec

show

improv

word

av-

eragexpansstatist

experi

maxim

um

crosslanguag

queri

retriev

document

relev

result

evalu

term

feedback

improv

precis

collectrank

perform

method

system

measur

user

set

effect

estim

show

av-

erag

test

statist

expans

demonstr

judgment

element

techniqumean

queriretriev

relev

system

rank

evalu

perform

collect

measur

effect

document

term

method

result

testprecis

improv

techniquuserirset

compar

judgment

topic

approach

feedback

differ

proposselecttwo

2inform

userretriev

model

document

system

base

ap-

proach

process

studidiffercognit

structurtopic

need

develop

search

presentir

typeon

techniqu

session

poster

contribut

represent

propos

generbuildintegr

model

retriev

in-

form

userlanguag

system

document

estim

feedback

problem

proba-

biliststudipresent

concept

approach

experim

ent

search

process

provid

markov

queripro-

duc

framework

word

method

per-

form

managbrows

collectim

plement

modelinform

lan-

guag

retriev

user

document

queri

collect

method

paramet

perform

present

propos

baseestim

smooth

music

paper

tra-

dit

precis

differ

interfac

categori

framework

ac-

cess

integr

data

probabl

cooccurr

describ

model

inform

re-

triev

imag

user

languag

video

perform

annot

show

type

docu-

ment

probabilist

probabl

describ

base

task

gener

visual

index

pro-

pos

queridevelop

tool

paper

experi

distribut

music

estim

predict

model

retriev

in-

form

languagim

ag

approach

docu-

ment

framework

distribut

propos

video

perform

show

irprocess

gener

paper

term

present

result

paramet

set

con-

text

belief

empir

us

provid

level

annotautomat

modelretrievim

ag

inform

languag

approach

annot

generproposshow

function

irterm

perform

proba-

bilist

estim

exist

present

framework

paper

concept

set

deriv

result

rela-

tionship

document

experirelev

novel

depend

model

retriev

in-

form

document

languag

approach

probabl

base

space

framework

probabilist

topic

translatstructurir

measur

knowledg

show

sampl

oper

paperbetterrelev

smoothlogic

result

onproposcontextu

relat

model

retriev

inform

languag

base

term

imag

propos

approach

show

contextgener

depend

paper

expert

document

weight

perform

estim

data

import

collect

semant

word

algorithm

present

probabl

improv

frequenc

match

3perform

queri

select

classifi

method

classif

textvectorsystem

databas

result

train

data

retriev

model

hierarch

distribut

improv

effect

machin

boolean

sim

ilar

collect

good

cate-

gor

categori

find

accuraci

support

central

textperform

clas-

sif

sim

ilar

improv

stori

approach

combin

featurdis-

tribut

data

index

classifi

categor

consist

structur

xmlstrategiim

ag

new

train

differ

support

three

machin

base

cor-

pora

result

learn

svm

text

index

classifi

result

classif

data

corpu

approach

tim

einvert

learn

categor

process

method

papersize

show

algorithm

efficiexperiapplic

file

evalu

collect

space

perform

structur

frequenc

filtertrain

textclassif

classifi

method

featur

re-

sult

structurcate-

gorterm

approach

extract

learn

pa-

per

algorithm

ex-

peri

machin

vec-

tor

improv

effici

show

content

ac-

curacieffectgener

categorisourcseg-

ment

train

imag

optim

featur

classif

text

method

learn

al-

gorithm

perform

achiev

automat

train

name

im-

prov

entiticlassifi

machin

accuraci

result

structur

problem

paper

detect

collect

semant

techniqu

effectproposshow

pattern

gener

categor

method

text

clas-

sif

featur

detect

extract

base

au-

tomat

show

data

summar

document

titl

new

summari

techniqu

collect

classifi

problem

articlim

provtrain

match

pattern

question

propos

inform

paperrelat

categor

text

question

document

featur

classif

approach

method

task

sen-

tenc

summar

answerpaperlearn

system

classifi

train

base

compar

problem

relat

in-

form

propos

set

three

hierarch

new

present

novel

browssvm

featur

learn

text

classif

answer

approach

paper

task

classifi

base

method

rank

train

sentenc

problem

question

propos

detectinform

seg-

ment

svm

label

new

result

algo-

rithm

event

data

perform

opinion

differ

4web

qualiti

page

system

document

techniqu

question

retriev

answer

paper

task

high

gener

link

propos

automat

profil

present

user

learn

avail

collect

in-

vestig

statist

larg

manual

metric

dictionari

show

problem

webpagetopic

an-

swerlink

automat

text

question

task

gener

system

algorithm

data

show

find

inform

analysi

method

differperform

two

identifi

extract

paper

larg

queri

import

content

experiapproach

web

extract

data

system

item

text

analysisearchpage

separ

set

collect

question

rank

per-

form

tool

redund

problem

merg

engin

find

design

recommend

cluster

form

answergener

techniqu

specif

differ

irweb

research

search

system

question

answer

inform

data

task

field

databas

link

perform

commun

area

import

user

engin

evalu

find

algorithm

page

combin

present

goalworkstructur

needspecif

web

search

page

system

user

task

document

answer

evalu

result

person

inform

present

engin

perform

analysi

techniqu

provid

effect

base

sim

ilar

commun

structurir

databas

linkcurrentapplic

tim

edata

search

web

user

page

engin

result

algorithm

inform

collect

rank

data

link

system

paper

relationship

studi

base

task

analysi

relev

answerlocat

question

structur

document

interfac

two

order

provid

effect

search

web

user

inform

page

re-

sult

network

rank

algorithm

struc-

tur

task

engin

link

socialsystem

generfind

interest

behavior

queri

pagerank

present

differ

provid

on

paper

problem

commun

content

interact

search

web

user

queriinform

page

engin

result

relev

log

present

studi

develop

system

rank

link

find

provid

task

con-

tentpersonsuggest

gener

support

be-

havior

sim

ilar

ad

sourcsnippetfile

5document

clus-

ter

word

method

retriev

algorithm

inform

perform

system

improv

featur

learn

filter

event

set

text

re-

sult

combin

precis

relev

track

paper

base

threshold

sim

ilar

probabilist

topic

model

term

collect

document

method

base

threshold

in-

dex

segmentscore

term

distribut

algorithm

retriev

topic

relev

describ

investig

detect

optim

inform

perform

cluster

weight

metasearch

automat

paper

model

represent

schemetask

effect

present

document

cluster

topic

algorithm

set

method

score

evalu

model

sim

-

ilar

weight

result

averagprecis

class

differ

comput

gener

compar

inform

function

select

base

featur

sentenc

obtain

initiim

prov

track

detect

document

topic

cluster

method

basealgorithm

set

approach

sim

ilar

propos

semant

object

find

con-

cept

result

estim

sentencrelevscore

select

show

fil-

ter

measur

term

qualiti

improv

experim

entresourc

evalu

databas

document

cluster

filter

method

al-

gorithm

userdata

semantindexdiffer

collabor

propos

paper

lsi

effect

result

base

sim

i-

lar

two

structur

weight

rate

topic

inform

combin

problem

perform

factor

approach

normal

method

document

topic

algorithm

optim

semant

sim

ilardata

index

result

approach

problem

latent

base

inform

tim

e

propos

util

set

differ

filter

event

user

effici

colla-

bor

show

comput

papertracktwo

cluster

document

data

method

sim

ilar

semant

propos

algorithm

index

rate

show

filter

tim

ebase

result

usercombin

topic

space

graph

analysitwo

object

techniqu

item

latent

text

larg

paperapproach

document

index

cluster

method

algorithm

result

proposoptim

per-

form

problem

tim

e

filterefficisim

ilar

data

show

combin

multipl

number

inform

topic

spam

paper

detect

base

textgenercomput

cachexperi


hensiveness of the topics being monitored is analyzed.Since no ground truth for such studies exists, nor abaseline for topic monitoring with a vocabulary thatis not known in advance, we introduce a new evalua-tion framework. The evaluation is based on the con-cepts of perplexity and of topic threads. The formercaptures the extent to which a model is able to gen-eralize to unseen data. The latter re ects whether se-mantically related topics found at di�erent timepointsare observed as part of the same \thread". Our experi-ments show that adaptation of models by TopicMonitorleads to better results than construction of models fromscratch at each timepoint with respect to prediction per-formance. In addition, the experiments indicate thatindex-based topic threads monitored by TopicMonitor

are indeed semantically meaningful.Our TopicMonitor is appropriate for fast streams,

since it does not require to pause the stream in orderto recompute the feature space before building models.We intend to study the speed and space demandsof our approach and devise methods that allow topicmonitoring on very fast and large streams, e.g. onlineblogs. Further research issues are the identi�cationof recurring topics and the detection of groups ofcorrelated topics that occur at distant moments acrosstime.

Acknowledgments

We thank Jan Grau for valuable discussions and com-ments on the manuscript.

References

[1] Charu C. Aggarwal and Philip S. Yu. A framework forclustering massive text and categorical data streams.In SDM, 2006.

[2] L. AlSumait, D. Barbara, and C. Domeniconi. On-lineLDA: adaptive topic models for mining text streamswith applications to topic detection and tracking. InICDM, 2008.

[3] A. Ascuncion, P. Smyth, and M. Welling. Asyn-chronous distributed learning of topic models. InNIPS, 2008.

[4] D. Blei and J. La�erty. Dynamic topic models. InICML, 2006.

[5] David M. Blei, Andrew Y. Ng, and Michael I. Jordan.Latent Dirichlet Allocation. J. Mach. Learn. Res.,3:993{1022, 2003.

[6] J.-T. Chien and M.-S. Wu. Adaptive bayesian la-tent semantic analysis. IEEE Transactions on Audio,

Speech, and Language Processing, 16(1):198{207, Jan.2008.

[7] Tzu-Chuan Chou and Meng Chang Chen. Usingincremental PLSI for threshold-resilient online event

analysis. IEEE Trans. on Knowl. and Data Eng.,20(3):289{299, 2008.

[8] T.L. Gri�ths and M. Steyvers. Finding scienti�ctopics. PNAS, 101(suppl 1):5228{5235, 2004.

[9] T. Hofmann. Unsupervised learning by probabilis-tic latent semantic analysis. Machine Learning,42(1):177{196, 2001.

[10] Geo�rey J. McLachlan and Thiriyambakam Krishnan.EM Algorithm and Extensions. Wiley, 1997.

[11] Qiaozhu Mei and ChengXiang Zhai. Discovering evo-lutionary theme patterns from text: an exploration oftemporal text mining. In SIGKDD, pages 198{207,New York, NY, USA, 2005. ACM.

[12] Satoshi Morinaga and Kenji Yamanishi. Trackingdynamics of topic trends using a �nite mixture model.In SIGKDD, pages 811{816. ACM, 2004.

[13] Rene Schult. Comparing clustering algorithms andtheir in uence on the evolution of labeled clusters. InDEXA, pages 650{659, 2007.

[14] Rene Schult and Myra Spiliopoulou. Discoveringemerging topics in unlabelled text collections. In AD-

BIS, pages 353{366, 2006.[15] Rene Schult and Myra Spiliopoulou. Expanding the

taxonomies of bibliographic archives with persistentlong-term themes. In SAC, pages 627{634, 2006.

[16] Xiaodan Song, Ching-Yung Lin, Belle L. Tseng, andMing-Ting Sun. Modeling and predicting personal in-formation dissemination behavior. In SIGKDD, pages479{488. ACM, 2005.

[17] Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis,and Rene Schult. Monic: modeling and monitoringcluster transitions. In SIGKDD, pages 706{711. ACM,2006.

[18] Xuerui Wang and Andrew McCallum. Topics overtime: a non-markov continuous-time model of topicaltrends. In SIGKDD, pages 424{433. ACM, 2006.

[19] Max Welling, Chaitanya Chemudugunta, and NathanSutter. Deterministic latent variable models and theirpitfalls. In SDM, 2008.

[20] Max Welling, Y.W. Teh, and B. Kappen. Hybridvariational-MCMC inference in bayesian networks. InUAI 2008, 2008.