Upload
eagan-lancaster
View
36
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Using Discretization and Bayesian Inference Network Learning for Automatic Filtering Profile Generation. Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang. Contents. Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization - PowerPoint PPT Presentation
Citation preview
Using Discretization and Bayesian InferencUsing Discretization and Bayesian Inference Network Learning for Automatic Filterine Network Learning for Automatic Filterin
g Profile Generationg Profile Generation
Authors: Wai Lam and Kon Fan Low
Announcer: Kyu-Baek Hwang
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
Information FilteringInformation Filtering
The Filtering ProfileThe Filtering Profile
Information filtering system deals with users who have a relatively stable and long-term information need.
An information need is usually represented by a filtering profile.
Construction of the Filtering ProfileConstruction of the Filtering Profile
Collect training data through the interactions with users. Ex) gathering user feedback information about the relevance
judgments for a certain information need or topic.
Analyze this kind of training data and construct the filtering profile by machine learning techniques.
Use this filtering profile to determine the relevance of a new document.
The Uncertainty IssueThe Uncertainty Issue
It is difficult to specify absolutely whether a document is relevant to a topic as it may only partially match with the topic. Ex) “the economic policy of government”
The probabilistic approach is appropriate for this kind of task.
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
An Overview of the ApproachAn Overview of the Approach
Transformation of each document into an internal form
Feature selection
Discretization of the feature value
Gathering training data by interactions with users
Bayesian network learning
- For each topic
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
Document RepresentationDocument Representation
All stop words are eliminated. Ex) “the”, “are”, “and”, etc.
Stemming of the remaining words. Ex) “looks” “look”, “looking” “look”, etc.
A document is represented by a vector form. Each element in the vector is either the word frequency or the wor
d weight. The word weight is calculated as follows:
where N is the total number of documents and ni is the number of documents that contains the term i.
iii n
Nfw log
Word Frequency Representation of a Word Frequency Representation of a DocumentDocument
Term id Term Frequency
21 gover 3
17 annouc 1
98 take 3
34 student 4
… … …
Feature SelectionFeature Selection
Expected mutual information measure is given as
where Wi is a feature and Cj denotes the fact that the document is relevant to topic j.
Mutual information measures the information contained in the term Wi about topic j.
A document is represented as follows:
)()(
),(log),(
)()(
),(log),(),(
~
~~
1,0
ji
jiji
b ji
jijiji
CPbWP
CbWPCbWP
CPbWP
CbWPCbWPCWI
).,...,(1 pjjj TTT
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
Discretization SchemeDiscretization Scheme
The goal of discretization is to find a mapping m such that the feature value is represented by a discrete value.
The mapping is characterized by a series of threshold levels (0, w1, …, wk) where 0 < w1 < w2 < … < wk.
The mapping m has the following property:
where q is the feature value.
. if ,
if ,
0 if ,0
)( 1
qwk
wqwi
q
qm
k
ii
Predefine Level DiscretizationPredefine Level Discretization
One determine the discretization level k and the threshold values. Ex) Integers between 0 and 15 are discretized into three levels by t
he threshold values 5.5 and 10.5.
Lloyd’s AlgorithmLloyd’s Algorithm
Consider the distribution of feature values. Step 1: determine the discretization level k. Step 2: select the initial threshold levels (y1, y2, …, yk - 1). Step 3: repeat the following steps for all i.
Calculate the mean feature value i of ith region. Generate all possible threshold levels between i and i+1. Select the threshold level which minimizes the following distortio
n measure.
Step 4: If the distortion measure of this new set of threshold levels is less than that of the old set, then go to Step 3.
j
iji qd 2)(
Relevance Dependence Discretization (1/3)Relevance Dependence Discretization (1/3)
Consider the dependency between the feature and the relevance of the topic.
The relevance information entropy is given as
where S is the group of feature values.
),(log),(
),(log),()(Ent~~
SCPSCP
SCPSCPS
jj
jj
Relevance Dependence Discretization (2/3)Relevance Dependence Discretization (2/3)
The partition entropy of the region induced by w is defined as
where S1 is the subset of S with feature values smaller than w and S2 is S – S1.
The more homogeneous of the region, the smaller is the partition entropy.
The partition entropy controls the recursive partition algorithm.
)(Ent||
||)(Ent
||
||);( 2
21
1 SS
SS
S
SSwE
Relevance Dependence Discretization (3/3)Relevance Dependence Discretization (3/3)
A criterion for recursive partition algorithm is as follows:
where (m; S) is defined as
where k number of relevance classes in the partition S; k1 number of relevance classes in the partition S1;
k2 number of relevance classes in the partition S2.
||
);(
||
)1|(|log);( 2
S
Sm
S
SSmGain
)](Ent)(Ent)(Ent[)23(log);( 22112 SkSkSkSm k
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
Bayesian Inference for Document Bayesian Inference for Document ClassificationClassification
The probability of Cj given the document by Bayes’ Theorem is as follows:
.),...,(
)()|,...,()|(
1
1
p
p
jj
jjjj
jj TTP
CPCTTPTCP
Background of Bayesian NetworksBackground of Bayesian Networks
The process of inference is to use the evidence of some of the nodes that have observations to find the probability of some of the other nodes in the network.
T1
C
T2T4
T5
T3).,,,,,( 54321 TTTTTCP
Learning Bayesian NetworksLearning Bayesian Networks
Parametric learning The conditional probability for each node is estimated from the
training data.
Structural learning Best-first search MDL score
A classification-based network simplifies the structural learning process.
MDL Score for Bayesian NetworksMDL Score for Bayesian Networks
The MDL (Minimum Description Length) score for a Bayesian network B is defined as
where X is a node in the network.
The score for each node is calculated as
X
Xtotal XLBL ),()(
).,(),(),(ijiijiiji TjdataTjnetworkTjtotal TLTLTL
Complexity of the Network StructureComplexity of the Network Structure
Lnetwork is the network description length and corresponds to the topological complexity of a network and computed as follows:
where N is the number of training documents, sj is the number of possible states the variable Tji
can take.
ijT
ijij
jiTjnetwork ssN
TL )1(2
log),( 2
Accuracy of the Network StructureAccuracy of the Network Structure
The data description length is given by the following formula:
where M() is the number of cases that match a particular instantiation in the training data.
The more accurate the network, the shorter is this length.
),(
)(log),(),( 2
,ij
ij
ij
ijTij
iij
Tij
T
TT
jTijdata TM
MTMTL
ContentsContents
Introduction Overview of the approach Automatic document pre-processing Feature selection Feature discretization Learning Bayesian networks Experiments and results Conclusions and future work
The Process of Information Filtering based The Process of Information Filtering based on Bayesian Network Learningon Bayesian Network Learning
Gather the training documents. For all training documents, determine the relevance to eac
h topic. Feature selection for each topic.
5 and 10 features were used in the experiments.
Discretization of the feature values. Learn a Bayesian network for each topic.
Set the probability threshold value for the relevance decision.
Each Bayesian network corresponds to the filtering profile.
Document CollectionsDocument Collections
Reuters 21 578 29 topics. In chronological order, first 7 000 documents were chosen as the tr
aining set and the other 14 578 documents were used as test set.
FBIS (Foreign Broadcast Information Service) 38 topics used in TREC (Text REtrieval Conferences). In chronological order, 60 000 documents were chosen as the train
ing set and the other 70 471 documents were used as test set.
Evaluation Metrics for Information Evaluation Metrics for Information RetrievalRetrieval
True
Relevant Non-relevant
Algorithm Relevant n1 n2
Non-relevant n3 n4
.32
231
Utility
)1(
)/( )(precision
)/( (recall)
321
21
4321
2
2
211
311
nnnF
nnF
DnCnBnAn
RS
SRF
nnnS
nnnR
Filtering Performance of the Bayesian Filtering Performance of the Bayesian Network on the Reuters CollectionNetwork on the Reuters Collection
Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach
Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach
Filtering Performance of the Bayesian Filtering Performance of the Bayesian Network on the FBIS CollectionNetwork on the FBIS Collection
Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach
Comparison of the Bayesian Network Comparison of the Bayesian Network Approach and the Naïve Bayesian Approach and the Naïve Bayesian ApproachApproach
Conclusions and Future WorkConclusions and Future Work
Discretization methods. Structural learning.
Large data
Better performance over naïve Bayesian approach.