Upload
duckmaestro
View
223
Download
0
Embed Size (px)
Citation preview
8/13/2019 Visualizing Inference in Large Bayesian Networks
1/14
1 IntroductionThe application of machine learning in the 21 stcentury is
increasingly both exciting and challenging, with many
orders of magnitude more digital data available than
before. The amount of text data on the internet has
increased from an estimated couple terabytes in 1997, to
Twitter.com alone storing 5 gigabytes of new tweets
daily !1"!2". This does not count the many private data#
bases used in enterprise, such as the petabytes of cus#
tomer and transaction information $al#%art &tores
retains !'".
Though the collectionof raw data will continue to have its
challenges and costs, significant attention has now
turned to the problem of utilizingall of this data. $hether
the goal is indexing, data#mining, or building predictive
models, today(s challenge is fundamentally tied to the
enormous number of observations and variables cap#
tured. This is the so#called )big data* problem.
+ually important as the algorithms or storage systems
are the visuali-ation methods. resentation is not simply
an aesthetic concern. +dward Tufte writes that )often themost effective way to describe, explore, and summari-e a
set of numbers / even a very large set / is to loo0 at pic#
tures of those numbers,* and that data graphics can be
both )the simplest !and" most powerful* of methods !".
ayesian networ0s in many applications enable efficient
and scalable statistical and causal modeling !5", and have
a natural visual representation following from theirgraph structure. y viewing a visual representation of
the graph structure one can uic0ly identify potential
correlations or causal relationships between variables
simply by viewing the presence of edges in the rendered
graph structure. eyond this simple case, more sophisti#
cated analysis by visual means alone is difficult, espe#
cially as the model grows in si-e. 3iewing conditional dis#
tributions with more than a couple parent variables
uic0ly becomes unwieldy, and networ0s with upwards
of 5 variables can be difficult for a user to navigate and
parse visually. 4ecent wor0 has focused on improvingvisuali-ation and navigation in large networ0s of up to
thousands of variables !", and is not a solved problem
yet. To understand these large, modern networ0s more
efficiently, and in turn better utili-e the wealth of data
available in the era of big data, new methods of visuali-a#
tion are needed.
To this end, we introduce two ideas to assist the visual
analysis of large ayesian networ0s6 inference diffs and rel-
evance filtering.
2 Summary of Prior Work
Though creating effective ways to visuali-e ayesian net#
wor0s is not a new problem relative to the age of
)ayesian networ0s* proposed in detail by earl in 1988
!7", it appears to be a problem that has received rela#
:raft v5 1 21' :ecember 9
Visualizing Inference in Large Bayesian Networks
;lifford ;hampion:epartment of ;omputer &cience and +ngineering
n this pro?ect we address the challenge of viewing and usingayesian networ0s as their structural si-e and complexity grows. $eintroduce two new visuali-ation methods, inference diffs, and relevancefiltering, to enable visual analysis of information flow in thesenetwor0s, and to enable direct comparison of two evidenceconfigurations simultaneously. $e implement and discuss theperformance of these visuali-ation methods on two modestly largenetwor0s, built from real#world data.
8/13/2019 Visualizing Inference in Large Bayesian Networks
2/14
tively little attention. $hile there have been advances in
visuali-ing large graphs such as those surveyed by &cha#
effer !8", these methods depend on basic graph#theoretic
information at most, such as cliues and node degree,
and don(t directly consider the probabilistic aspects of
ayesian networ0s.
@evertheless, some variety of visual designs and princi#ples specific to ayesian networ0s have been developed
or explored over time, and are briefly recounted here.
To visuali-e causal relationships globally, so#called tem#
poral or causal layouts are popular, placing ancestors
e.g. independent variables near the top of the visual
layout and descendents e.g. dependent variables near
the bottom, for a generally downward flow of edge direc#
tions, for a downward flow of causation. This 0ind of lay#
out is often used without explicit mention and is a fea#
ture of some directed graph layout algorithms, but Aap#ato#4ivera et al., and ;hiang et al. called out this layout
explicitly !9"!1".
To visuali-e local influence i.e. between exactly two vari#
ables the direction of the edge arrow is of course well#
established for indicating the direction of modeled cause
and effect. eyond simply the edge direction, or its pres#
ence at all, Aapato#4ivera et al. explored fixed color
assignments to independent variables, and color mix#
tures thereof to dependent variables, weighted to indi#
cate relative strength influence from the parent variables!9". Aapato#4ivera also considered varying edge lengths,
so that mutually influential nodes appear nearer to one
another than if uninfluential. Burther, both Aap#
ato#4ivera et al. and Coiter explored varying edge thic0#
ness to indicate influence between parent and child vari#
ables, using thic0 lines for strong influence. +ach dis#
cussed various analytical definitions for computing the
input values needed for these visuali-ation techniues !9"
!11".
To visuali-e conditional probability tables ;Ts, ;hainget al. proposed miniature 2d heatmaps attached to edges
!1", however this is appears to be well#defined only for
children with exactly one parent each. ;ossalter et al.
introduced )bubble lines* connecting nodes in the net#
wor0 to floating ;T windows, ma0ing it easier for the
user to 0eep their bearings while debugging ;Ts in large
networ0s. They also introduced a numerical difference
view for viewing ;Ts of two variables expected to have
similar local distributions !12".
To visuali-e the presence of evidence, common practice
in literature is to draw a double#border around observed
nodes variables with evidence, or to use shading on the
interior of the node. $illiams and Dmant experimentedwith using different colors of shading to indicate differ#
ent evidence values !1'".
To visuali-e marginal and posterior distributions, at least
three techniues have been explored. &oftware applica#
tion @etica used rectangular nodes instead of circular, in
order to embed bar charts for the marginal probability
masses of each variable !1". &oftware application
ayesiaEab allowed the user to open a distribution win#
dow for each variable and compare the prior no evi#
dence and posterior with evidence distributions of avariable in two hori-ontal bar charts overlaid on one
another !15". Aapato#4ivera et al. used node diameter to
indicate large or small posterior probabilities for binary#
valued variables, and animation thereof to indicate
changes to posteriors under changing evidence !9".
To visuali-e local and global information simultaneously,
&undarara?an et al. employed a partition and fish#eye
approach to graph layout, letting the user define and
inspect local areas of interest in the networ0 while still
seeing the context and structure of the full networ0 !".
D common trait among most of the approaches is their
dependence on relatively static information about the
networ0, whether this be the conditional probability
tables, or simple posterior or marginal distributions.
Fur goal is to create a visuali-ation that captures a more
dynamic view of the ayesian networ0s, hopefully shin#
ing new light on information flowG and to scale effec#
tively in large networ0s. $e outline our basic design
choices next, using or iterating on prior wor0, and upon
this foundation introduce a more dynamic approach to
visuali-ing ayesian networ0s using inference diffs.
:raft v5 2 21' :ecember 9
8/13/2019 Visualizing Inference in Large Bayesian Networks
3/14
3 Visual oundation
3!1 "ssum#tions and Princi#les
Fur approach is to define a visual language eually suited
for print or personal computer, and consistent with the
principles proposed by +dward Tufte on )graphical dis#
plays* !". $hen a human#computer interaction is dis#
cussed, we assume one user at a time, that they are using
a mouse or a touch#screen interface, and that their dis#
play si-e and :> is that of an average tablet or des0top
display.
Fur underlying model to visuali-e is a ayesian networ0
of finitely many random variables, and each variable hav#
ing a finite event space. $e assume at the very least the
user would desire to be able to view the ayesian net#
wor0 structure, inspect local conditional probability dis#
tributions, see marginal or posterior distributions,
inspect the event space of each variable, and otherwise
clearly see the basic ma0eup of the networ0 instance.
These assumptions are sufficient for defining the founda#
tion of our visual design.
To stay disciplined and focused, we will also see0 to avoid
)chart ?un0*, avoid distorting data or potentially mis#
leading the user, and avoid unnecessary in0, maximi-ing
the )data#in0 ratio* !". +very pixel or drop of in0 should
convey information and convey so unambiguously.
3!2 Network Structure and $andom
Varia%les
The most basic information in a ayesian networ0 is the
structure and random variables therein. To present the
structure is to present the learned or constructed
causal influence between variables. Eogically, this ob?ect
is a directed acyclic graph :DHG visually, this is tradi#
tionally a collection of labeled circles and arrows
between them. $e largely continue this tradition.
4andom variables must be clearly identifiable, while at
the same time clutter must be 0ept under control, other#
wise it becomes noise. To this end, there are two views6
structural and legend. The structural view present a vari#
able as a single, circumscribed capital letter, ta0en from
the first letter in the name of the variable. The legend
view maps these letters to their full variable names, such
Atoage. ;apital letters are used in the structural view for
readability. Ds with previous methods, vertical ordering
in the structural view is presented causally top#down to
the extent possible, though this is not always possible,
especially in large networ0s.
To scale to large networ0s, we perform two more things.
Birst, where two or more random variables have the same
single letter in the structural view, we suffix their name
with a uniue number, chosen seuentially from 1. This
numerical suffix appears both in the structural view and
the legend view in subscript type. &econd, both views are
scrollable and both follow loosely the same top#to#bot#
tom variable ordering.
Fn the appearance of random variables in the structural
view, we de#emphasi-e the dar0 stro0e that traditionally
circumscribes the variable, as we will be using this stro0e
to carry meaning later.
8/13/2019 Visualizing Inference in Large Bayesian Networks
4/14
within each variable, not over the collection of variables.
This permits us to design a reusable, optimal color pal#
ette with minimal visual confusion when within the con#
text of a single variable. Though there is possible ambigu#
ity in values of different variables sharing a same color,
our design avoids this ambiguity by always framing color
in the context of a specific variable. $here two or morevariables share the same event space, we reuse the color
mapping for stronger consistency.
To indicate what in fact
the color assignments
are for each value and
each variable, we aug#
ment our legend view to
list each event space
value with correspond#
ing assigned colors, seenin Bigure 2.
Bor categorical event
spaces our color map is
constructed so that no two contiguous colors are per#
ceived too near to one another6 for example, orange may
follow green but may not follow brown. Bor ordered
event spaces the color map is constructed in the opposite
fashion by seuentially choosing neighboring hues on the
color wheel, e.g. from the blue region, through yellow, to
the red region.
Ds will be important later, we ensure any presented color
order value order is constant, e.g. for a particular vari#
able, blue always appears first before orange.
3!* &onditional
Pro%a%ility +a%les
Fne of practical difficul#
ties with ayesian net#
wor0s can be the si-e of
each conditional proba#bility table ;T, or its
local distribution. 4ecall
that the si-e of a random
variable(s ;T is gener#
ally the number of prob#
ability weight assign#
ments specified, which grows exponentially in the num#
ber of parent variables or the in#degree of the variable.
&ome tools present the ;T as a single table with col#
umns for each permutation of parent values, but this
tends to reuire a large hori-ontal scrolling area. $e pro#
pose that vertical scrolling is more natural, and present
our ;T vertically. $e present each conditional probabil#ity distribution for a given parent permutation as a sim#
ple vertical list of probability densities. $e stac0 each
such list vertically, and separate each by their corre#
sponding parent value permutation, placed above it. $e
use our event space color mappings here, for each proba#
bility density and each parent valueG and again use corre#
sponding abbreviated variable names e.g. D1.
3!, (m%edded -istri%utions
3iewing the marginal distribution of each variableshould be convenientG that is, we want a clear way to see
P(X) for each random variable X . %ost tools
reuire the user open an additional window to see such
distributions, either in tabular form or bar chart. $e sim#
ply embed the distribution directly in the variable in the
structural view. To do this we construct a pie chart using
our event space color mapping, and render pie slices pro#
portional in si-e to the posterior probability mass of each
:raft v5 21' :ecember 9
Bigure '6 ;onditional probabilitytable view for a variable T2.
Bigure 26 3alues in each randomvariable event space are mappedto a preset color palette.
Bigure 6 %arginal or posterior distributions are embeddeddirectly in the variable via area pie chart.
8/13/2019 Visualizing Inference in Large Bayesian Networks
5/14
value, starting at the 12 o(cloc0 position, allocating slices
in cloc0wise order.
This highly visual approach conveniently presents an
overview of the entire statistical model to the user, with#
out their needing to inspect variables one#by#one in
seuence, or in additional views. $here more precise
numerical inspection is needed, a we use an additionalcolor#coded tabular view similar to our ;T view.
3!. ()idence
Drguably the greatest power of a ayesian networ0
model once it is trained or constructed is in computing
posterior distributions of arbitrary evidence, i.e.
P(X| E) . $e will not explicitly discuss causal modeling
and interventions the analogue of evidence until later,
as most of our visual design for statistical models has a
natural extension into causal models.
Brom a visuali-ation perspective it is important that the
user clearly see which variables currently have evidence
and which do notG and furthermore, what specific values
of evidence have been specified. To indicate that a vari#
able has user#defined evidence, we circumscribe the vari#
able in the structural view with a strong blac0 stro0e.
%oreover, the interior of node is colored entirely with
the associated color of the evidence value, as seen in Big#
ure 5.
Binally, all embedded distributions for non#evidence
nodes are updated in the structural view to reflect each
variable(s new, posterior distribution. Fur previous visu#
ali-ation which embeds marginal distributions is of
course simply a special case of visuali-ing posterior dis#
tributions with an empty evidence set. $e will formali-e
our notion of evidence in finer detail shortly.
* /aking an Inference -iff
*!1 /oti)ation
$ith a basic visual foundation established, we focus our
attention to more sophisticated visual analysis methods.
The basic idea we introduce first is that of an inference
diff.
>nference and information flow is an important capability
of ayesian networ0s. ;onsider a large networ0 which
models the health of components in a large multi#compo#
nent system. Fne may wish to use this networ0 to as0
which components( probabilities of failure are affected
by one or more other variables, for instance ambient air
temperature, and for those affected, to what degreeG or
the inverse of this and as0 what are the most li0ely envi#
ronmental conditions given a failure in one or more com#
ponents. Ds networ0s grow in si-e, the answers to these
uestions can be as difficult to find as the right uestion
to as0 in the first place.
There are analytical tools such as d#separationG however,
such a tool is limited in its application, largely because a
ayesian networ0 is itself inherently limited in its ability
to describe certain independent relations !1". 4ecall that
for a networ0G , its >#map I(G ) may not be a minimal
>#map, meaning G contains unnecessary edges and is too
safe in its conditional independence claims. %oreover,
for a true ?oint distribution it may be impossible for any
ayesian networ0 to have a perfect >#map or #map.
There may also be context#specific independencies in the
networ0, not discernible in networ0 structure alone. Bur#
ther, the user may not need to 0now or care about cer#
tain dependencies if small or approximately indepen#dent. >n each case these issues can lessen the usefulness
of d#separation analysis in practice, or reuire more com#
plicated analysis.
Fn the other hand there is exhaustive computation,
using inference algorithms to produce complete poste#
rior distributions for some or all of the non#evidence
:raft v5 5 21' :ecember 9
Bigure 56 +vidence nodes are circumscribed with a strong blac0
stro0e, and colored according to their evidence. Dll other nodes(embedded distributions are updated to reflect their posteriordistributions. Bor example, T1(s embedded distribution nowreflects P(T1 | V=v , A=a) rather than simply P(T1) .
8/13/2019 Visualizing Inference in Large Bayesian Networks
6/14
variables. &uch output is much more detailed and precise,
but suffers from another limitation in that it is static.
Brom it there is no direct indication of how or to what
degree belief propagation occurred. There is simply a
before and after state of the distribution.
$hat we would li0e is a way to visuali-e in an obvious
way the effects of information flow through the networ0.To this end we find inspiration in modern software engi#
neering practices and so#called )diff* tools short for
)difference* tool. 4eviewing the )diff* of two or more
human#readable files is an every#day practice in com#
mercial software development, generally aided by the use
of color and side#by#side before#and#after views. Ds a
means of visuali-ingchange, diffs are highly effective and
visually intuitive. Fur goal is to find an as#effective
method of viewing inference and information flow in
ayesian networ0s, in hopes of enabling a more powerful0ind of visual analysis.
*!2 -efinition
$e start with a mathematical definition an )inference
diff*. Hiven a ayesian networ0 B describing a proba#
bilistic model over n random variables{Xi : i[1,n ]} ,
each with finite event space Si , and given two evidence
sets E1
and E2, each an element of the set of partial
observations i=1n
Si {?} , we define an inference diff
to be the set of pairs
={( P(Xi|E1), P(Xi|E2) ): i[1, n ] } .
>n other words, an inference diff is the set of pairs of con#
ditional probability distributions, for each random vari#
able, and according to two sets of evidence.
Bor example if the random variables of the networ0 are
X , A , and B , and B ta0es on value b in E1 then
E1=(? , ? , b ) and P(X| E
1)=P(X| B=b) . >f an evi#
dence set is eual to (? , ? , ..., ?) we say that it is empty.
@ote that we use ? to represent )unobserved* or)unspecified*.
*!3 Visualization
To visuali-e inference diffs we extend our use of pie
charts. Birst we establish convention that evidence set
E1 is in fact the evidence set when simply using the net#
wor0 to view a single set of posterior results. Bor instance
Bigure 5 represents an E1
with values for two variables,
and an E2
that is empty. To augment our visuali-ation
for the case when E2
is non#empty, we introduce for
each variable a )ring* chart, concentric with the vari#
able(s existing pie chart. $e reuse the event space colormap established for that variable, maintain a consistent
event space ordering, and again weight the chart slices in
proportion to posterior probability masses for that vari#
able, this time conditioned on E2
.
To indicate which variables have evidence specified, we
continue to use the strong blac0 stro0e, applied to either
the pie, the ring, or both, in accordance with which vari#
able and evidence set has evidence. y carefully reserv#
ing use of the blac0 stro0e earlier, we are able to apply it
here in a more nuanced fashion, to help disambiguatefrom which evidence set a variable(s evidence is speci#
fied.
This concentric design allows the user to ma0e direct
comparisons of the effects of evidence sets E1
and E2
,
uic0ly and easily. Dt least two classes of ueries can be
performed now and produce interesting visual answers.
Birst, one can view information flow more concretely,
simply by setting E1={?} , and E
2to any partial
observation. &econd, one can ma0e direct comparisons
between different non#empty evidence sets, such as as0#
ing whether observing some three variables is different
than observing only two of them, or as0ing how different
is it to observe variable Xi having some value than to
observe variable X j having that value.
:raft v5 21' :ecember 9
Bigure 6 Dn inference diff between two evidence sets. Ierevariable D1has observed values in both sets, indicated by ablac0 stro0e around both the inner circle evidence set 1and the outer ring evidence set 2.
8/13/2019 Visualizing Inference in Large Bayesian Networks
7/14
, $ele)ance iltering
$hile the inference diff enables direct comparisons of
the effects of different evidence on each variable, it
doesn(t necessarily help guide the user to the variables
they may be interested in the most. This can be a prob#
lem in large ayesian networ0s, where there is simply
too much information visible simultaneously, or where
the user itself lac0s familiarity with the variables in the
model. $hat we would hope to achieve is a way to guide
the user to the variables they are li0ely to be interested
in, given some provided evidence. To accomplish this we
use the inference diff as our basis, and add to it relevance
filtering.
,!1 -efinition
$e define the relevanceof a random variable Xi as simply
the symmetric Cullbac0#Eeibler CE divergence !17" of
that random variable given its inference diff. %ore pre#
cisely, given an inference diff derived from evidence
sets E1
and E2
, we define the relevance r of a random
variable X as
r(X) =D KL (P(X| E1)|| P(X| E2) )+DKL ( P(X|E2)|| P(X| E1) )
.
The function DKL indicates the Cullbac0/Eeibler diver#
gence, with standard definition
DKL(P||Q )=i
ln(P(i)Q(i )
)P(i)
for probability distributions Pand Qsharing the same
event space. $e use the symmetric CE divergence so that
our definition of relevance is also symmetric. Hiven two
random variables Xi and X j ,ifr(Xi ) < r(Xj) then
we say that X j is more relevant than Xi given evidence
sets E1
and E2
.
This definition of relevance is chosen intuitively. ecause
distributions which differ greatly have high CE diver#
gence values, we are saying that the variables whose con#
ditional distributions changed the most between E1
and
E2
are the variables that are most relevant to the user.
,!2 Visualization
$ith relevance defined, we have our final mathematical
tool to complete our visuali-ation method. Bor large net#
wor0s it can be easy for evidence sets to produce minimal
differences in the posterior distributions of variables. >t is
with this situation in mind that we apply our definition
of relevance.
To do this, we first decide which variables are relevant
enough for the user. Hiven the current inference diff ,
we sort the variables of in descending order according
to their relevance. &econd, we introduce a user#config#
urable relevance threshold, represented as a percent
valuec between J and 1J. Binally, we tag each ran#
dom variable Xi as either )relevant* or )irrelevant*
according to whether whether Xi is in the top c percent
of variables ordered by relevance.
Eastly, we ad?ust our visuali-ation in the structure and
legend views. 3ariables which are irrelevant according to
thresholdc are shrun0, dimmed, and their pie and ring
charts removed in the structure view. $e also shorten
the edge lengths between any two collapsed variables,
and edges connecting at least one irrelevant variable are
changed to a dotted line rendering.
The overall effect is to shrin0 the virtual space needed
for the entire graph structure, which focuses the user(s
attention on the relatively small number of variables
remaining. Bor these remaining variables we continue to
show the pie and ring charts associated with the infer#
ence diff. $e also remove from the legend view irrele#
:raft v5 7 21' :ecember 9
Bigure 76 Dn inference diff with relevance filtering enabled.3ariablesAand Vcontain evidence in at least one evidence seteach. ;ompared with the structure seen in Bigure 1, variableslabeled , T!, and"were least relevant given the evidence sets,and thus are reduced in si-e and visibility.
8/13/2019 Visualizing Inference in Large Bayesian Networks
8/14
vant variables.
The end result is a user#controllable level of visual com#
plexity, and a clear and concise, ualitative view of infor#
mation flow where most impactful.
. -ata Set (0am#les
To test our proposed visuali-ations, we developed a sim#
ple application called B-Vis. >n building this application
we implemented our own structure learning and infer#
ence based on existing algorithms in the literature,open#sourced separately as #-A$ !18". Bor graph layout we
choose the library %ra&'( !19", ma0ing small modifica#
tions as necessary.
8/13/2019 Visualizing Inference in Large Bayesian Networks
9/14
were most affected by the change in evidence. ecause
the &ugiyama layout algorithm is layered means there is
also resemblance between the before and after layouts.
The Traffic data set is somewhat special#case in that all
variables share the same event space, and that the vari#
ables are highly correlated for identical event values.
.!2 !S! 1 &ensus -ata Set
Bor a larger and more interesting networ0, we consider
the 199
8/13/2019 Visualizing Inference in Large Bayesian Networks
10/14
:raft v5 1 21' :ecember 9
Bigure 16 ;ensus networ0, consisting of 8 variables, -oomed out to reveal entirestructure.
Bigure 116 ;ensus networ0, with relevance filtering enabled for the currentinference diff. The top 2J most relevant variables retain their embeddedposterior distributions, while all other variables are reduced in si-e and visibility.
8/13/2019 Visualizing Inference in Large Bayesian Networks
11/14
4 uture Work
4!1 &'allenges and Scaling urt'er
%aintaining layout stability is particularly important.
The user must be able to ad?ust relevance filtering with#
out radical changes to the layout, otherwise the experi#
ence uic0ly becomes disorienting. $e were able to
maintain a minimally sufficient level of stability, simply
by using the )&ugiyama +fficient* algorithm in %ra&'(.
Though this has some inherent stability due to its layered
solution, it is not uite perfect for our needs. $e would
li0e to incorporate a customi-ed or more sophisticated
layout algorithm with inference diffs and relevance fil#
tering in mind. &uch an algorithm may continue to be
layered, for instance with stability addressed in more
detail in &ugiyama(s original wor0s !2"G or forced#
directed with constraints, which has ongoing exploration
!2". >t may also be possible to combine the fish#eye tech#
niues of &undarara?an et al., by automatically configur#
ing their interest areas using the locations of our rele#
vant variables after relevance filtering !".
Fur choice of color as a modality for values is challenging
when scaling to large event spaces. Bor instance, some
variables in the ;ensus data set contained over 15 possi#
ble values. Fur color mapping at present contains only
uniue color values, meaning that for such variables
some colors were used multiple times. ecause we
present a consistent ordering, both radially and in the
legend view values, ambiguity is mostly removed, but
reuires additional mental energy on the part of the user.
Bor categorical variables with significantly more than 15
values, the effectiveness of the color mapped approach is
expected to fall apart. Bor color#blind users, using a lim#
ited color palette creates further difficulty in scaling. Fne
interesting possibility may lay in collapsing colors in
inference diffs, where the probability masses of certainvalues are small or have changed very little in the diff for
that variable. ;oloring continuous#valued variables is not
addressed here, but may be possible as well, perhaps by
bounding the event space and assigning distinct colors to
special points of significance in the event space, with
weighted color blending radially.
4!2 "##lications to 5t'er 6ra#'ical /odels
$ith respect to ayesian networ0s, though the models
presented here are statistical, rather than causal, aninference diff is possibly most useful in a networ0 with
truly causal modeling. Bor example in medicine, one
could uic0ly as0 the networ0 for a visual answer to the
uestion6 given a patient with conditions X and Y , i.e.
X=true ,Y=true in E1
and E2
, what are the largest
differences expected between prescribing treatment A
versus treatmentB , i.e. do (A=true , B=fa!e) in E1
and do (A=fa!e , B=true ) in E2
.
Fther probabilistic networ0 models may benefit from
inference diff and relevance filtering visuali-ations, suchas %ar0ov random fields which have similar inference
and belief propagation capabilities.
4!3 nused /odalities
nstead of
beginning embedded distributions( slices at the 12o(cloc0 position, the start degree could be varied and
demarcated to carry significance of some 0ind. The shape
of nodes in the structural view could also convey infor#
mation, such as whether the node captures a causal
dependency.
:raft v5 11 21' :ecember 9
Bigure 126 >nference diff of variable (means( oftransportation. The diff is generated from an emptyevidence set 1, and a evidence set 2 with (income(Ntrue.
8/13/2019 Visualizing Inference in Large Bayesian Networks
12/14
4!* 5t'er /etrics for $ele)ance
%etrics other than CE#divergence may be useful or more
appropriate for relevance filtering. 4egardless of the
basic metric chosen, it may be useful if weights can be
attached to values in the event spaces, such that large
changes to probability masses of low#weight events
count less toward ma0ing that variable relevant.
7 &onclusion
3isuali-ation methods are increasingly important as the
scope and uantity of data increases. The flexibility and
distributed nature of ayesian networ0s, and graphical
models in general, ma0e them one of many useful tools
for machine learning. >n this pro?ect explored ways to
visuali-e these networ0s, and introduced a number of
new visuali-ation techniues.
Birst, viewing the structure of a networ0 is but half the
coin. $e complete the other half by placing posterior dis#
tributions directly in the networ0 as embedded pie
charts. &econd, direct comparisons are incredibly useful
tools, yet past visuali-ations have struggled with this. $e
propose inference diffs as a method of meaningful direct
comparisons, with concentric pie and ring charts for
visuali-ation. Binally, navigating efficiently in large
ayesian networ0s is generally a challenge. $e introduce
relevance filtering, with CE divergence as a mathematical
basis, as a tool to guide the user to variables of interest inthe model.
"##endi0 "8 S#ecifying ()idence
>n implementing B-Viswe sought a convenient interac#
tion model for allowing the user to add or remove evi#
dence. Burther, we wanted the interaction to exhibit
cohesiveness with the existing design language. The
practical challenges are in presenting potentially many
possible values efficiently e.g. without resorting to
scrolling or multi#level menus, and in providing a mech#
anism for choosing the target evidence set i.e. E1
or E2
.
>n 0eeping with the use of color and circular charts, we
opted for a radial menu approach, seen in Bigure 1'. +ach
node in the networ0 is directly clic0able, opening a radial
menu with color#coded choices available for each possi#
ble value in the event space for that variable. The user
then simply drags a value from the radial menu onto
either the inner circle or the inner ring, corresponding to
evidence sets 1 and 2 respectively. To remove evidence,
the user performs the action in reverse, dragging from
the inner circle or ring to anywhere else outside of thenode.
Ds with before, the legend view provides a reference of
the event space to color mapping for the user. $e also
fade the rest of the networ0 to remove visual complexity
while choosing evidence. Dlso note that because no 0ey#
board input is necessary, this interaction model is also
touch#screen friendly.
"##endi0 B8 Inference "lgorit'm sed
Bor inference in the above networ0s we implemented
approximate inference using Hibbs sampling, a %ar0ov
chain method with theoretical convergence to the true
posterior distribution !1". This implementation is avail#
able publicly under our #-A$pro?ect !18". To reduce the
effects of statistical dependence from particle to particle,
we retain only one out of ever ad?acent particles when
building new posterior distributions. To seed the %ar0ov
chain we begin with a forward sampling of the networ0,followed by a warmup period of 25 Hibbs samples.
ecause this computational approach is iterative, we are
able to regularly update the embedded distributions in B-
Vis, effectively creating a real#time animation of the con#
vergence toward the stationary distribution. $e update
the structural view in B-Vis approximately once every
:raft v5 12 21' :ecember 9
Bigure 1'6 Dn example of the evidence menu. This variable(sevent space is of si-e six, with corresponding color#codedvalues available for drag#drop into evidence sets 1 or 2.
8/13/2019 Visualizing Inference in Large Bayesian Networks
13/14
', variable#iterations, or every iterations for a 5
variable networ0.
"##endi0 &8 Learning "lgorit'm sed
To create the ayesian networ0s seen, we wrote our own
implementation of general structure learning and ;T
learning. Fur algorithm is based almost entirely on the
ideas presented in section 18. of Coller and Briedman
!1", and amounts to an iterative search over the space of
possible networ0 structures. +ach possible structure has
an associate score, and the goal is to find a structure of
maximal score. $e use the ayesian information crite#
rion >; score for this purpose. $e use a uniform
:irichlet prior to effectively ad?ust the sufficient statis#
tics of the training set, which combined with the >;
score acts to control the level of complexity in the result#
ing networ0 structure.
The search algorithm is initiali-ed with a fully discon#
nected networ0, and at each iteration chooses the best
edge operation among a candidate of operations. +ach
candidate operation is either an edge addition, edge dele#
tion, or edge reversal, so long as the operation will not
violate acyclicity. $e compute the change to the total
score of the networ0 for each candidate action, and for
that iteration of search choose the action with largest
improvement to the total networ0 score. >f no action is
found that increases the total score, the search is haltedand the most recent structure is returned.
ecause each edge operation affects at most two local
family scores, most family scores from iteration to itera#
tion do not change. Ds further suggested in Coller and
Briedman !1", we exploit this fact to cache previously
computed family scores, and invalidate only when a cho#
sen edge operation affects the associated variable of a
cached family score. This ma0es the algorithm a dynamic
programming algorithm, and significantly improves the
computational demand. Burther, we also cache varioussufficient statistics values associated with the training
set, also speeding up the overall process.
"##endi0 -8 Software -esign
Dn important philosophy in the application design was
that anything that can be visuali-ed should be visuali-ed.
The software design of B-Vis can be divided into three
main modules6 the statistical learning O inference, the
application data model, and the graphics presentation.
Eearning and inference is implemented in BK a deriva#
tive of F;aml on the %icrosoft .@+T Bramewor0 v.5.
3ariables, event spaces, observations, distributions, and
ayesian networ0s are all represented through ob?ect#oriented types, while algorithms are generally imple#
mented as pure functions. Dn important reali-ation part#
way through the development process was that complex#
ity was significantly reduced in adopting an immutable
design approach for each ob?ect type. The final imple#
mentation of most ob?ects are in fact immutable, such
that most methods that would otherwise mutate an
ob?ect instead return a new instance of a derivative
ob?ect. Dnecdotally, when the probability distribution
type was converted to be immutable, a significant speed#up in structure learning occurred, presumably due to a
more hardware cache#friendly memory access pattern.
Dll learning and inference code is available publicly on
HitIub as pro?ect #-A$!18".
The application data model, written in ;K, was an impor#
tant separate layer in the development of B-Vis. The data
model maintains a source of truth for the application,
and manages a number of bac0ground threads for
simultaneous user#input and bac0ground processing. Dt
any given moment up to four threads are active, for
learning, inference, layout, and user#input O rendering.
>n a model#view#controller paradigm, our data model
acts as both a model and controller, translating user
actions into messages for the learning, inference, and
layout threads. This approach was advantageous in that
it separated user#interface and rendering concerns from
threading and computational concerns.
Binally for user#input and presentation we used the vec#
tor drawing features of %icrosoft $B v.5. $e imple#
mented a number of reusable visual controls, for individ#
ual responsibilities such as rendering pie charts, ring
charts, for rendering vertices and edges, and for routing
raw user#input events to the application data model.
"cknowledgments
>(d li0e to ac0nowledge the ;&+ department at
8/13/2019 Visualizing Inference in Large Bayesian Networks
14/14
:iego for their thoughtful curriculum design and faculty.
>(d li0e to than0 rof. ;harles +l0an for his instruction
and guidance, both in the sub?ects of machine learning
and in data graphics.
$eferences
!1" %. Ees0, Iow %uch >nformation >s There >n the$orldM, 1997http6LLcourses.cs.washington.eduLcoursesLcse59sL'auLles0.pdf
!2" Twitter, >nc.. nformation. Hraphics ress EE;. 21
!5" Q. earl, ayesian @etwor0s, 2
!" . C. &undarara?anG F. Q. %engshoelG T. &el0er, %ulti#focus and %ulti#window Techniues for >nteractive@etwor0 +xploration, 21'
!7" Q. earl. robabilistic reasoning in intelligent systems6networ0s of plausible inference. %organ Caufmann. 1988
!8" &. +. &chaeffer. Hraph clustering. ;omputer &cience4eview 1#1. 27
!9" Q.#:. Aapata#4iveraG +. @eufeldG Q. +. Hreer.
3isuali-ation of ayesian elief @etwor0s. roceedings of>+++ 3isuali-ationR99. 1999
!1" ;.#I. ;hiangG . &haughnessyG H. EivingstonG H.Hrinstein. 3isuali-ing Hraphical robabilistic %odels.Technical 4eport 25#17, ntegration of Dnalyticsand 3isuali-ation. 211
!1'" E. $illiamsG 4. &t. Dmant. D 3isuali-ation Techniuefor ayesian %odeling. roc. of >. 3ol. .. 2
!1" @etica. http6LLwww.norsys.comLnetica.html.Dccessed 21'
!15" ayesiaEab. http6LLwww.bayesia.comL. Dccessed21'
!1" :. Coller, @. Briedman. robabilistic Hraphical %odel6rinciples and Techniues. %assachusetts >nstitute ofTechnology. 29
!17" &. Cullbac0G 4. D. Eeibler. Fn >nformation and&ufficiency. The Dnnals of %athematical &tatistics 22.1 .1951
!18" B#D>. https6LLgithub.comLduc0maestroLB#D>.Dccessed 21'
!19" HraphK. http6LLgraphsharp.codeplex.comL. Dccessed21'
!2" C. &ugiyama. Hraph :rawing and Dpplications for&oftware and Cnowledge +ngineers. $orld &cientific ub;o >nc. 22
!21" D. Crause, ;. +. Huestrin. @ear#optimal @onmyopic3alue of >nformation in Hraphical %odels. Twenty#Birst;onference on ntelligence. 212
!22" :. &hahafG D. ;hechet0aG ;. Huestrin. Eearning Thin
Qunction Trees via Hraph ;uts. >nternational ;onferenceon Drtificial >ntelligence and &tatistics. 29
!2'" ;. %ee0, . Thiesson, :. Iec0erman. The Eearning#;urve &ling %ethod Dpplied to %odel#ased;lustering. The Qournal of %achine Eearning 4esearch.22
!2" C. ache, %. Eichman. %achine Eearning4epository.