23

exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – [email protected]

Embed Size (px)

Citation preview

Page 1: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 2: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 3: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 4: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

exploring enron – visualizing anlp results (an aanlp project)

Jeffrey Heer – [email protected]

Page 5: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

the problem

ANLP technologies are highly valuable but often less than usable and reliable…

Can be hard to make sense of results… how to go from reams of textual output to new knowledge and insight?

Completely automated processing can be dangerous! Can be wrong or obscure patterns, especially when trusted training data is not available.

Page 6: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

one possible solution

Turn ANLP technologies into tools usable within exploratory data environments Enable users to directly visualize and analyze the

results of processing, always providing access to the underlying source data.

Users can then use these tools to further analysis, while simultaneously making their own decisions of the quality of processing results and possibly even correcting algorithms as they go.

Page 7: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

visualize inferred social network

view message traffic and actual e-mail text

Page 8: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

visualize clustering results – color coded to enron business e-mails

pie charts indicate categorizations of e-mail traffic

Page 9: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

zoom and pan to explore large networks

Page 10: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

filter network for ‘hubs’ of higher connectivity

Page 11: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

filter, zoom, details on demand! view all messages to or from a given person…

Page 12: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

…or view all message traffic between two people.

Page 13: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

networks form various communities … some obvious, some not

can we process the inferred network to automatically identify communties at various granularities?

attempt social network analysis using a hierarchical agglomerative clustering approach, greedily combining groups into communities based on a criterion function that compares within-community edges against total connectivity.

Page 14: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 15: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 16: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

show results of community analysis at various stages of progress… allowing interactive exploration of the agglomerative cluster tree

Page 17: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

analysis scenario

filtered graph to isolate “power players” looked for “california” color labels on edges found John Shelk reporting on congressional

meetings to Tim Belden – all one way e-mails looking at Time Belden revealed ALL one-

way e-mails sent to him, no responses, etc seemed a bit suspicious… where is that info

going?

Page 18: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

Analysis scenario

All one way e-mails to Tim Belden about various legal issues…

Page 19: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

guilty!

Page 20: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

future work a plenty improved colors, filtering, and brushing

category filtering, brushing from e-mails to graph histogram visualization over sliders

visualize network of messages themselves? temporal dimension of data

time-selection range slider animate evolution of the network

search search search tie to additional analyses

automated clustering finer social network analysis duplicate identification, acronym resolution, etc…

Page 21: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu
Page 22: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

please send me any ideas you have to improve this!!!

[email protected]

Page 23: exploring enron – visualizing anlp results (an aanlp project) Jeffrey Heer – jheer@cs.berkeley.edu

I’m Kenneth Lay. And I

approve this message.