Upload
km-chicago
View
661
Download
5
Tags:
Embed Size (px)
DESCRIPTION
KM Chicago (December, 2005)
Citation preview
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Using New Technologiesto Make Sense of Content Chaos:
Text mining and visualization
Glenn FannickProduct Development Manager
12 December 2005
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
No longer “information overload” …
… we’re awash in “content chaos”.
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How difficult is it to find…?
• thought leaders in an industry
•newly hired CEOs who’ve commented on wifi
•which of your products are written about most often
•most mentioned people near Oracle
•most prolific journalists in an industry
•how much of your press coverage is negative
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Three Causes of Chaos
• Blogs Mean Everyone’s a Publisher
• ‘Markets are Conversations’
• More dynamic news cycles
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Everyone’s a Publisher
•Feb: Decided to break with long-
time public support for anti-
discrimination legislation.
•Apr 21: Local press coverage
spurred Microsoft employee
bloggers to speak out.
•May 6: Steve Ballmer reverses
Microsoft’s stance.
Cause #1
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Markets are Conversations
• Savvy consumers are not trusting of
corporate marketing.
• On the Web, people tell each other
their opinions about products and
companies.
• The most reliable information comes
from peers.
• Companies must participate in the
conversation or risk irrelevance.
Cause #2
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
maytag2004 2005 2006 2007 20082003200220012000 2001 20022000199919981997 2003 2004 2005
MayOctober
Cause #2
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Shrinking News Cycle
•Newspapers continue to wane in influence
•Radio long-ago filled the role of the evening newspaper.
•Web now fills the role of the morning newspaper.
•Pushing newspapers into the analysis role formerly filled
by the newsweeklies.
•News is reported 24 / 7
•Web editions
•Citizen journalists
Cause #3
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Managing the Chaos
• People need answers, not documents
• Trends must be discovered early
• Going beyond search
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
People need answers, not documents
•Articles 1-100 of about 2,343,000
•Spend more time analyzing, less time looking
•We must continue to push technology toward a point
where it can provide us facts and answers, not
headlines and links.
ActDecideAnalyzeSearch/GatherIdentify
ActDecideAnalyzeFind/DiscoverIdentify
No
wG
oal
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Trends must be discovered early
• Identify the waves before
they break on shore.
Principle #3
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Hurricane RitaGoldman SachsFlorida KeysJohn RobertsOil Prices
Using technology to power serendipity
Facts gleaned from across an entire day’s news can visually summarize an industry.
Extracted entities, phrases and events can direct users to the top newsmakers of the day.
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How To Get There
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How To Get There: Text Mining
Phase 1 | Classification / Taxonomy
•Metadata tags what an article is about
Phase 2 | Entity Extraction
•Extracting the billions of facts and entities stored in
millions of documents
Phase 3 | Ontological Search
• searching for concepts
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
text mining – n., a process of extracting information from unstructured text, drawing on practices from information retrieval, data mining, machine learning, computational linguistics and statistics.
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Company Codes
Ind
ust
ries
Regio
ns
Su
bje
cts
FII
Technology Editorial Experts
Unstructured Text
1. Document Classification
Meta
data
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Unstructured Text
People
Products
companies
events
authors
Meta
data
Docu
men
t le
vel
Sente
nce
le
vel
2. Entity Extraction
Meta
data
Technology Editorial Experts
Company Codes
Ind
ust
ries
Regio
ns
Su
bje
cts
FII
En
titi
es
Com
pan
ies
Peop
le
Bra
nd
s
Rela
tion
sh
ips
Even
ts
Au
thors
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Article receives company code for:
T-Mobile USA
But there are other companies involved
And captures news subjects and industry.
And people and authors
And brands and products
Extracting More Value from Documents
And quotations
And regions
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Today’s Search
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Articles containing executive appointments
List of people and companies found in relationship to executive appointments
Ontological Search
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Dec
Jan
Feb
Mar
Ap
r
May
Jun
Jul
Au
g
Sep Oct
David Sifry [45]
Robert Scoble [30]
Sergey Brin [23]
John Battelle [19]
Mena Trott [1]
People
Re-Engineering Search Results
Concept
Screen
Related companies and subjects provide: filtering, navigation and discovery.
Previous dates can be navigated.
Publications can act as filters.
People and phrases can be discovered.
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Factiva Insight: Reputation Intelligence
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Questions ?
Glenn FannickProduct Development Manager
fannick.blogspot.com