23
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved. Using New Technologies to Make Sense of Content Chaos: Text mining and visualization Glenn Fannick Product Development Manager 12 December 2005

Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

Embed Size (px)

DESCRIPTION

KM Chicago (December, 2005)

Citation preview

Page 1: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Using New Technologiesto Make Sense of Content Chaos:

Text mining and visualization

Glenn FannickProduct Development Manager

12 December 2005

Page 2: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

No longer “information overload” …

… we’re awash in “content chaos”.

Page 3: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

How difficult is it to find…?

• thought leaders in an industry

•newly hired CEOs who’ve commented on wifi

•which of your products are written about most often

•most mentioned people near Oracle

•most prolific journalists in an industry

•how much of your press coverage is negative

Page 4: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Three Causes of Chaos

• Blogs Mean Everyone’s a Publisher

• ‘Markets are Conversations’

• More dynamic news cycles

Page 5: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Everyone’s a Publisher

•Feb: Decided to break with long-

time public support for anti-

discrimination legislation.

•Apr 21: Local press coverage

spurred Microsoft employee

bloggers to speak out.

•May 6: Steve Ballmer reverses

Microsoft’s stance.

Cause #1

Page 6: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Markets are Conversations

• Savvy consumers are not trusting of

corporate marketing.

• On the Web, people tell each other

their opinions about products and

companies.

• The most reliable information comes

from peers.

• Companies must participate in the

conversation or risk irrelevance.

Cause #2

Page 7: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

maytag2004 2005 2006 2007 20082003200220012000 2001 20022000199919981997 2003 2004 2005

MayOctober

Cause #2

Page 8: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Shrinking News Cycle

•Newspapers continue to wane in influence

•Radio long-ago filled the role of the evening newspaper.

•Web now fills the role of the morning newspaper.

•Pushing newspapers into the analysis role formerly filled

by the newsweeklies.

•News is reported 24 / 7

•Web editions

•Citizen journalists

Cause #3

Page 9: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Managing the Chaos

• People need answers, not documents

• Trends must be discovered early

• Going beyond search

Page 10: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

People need answers, not documents

•Articles 1-100 of about 2,343,000

•Spend more time analyzing, less time looking

•We must continue to push technology toward a point

where it can provide us facts and answers, not

headlines and links.

ActDecideAnalyzeSearch/GatherIdentify

ActDecideAnalyzeFind/DiscoverIdentify

No

wG

oal

Page 11: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Trends must be discovered early

• Identify the waves before

they break on shore.

Principle #3

Page 12: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Hurricane RitaGoldman SachsFlorida KeysJohn RobertsOil Prices

Using technology to power serendipity

Facts gleaned from across an entire day’s news can visually summarize an industry.

Extracted entities, phrases and events can direct users to the top newsmakers of the day.

Page 13: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

How To Get There

Page 14: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

How To Get There: Text Mining

Phase 1 | Classification / Taxonomy

•Metadata tags what an article is about

Phase 2 | Entity Extraction

•Extracting the billions of facts and entities stored in

millions of documents

Phase 3 | Ontological Search

• searching for concepts

Page 15: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

text mining – n., a process of extracting information from unstructured text, drawing on practices from information retrieval, data mining, machine learning, computational linguistics and statistics.

Page 16: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Company Codes

Ind

ust

ries

Regio

ns

Su

bje

cts

FII

Technology Editorial Experts

Unstructured Text

1. Document Classification

Meta

data

Page 17: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Unstructured Text

People

Products

companies

events

authors

Meta

data

Docu

men

t le

vel

Sente

nce

le

vel

2. Entity Extraction

Meta

data

Technology Editorial Experts

Company Codes

Ind

ust

ries

Regio

ns

Su

bje

cts

FII

En

titi

es

Com

pan

ies

Peop

le

Bra

nd

s

Rela

tion

sh

ips

Even

ts

Au

thors

Page 18: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Article receives company code for:

T-Mobile USA

But there are other companies involved

And captures news subjects and industry.

And people and authors

And brands and products

Extracting More Value from Documents

And quotations

And regions

Page 19: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Today’s Search

Page 20: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Articles containing executive appointments

List of people and companies found in relationship to executive appointments

Ontological Search

Page 21: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Dec

Jan

Feb

Mar

Ap

r

May

Jun

Jul

Au

g

Sep Oct

David Sifry [45]

Robert Scoble [30]

Sergey Brin [23]

John Battelle [19]

Mena Trott [1]

People

Re-Engineering Search Results

Concept

Screen

Related companies and subjects provide: filtering, navigation and discovery.

Previous dates can be navigated.

Publications can act as filters.

People and phrases can be discovered.

Page 22: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Making Sense of Content Chaos

Factiva Insight: Reputation Intelligence

Page 23: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization

© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.

Questions ?

Glenn FannickProduct Development Manager

[email protected]

fannick.blogspot.com