41
MASS COLLABORATION AND DATA MINING Raghu Ramakrishnan Founder and CTO, QUIQ Professor, University of Wisconsin-Madison Keynote Talk, KDD 2001, San Francisco

MASS COLLABORATION AND DATA MINING

  • Upload
    osman

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

MASS COLLABORATION AND DATA MINING. Raghu Ramakrishnan Founder and CTO, QUIQ Professor, University of Wisconsin-Madison Keynote Talk, KDD 2001, San Francisco. DATA MINING. Extracting actionable intelligence from large datasets. - PowerPoint PPT Presentation

Citation preview

Page 1: MASS COLLABORATION AND DATA MINING

MASS COLLABORATION AND DATA MINING

Raghu Ramakrishnan

Founder and CTO, QUIQ

Professor, University of Wisconsin-Madison

Keynote Talk, KDD 2001, San Francisco

Page 2: MASS COLLABORATION AND DATA MINING

Page 2

University of Wisconsin-Madison

DATA MINING

• Is it a creative process requiring a unique combination of tools for each application?

• Or is there a set of operations that can be composed using well-understood principles to solve most target problems?

• Or perhaps there is a framework for addressing large classes of problems that allows us to systematically leverage the results of mining.

Extracting actionable intelligence from large datasets

Page 3: MASS COLLABORATION AND DATA MINING

Page 3

University of Wisconsin-Madison

“MINING” APPLICATION CONTEXT

• Scalability is important. – But when is 2x speed-up or scale-up important?

When is 10x unimportant?

• What is the appropriate measure, model?– Recall, precision– MT for search vs. MT for content conversion

Answers to these questions come from the context of the application.

Page 4: MASS COLLABORATION AND DATA MINING

Page 4

University of Wisconsin-Madison

TALK OUTLINE

• A New Approach to Customer Support– Mass Collaboration

• Technical challenges– A framework and infrastructure for P2P

knowledge capture and delivery

• Role of data mining– Confluence of DB, IR, and mining

Page 5: MASS COLLABORATION AND DATA MINING

Page 5

University of Wisconsin-Madison

TYPICAL CUSTOMER SUPPORT

Support Center

Customer

Web Support

KB

Page 6: MASS COLLABORATION AND DATA MINING

Page 6

University of Wisconsin-Madison

TRADITIONAL KNOWLEDGE MANAGMENT

KNOWLEDGEBASE

QUESTION

CONSUMERS

EXPERTS

ANSWER

Knowledge created and structured by trained experts

using a rigorous process.

Page 7: MASS COLLABORATION AND DATA MINING

Page 7

University of Wisconsin-Madison

MASS COLLABORATION

KNOWLEDGEBASE

MASS COLLABORATION-Experts -Partners-Customers -Employees

QUESTION

Answer added to power self

service

SELF SERVICE

ANSWER

People using the web to share

knowledge and help each other find

solutions

Page 8: MASS COLLABORATION AND DATA MINING

Page 8

University of Wisconsin-Madison

65% (3,247)

77% (3,862)

86% (4,328)

6,845

74% answered

Answersprovidedin 12h

Answersprovidedin 24h

40% (2,057)

Answersprovided

in 3h

Answersprovidedin 48h

Questions

• No effort to answer each question

• No added experts

• No monetary incentives for enthusiasts

TIMELY ANSWERS

77% of answers are provided within 24h

Page 9: MASS COLLABORATION AND DATA MINING

Page 9

University of Wisconsin-Madison

MASS CONTRIBUTION

Users who on average provide only 2 answers provide 50% of all answers

7 % (120) 93 % (1,503)

50 % (3,329)

100 %(6,718)

Answers

ContributingUsers

Top users

Contributed by mass of users

Page 10: MASS COLLABORATION AND DATA MINING

Page 10

University of Wisconsin-Madison

POWER OF KNOWLEDGE CREATION

- 85%

- 64%

Support Incidents Agent Cases

5 %

Self-Service *)

CustomerMass Collaboration *)

KnowledgeCreation

SHIELD 1

SHIELD 2

*) Averages from QUIQ implementations

SUPPORT

Page 11: MASS COLLABORATION AND DATA MINING

Page 11

University of Wisconsin-Madison

TYPICAL SERVICE CHAIN

Self Service

Knowledge base

FAQAuto Email

Manual Email

ChatCall

Center2nd Tier Support

50% 40% 10%

$$ $$$$

QUIQ SERVICE CHAIN

Self

Service

Manual Email

Chat Call Center

2nd Tier Support

80% 15% 5%

MassCollaboration

QUIQ QUIQ

$$ $$$$

Page 12: MASS COLLABORATION AND DATA MINING

Page 12

University of Wisconsin-Madison

CASE STUDIES: COMPAQ

“In newsgroups, conversations disappear and you have to ask the same question over and over again. The thing that makes the real difference is the ability for customers to collaborate and have information be persistent. That’s how we found QUIQ. It’s exactly the philosophy we’re looking for.”

“Tech support people can’t keep up with generating content and are not experts on how to effectively utilize the product … Mass Collaboration is the next step in Customer Service.”

– Steve Young, VP of Customer Care, Compaq

Page 13: MASS COLLABORATION AND DATA MINING

Page 13

University of Wisconsin-Madison

“Austin-based National Instruments deployed … a Network to capture the specialized knowledge of its clients and take the burden off its costly support engineers, and is pleased with the results. QUIQ increased customers’ participation, flattened call volume and continues to do the work of 50 support engineers.”

– David Daniels, Jupiter Media Metrix

ASP 2001 “Top Ten Support Site”

Page 14: MASS COLLABORATION AND DATA MINING

Page 14

University of Wisconsin-Madison

MASS COLLABORATION

Communities

+ Knowledge Management

+ Service Workflows

Support Newsgroups

Support Knowledge Base

Call Center

Solutions

Interactions

Few

Exp

ert

s

Man

yExp

ert

s

Mass Collaboration

Internet-scale P2P knowledge sharing

Page 15: MASS COLLABORATION AND DATA MINING

Page 15

University of Wisconsin-Madison

KnowledgebaseKnowledgebase

Customers

Partners

CORPORATE MEMORY CORPORATE MEMORY Untapped Knowledge in Extended Business CommunityUntapped Knowledge in Extended Business Community

Suppliers Employees

Page 16: MASS COLLABORATION AND DATA MINING

Page 16

University of Wisconsin-Madison

Areas of Areas of InterestInterest

Self-Organizing

UserAcquisition

Incentive to Participate

User-to-User

Exchange

Structured User Forum

User-to-Enthusiast

User-to-Expert

Page 17: MASS COLLABORATION AND DATA MINING

Page 17

University of Wisconsin-Madison

GOALS & ISSUES

• Interactions must be structured to encourage creation of “solutions”– Resolve issue; escalate if necessary– Capture knowledge from interactions– Encourage participation

• Sociology– Privacy, security– Credibility, authority, history– Accountability, incentives

Page 18: MASS COLLABORATION AND DATA MINING

Page 18

University of Wisconsin-Madison

REQUIRED CAPABILITIES

• Roles: Credibility, administration– Moderators, experts, editors, enthusiasts

• Groups: Privacy, security, entitlements– Departments, gold customers

• Workflow: QoS, validation, escalation

Page 19: MASS COLLABORATION AND DATA MINING

Page 19

University of Wisconsin-Madison

TECHNICAL CHALLENGES

Page 20: MASS COLLABORATION AND DATA MINING

Page 20

University of Wisconsin-Madison

?SEARCH

ROUTING,NOTIFICATION

SEARCHING “PEOPLE-BASES”

“If it’s not there, find someone who knows”- And get “it” there (knowledge creation)!

Page 21: MASS COLLABORATION AND DATA MINING

Page 21

University of Wisconsin-Madison

Email Support

-20%

Support Incidents Agent Cases

80%

Automated Emails 1)

Call Center

Support Incidents Agent Cases

100%

Web Self-Service

-42%

Support Incidents Agent Cases

68%

Self-Service 2)

QUIQ, the “Best in Class” Support Channel

Mass Collaboration

-85%

-64%

Support Incidents Agent Cases

5%

Self-Service

CustomerMass Collaboration

KnowledgeCreation

1) Source: QUIQ Client Information2) Source: Association of Support Professionals

SUPPORT

Page 22: MASS COLLABORATION AND DATA MINING

Page 22

University of Wisconsin-Madison

SEARCH AND INDEXING

• User types in “How can I configure the IP address on my Presario?”– Need to find most relevant content that is of high

quality and is approved for external viewing, and that this user is entitled to see based on her roles, groups, and service levels.

• User decides to post question because no good answer was found in the KB.– Search controls when experts and other users will

see this new question; need to make this real-time.– Concurrency, recovery issues!

Page 23: MASS COLLABORATION AND DATA MINING

Page 23

University of Wisconsin-Madison

SEARCH AND INDEXING

• Data is organized into tabular channels– Questions, responses, users, …

• Each item has several fields, e.g., a question:– Author id, author status, service level, item

popularity metrics, rating metrics, answer status, approval status, visibility group, update timestamp, notification timestamp, usage signature, category, relevant products, relevant problems, subject, body, responses

Which 5 items should be returned?

Page 24: MASS COLLABORATION AND DATA MINING

Page 24

University of Wisconsin-Madison

Email

Cache Indexer Alerts

Hive Manager

Files, Logs DBMS

Web server Web server

RAID STORAGE

Warehouse

RUNTIME ARCHITECTURE

Real-timeIndexing,Caching,

Alerts

Page 25: MASS COLLABORATION AND DATA MINING

Page 25

University of Wisconsin-Madison

IndexerMiner

Files, Logs DBMS

RAID STORAGE

Warehouse

LEARNING FROM ACTIVITY DATA TO KNOWLEDGE

Small readsLarge R/W

Periodicoffline activity

Page 26: MASS COLLABORATION AND DATA MINING

Page 26

University of Wisconsin-Madison

SEARCH AND INDEXING

• Question text, user attributes, system policies • IR-style ranked output• Search constraints:

– Show matches; subject match twice as important– Show only approved answers to non-editors– Give preference to category Laptop– Give preference to recent solutions– Weight quality of solution

Which 5 items should be returned?

Page 27: MASS COLLABORATION AND DATA MINING

Page 27

University of Wisconsin-Madison

VECTOR SPACE MODEL

• Documents, queries are vectors in term space

• Vector distance from the query is used to rank retrieved documents

edunnormaliz ),(

...,,

...,,

12121

2,22212

1,12111

t

iii

t

t

wwD Q sim

w w w D

w w w Q

i’th term in summation can be seen as the “relevance contribution” of term i

Page 28: MASS COLLABORATION AND DATA MINING

Page 28

University of Wisconsin-Madison

TF-IDF DOCUMENT VECTOR

)/log(* kikik nNtfw

log

Tcontain that in documents ofnumber the

collection in the documents ofnumber total

in T termoffrequency document inverse

document in T termoffrequency

document in term

nNidf

Cn

CN

Cidf

Dtf

DkT

kk

kk

kk

ikik

ik

Page 29: MASS COLLABORATION AND DATA MINING

Page 29

University of Wisconsin-Madison

A HYBRID DB-IR SYSTEM

• Searches are queries with three parts:– Filter

• DB-style yes/no criteria

– Match• TF-IDF relevance based on a combination of fields

– Quality• Relevance “boost” based on a policy

Page 30: MASS COLLABORATION AND DATA MINING

Page 30

University of Wisconsin-Madison

A HYBRID DB-IR SYSTEM

• A query is built up from atomic constraints using Boolean operators.

• Atomic constraint:– [ value op term, constraint-type ]– Terms are drawn from discrete domains and

are of two types: hierarchy and scalar– Constraint-type is exact or approximate

Page 31: MASS COLLABORATION AND DATA MINING

Page 31

University of Wisconsin-Madison

A HYBRID DB-IR SYSTEM

• Applying an atomic constraint to a set of items returns a tagged result set:– The result inherits the constraint-type– Each result item has a (TF-IDF) relevance score; 0 for exact

• Combining two tagged item sets using Boolean operators yields a tagged set:– The result type is exact if both inputs are exact, and

approximate otherwise– Result contains intersection of input item sets if either input

is exact; union otherwise– Each result item is tagged with a combined relevance

Page 32: MASS COLLABORATION AND DATA MINING

Page 32

University of Wisconsin-Madison

A HYBRID DB-IR SYSTEM

• Semantics of Boolean expressions over constraints is associative and commutative

• Evaluating exact constraints and approximate constraints separately (in DB and IR subsystems) is a special case. Additionally:– Uniform handling of relevance contributions of

categories, popularity metrics, recency, etc.

• Absolute and relative relevance modifiers can be introduced for greater flexibility.

Page 33: MASS COLLABORATION AND DATA MINING

Page 33

University of Wisconsin-Madison

CONCURRENCY, RECOVERY, PARALLELISM

• Concurrency– Index is updated in real-time– Automatic partitioning, two-step locking protocol result in

very low overhead– Relies upon post-processing to address some anomalies

• Recovery– Partitioning is again the key– Leverages recovery guarantees of DBMS – Approach also supports efficient refresh of global

statistics

• Parallelism– Hash based partitioning

Page 34: MASS COLLABORATION AND DATA MINING

Page 34

University of Wisconsin-Madison

NOTIFICATION

• Extension of search: Each user can define one or more “standing searches”, and request instant or periodic notification.– Boolean combinations of atomic constraints.

• Major challenges:– Scaling with number of standing searches.

• Requires multiple timestamps, indexing searches.

– Exactly-once delivery property.• Many subtleties center around “notifiability” of updates!

Page 35: MASS COLLABORATION AND DATA MINING

Page 35

University of Wisconsin-Madison

ROLE OF DATA MINING

Page 36: MASS COLLABORATION AND DATA MINING

Page 36

University of Wisconsin-Madison

DATA MINING TASKS

• There is a lot of insight to be gained by analyzing the data.– What will help the user with her problem?– Who does a given user trust?– Characteristic metrics for high-quality content.– Identify helpful content in similar, past queries.– Summarize content.– Who can answer this question?

Page 37: MASS COLLABORATION AND DATA MINING

Page 37

University of Wisconsin-Madison

LEVERAGING DATA MINING

• How do we get at the data?– Relevant information is distributed across

several sources, not just the DBMS.– Aggregated in a warehouse.

• How do we incorporate the insights obtained by mining into the search phase?– Need to constantly update info about every

piece of content (Qs, As, users …)

Page 38: MASS COLLABORATION AND DATA MINING

Page 38

University of Wisconsin-Madison

LEVERAGING DATA MINING

• Three-step approach:– Off-line analysis to gather new insight– Periodic refresh indexes– Use insight (from KB/index) to improve

search using the extended DB/IR query framework

Use mining to create useful metadata

Page 39: MASS COLLABORATION AND DATA MINING

Page 39

University of Wisconsin-Madison

SOME UNIQUE TWISTS

• Identify the kinds of feedback that would be helpful in refining a search.– I.e., Not just specific terms, but the types of

concepts that would be useful discriminators (e.g., a good hierarchy of feedback concepts)

• Metrics of quality– Link-analysis is a good example, but what are

the “links” here?

• Self-tuning searches– The more the knobs, the more the choices– Next step: self-personalizing searches?

Page 40: MASS COLLABORATION AND DATA MINING

Page 40

University of Wisconsin-Madison

CONCLUSIONS

Page 41: MASS COLLABORATION AND DATA MINING

Page 41

University of Wisconsin-Madison

IR SEARCH

DB Q

UERIE

SP2P K

M

CONFLUENCES

?