22
Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉劉劉劉劉劉劉 Dept. of Information Management Chung Hua University

Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

Embed Size (px)

Citation preview

Page 1: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

Mining for Interactive Identification of Users’ Information Needs

Rey-Long Liu and Wan-Jung Lin

劉瑞瓏ヽ林宛蓉Dept. of Information Management

Chung Hua University

Page 2: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

2

Outline

Introduction Information Need Identification (INI): What & Why Interactive INI

INEED: Incremental Mining for Interactive INI The profile miner The information need identifier

Experiment Conclusion

Page 3: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

3

Introduction

Information Need Identification (INI) for Information portals Online service guidance Internet search engines People finding

Interactive INI, which needs to consider Precision (P) Precision Effectiveness (PE) Recall (R) Recall Effectiveness (RE)

C

R

C

n2

2

C

n2

1

C

n1

2

C

n1

1

C

n

2

C

n

1

C

n

C

1

2

1

2

C

1

2

1

1

C

12

2

C

12

1

C

1

1

C

1

2

C

1

C

11

2

C

11

1

C

12

2

C

12

1

C

1

2

C

1

1C

1

2

1

2

C

1

2

1

1

C

1

C

1

2

1

2

C

1

2

1

1

C

1

2

1

2

C

1

2

1

1

C

n22

C

1

2

1

2

C

1

2

‧‧‧

Page 4: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

4

Introduction (Cont.)

Main Challenges Each information space has its own content and

structure. Each information space is intrinsically dynamic. Users are often unable (or unwilling) to precisely

express their information needs (INs). Their queries are often quite short.

Users prefer simpler and fewer interactions.

Page 5: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

5

INEED

(3) Information

Information Storage

Interface

Information Provider

(4) Information Required

Profile Miner

IN Identifier

INEED

Category Profile

(0)Content & Taxonomy

(2)Request

(1)Interaction

Page 6: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

6

The Profile Miner

Incremental profile mining

Given: The document d to be added to category c.Effect: Updating the profiles of c and related categories. Procedure:(1) While c is not the root of the text hierarchy, do

(1.1) For each distinct word w in d, do(1.1.1) If w is not a profile term for c, add <w, sw,c> to the profile of c (strength sw,c is unknown);

(1.2) For each pair <w, sw,c> in the profile of c, do(1.2.1) sw,c = P(w|c) (Bc / iP(w|ci));

(1.2.2) For each sibling b of c, update sw,b in the profile of b; (1.3) c father of c.

Page 7: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

7

The Profile Miner (Cont.)

f

Updating the profiles of related categories once a document is added

New document added to f

The s-values of the profile terms are updated ‧‧‧

‧‧‧

‧‧‧

‧‧‧

The s-values of the profile terms are updated

Page 8: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

8

The Profile Miner (Cont.)

經理人員

決策制定、協調整合

業務處

市場規劃、商品推展

管理處

內務行政、績效管理

研發處

整合評估、流程制定

行銷部

行銷文宣、廣告宣傳

客戶部

訂單管理、銷售分析

品保部

品質維護、產品測試

製造部

產品生產、設計製造

行政部

營運管理

資訊部

系統規劃、研發維護

人事課員工聘用、人才培育

會計課

帳目管理、預算編排

出納課

款項收付

電腦整合課

生產資訊、資訊運用

資訊管理課

系統管理、辦公室自動化

An example:

Page 9: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

9

管理處

內務、行政、管理

研發處

研發、生產、流程

品保部

品質、管理、測試

資訊部

資訊、系統、建置

電腦整合課

生產、整合、運用

……

……

……

生產管理之相關資訊 ?

The Profile Miner (Cont.)

經理人員

業務處

市場、規劃、銷售

行銷部

行銷、廣告、宣傳

客戶部

訂單、管理、分析

具有代表性 P(w|c) 高區別能力 P(w|c) * Bc/ iP(w|ci) 強

S=P(w|c) * (Bc / iP(w|ci)管理處

內務、行政、管理

研發處

研發、生產、流程

品保部

品質、管理、測試

資訊部

資訊、系統、建置

電腦整合課

生產、整合、運用

……

……

……

生產管理系統建置與維護

生產品質維護

context

Page 10: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

10

The IN Identifier

Page 11: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

11

The IN Identifier (Cont.)

(1) For each category c, HitScorec 0;(2) For each pair (w, c), where w is a word in the query Q and c is a category,

(2.1) If sw,c > 1 and Support(w, c) minSupport,(2.1.1) ns (sw,c – 1) / (number of siblings of c);(2.1.2) HitScorec HitScorec + ns TF(w, Q);

(3) S The set of all categories; (4) While the target category has not been identified and interaction is still allowed, do

(4.1) Let p1 and p2 be two pedigrees (in S) with the highest average HitScore;(4.2) Let t1 and t2 be the categories with the highest HitScore in p1 and p2;(4.3) Display t1 and t2 (and their basic information) for the user to select;(4.4) If either t1 or t2 is exactly the target, return the space under the target;(4.5) Else if neither t1 nor t2 is of interest, S S – {the categories under t1 and t2};(4.6) Else if both t1 and t2 are of interest, g ClimbUp(common ancestor of t1 and t2), and return the space under g;(4.7) Else

(4.7.1) Let t be the category that is of interest;(4.7.2) If t is a leaf, g ClimbUp(father of t), and return the space under g;(4.7.3) Else S {the categories under t};

(5) Return S;

Page 12: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

12

The IN Identifier (Cont.)

Finding two candidate categories for interaction

(1) (2) (3)

(4) (5)

p1

p2

t1t2

Page 13: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

13

The IN Identifier (Cont.)

Function ClimbUp(f), where f is a category to start climbing (1) If f is the root, return f;(2) While the target category has not been identified and interaction is still allowed,

(2.1) fsibling A sibling of f;(2.2) funcle A sibling of the father of f;(2.3) Display fsibling and funcle (and their basic information) for the user to select;(2.4) If either fsibling or funcle is exactly the target, return the target;(2.5) Else if neither fsibling nor funcle is of interest, return f;(2.6) Else if both fsibling and funcle are of interest,

(2.6.1) f grandfather of f;(2.6.2) If f is the root, return f;

(2.7) Else if fsibling is of interest, return father of f;(2.8) Else return {f, funcle};

(3) Return f;

Page 14: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

14

The IN Identifier (Cont.)

Generalization by climbing the hierarchy

Possible results of generalizationFinding two categories for generalization

fsibling

funclef

2.6

2.4

2.42.5

2.6

2.7

Page 15: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

15

Experiment

Experimental Data Source: Yahoo! (http://www.yahoo.com) Coverage: Computers & Internet, Society and

Culture, and Science Size: 214 categories; depth: 8 Training data: 2216 documents Test data: 168 queries extracted from another set

of site summaries

Page 16: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

16

Experiment (Cont.)

Each system could conduct at most 5 interactions for each query

System Description Note

INEED As described with two settings for minSupport: 0.001 and 0.0005.INEED-0.001

INEED-0.0005

BruteForceAs in most search engines, the whole information space is considered (no INI is conducted).

RandomCNThe system employs top-down navigation. At each level, two categories are randomly selected for the user to confirm.

Repeat 10 times

IdealCNThe system employs top-down navigation. At each level, the target is always in the candidates identified by the system.

NBThe output category is determined by the conditional probabilities of the query terms occurring the categories, with two feature set sizes: 5000 and 8000.

NB-5000

NB-8000

Page 17: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

17

Experiment (Cont.)

Precision BruteForce was poor Interaction is good for

precision INEED improved 14%~2

0% w.r.t NB0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

最大允許互動次數

Pre

cisi

on

INEED-0.001

INEED-0.0005

BruteForce

RandomCN

IdealCN

NB-5000

NB-8000

0.92

0.94

0.96

0.98

1

0 1 2 3 4 5

最大允許互動次數

Rec

all

INEED-0.001

INEED-0.0005

BruteForce 1

RandomCN

IdealCN

NB-5000

NB-8000

Recall INEED was good in both

precision and recall BruteForce and CN

achieved 100% recall INEED achieved 100%

recall using only 2 interactions

Page 18: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

18

Experiment (Cont.)

00.10.20.30.40.50.60.70.8

1 2 3 4 5

最大允許互動次數

Prec

isio

n-ef

fect

iven

ess INEED-0.001

INEED-0.0005

RandomCN

IdealCN

NB-5000

NB-8000

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5

最大允許互動次數

Rec

all-

effe

ctiv

enes

s

INEED-0.001

INEED-0.0005

RandomCN

IdealCN

NB-5000

NB-8000

Precision-effectiveness BruteForce was excluded INEED improved more

(19%~32%) w.r.t. NB interactions by INEED were more effective

Recall-effectiveness INEED performed best INEED improved 2%~2

0% w.r.t. NB

Page 19: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

19

Experiment (Cont.)

0.92

0.94

0.96

0.98

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision

Rec

all

INEED-0.001

INEED-0.0005

BruteForce

RandomCN

IdealCN

NB-5000

NB-8000

0.448

0.64

0.418

0.646

0.469 0.4650.437 0.468

0

0.2

0.4

0.6

0.8

Precision Recall

INEED-0.001

INEED-0.0005

NB-5000

NB-8000

Precision vs.Recall BruteForec and CN

always achieved 100% recall

INEED performed best (its curve lied on the upper right corner)

When no interaction is allowed

INEED improved 38% recall w.r.t. NB

Precision of INEED improved 62% in the first interaction (NB only improved 29%)

Page 20: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

20

Experiment (Cont.)

Test query:Virtual world featuring 3-D ray-traced graphics. Wander around, meet other netizens, and try to solve some puzzles. Features animation and sound clips,

Correct target identified by INEED:Computers and Internet → Multimedia → Virtual Reality → Exhibits

Erroneous category identified by NB:Computers and Internet → Software → Operating Systems → Windows → Windows 95

An example:

Page 21: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

21

Conclusion

Interactive Information Need Identification (interactive INI) as an essential component for

Information portals Online service guidance Information retrieval People finding

Requirements of interactive INI, fulfilled by INEED Exactly identify the information space that may satisfy the user’s

information needs Effectively interact with the user Intelligently reduce the user’s load in query formation and result cognition

Page 22: Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University

22

ThanksThanks