19
COER FIRE 2011 Proposal Dec 2, 2011 SMS Based FAQ Retrieval Task at FIRE 2011 Danish Contractor, IBM Research India Ankush Mittal, College of Engineering Roorkee Deepak P, IBM Research India L Venkata Subramaniam, IBM Research India 1

SMS Based FAQ Retrieval Task at FIRE 2011

  • Upload
    herne

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

SMS Based FAQ Retrieval Task at FIRE 2011. Danish Contractor, IBM Research India Ankush Mittal, College of Engineering Roorkee Deepak P, IBM Research India L Venkata Subramaniam, IBM Research India. Agenda. Motivation and Overview Dataset Participants Evaluation and Final Scores. - PowerPoint PPT Presentation

Citation preview

Page 1: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Dec 2, 2011

SMS Based FAQ Retrieval Task at FIRE 2011

Danish Contractor, IBM Research IndiaAnkush Mittal, College of Engineering RoorkeeDeepak P, IBM Research IndiaL Venkata Subramaniam, IBM Research India

1

Page 2: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal2

Agenda

• Motivation and Overview

• Dataset

• Participants

• Evaluation and Final Scores

Page 3: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal3

India’s Education Pyramid and Information Access Patterns

Internet

Users

70 million

Mobile Phone

Users

800 million

The mobile phone is the preferred information device for Indians

Page 4: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal4

SMS Based FAQ Retrieval Task

The goal is to find the Question Q* in the FAQ database that best matches the SMS S

SMS Question

FAQ

Database

SMS Answer

Which insurance policies are available for cancer

patients

LIC has some insurance policies for cancer

patients

What are the rates for roaming within India

Average Roaming rates on prepaid connections

are 60 Paise per minute

wht r d policis avlbl 4 cancar pasaints

LIC has some insurance policies for cancer patients

Page 5: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal5

FAQ Retrieval Task

SMS

L1

FAQ

L1

SMS

L1

FAQ

L2

SMS

L1/L2/L3

FAQ

L1/L2/L3

Task 1:

Same Language Retrieval

Task 2:

Cross Language Retrieval

Task 3:

Multi Lingual Retrieval

Retrieve the best FAQ for a given SMS query

English, Hindi,

Malayalam

English SMS,

Hindi FAQ

English/

Hindi/Malayala

m

SMS/FAQ

Page 6: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal6

Details of Dataset• FAQs

– Collected from online resources, both govt. and private sector

– Three languages: English, Hindi, Malayalam

– FAQ Categories• Health

• Telecom

• Insurance

• Railway booking

• …………

• SMSes

– Collected from mobile savvy college students, online sources and by manually perturbing questions to include common forms of noise-induced variations

– Three languages: English, Hindi, Malayalam

– Both in domain and out of domain• SMS could match a FAQ in the same language, in another language or not at all

Page 7: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal7

Dataset

FAQ Language No. of SMS Queries (In-domain, Out-of-domain)

Monolingual Task

Cross-lingual task

Multilingual Task

7251 English (701,370) (291,181) (290,170)

(728,2677) (37,3368) (724,2681)

1994 Hindi (181,49) (183,47)

(200,124) (200,124)

681 Malayalam (120,20) (60,20)

(50,0) (50,0)

Training Dataset Release (May 2011 and July 2011)

FAQ Dataset: FAQs in three languages

Training SMS: SMSes in three languages

Test Data Release (August 2011)

Test SMS: SMSes in three languages

Submissions by teams (Sept 2011)

Top 5 FAQs for each SMS Test

Train

Page 8: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal8

Participating Teams1. Univ of Iowa (Sanmitra Bhattacharya, Hung Tran and Padmini Srinivasan)

2. BUAP Mexico (Darnes Vilariño Ayala, David Pinto, Saúl León Silverio, Esteban Castillo and Mireya Tovar Vidal)

3. DCE Delhi (Arpit Gupta)

4. IIIT Hyderabad (Aditya Mogadala, Bhupal Reddy and Vasudeva Varma)

5. DAIICT Gandhinagar (Khushboo Singhal, Smita Kumari and Gaurav Arora)

6. DTU Delhi (Anwar Shaikh, Rajiv Ratn Shah, Mukul Jain, Mukul Rawat and Manoj Kumar)

7. Jadhavpur Univ and IPN Mexico (Partha Pakray, Soujanya Poria, Sivaji Bandyopadhyay and Alexander Gelbukh)

8. DCU Dublin (Deirdre Hogan, Paul Ferguson, Hongyi Wang, Johannes Leveling and Cathal Gurrin)

9. MSRIT Bangalore (Vinayaka Dj)

10.TCS Mumbai (Arijit De)

11.SASTRA Thanjavur (Ashish Raste, Venkata Narasimhan A and Santhosh Bargav)

12.RVCE Bangalore (Nishit Shivhre)

13.IIIT Delhi (Tanushree Mishra)

Page 9: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal9

Evaluation

• Participants to submit the top 5 FAQs for each SMS

• Accuracy and MRR based evaluation

Page 10: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Team-Task Matrix

10

Team (# of runs submitted)

English Mono

Hindi Mono

Mal Mono

Cross English Multi

Hindi Multi

Mal Multi

Iowa (19) ✔ ✔ ✔ ✔ ✔ ✔ ✔BUAP (11) ✔ ✔ ✔ ✔ ✔ ✔ ✔DCE (5) ✔ ✔ ✔ ✔ ✔IIITH (12) ✔ ✔ ✔ ✔DAIICT (6) ✔ ✔ ✔Jadhavpur-IPN (4) ✔ ✔ ✔DTU (4) ✔ ✔DCU (3) ✔MSRIT (3) ✔TCS (2) ✔SASTRA (1) ✔RVCE (1) ✔IIITD (1) ✔

13 Teams 72 Runs

9 sub-tasks

✔ score above median

Page 11: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

DCU RVCE DCE DTU IIITH Iowa BUAP DAIICT TCS SASTRA Jadhavpur MSRIT IIITD0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Monolingual Task: English SMS – English FAQ

11

SMS: 728 indomain, 2677 outdomain

FAQs: 7251

High Score: 0.83

Median: 0.14

(506,19)(415,0)

(12,58)

(391,75)

(0,0)

(0,225)

(0,0)

(473,118)

(553,871)

(432,1512)

(396,1940)

(508,2307)

(0,29)

Page 12: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Monolingual Task: Hindi SMS – Hindi FAQ

12

DTU DCE DAIICT IIITH Iowa BUAP Jadhavpur0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SMS: 200 indomain, 124 outdomain

FAQs: 1994

High Score: 0.62

Median: 0.53

(0,119)

(153,0)(165,0)

(171,2)(186,0)(111,80)

(198,3)

Page 13: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Monolingual Task: Malayalam SMS – Malayalam FAQ

13

DAIICT IIITH Iowa BUAP0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SMS: 50 indomain, 0 outdomain

FAQs: 681

High Score: 0.94

Median: 0.90

(39,2)

(44,0)

(46,0)(47,0)

Page 14: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

DCE Iowa BUAP IIITH Jadhavpur0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Crosslingual Task: English SMS – Hindi FAQ

14

SMS: 37 indomain, 3368 outdomain

FAQs: 1994

High Score: 0.65

Median: 0.0499

(4,159)(0,170)(5,182)

(2,2206)

(2,40)

Page 15: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Multilingual: English SMS – English/Hindi/Malayalam FAQ

15

DCE Iowa BUAP0

0.1

0.2

0.3

0.4

0.5

0.6

SMS: 724 indomain, 2681 outdomain

FAQs: 9926

High Score: 0.52

Median: 0.15

(356,25)(504,17)

(424,1336)

Page 16: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Multilingual: Hindi SMS – English/Hindi/Malayalam FAQ

16

DCE Iowa BUAP0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SMS: 200 indomain, 124 outdomain

FAQs: 9926

High Score: 0.57

Median: 0.51

(113,0)

(165,0)

(103,83)

Page 17: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal

Multilingual: Malayalam SMS – English/Hindi/Malayalam FAQ

17

Iowa BUAP0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SMS: 50 indomain, 0 outdomain

FAQs: 9926

High Score: 0.88

(32,0)

(44,0)

Page 18: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal18

Concluding Remarks• The mobile phone is the preferred Information Device for Indians

– SMS is the preferred mode

• The FAQ Retrieval task encourages research in building systems that enable accessing of information from FAQ databases using SMS queries

– The results are encouraging

Page 19: SMS Based FAQ Retrieval Task at FIRE 2011

COER

FIRE 2011 Proposal1919

Thank You!