30
Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Embed Size (px)

Citation preview

Page 1: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Introduction to Q&A systems Yahoo! Answer and Naver KiN

KSE 801Uichin Lee

Page 2: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Online Q&A Systems

Baidu Knows

Naver Knowledge iN Naver Mobile Q&A

Closed in Sept 2011

Page 3: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Naver Knowledge iN

• Largest search engine in Korea - 70% of search (Google: 2%)

• Comprehensive portal – integrated news, blogs, ‘knowledge search’

• “Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge” --- NHN CEO Chae Hwi-young

• Knowledge-In had 60 million questions and answers as of Feb 2007

Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf

Page 4: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Culture of Generosity

• “(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.” -- Eckart Walther (Yahoo Research)

Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf

Page 5: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Knowledge Sharing and Yahoo Answers:

Everyone Knows Something

Lada A. Adamic, Jun Zhang, Eytan Bakshy, Mark S. Ackerman

University of MichiganWWW 2008

Original slides from http://140.116.246.92:5000/2008_WebIR_ppt/06.ppt

Page 6: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

What is Yahoo Answers (YA)?

• A large and diverse question-answer forum.• YA has 25 top-level and 1002 (continually expanding) lower level

categories.• Ask: thread (title) and content.• Best Answers: those answers are rated by the asker or voted by

YA users.

Page 7: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee
Page 8: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Yahoo! Answer Dataset

• One month of YA activity

• 8,452,337 answers to 1,178,983 questions– 433,402 unique repliers 495,414 unique askers – 211,372 users both asked and replied

Page 9: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Posts/Replies

• Thread length (# replies per post) vs. post length (how verbose the answers are) w/ some categories named

Factual: programming, chemistry, physics: few,

but lengthy replies (less interactive)

Discussion: attracting many replies w/ moderate length

(more interactive)

Page 10: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Category Clusters• Classified the most active categories (>1000 posted questions) into

3 categories using k-means clustering on three metrics: thread length, content length, and asker/replier overlap

189 categories(91% of questions)

(GREEN) Discussion: Politics , Sport,

Religion

(BLUE) Advice/Opinion: Fashion, Baby names, Fast Food, Dogs/Cats

(RED) Facts: Biology, Programming, Repairs

Page 11: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Network Structure

Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C

Thread 2: Binary file with ASCII data user ARe: File with... user C

A

B

C

Outdegree: 1Indegree: 0

Outdegree: 0Indegree: 2

Outdegree: 1Indegree: 0

CCD

F: P

rob

(Deg

ree>

=k)

CCD

F: P

rob

(Deg

ree>

=k)

• Heavy tailed distribution, yet diverse across different categories

Page 12: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Ego Networks• Distinguishing answer person from discussion person in online forum

with an ego network (Welser 2007)• Ego networks consist of a focal node ("ego") and the nodes to whom

ego is directly connected to (these are called "alters") plus the ties among the alters, if any

Programming Marriage Wrestling

Page 13: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Motif Analysis

• How often interactions are reciprocal (the asker becomes the replier for another question)

• How often the triads are complete (three users who have all replied to one another)

Page 14: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Characterizing YA: Expertise Depth

• Rate 100 random questions from the Programming into 5 levels of expertise

• Outcome: only one question (1%) requires expertise above level 3

YA is very broad but not very deep

Page 15: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Expertise/Knowledge Across CategoriesPeople who answer questions in one category are likely to answer questions in related categories

Page 16: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

16

Expertise/Knowledge Across Categories

Overlap in users who answered in one category (rows) and asked in another (columns)

Page 17: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

User Entropy: Category Dispersion

• Goal: Entropy is just such a measure-the more concentrated a person’s answers, the lower the entropy, and the higher the focus.

• For example ,a user who is a dog trainer and we find that all her answers are in the Dog subcategory. Therefore her entropy is 0.

• Another user whose 40 questions are scattered among 17 of the 25 top-level categories and 26 subcategories. He posted no more than 4 answers in any one category and his combined 2-level entropy is 5.75

Page 18: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

User Entropy: Category Dispersion

• Calculate entropy at each level and then do a summation

H1= 0.3 log(0.3)0.7 ∗ log(0.7) = 0.61∗

H2= −0.2 log(0.2) − 0.1 log(0.1) − 0.7 log(0.7) = 0.81∗ ∗ ∗

HT = H1 + H2 = 1.42

Level 1

Level 2

Page 19: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Category Dispersion vs. Best Answers

Page 20: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Predicting Best Answers

Page 21: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Questions in, Knowledge iN?: A Study of Naver's Question Answering

Community

Kevin K. Nam, Mark S. Ackerman, Lada A. Adamic

University of MichiganCHI 2009

Page 22: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Interactions in Naver KiN

Page 23: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Research Methods

• Naver dataset (via crawling): – Expertise score– Reward vs. answers– Patterns of participation (intensity, active periods)

• Focused interviews (26 KiN users):– Motivation for participation– Allocation of expertise

Page 24: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Patterns of Participation• Those who ask don’t answer • Top answers’(called gurus) z-score

Page 25: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Motivation for Participation• Altruism and helping others

– “Since I was a doctor, I was browsing the medical directories [in KiN]. I found a lot of wrong answers and information, and was afraid they would cause problems. So I thought I’d contribute in fixing it hoping that it’d be good for the society. [Sangmin]”

– I try to answer so that regular people can share knowledge, rather than technical knowledge. ...Someone needs it, and I have the ability to do it, and it’ll be a service to society. [Mirae]

• Business motives– “I’ve been working as an insurance agent for 9 years. I started answering in

Knowledge-iN as part of my business activity. In the evening, I answered questions to solicit potential clients.... So when I’d leave an answer, I’d say I would meet with you face-to-face to talk about more details and give you advice. [Taein]”

– Two interviewees stated that they had originally started on Naver to gain clients, but they found it to be less valuable than they had hoped. Instead, they stayed as a hobby and for altruistic reasons.

Page 26: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Motivation for Participation

• Learning– “My first intention [in answering] was to organize and

review my knowledge and practice it by explaining it to others. [Taein]”

– “Answering questions helps me study. I can learn from answering [in Translation]. I get to review what I used to know such as vocabularies and idioms. [Minhyuk]”

• Hobby and personal competence– “Yes [I answer everyday]. I am addicted (laughs).

[Nami]”

Page 27: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Motivation for Participation• Points

– “I don’t care about the points. [but] It’s fun to see points accumulate and my character level up [increase to the next level]. [Jeyeon]”

– “Usually questions w/ points do not seem frivolous. I feel like answering questions with points, not because of the points, but because those questions are more detailed and seek realistic help.”

Higher points elicit more answersPoint bounty for best answers

Law Category

Points postedPoints posted

Page 28: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Allocation of Expertise

• Knowledge level and quality of Naver KiN: – Useful for getting information on commonsense knowledge, current

events, basic domain knowledge, advice and recommendations from people, and diverse opinions

– But looking for Internet cafes for more detailed/expertise information

• Why?– Just to cover as many questions as possible (and earning points): time

pressure– Minimizing their efforts on answering; still others are willing to answer

questions “slightly” beyond their expertise– Other factors: lack of detailed information in the question, lack of

sense of community

Page 29: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Allocation of Expertise

• Intermittent participation

Weekly activity levels of a user

Weekly contributions averaged over users who posted > 100 answers and became active more than a year prior to the crawl.

in-active periods due to family obligation, loss of internet access, etc.

Page 30: Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Summary

• YA/KiN: – Participation patterns:

• Facts vs. discussion, heavy tailed in/out-degree dist. • Intermittent participation (due to personal reasons)

– Less expertise level of answers (due to lack of motivation, lack of sense of community, etc.)

– Answers mostly focus on a few categories – YA: best answers tend to have lengthy answers; KiN: best

answers are generally located at the last answer position (or second to the last)

– Motivation (KiN): altruism, business, learning, hobby and personal competence, points, etc.