Introduction to Q&A systems Yahoo! Answer and Naver KiN KSE 801 Uichin Lee

Preview:

Citation preview

Introduction to Q&A systems Yahoo! Answer and Naver KiN

KSE 801Uichin Lee

Online Q&A Systems

Baidu Knows

Naver Knowledge iN Naver Mobile Q&A

Closed in Sept 2011

Naver Knowledge iN

• Largest search engine in Korea - 70% of search (Google: 2%)

• Comprehensive portal – integrated news, blogs, ‘knowledge search’

• “Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge” --- NHN CEO Chae Hwi-young

• Knowledge-In had 60 million questions and answers as of Feb 2007

Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf

Culture of Generosity

• “(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.” -- Eckart Walther (Yahoo Research)

Slide from http://www.nd.edu/~netsci/TALKS/Adamic.pdf

Knowledge Sharing and Yahoo Answers:

Everyone Knows Something

Lada A. Adamic, Jun Zhang, Eytan Bakshy, Mark S. Ackerman

University of MichiganWWW 2008

Original slides from http://140.116.246.92:5000/2008_WebIR_ppt/06.ppt

What is Yahoo Answers (YA)?

• A large and diverse question-answer forum.• YA has 25 top-level and 1002 (continually expanding) lower level

categories.• Ask: thread (title) and content.• Best Answers: those answers are rated by the asker or voted by

YA users.

Yahoo! Answer Dataset

• One month of YA activity

• 8,452,337 answers to 1,178,983 questions– 433,402 unique repliers 495,414 unique askers – 211,372 users both asked and replied

Characterizing YA: Posts/Replies

• Thread length (# replies per post) vs. post length (how verbose the answers are) w/ some categories named

Factual: programming, chemistry, physics: few,

but lengthy replies (less interactive)

Discussion: attracting many replies w/ moderate length

(more interactive)

Characterizing YA: Category Clusters• Classified the most active categories (>1000 posted questions) into

3 categories using k-means clustering on three metrics: thread length, content length, and asker/replier overlap

189 categories(91% of questions)

(GREEN) Discussion: Politics , Sport,

Religion

(BLUE) Advice/Opinion: Fashion, Baby names, Fast Food, Dogs/Cats

(RED) Facts: Biology, Programming, Repairs

Characterizing YA: Network Structure

Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C

Thread 2: Binary file with ASCII data user ARe: File with... user C

A

B

C

Outdegree: 1Indegree: 0

Outdegree: 0Indegree: 2

Outdegree: 1Indegree: 0

CCD

F: P

rob

(Deg

ree>

=k)

CCD

F: P

rob

(Deg

ree>

=k)

• Heavy tailed distribution, yet diverse across different categories

Characterizing YA: Ego Networks• Distinguishing answer person from discussion person in online forum

with an ego network (Welser 2007)• Ego networks consist of a focal node ("ego") and the nodes to whom

ego is directly connected to (these are called "alters") plus the ties among the alters, if any

Programming Marriage Wrestling

Characterizing YA: Motif Analysis

• How often interactions are reciprocal (the asker becomes the replier for another question)

• How often the triads are complete (three users who have all replied to one another)

Characterizing YA: Expertise Depth

• Rate 100 random questions from the Programming into 5 levels of expertise

• Outcome: only one question (1%) requires expertise above level 3

YA is very broad but not very deep

Expertise/Knowledge Across CategoriesPeople who answer questions in one category are likely to answer questions in related categories

16

Expertise/Knowledge Across Categories

Overlap in users who answered in one category (rows) and asked in another (columns)

User Entropy: Category Dispersion

• Goal: Entropy is just such a measure-the more concentrated a person’s answers, the lower the entropy, and the higher the focus.

• For example ,a user who is a dog trainer and we find that all her answers are in the Dog subcategory. Therefore her entropy is 0.

• Another user whose 40 questions are scattered among 17 of the 25 top-level categories and 26 subcategories. He posted no more than 4 answers in any one category and his combined 2-level entropy is 5.75

User Entropy: Category Dispersion

• Calculate entropy at each level and then do a summation

H1= 0.3 log(0.3)0.7 ∗ log(0.7) = 0.61∗

H2= −0.2 log(0.2) − 0.1 log(0.1) − 0.7 log(0.7) = 0.81∗ ∗ ∗

HT = H1 + H2 = 1.42

Level 1

Level 2

Category Dispersion vs. Best Answers

Predicting Best Answers

Questions in, Knowledge iN?: A Study of Naver's Question Answering

Community

Kevin K. Nam, Mark S. Ackerman, Lada A. Adamic

University of MichiganCHI 2009

Interactions in Naver KiN

Research Methods

• Naver dataset (via crawling): – Expertise score– Reward vs. answers– Patterns of participation (intensity, active periods)

• Focused interviews (26 KiN users):– Motivation for participation– Allocation of expertise

Patterns of Participation• Those who ask don’t answer • Top answers’(called gurus) z-score

Motivation for Participation• Altruism and helping others

– “Since I was a doctor, I was browsing the medical directories [in KiN]. I found a lot of wrong answers and information, and was afraid they would cause problems. So I thought I’d contribute in fixing it hoping that it’d be good for the society. [Sangmin]”

– I try to answer so that regular people can share knowledge, rather than technical knowledge. ...Someone needs it, and I have the ability to do it, and it’ll be a service to society. [Mirae]

• Business motives– “I’ve been working as an insurance agent for 9 years. I started answering in

Knowledge-iN as part of my business activity. In the evening, I answered questions to solicit potential clients.... So when I’d leave an answer, I’d say I would meet with you face-to-face to talk about more details and give you advice. [Taein]”

– Two interviewees stated that they had originally started on Naver to gain clients, but they found it to be less valuable than they had hoped. Instead, they stayed as a hobby and for altruistic reasons.

Motivation for Participation

• Learning– “My first intention [in answering] was to organize and

review my knowledge and practice it by explaining it to others. [Taein]”

– “Answering questions helps me study. I can learn from answering [in Translation]. I get to review what I used to know such as vocabularies and idioms. [Minhyuk]”

• Hobby and personal competence– “Yes [I answer everyday]. I am addicted (laughs).

[Nami]”

Motivation for Participation• Points

– “I don’t care about the points. [but] It’s fun to see points accumulate and my character level up [increase to the next level]. [Jeyeon]”

– “Usually questions w/ points do not seem frivolous. I feel like answering questions with points, not because of the points, but because those questions are more detailed and seek realistic help.”

Higher points elicit more answersPoint bounty for best answers

Law Category

Points postedPoints posted

Allocation of Expertise

• Knowledge level and quality of Naver KiN: – Useful for getting information on commonsense knowledge, current

events, basic domain knowledge, advice and recommendations from people, and diverse opinions

– But looking for Internet cafes for more detailed/expertise information

• Why?– Just to cover as many questions as possible (and earning points): time

pressure– Minimizing their efforts on answering; still others are willing to answer

questions “slightly” beyond their expertise– Other factors: lack of detailed information in the question, lack of

sense of community

Allocation of Expertise

• Intermittent participation

Weekly activity levels of a user

Weekly contributions averaged over users who posted > 100 answers and became active more than a year prior to the crawl.

in-active periods due to family obligation, loss of internet access, etc.

Summary

• YA/KiN: – Participation patterns:

• Facts vs. discussion, heavy tailed in/out-degree dist. • Intermittent participation (due to personal reasons)

– Less expertise level of answers (due to lack of motivation, lack of sense of community, etc.)

– Answers mostly focus on a few categories – YA: best answers tend to have lengthy answers; KiN: best

answers are generally located at the last answer position (or second to the last)

– Motivation (KiN): altruism, business, learning, hobby and personal competence, points, etc.

Recommended