Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Wikipedia Article Curation Understanding Quality Recommending Tasks
Morten Warncke-Wang en:User:Nettrom / nettrom@twitter
2014-08-20
1
Recommending Tasks
2
3
Step 1: Interest profiling
4
5
Step 2: Find similar articles
6
7
Step 3: Filtering
8
Want: Tasks
9
10
11
Photo: "Gold Pan" by Nate Cull – CC BY
Step 4: Presentation
12
13
MOAR INFORMATION!
14
What information?
» Idea: Show viewership & quality » Contributors work on popular low
quality things first? » Needs data: • Article viewership • Article quality
15
Viewership
» Not readily available to contributors » Readily available to us: • Wikimedia Foundation data dumps • stats.grok.se
16
Quality
» More easily available • Wikipedia assessments on the talk page • Contributor’s own judgement
» Assessment might be lagging or absent » Contributor judgement requires
experience
17
Problem: Up-to-date Quality
18
Twist: Actionable Quality
19
Solution: Machine Learning?
20
ML Issue: Feature Selection
» Typical features are difficult to change: • Editor diversity • Meta-features
21
Photo: Oregon Dept of Transportation – CC BY
Our Approach: Actionable features
» Use features editors can act upon » Originally 5 main features: • Amount of article content • Number of citations • Number of images • Number of wikilinks • Number of article sections
22
Does It Work?
23
Seven classes, difficult problem
» Seven assessment classes on en-WP: • Featured Article • A-class • Good Article • B-class • C-class • Start-class • Stub-class
» Unclear boundaries between them
24
Yardstick: Random guesses
25
Random 14.3%
Actionable Model Performance
26
Random
Actionable model
14.3%
42.5%
Often off by one class
27
Random
Actionable model
Actionable model off-by-one
14.3%
42.5%
76.9%
Situation Report
» Few features, many information » Missing information quality aspects
28
Show Me The Information!
29
30
MOAR BETTER!
31
Improved Information
» Viewership: numbers? » Quality: assessments and predictions? » Turn quality features into tasks? » Make it sort?
32
Quality: Low, Assessed class: Unassessed, Predicted class: Stub
33
In summary…
» SuggestBot: recommending tasks to contributors
» Quality: • Actionable features • Predicting article quality • Suggesting improvement tasks
34
Acknowledgements
» NSF grants IIS 08-08692, 09-68483, 08-45351 » WMF and Wikimedia Deutschland » GroupLens Research
35
Questions?
36
Wikipedia: User:Nettrom Email: [email protected] Twitter: @nettrom Web: http://www-users.cs.umn.edu/~morten/ GroupLens: http://www.grouplens.org/