Upload
elena-simperl
View
343
Download
5
Embed Size (px)
Citation preview
QUALITY AND COLLABORATION
IN WIKIDATA
Elena Simperl and
Alessandro Piscopo
University of Southampton, UK
@esimperl
OVERVIEW
Wikidata is a critical AI asset in many applications
Recent project of Wikimedia (2012), edited collaboratively
Our research assesses the quality of Wikidata and the link between community processes and quality
WHAT IS WIKIDATA
BASIC FACTS
Collaborative knowledge graph
100k registered users, 35M items
Open licence
RDF exports, connected to Linked Open Data Cloud
THE KNOWLEDGE GRAPHSTATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q84
London
Q334155
Sadiq Khan
P6
head of government
THE KNOWLEDGE GRAPHITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259Ada Lovelace
Q84London
Q334155Sadiq Khan
P6
head of government
Q727Amsterdam
Q515city
Q6581097male
Q59360Labour party
Q145United Kingdom
THE KNOWLEDGE GRAPHADDING CONTEXT TO STATEMENTS
Statements may include context Qualifiers (optional)
References (required)
Two types of references Internal, linking to another item
External, linking to webpage
7
Q84London
Q334155Sadiq Khan
P6head
of government
9 May 2016
https://www.london.gov.uk/...
THE KNOWLEDGE GRAPHCO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community for routine tasks
OUR WORK
Influence of community make-up on outcomes
Effects of editing practice on outcomes
Data quality, as a function of its provenance
THE RIGHT MIX OF USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017) What
Makes a Good Collaborative Knowledge Graph:
Group Composition and Quality in Wikidata.
International Conference on Social Informatics, 305-
322, Springer.
BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
Diversity can multiple effects
Moderate tenure diversity increases outcome quality
Interest diversity leads to increased group productivity
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivity and member withdrawal in online volunteer groups. In: Proceedings of the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
OUR STUDY
Analysed the edit history of itemsUsed corpus of 5000 items, whose quality has been manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality
RESEARCH HYPOTHESES
Activity Outcome
H1 Bots edits Item quality
H2 Bot-human interaction Item quality
H3 Anonymous edits Item quality
H4 Tenure diversity Item quality
H5 Interest diversity Item quality
DATA AND METHODS
Ordinal regression analysis, four models were trained
Dependent variable: 5000 labelled Wikidata items
Independent variables
Proportion of bot edits
Bot human edit proportion
Proportion of anonymous edits
Tenure diversity: Coefficient of variation
Interest diversity: User editing matrix
Control variables: group size, item age
RESULTSALL HYPOTHESES SUPPORTED
H1
H2
H3 H4
H5
LESSONS LEARNED
The more is not always the merrier
01Bot edits are key for quality, but bots and humans are better
02Diversity matters
03
IMPLICATIONS
Encourage registration
01Identify further areas for bot editing
02Design effective human-bot workflows
03Suggest items to edit based on tenure and interests
04
LIMITATIONS AND FUTURE WORK
▪ Measures of quality over time required
▪ Sample vs Wikidata (most items C or lower)
▪ Other group features (e.g., coordination) not
considered
▪ No distinction between editing activities (e.g.,
schema vs instances, topics etc.)
▪ Different metrics of interest (topics, type of
activity)
18
THE DATA IS AS GOOD AS ITS REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Information in a Collaborative
Knowledge Graph: an Evaluation of Wikidata External
References. International Semantic Web Conference,
542-558, Springer.
19
PROVENANCE IN WIKIDATA
Statements may include context Qualifiers (optional)
References (required)
Two types of references Internal, linking to another item
External, linking to webpage
Q84London
Q334155Sadiq Khan
P6head
of government
9 May 2016
https://www.london.gov.uk/...
THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Data provenance increases trust in Wikidata
Lack of provenance hinders data reuse
Quality of references is yet unknown
Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
OUR STUDY
Approach to evaluate quality of external references in Wikidata
Quality is defined by the Wikidata verifiability policy Relevant: support the statement they are attached to
Authoritative: trustworthy, up-to-date, and free of bias for supporting a particular statement
Large-scale (the whole of Wikidata)
Bot vs. human-contributed references
RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references authoritative?
▪I.e., do they match the author and publisher types from the Wikidata policy?
RQ3 Can we automatically detect non-relevant and non-authoritative references?
METHODSTWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
▪Evaluate relevance & authoritativeness of a reference sample
▪Create training set for machine learning model
2. Machine learning
▪Large-scale reference quality prediction
RQ1 RQ2
RQ3
STAGE 1: MICROTASK CROWDSOURCING
▪3 tasks on Crowdflower
▪5 workers/task, majority voting
▪Test questions to select workers
25
Feature Microtask Description
Relevance T1 Does the reference support the statement?
Authoritativeness
T2 Choose author type from list
T3.A Choose publisher type from list
T3.B Verify publisher type, then choose sub-type from list
RQ1
RQ2
STAGE 2: MACHINE LEARNING
Compared three algorithms Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., 2012 & Potthast et al. 2008]
Baseline: item labels matching (relevance); deprecated domains list (authoritativeness)
RQ3
Features
URL reference uses Subject parent class
Source HTTP code Property parent class
Statement item vector Object parent class
Statement object vector Author type
Author activity Author activity on references
DATA
1.6M external references (6% of total) 1.4M from two sources (protein KBs)
83,215 English-language references Sample of 2586 (99% conf., 2.5% m. of error)
885 assessed automatically, e.g., links not working or csv files
RESULTS: CROWDSOURCINGCROWDSOURCING WORKS
▪Trusted workers: >80% accuracy
▪95% of responses from T3.A confirmed in T3.B
Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k
T1 1701 references 457 218 75% 0.335
T2 1178 links 749 322 75% 0.534
T3.A 335 web domains 322 60 66% 0.435
T3.B 335 web domains 239 116 68% 0.391
RESULTS: CROWDSOURCINGMAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from 345 domains
Broken URLs deemed not relevant and not authoritative
RQ1
RQ2
RESULTS: CROWDSOURCINGHUMANS ARE BETTER AT EDITING REFERENCES
RQ1
RQ2
RESULTS: CROWDSOURCINGDATA FROM GOVT. AND ACADEMIA
Most common author type (T2)
Organisation (78%)
Most common publisher types (T3)
Governmental agencies (37%)
Academic organisations (24%)
RQ2
RESULTS: MACHINE LEARNINGRANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random Forest 0.92 0.89
SVM 0.91 0.87
Authoritativeness
Baseline 0.53 0.16
Naïve Bayes 0.86 0.78
Random Forest 0.89 0.83
SVM 0.89 0.79
RQ3
LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links, continuous control required
Lack of diversity in bot-added sources
Humans and bots are good at different things
LIMITATIONS AND FUTURE WORK
Studies with non-English sources
New approach for internal references
Deployment in Wikidata, including changes inediting behaviour
THE COST OF FREEDOM: ON THE ROLE OF PROPERTY CONSTRAINTS IN WIKIDATA
35
BACKGROUND
Wikidata is built by the community, from scratch
Editors are free to carry out any kind of edit
There is tension between editing freedom and quality of the modelling
Property constraints have been introduced at a later stage
Currently 18 constraints, but they are not enforced
36Hall, A., McRoberts, S., Thebault-Spieker, J., Lin, Y., Sen, S., Hecht, B., & Terveen, L. (2017, May). Freedom versus standardization: structured data generation in a peer
production community. In Proceedings of the 2017 CHI Conference on human fators in computing sytems (pp. 6352-6362). ACM.
OUR STUDY
Effects of property constraints onContent quality, i.e., increasing user awareness of property use
Diversity of expression
Editor behaviour, by increasing conflict level
▪Several claims can be expressed for a statement, thanks to qualifiers and references
38
Q84London
Q334155Sadiq Khan
P6
9 May 2016
https://www.london.gov.u
k/…
The cost of freedom: Claims
Q180589Boris Johnson
4 May 2008
https://www.london.gov.u
k/…
RESEARCH HYPOTHESES
Activity Outcome
H1 Property constraints Property perspicuity
H2 Property constraints Knowledge diversity
H3 Property constraints Level of conflict
METRICS
▪ Property perspicuity: V = Nviolations/Nclaims
▪ Knowledge diversity: KDscore = Nclaims/Nstatements
▪ Controversy metric:
▪ Conflicting edits
▪ Cscore = Nconfl.edits/Nedits (0> Cscore>>1)
40
METHODS
H1: Linear trend analysis of Cviolations
H2 and H3: Lagged, multiple regression models to predict changes between Tn & Tn–1in KDscore and Cscore
RESULTS
H1 was supported, but limited to some constraints
12 constraints out of 18 showed significant variations along the time frame observed
Constraint with largest variation was type (i.e., property domain)
RESULTS
H2 was rejected, but more property constraints at the
beginning of a time frame lead to decreased knowledge
diversity
RESULTS
H3 was rejected, constraints lead to fewer conflicts
LIMITATIONS
Wikidata still in early state of development
Metrics need further refinement
Changes were made to constraints after our
analysis, which could produce new effects
LESSONS LEARNED
Editors seem to understand meaning of property constraints
Low level of knowledge diversity and conflict overall
Non-enforcement of constraints seems to have only limited effect on community dynamics
Effects of when and how constraints are introduced not explored yet
46
CONCLUSIONS
47
SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuously study their effects on outcomes and community
References are high quality, though biases exist in terms of choice of sources
Wikidata’s approach to knowledge engineering questions existing theoretical and empirical literature