Upload
mojisola-erdt-nee-anjorin
View
70
Download
0
Tags:
Embed Size (px)
Citation preview
© author(s) of these slides including research results from the KOM research network and TU Darmstadt; otherwise it is specified at the respective slide 28-Dez-14
Prof. Dr.-Ing. Ralf Steinmetz KOM - Multimedia Communications Lab
Eval_Rec_Algo_Crowdsourcing__ICALT_2014_MA.pptx
Evaluating Recommender Algorithms for Learning using Crowdsourcing
Mojisola Erdt Christoph Rensing
ICALT 2014, Athen
Source: http://www.digitalvisitor.com/cultural-differences-in-online-behaviour-and-customer-reviews/
KOM – Multimedia Communications Lab 2
Motivation
Learning on-the-job § To solve a particular problem § To learn about a new topic § Mostly web resources
Social Tagging Applications § Help to manage resources § Offer recommendations
TEL Recommender Systems § Recommend relevant, novel and
diverse resources to a specific learning goal or activity
KOM – Multimedia Communications Lab 3
Evaluation Approach Advantages Disadvantages
Offline Experiments (historical or synthetic datasets)
§ Fast § Less effort § Repeatable
§ New, unknown resources cannot be evaluated
§ Dependent on dataset User Experiments § User’s perspective § A lot of effort and time
§ Few users (ca. 40)
Real-life testing § Real-life setting § Needs a substantial amount of users
Crowdsourcing
§ Fast § Less effort § Repeatable § User’s perspective § Sufficient users
§ Unknown users § “Artificial task” § Spamming
Evaluation Methods for TEL Recommender Systems
KOM – Multimedia Communications Lab 4
microworkers § 500,000 crowdworkers worldwide § Flexible forwarding to other
hosting platforms § Since 2009
CrowdFlower § 5 million crowdworkers in 208
countries § Gives access to other
crowdsourcing platforms e.g. Amazon MTurk
§ Since 2007
https://microworkers.com, http://www.crowdflower.com
Crowdsourcing Platforms
KOM – Multimedia Communications Lab 5
§ Motivation § Crowdsourcing Evaluation Concept § Preparation Step § Execution Step
§ Crowdsourcing Evaluation Results § Conclusion & Future Work
Overview
KOM – Multimedia Communications Lab 6
Crowdsourcing Evaluation Concept Preparation Step
Create Questionnaire
Set Goal
Formulate Hypotheses
Create Questions
Add Control
Questions
Select Topic
Create Activity
Hierarchy
Create Seed
Dataset
Prepare Algorithms
Generate Recommendations
Filter Duplicates
DeLFI 2013. M. Migenda, M. Erdt, M. Gutjahr, and C. Rensing
KOM – Multimedia Communications Lab 7
Preparation Step Set Goal
AScore is based on Activity Hierarchies § Extends FolkRank by considering activities, activity hierarchies and the current
activity of the learner
ECTEL 2012. Anjorin et al
Understanding the Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to Climate Change
Investigate causes of
Climate Change
Give an overview on the history of Global Warming
Determine
future prognoses on Climate Change
Understanding
Climate Change
KOM – Multimedia Communications Lab 8
Set Evaluation Goals: § Investigate if AScore
recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank.
§ Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super).
Formulate Hypotheses: 1. Hypothesis: Relevance § Ascore vs. FolkRank § A_Sub vs. A_Super
2. Hypothesis: Novelty § Ascore vs. FolkRank § A_Sub vs. A_Super
3. Hypothesis: Diversity § Ascore vs. FolkRank § A_Sub vs. A_Super
Preparation Step Set Goal and Formulate Hypotheses
KOM – Multimedia Communications Lab 9
Generate a basis graph structure for recommendations § 5 experts research on the topic of climate change for one hour § Using CROKODIL to create an extended folksonomy (users, tags, resources,
activities) § Ca. 70 resources were tagged and attached to 8 activities
Preparation Step Select Topic and Generate Recommendations
Understanding the Carbon Footprint
Calculating the
Carbon Footprint
Investigate the impact of
Climate Change
Analyze potential
Catastrophes due to Climate Change
Investigate causes of
Climate Change
Give an overview on the history of Global Warming
Determine
future prognoses on Climate Change
Understanding
Climate Change
Experiment Spring
Experiment Autumn
KOM – Multimedia Communications Lab 10
Conduct personal research on the topic § Level of knowledge on this topic § Request to find 5 online resources relevant to this topic
10 Questions per Recommendation § 3 questions to each hypothesis (relevance, novelty, diversity) § 1 control question to detect spammers § E.g. Give 4 keywords to summarize the recommended resource
General Questions § Age, gender, level of education and nationality
Preparation Step Create Questionnaire
Experiment Spring
Sub-activity
Super-activity
AScore A_Sub A_Super FolkRank F_Sub F_Super
Experiment Autumn
Sub-activity
Super-activity
AScore A_Sub A_Super FolkRank F_Sub F_Super
KOM – Multimedia Communications Lab 11
https://www.soscisurvey.de
Crowdsourcing Evaluation Concept Execution Step
Release next iteration burst
Crowdsourcing Platform
Results
Filter Spammers
Make Payments
Questionnaire
KOM – Multimedia Communications Lab 12
Execution Step Participants and Treatment Conditions
Experiment Spring
Sub-activity
Super-activity
AScore A_Sub: 45
A_Super:39
FolkRank F_Sub: 39
F_Super: 36
Experiment Autumn
Sub-activity
Super-activity
AScore A_Sub: 80
A_Super: 73
FolkRank F_Sub: 76
F_Super: 85
CrowdFlower (32)
Microworker (35) Volunteers (92)
Spammers (243)
Crowdworkers (314)
Spammers (549)
KOM – Multimedia Communications Lab 13
§ Motivation § Crowdsourcing Evaluation Concept § Crowdsourcing Evaluation Results § AScore and FolkRank § Experiment Spring § Experiment Autumn
§ A_Sub and A_Super § Experiment Spring § Experiment Autumn
§ Conclusion & Future Work
Overview
KOM – Multimedia Communications Lab 14
Crowdsourcing Evaluation Results Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.000003578 < 0.05 0.000001531 < 0.05 0.0001618 < 0.05
KOM – Multimedia Communications Lab 15
Crowdsourcing Evaluation Results Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.000001362 < 0.05 0.0000007654 < 0.05 0.00000000015 < 0.05
KOM – Multimedia Communications Lab 16
Evaluation Goals: § Investigate if AScore
recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank.
§ Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super).
Formulate Hypotheses: 1. Hypothesis: Relevance § Ascore vs. FolkRank § A_Sub vs. A_Super
2. Hypothesis: Novelty § Ascore vs. FolkRank § A_Sub vs. A_Super
3. Hypothesis: Diversity § Ascore vs. FolkRank § A_Sub vs. A_Super
Execution Step Evaluation Results
✔
✔ ✔
✔
KOM – Multimedia Communications Lab 17
Crowdsourcing Evaluation Results Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.0005654 < 0.05 0.01666 < 0.05 0.02176 < 0.05
KOM – Multimedia Communications Lab 18
Crowdsourcing Evaluation Results Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.0005306 < 0.05 0.000001531 < 0.05 0.0000001608 < 0.05
KOM – Multimedia Communications Lab 19
Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mea
n
01
23
45
67
F_SubF_Super
3.95 4.05 3.97 3.91 3.96 3.83
Crowdsourcing Evaluation Results Experiment Spring
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.3023 > 0.05 0.5216 > 0.05 0.2031 > 0.05
KOM – Multimedia Communications Lab 20
Hypothesis 1 Hypothesis 2 Hypothesis 3
Aggregated Mean Values for Hypotheses 1, 2 and 3
Mea
n
01
23
45
67 F_Sub
F_Super
4.04 3.9 4.11 4.09 4.07 4.01
Crowdsourcing Evaluation Results Experiment Autumn
Significance Tests
Hypothesis 1: Relevance 2: Novelty 3: Diversity
p-value 0.01481 < 0.05 0.7064 > 0.05 0.2881 > 0.05
KOM – Multimedia Communications Lab 21
Evaluation Goals: § Investigate if AScore
recommends more relevant, novel and diverse learning resources to a specified topic than FolkRank.
§ Investigate if AScore recommends more relevant, novel and diverse learning resources to sub-activities (A Sub) than to activities higher up in the hierarchy (A Super).
Formulate Hypotheses: 1. Hypothesis: Relevance § Ascore vs. FolkRank § A_Sub vs. A_Super
2. Hypothesis: Novelty § Ascore vs. FolkRank § A_Sub vs. A_Super
3. Hypothesis: Diversity § Ascore vs. FolkRank § A_Sub vs. A_Super
Execution Step Evaluation Results
✔
✔
✔
✔
✔
KOM – Multimedia Communications Lab 22
Crowdsourcing can be successfully applied to evaluate TEL recommender algorithms § Integrate more user-centric evaluations already during the design and
development of TEL recommender algorithms § Select the best fitting evaluation approach
Future Work § Can crowdsourcing be used to evaluate other aspects of a recommender
system? E.g. explanations, presentation… Can more complex TEL evaluation tasks be evaluated with crowdsourcing?
Conclusion and Future Work