15
Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore, PhD, CCC-SLP 2014 ASHA Convention Orlando, Florida

Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Embed Size (px)

Citation preview

Page 1: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Bringing the crowdsourcing revolution to research in communication disorders

Tara McAllister Byun, PhD, CCC-SLPSuzanne M. Adlof, PhD

Michelle W. Moore, PhD, CCC-SLP

2014 ASHA ConventionOrlando, Florida

Page 2: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Disclosure

• The individuals presenting this information are involved in recruiting individuals to complete tasks through AMT or other online platforms.• This session may focus on one specific approach, with limited

coverage of other alternative approaches.• Portions of the research were supported by funding from IES.• No other conflicts to disclose.

Page 3: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Crowdsourcing for CSD research

Case study 3: Obtaining ratings of sentence contexts for vocabulary instruction

Suzanne Adlof

Page 4: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Goals

• To develop a web based platform that provides individualized, effective vocabulary instruction to high school students • Individualized based on content of study, and pace of study• Instruction includes dictionary definitions, and real-world contexts

• To be able to teach any word in the English language• Beginning with a seed corpus of 1000 target words and >70,000 contexts• Want ≥ 20 good contexts per word

This research is supported by a grant from the Institute of Education Sciences: R305A130467 (Adlof, PI).

Page 5: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Which contexts are most “nutritious” for vocabulary instruction?• Initial corpus of texts randomly retrieved in mass quantities from the

Internet; the quality of retrieved contexts is highly variable. • Example contexts for target word “guile”:

• There are some people, like Nathanael, who truly have no guile. They are very transparent and open. They accept people at face value and, since they have no guile themselves, are bewildered when they are faced with wickedness and deceit in others. But, truly guileless people are rare. They are both refreshing and frustrating at the same time.

• Show me the dirtpile and I will pray that the soul can take three stowaways" confuses me. What are the three stowaways? One of them could be him - like he wants to go with her, but what are the other two? Also, why does she vanish with no guile? Why would she vanish with guile?

• guile is the program, the -c switch instructs guile to evaluate the statement after the switch (similar to the -e switch for perl). The use-modules directive will ask guile to load the slib module in the ice-9 directory. After the use-modules statement is evaluated, it will proceed to call functions available through Slib, namely require and printf.

Page 6: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Challenges of obtaining ratings

• Scale of task: 70,000 contexts is a lot!• Need multiple rating of each context to ensure reliability• Traditional lab setup: • 50 undergrad students rate 100 contexts per day for $8.00 each• 140 consecutive days & $56,000 to get 10 ratings of each context!

• AMT setup: • AMT workers each rate 5 contexts at a time, for 10-12 cents • Speed of acquisition depends on many factors, but primary factor is building

up a qualified worker pool

Page 7: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Step 1: Qualification Test

Page 8: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

Step 1: Qualification Test

Page 9: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

2. Building a Pool of Qualified Workers• Began posting QTs and HITs in August 2013• Also advertised on listservs to recruit a larger pool of workers

interested in language, word learning• 2317 AMT workers have taken the QT• 947 (41%) workers qualified

Page 10: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

2. Worker Retention

• We have posted > 11,000 HITs for 947 qualified workers• (soliciting 10 ratings each for

>55,000 contexts)

• 75% of all context ratings have come from 27 “very high productivity raters” who have rated > 1000 contexts each

22%

26%

26%

11%

6%

5% 1%3%

Number HITs Completed after QT

01-23-1011-5051-150 151-500501-1000> 1000

Page 11: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

3. Reliability and Validity Checks• 93 contexts each

rated by expert and 10 AMT raters• 176 AMT raters

represented across contexts• AMT average rating

correlates with expert rating at r=.71, p<.001

0 1 2 3 4 5 60

0.51

1.52

2.53

3.54

4.5

Average AMT Rating

Expe

rt R

ating

Page 12: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

3. Reliability and Validity Checks• Spot checking suggests ratings are generally valid

Average AMT Rating

(SD)Context for target word “collusion”

1.5 (.53)In his discussion of this issue in the context of the fallout from California's recent attempt at electricity deregulation, Dr. Rapp notes that claims of collusion must be reconciled with the specific market facts and regulatory rules that affect suppliers' bidding behavior and capacity decisions. This is not always easy.

2.0 (.67)In his discussion of this issue in the context of the fallout from California's recent attempt at electricity deregulation, Dr. Rapp notes that claims of collusion must be reconciled with the specific market facts and regulatory rules that affect suppliers' bidding behavior and capacity decisions. This is not always easy.

3.0 (.94)We provide a collusive framework with heterogeneity among firms, investment, entry, and exit. It is a symmetric-information model in which it is hard to sustain collusion when there is an active firm that is likely to exit in the near future. Numerical analysis is used to compare a collusive to a noncollusive environment.

3.7 (.48)Some poker players think that by sharing information with their friends on Party Poker, they can gain an advantage and cheat their opponents. This is known as poker collusion, two or more players will use a chatroom, instant messages or even the telephone to tell their friends what cards they have.

Page 13: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

5. What we learned along the way

• Importance of clear instructions• Importance of “customer service,” e.g., fast payment & good

communication• TurkOpticon reviews

• Include quality control measures• Inter-rater agreement• Premier rater training• Attention checks

Page 14: Bringing the crowdsourcing revolution to research in communication disorders Tara McAllister Byun, PhD, CCC-SLP Suzanne M. Adlof, PhD Michelle W. Moore,

6. Next Steps

• Students generate their own contexts as part of instructional program• Students rate contexts as part of instructional program• Machine learning for automated ratings• “Authentic artificial intelligence”