49
Using Human Language Technology to Improve MOOC Learning Victor Zue with Daniel Li and Hung-yi Lee Research sponsored by Quanta Computer, Inc.

Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Using Human Language Technology

to Improve MOOC Learning

Victor Zuewith Daniel Li and Hung-yi Lee

Research sponsored by Quanta Computer, Inc.

Page 2: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

HLT Meets MOOC

• Human language integral part of education– Lectures, books, tutorials, discussions, Q&A, etc.

• Some challenges for MOOC– Heterogeneity: Student’s background (preparedness,

language competence, learning style, etc.)– Scale: One-size-fits-all solutions would not suffice;

need mass customization• Human language technologies could help

– Process and manage contents– Develop speech-based interfaces for easy access– But need general and scalable solutions whenever

possible

MIT Computer Science and Artificial Intelligence Laboratory

Page 3: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Some Example Uses

• Transcription of course materials– Currently done by humans

• Translation into multiple languages– Long term research

• User identification and authentication– Security and privacy concerns

• Information management– Categorization: e.g., managing Q&A– Linking: e.g., different forms, different sources– Summarization: e.g., précis of a discussion– Search: e.g., dialogue-based search engine

• …

Page 4: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Some Example Uses

• Transcription of course materials– Currently done by humans

• Translation into multiple languages– Long term research

• User identification and authentication– Security and privacy concerns

• Information management– Categorization: e.g., managing Q&A– Linking: e.g., different forms, different sources– Summarization: e.g., précis of a discussion– Search: e.g., dialogue-based search engine

• …

Page 5: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

On-line and residential courses are different in

many respects

Page 6: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Some Comparisons …

• Multiple offerings of the same course at MIT and on edX• Class size, drop out rate, manners of interaction, etc.

0

200

400

600

800

1000

6.00 F2013

6.00 S2014

6.000.1 +6.000.2 F2014

0

20000

40000

60000

80000

100000

6.00x F2012

6.00x S2013

6.00.1x F2013

6.00.1x +6.00.2x S2014

#threads#threads replied by learners#threads replied by staff#learners who passedenrollment

On‐line Residential

Page 7: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Why so many dropouts in MOOC?

• Browsing, heterogeneous background, commitment, insufficient help, …

Data courtesy of Prof. John Guttag

Page 8: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Multimedia Content Linking for MOOC

with Daniel Li

Page 9: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Linked Knowledge

• Lectures, slides, textbook, forum, etc., of a topic can play reinforcing roles in learning

• Link various contents of a given topic

• Create an adaptable learning environment for students to navigate freely

Page 10: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

A Hypothesis

• Organizing and linking related contents will improve learning

Unlinked Linked

Page 11: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

A Hypothesis

• Organizing and linking related contents will improve learning

The equation of the regression line

Estimation using regression line

R.M.S. error

Regression – equation and estimation

Regression error

Video Textbook

1

2

3

4

1

2

3

4

R.M.S. error

Regression error

The equation of the regression line

Estimation using regression

line

Regression – equation and estimation

Video Textbook

1

2

3

4

1 2

3

4

Unlinked Linked

Page 12: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Could linking really help?

• Conducted crowd sourcing study online– Course: Stat2.1X from Berkeley

• Videos, slides, textbook– Subjects: Amazon Mechanical Turk workers with

varying background• Education? Experience with MOOC? Statistics?

– Ground Truth:• Established by human experts

– Measurements: • Information Search: How fast and accurately can subjects

retrieve information• Concept Retention: How well can they retain information

Page 13: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Information Search (Speed)

‐20

0

20

40

60

80

Overall ≥Bachelor ≤Somecollege

MOOCs NoMOOCs

Statistics NoStatistics

mprovement in search time (second)made by linking

• Linking improves speed in all cases involving novice learners

• Improvement made without sacrificing accuracy

✔ ✔ ✔Significance(p=0.01)

Page 14: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Visualization and Inference of Learning Paths fromOnline Courses

with Hung-yi Lee

Page 15: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

1. Data Gathering

• Collect video clips from MOOC platforms and electronic textbooks from the Internet

ElectronicTextbooks

Page 16: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

2. Key Concept Extraction• Extract terms representing key concepts from

the audio transcriptions of video clips

AudioTranscriptions

Candidates

Part‐of‐Speech (POS) Tagging

Filtering by lexical Statistics

(for example, TF‐IDF ……)

Pick nouns and noun phrasesDNA

InheritanceRNA

ProteinAdenosine triphosphate

……

Page 17: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Transcriptions

3. Relation Extraction

As DNA encodes RNA,it is the material of inheritance.

• “Read” the text/transcriptions to find the relation between the key concepts

Co‐referenceResolution

Page 18: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Transcriptions

3. Relation Extraction

As DNA encodes RNA,it is the material of inheritance.Co‐reference

Resolution

SyntacticParsing

DNA

Syntactic parsing tree

• “Read” the text/transcriptions to find the relation between the key concepts

Page 19: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

A statistical model finds the relation between concepts on the syntactic tree [Mausam, EMNLP’12].

relation

3. Relation Extraction

Co‐referenceResolution

SyntacticParsing

RelationExtraction

Transcriptions Syntactic parsing tree

• “Read” the text/transcriptions to find the relation between the key concepts

Page 20: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

The relation of “DNA” and “inheritance” is “is the material of”.

relation

3. Relation Extraction

Co‐referenceResolution

SyntacticParsing

RelationExtraction

Transcriptions

• “Read” the text/transcriptions to find the relation between the key concepts

Page 21: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

4. Knowledge Graph Construction

• Nodes: key concepts• Edges: relation between the key concepts

DNA RNA

Protein

GenomeGene

Nucleotide

Inheritance

Knowledge Graph

Page 22: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

5. Prerequisite Concept Inference

“DNA” is related to “Inheritance”, “RNA”, etc.

Which one is the prerequisite concept of “DNA”?

DNA RNA

Protein

GenomeGene

Nucleotide

Inheritance

Page 23: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

5. Prerequisite Concept Inference

Analyze the positions where the concepts are mentioned the first time in a course

DNA RNA

Protein

GenomeGene

Nucleotide

Inheritance

“Inheritance” is the prerequisite concept of “DNA”

First mentioned in Lecture 1

First mentioned in Lecture 3

Page 24: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

5. Prerequisite Concept Inference

DNA RNA

Protein

GenomeGene

Nucleotide

Inheritance

“Inheritance” is the prerequisite concept of “DNA”

First mentioned in Lecture 10

First mentioned in Lecture 3

“RNA” is the advanced concept of “DNA”

Analyze the positions where the concepts are mentioned the first time in a course

Page 25: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

6. Merge of Courses

• Find lectures related to “DNA”

Learner

What is “DNA”?

Course AInheritance

Lecture 8‐2 Lecture 8‐3 Lecture 8‐4

Lecture 5‐4 Lecture 5‐5 Lecture 5‐6Course B

Page 26: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

6. Merge of Courses

• Use vector space models to represent the audio transcriptions of each lecture – Compute cosine similarity between the models– Merge the courses with high cosine similarity

Learner

What is “DNA”?

Course A

DNA Structure

DNA Replication

Inheritance

Lecture 8‐2 Lecture 8‐3 Lecture 8‐4

Lecture 5‐4 Lecture 5‐5 Lecture 5‐6Course B

Page 27: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

7. Cannot Understand?

DNA

Inheritance

RNA

DNA Structure

DNA Replication

Learner

I cannot understand.

Show the lectures of the prerequisite concept of “DNA” 

Page 28: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

8. Learn More

DNA

Inheritance

RNA

Learner

Learn More!

DNA Structure

DNA Replication

Show the lectures of the advanced concept of “DNA” 

Page 29: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Video Demonstration

Page 30: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Ongoing Research

• Automatic content linking• Topic modeling• Video processing• Sentiment analysis

Page 31: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Summary

• We have achieved some positive results using HLT for MOOC

• Improvements, extensions, and evaluation are ongoing

• We hope to collaborate with colleagues outside of MIT to achieve greater impact

Page 32: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Thank You!

Page 33: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human
Page 34: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Multimedia Content Linking for MOOC

Daniel Li

Page 35: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Linked Knowledge

• Lectures, slides, textbook, forum, etc., of a topic can play reinforcing roles in learning

• Link various contents of a given topic

• Create an adaptable learning environment for students to navigate freely

Page 36: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Multimedia Content Linking

• Discover the content linking using Hidden Markov Models (HMM)

• Align textbook sections into HMM states• Link contents assigned to the same state

Page 37: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Experimental Setup

• Dataset from Berkeley’s Intro to Statistics– Video Transcription: 7 hours/56K words– Slides: 184 pages/9K words– (Recommended) Textbook: 162 sections/42K words

• Evaluate against hand-labeled ground truth– Three Conditions: transcription only; transcription +

# of pages of slides; transcription + slides– Three Models: baseline (measuring cosine similarity

using simple word statistics); HMM; and HMM weighing keywords higher than other words

Page 38: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Preliminary Results

• HMM results better than baseline• Even better results with Feature Selection• Video transcription + slides yield the best

performance

Page 39: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

User Interface

Page 40: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human
Page 41: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Opinion Summarization

“The laments of both Andromache and Kleopatraseem to be more attuned to their own fate.”

“The result is more sacrifice, rather than redemption.”

“Zeus is by no means an omnipotent ruler.”

“compare Hesiod with Homer”

“The performance of the song in the Iliad confers the text a multi‐dimensional, almost 3D, perspective of the story.”

Course: Ancient Greek Hero

Topic Modeling

Page 42: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

• Goal:– To applystatistical models for discovering the abstract

"topics" that occur in a collection of documents• Use Latent Dirichlet Allocation (LDA)

– A generative graphical model

– Each document (e.g., a forum post) is a mixture of a small number of topics; and each topic has probabilities of generating various words

Topic Modeling

CAT-Related• Milk• Meow• Kitten

DOG-Related• Bone• Bark• Puppy

Page 43: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Pset problem set week exam info  practice grade due deadline complete date 

posted certificate

Research science scientific area projectprofessional field 

development scientist discoveries technology

color

DNA trait dominant affected recessive linked phenotype sex breeding 

individual relevant inheritance color

mutant mutants 

Mutation gene genes related function 

mutations connection relation separate 

involved linked caused mutant mutants 

Diseases body health cancer understanding disease research lives medicine 

medical

Topic Modeling

chemical 

Molecule group structure strong functional close making large amount create separated chemical 

Ancestors neanderthalshumans modern africaneanderthal mated recent genome 

interbred interbreeding migrated

electrons nitrogen

Atom bonds polar amino hydrogen acid covalent atoms form acids oxygen ionic 

chain carbon bonding electrons nitrogen

reach

Experiment flies wings food 

olfaction source wing smell abilitynavigate winged obstacle reach

Organism life level diversity complex biological building  appearance cellular functions process mechanisms forms 

living

• Topic clustering on the forum of “Biology” class by LDA

Page 44: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Edx online facebookcourses free moocenrolled joined university global 

education educational friend 

request fb

Topic Modeling

Song andromachesinging hector songs 

narrative laments sung achilles klea andronnarrator patrokloskleopatra lyre 

Reading read texts iliad homer odyssey poetry works book nagy literature past fast poems homeric

Feeling thought feel agree means human thinking comment feelings 

mind 

hallo island 

Athens greeksgreece thessalonikimaria eleni naxoshellas crete eirinitirnavos evangelia

hallo island 

Death lament thetismuses funeral nereids laments dead die goddess grief lamentation 

lamenting sisters pindar

Pain achilles son mother iliad life sorrow hero kleosepic patrokloshomer memory sisters pindar

War agamemnonachilles anger 

apollo kleos zeusangry achaeansathena god iliadgoddess king april

Timeline week hours starting june beginning weeks end 

schedule date april

Heroes hero gods human heroic man 

humanity power fight born strong actions 

humans values notion modern god

argentina

Spanish holamexico saludosmadrid desdespain costa barcelonaargentina

• Topic clustering on the forum of “Ancient Greek Hero”

Page 45: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human
Page 46: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Topic 1 Topic 2 Topic 3

Page 47: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Three Related Efforts

1. How can different threads of on-line discussion be categorized?– Topic Modeling (Jingjing Liu; May, 2013)

2. How can we provide a choice of learning paths for students with varying backgrounds?– Learning Path Discovery (Hung-yi Lee)

1. How can we link the various contents of an online course to enhance learning?– Multimedia Content Linking (Daniel Li)

Page 48: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Information Search (Accuracy)

‐15

‐10

‐5

0

5

10

15

Overall ≥Bachelor ≤Somecollege

MOOCs NoMOOCs

Statistics NoStatistics

• Linking doesn’t improve accuracy (except in one case)

✔Significance(p=0.05)

Page 49: Using Human Language Technology to Improve MOOC Learningdrhomed.org.mo/wp-content/uploads/2018/04/Victor-Zue.pdfResearch sponsored by Quanta Computer, Inc. HLT Meets MOOC •Human

Concept Retention

‐0.2

0

0.2

0.4

0.6

0.8

1

1.2

Overall ≥Bachelor ≤Somecollege

MOOCs NoMOOCs

Statistics NoStatistics

Improvement in #key‐terms ofsummary made by linking

• Linking doesn’t improve retention (except in one case for novice)

✔Significance(p=0.05)