50
Autograder RISHABH SINGH, SUMIT GULWANI, ARMANDO SOLAR- LEZAMA A G

Autograder

Embed Size (px)

DESCRIPTION

Autograder. Rishabh Singh, Sumit gulwani , Armando solar- lezama. A. G. Test-cases based feedback Hard to relate failing inputs to errors Manual feedback by TAs Time consuming and error prone. Feedback on Programming Assignments. - PowerPoint PPT Presentation

Citation preview

Page 1: Autograder

AutograderRISHABH SINGH, SUMIT GULWANI, ARMANDO SOLAR-LEZAMA

AG

Page 2: Autograder

Feedback on Programming Assignments

• Test-cases based feedback• Hard to relate failing inputs to errors

• Manual feedback by TAs• Time consuming and error prone

Page 3: Autograder

6.00 Student Feedback (2013)

"Not only did it take 1-2 weeks to grade problem, but the comments were entirely

unhelpful in actually helping us fix our errors. …. Apparently they don't read the code -- they

just ran their tests and docked points mercilessly. What if I just had a simple typo,

but my algorithm was fine? ...."

Page 4: Autograder

Bigger Challenge in MOOCs

Scalability Challenges (>100k students)

Page 5: Autograder

Today’s Grading Workflow def computeDeriv(poly):

    deriv = []    zero = 0    if (len(poly) == 1):        return deriv    for e in range(0, len(poly)):        if (poly[e] == 0):            zero += 1        else:            deriv.append(poly[e]*e)    return deriv

replace derive by [0]def computeDeriv(poly):    deriv = []    zero = 0    if (len(poly) == 1):        return deriv    for e in range(0, len(poly)):        if (poly[e] == 0):            zero += 1        else:            deriv.append(poly[e]*e)    return deriv

Teacher’s Solution

Grading Rubric

Page 6: Autograder

Autograder Workflow

AGdef computeDeriv(poly):    deriv = []    zero = 0    if (len(poly) == 1):        return deriv    for e in range(0, len(poly)):        if (poly[e] == 0):            zero += 1        else:            deriv.append(poly[e]*e)    return deriv

replace derive by [0]def computeDeriv(poly):    deriv = []    zero = 0    if (len(poly) == 1):        return deriv    for e in range(0, len(poly)):        if (poly[e] == 0):            zero += 1        else:            deriv.append(poly[e]*e)    return deriv

Teacher’s Solution

Error Model

Page 7: Autograder

Technical ChallengesLarge space of possible corrections

Minimal corrections

Dynamically-typed language

Constraint-based Synthesis to the rescue

Page 8: Autograder

Running Example

Page 9: Autograder

computeDeriv

Compute the derivative of a polynomial

poly = [10, 8, 2] #f(x) = 10 + 8x +2x2

=> [8, 4] #f’(x) = 8 + 4x

Page 10: Autograder

Teacher’s solution

def computeDeriv(poly):    result = []    if len(poly) == 1:        return [0]    for i in range(1, len(poly)):        result += [i * poly[i]]    return result    

Page 11: Autograder

Demo

Page 12: Autograder

• return a return {[0],?a}

• range(a1, a2) range(a1+1,a2)

• a0 == a1 False

Simplified Error Model

Page 13: Autograder

Autograder Algorithm

Page 14: Autograder

Rewriter

Translator Solver Feedba

ck

Algorithm

.py

. .sk

.out

--------

Page 15: Autograder

Rewriter

Translator Solver Feedba

ck

Algorithm: Rewriter

.py

.

Page 16: Autograder

Rewriting using Error Model

    range(0, len(poly))        

a a+1

range({0 ,1}, len(poly))

default choice

Page 17: Autograder

Rewriting using Error Model

    range(0, len(poly))        

a a+1

range({0 ,1}, len(poly))

Page 18: Autograder

Rewriting using Error Model

    range(0, len(poly))        

a a+1

range({0 ,1}, len({poly, poly+1}))

Page 19: Autograder

Rewriting using Error Model

    range(0, len(poly))        

a a+1

range({0 ,1}, {len({poly, poly+1}), len({poly, poly+1})+1})

Page 20: Autograder

Rewriting using Error Model ()

def computeDeriv(poly):    deriv = []    zero = 0    if ({len(poly) == 1, False}):

        return {deriv,[0]}

    for e in range({0,1}, len(poly)):        if (poly[e] == 0):            zero += 1        else:            deriv.append(poly[e]*e)

    return {deriv,[0]}

Problem: Find a program that minimizes cost metric and is functionally equivalent with teacher’s solution

Page 21: Autograder

Rewriter

Translator Solver Feedba

ck

Algorithm: Translator

. .sk

Page 22: Autograder

Sketch [Solar-Lezama et al. ASPLOS06]

void main(int x){    int k = ??;    assert x + x == k * x;}

void main(int x){    int k = 2;    assert x + x == k * x;}

Statically typed C-like language with holes

Page 23: Autograder

Translation to Sketch (1) Handling python’s dynamic types

(2) Translation of expression choices

Page 24: Autograder

(1) Handling Dynamic Types

ival bval

lst…

Typeint boo

l

list

struct MultiType{ int type; int ival; bit bval; MTString str; MTList lst; MTDict dict; MTTuple tup;}

Page 25: Autograder

Python Constants using MultiType

1

[1,2]

2

1 -

- -INT

int

list

bool

2 -

- -INT

int

list

bool

- -

len=2, lVals = { *, * } -

LISTint

list

bool

Page 26: Autograder

Python Exprs using MultiType

x + y

15 -

- -

INTint

list

bool

10 -

- -

INTint

list

bool

5 -

- -

INTint

list

bool

Page 27: Autograder

Python Exprs using MultiType

- -

[1,2] -

LISTint

list

bool

- -

[3] -

LISTint

list

bool

- -

[1,2,3] -

LISTint

list

bool

x + y

Page 28: Autograder

- -

[3] -

LISTint

list

bool

5 -

- -

INTint

list

bool

Python Expressions using MultiType

Typing rules are encoded as constraints

x + y

Page 29: Autograder

(2) Translation of Expression Choices

{ , }

MultiType modifyMTi( , ){ if(??) return else choicei = True return

}

- -

- -

- -

- -

- -

- -

- -

- -

- -

- -

// default choice

// non-default choice

- -

- -

Page 30: Autograder

harness main(int N, int[N] poly){

MultiType polyMT = MTFromList(poly);

MultiType result1 = computeDeriv_teacher(polyMT); MultiType result2 = computeDeriv_student(polyMT);

assert MTEquals(result1,result2);

}

Translation to Sketch (Final Step)

Page 31: Autograder

harness main(int N, int[N] poly){ totalCost = 0;

MultiType polyMT = MTFromList(poly);

MultiType result1 = computeDeriv_teacher(polyMT); MultiType result2 = computeDeriv_student(polyMT); ……………… if(choicek) totalCost++; ……………… assert MTEquals(result1,result2); minimize(totalCost);} Minimum

Changes

Translation to Sketch (Final Step)

Page 32: Autograder

Rewriter

Translator Solver Feedba

ck

Algorithm: Solver

.sk

.out

Page 33: Autograder

Solving for minimize(x)

Binary search for x – no reuse of learnt clauses

Incremental linear search – reuse learnt clauses

Sketch Uses CEGIS – multiple SAT calls

MAX-SAT – too much overhead

Page 34: Autograder

Incremental Solving for minimize(x)

(P,x) Sketch

Sketch

Sketch

(P1,x=7)

(P2,x=4)UNSAT

{x<7}

{x<4}

Page 35: Autograder

Rewriter

Translator Solver Feedba

ck

Algorithm: Feedback

.out

--------

Page 36: Autograder

Feedback Generation

Correction rules associated with Feedback Template

Extract synthesizer choices to fill templates

Page 37: Autograder

Evaluation

Page 38: Autograder

Autograder Tool for Python Currently supports:

- Integers, Bool, Strings, Lists, Dictionary, Tuples

- Closures, limited higher-order fn, list comprehensions

Page 39: Autograder

Benchmarks

Exercises from first five weeks of 6.00x and 6.00 int: prodBySum, compBal, iterPower, recurPower, iterGCD

tuple: oddTuple

list: compDeriv, evalPoly

string: hangman1, hangman2

arrays(C#): APCS dynamic programming (Pex4Fun)

Page 40: Autograder

Benchmark Test SetevalPoly-6.00 13

compBal-stdin-6.00 52compDeriv-6.00 103hangman2-6.00x 218prodBySum-6.00 268oddTuples-6.00 344

hangman1-6.00x 351evalPoly-6.00x 541

compDeriv-6.00x 918oddTuples-6.00x 1756iterPower-6.00x 2875

recurPower-6.00x 2938iterGCD-6.00x 2988

Page 41: Autograder

Average Running Time (in s)

prodByS

um-6.00

oddTuples-6.00

compDeriv

-6.00

evalPoly-

6.00

compBal-s

tdin-6.00

compDeriv

-6.00x

evalPoly-

6.00x

oddTuples-6.00x

iterP

ower-6.00x

recu

rPower-6

.00x

iterG

CD-6.00x

hangman1-6.00x

hangman2-6.00x0

5

10

15

20

25

30

35

Tim

e (in

s)

Page 42: Autograder

Feedback Generated (Percentage)

evalPoly-6.00x

compBal-s

tdin-6.00

hangman2-6.00x

evalPoly-6.00

hangman1-6.00x

oddTuples-6.00x

oddTuples-6.00

iterP

ower-6.00x

iterG

CD-6.00x

recurP

ower-6.00x

prodBySum-6.00

compDeriv

-6.00x

compDeriv

-6.000.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Feed

back

Per

cent

age

Page 43: Autograder

Feedback Generated (Percentage)

evalPoly-6.00x

compBal-s

tdin-6.00

hangman2-6.00x

evalPoly-6.00

hangman1-6.00x

oddTuples-6.00x

oddTuples-6.00

iterP

ower-6.00x

iterG

CD-6.00x

recurP

ower-6.00x

prodBySum-6.00

compDeriv

-6.00x

compDeriv

-6.000.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Feed

back

Per

cent

age

Page 44: Autograder

Feedback Generated (Percentage)

evalPoly-6.00x

compBal-s

tdin-6.00

hangman2-6.00x

evalPoly-6.00

hangman1-6.00x

oddTuples-6.00x

oddTuples-6.00

iterP

ower-6.00x

iterG

CD-6.00x

recurP

ower-6.00x

prodBySum-6.00

compDeriv

-6.00x

compDeriv

-6.000.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Feed

back

Per

cent

age

Page 45: Autograder

Feedback Generated (Percentage)

evalPoly-6.00x

compBal-s

tdin-6.00

hangman2-6.00x

evalPoly-6.00

hangman1-6.00x

oddTuples-6.00x

oddTuples-6.00

iterP

ower-6.00x

iterG

CD-6.00x

recurP

ower-6.00x

prodBySum-6.00

compDeriv

-6.00x

compDeriv

-6.000.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Feed

back

Per

cent

age

Page 46: Autograder

Average Performance

TestSet Generated Feedback Percentage AvgTime(s)

13365 8579 64.19% 9.91

Page 47: Autograder

Why low % in some cases?• Completely Incorrect Solutions

• Unimplemented Python Features

• Timeout• comp-bal-6.00

• Big Conceptual errors

Page 48: Autograder

Big Error: Misunderstanding APIs

• eval-poly-6.00x

def evaluatePoly(poly, x):    result = 0    for i in list(poly):        result += i * x ** poly.index(i)    return result

Page 49: Autograder

Big Error: Misunderstanding Spec

• hangman2-6.00x

def getGuessedWord(secretWord, lettersGuessed):  for letter in lettersGuessed:    secretWord = secretWord.replace(letter,’_’)  return secretWord

Page 50: Autograder

Conclusion

A technique for automated feedback generationError Models, Constraint-based

synthesis Provide a basis for automated feedback

for MOOCs Towards building a Python Tutoring

[email protected]

Thanks!