Upload
hoangthien
View
216
Download
0
Embed Size (px)
Citation preview
Sumit Gulwani
Programming by Examples Applications, Algorithms & Ambiguity Resolution
Stanford Lecture March 2017
Motivation
99% of computer users cannot program! They struggle with simple repetitive tasks.
1
Spreadsheet help forums
2
Typical help-forum interaction
300_w5_aniSh_c1_b ! w5
=MID(B1,5,2)
300_w30_aniSh_c1_b ! w30
=MID(B1,FIND(“_”,$B:$B)+1, FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1)
=MID(B1,5,2)
3
Flash Fill (Excel 2013 feature) demo
“Automating string processing in spreadsheets using input-output examples”; POPL 2011; Sumit Gulwani 4
Input Output (Half hour bucket)
0d 5h 26m 5:00-5:30 0d 4h 57m 4:30-5:00 0d 4h 27m 4:00-4:30 0d 3h 57m 3:30-4:00
5
Number and DateTime Transformations
Synthesizing Number Transformations from Input-Output Examples; CAV 2012; Singh, Gulwani
Input Output (Round to 2 decimal places)
123.4567 123.46 123.4 123.40 78.234 78.23
Excel/C#:
Python/C:
Java:
#.00
.2f
#.##
To get Started!
Data Science Class Assignment
6
FlashExtract
FlashExtract
Ships inside two Microsoft products:
9
“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani
ConvertFrom-String cmdlet
Custom Log, Custom Field
FlashExtract Demo
Powershell
Start with: End goal:
Trifacta facilitates the task through a series of steps, suggesting small transformations at each step:
From: Skills of the Agile Data Wrangler (tutorial by Hellerstein and Heer)
1. Split on “:” Delimiter 2. Delete Empty Rows 3. Fill Values Down
4. Pivot Number on Type
FlashRelate
Table Re-formatting
FlashRelate Demo
11
“FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples”; PLDI 2015; Barowy, Gulwani, Hart, Zorn
Killer Application: Data Wrangling
Data is the new oil. Data scientists spend 80% time wrangling data.
Extraction • FlashExtract:Tabular data from log files [PLDI 2014] • Tabular data from spreadsheets [PLDI 2015]
Transformation • FlashFill: Syntactic String Transformations [POPL 2011] • Number Transformations [CAV 2013] • Date Transformations [POPL 2016] • Semantic String Transformations [VLDB 2012] • Normalization Transformations [IJCAI 2015]
Formatting • FlashRelate: Table layout transformations [PLDI 2011] • Repetitive editing in Powerpoint [AAAI 2014] 12
Programming-by-Examples Architecture
Example-based Intent
Program set (a sub-DSL of D)
DSL D
Program Synthesizer
13
• Balanced Expressiveness – Expressive enough to cover wide range of tasks – Restricted enough to enable efficient search
• Restricted set of operators – those with small inverse sets
• Restricted syntactic composition of those operators
• Natural computation patterns
– Increased user understanding/confidence – Enables selection between programs, editing
14
Domain-specific Language (DSL)
Flash Fill DSL (String Transformations)
!!
top%level!expr! if%then%else
condition%free!expr!
atomic!expression!
ConstantString!
!
input!string! !!
position!expression!
!!!!!!!!!!!!!!!!!!!!!!!|!Pos !
!!
Boolean!expression!
Concatenate(A,C)!
SubStr(X,P,P)!
Kth position in X whose left/right side matches with R1/R2.
!"#$%&!'→($)"(!"#$%&) Seq!expr!E!:=!Map(N,!*z:!S[z])!
!!!!!!!|!Merge(E1,!E2)!
some!lines!N!:=!FilterByPosition(L,!init,!iter)!
!!!!!!!!!!!!!!|!Filter(L,!*z:!F)!!
!
line!filter!function!F!!!!:=!Contains(z,r,K)!
!!!!!!!!!!!!!!!!!!!!!!!!!!!|!startsWith(z,r)!
all!lines!L!:=!Split(d,”\n”)!
16
FlashExtract DSL (Sequence Extraction)
substr!expr!S[z]!:=!
| Filter(L,!*y:!F[prevLine(y)])!
[z]!
[z])!
“FlashExtract: A Framework for data extraction by examples”; PLDI 2014; Vu Le, Sumit Gulwani
Programming by Examples Architecture
Example-based Intent
Program set (a sub-DSL of D)
DSL D
Program Synthesizer
17
Goal:+Set of expr of kind , that satisfies spec - [denoted [,⊨-]!]
,: DSL (top-level) expression
-: Conjunction of (input state!1!,!output value 2) [denoted 1⇝2]
Methodology: Based on divide-and-conquer style problem decomposition.
• [,⊨-] is reduced to simpler problems (over sub-expressions of e or sub-constraints of -).
• Top-down (as opposed to bottom-up enumerative search).
18
Search Methodology
“FlashMeta: A Framework for Inductive Program Synthesis”; [OOPSLA 2015] Alex Polozov, Sumit Gulwani
Let , be a non-terminal defined as ,!≔ ,↓1 !|! ,↓2
[,⊨-]=
[,⊨-↓1 ∧-↓2 ]!=!
19
Problem Reduction Rules
6%",#),7"([,⊨-↓1 ],![,⊨-↓2 ])
8%$9%([,↓1 ⊨-],![,↓2 ⊨-])!
[ ,↑′ ⊨-↓2 ], where ,′=!;9<!=>?9@([,⊨-↓1 ])
[,⊨-↓1 ∧-↓2 ]!=!
An alternative strategy:
[,⊨-↓1 ∨-↓2 ]!=
8%$9%([,⊨-↓1 ],![,⊨-↓2 ])
Inverse Set: Let F be an n-ary operator.
! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}
20
Problem Reduction Rules
!G9%7H"↑−1 ("Abc")=
[G9%7H"(M,N)⊨(1⇝"Abc")]!=!Union({ G9%7H"([M⊨(1⇝"Abc")],!![N⊨(1⇝O)]), G9%7H"([M⊨(1⇝"Ab")],[N⊨(1⇝"7")]),! G9%7H"([M⊨(1⇝"A")],[N⊨(1⇝"?7")]),! G9%7H"([M⊨(1⇝ϵ)],[N⊨(1⇝"P?7")])})
!{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}
[A(,↓1 , …,,↓% )⊨1⇝2]= !!!!8%$9%({F([e↓1 ⊨1⇝ E↓1 ],!…,[,↓% ⊨1⇝ E↓% ])!|!(E↓1 ,…, E↓% )∈ A↑−1 (2)!}
Inverse Set: Let F be an n-ary operator.
! A↑−1 (2)={(E↓1 ,…, E↓% !)!|!A(E↓1 ,…, E↓% )=2}
21
Problem Reduction Rules
!G9%7H"↑−1 ("Abc")= !{(!"Abc",ϵ),!("P?","c"),+("A","bc"),+(ϵ,"Abc")}
!!6S;ℎ,%U@),↑−1 (2)= !{("#E,,!2,!⊤),!(SH@),,!⊤,!2)}
!E?!"#↑−1 !("Ab")=!…! !E?!"#↑−1 ("Ab"!|!x="Ab!cd!Ab")= {!(0,2),!(6,8)!}
The notion of inverse sets (and corresponding reduction rules) can be generalized to the conditional setting.
Consider the SubStr(x,i,j) operator
User specification
Examples: Conjunction of (input state,!output value) Inductive Spec: Conjunction of (input state, output property) Search methodology (Conditional) Inverse sets: Back-propagation of values (Conditional) Witness functions: Back-propagation of properties 22
Generalizing output values to output properties
list of strings T!:=!Map(L,S)!substring fn S!:=!*y:!…!!list of lines L!:=!Filter(Split(d,"\n"),!B)!boolean fn B!:=!*y:!…!!!!!!!!!!
23
Problem Reduction
∧
Spec for T Spec for L Spec for S ⋈
FlashExtract DSL
substring expr E!:=!SubStr(y,P1,P2)! position expr P!:=!K!|!Pos(y,R1,R2,K)!!
24
Problem Reduction
Redmond, WA Redmond, WA
Spec for P1 Spec for P2 Spec for E ⋈
SubStr grammar
Programming-by-Examples Architecture
Example-based Intent Program set
(a sub-DSL of D)
DSL D Ranking fn
Program Synthesizer
25
Ranked Program set (a sub-DSL of D)
Prefer programs with simpler Kolmogorov complexity • Prefer fewer constants. • Prefer smaller constants.
26
Basic ranking scheme
Input Output Rishabh Singh Rishabh Ben Zorn Ben
• 1st Word
• If (input = “Rishabh Singh”) then “Rishabh” else “Ben”
• “Rishabh”
Prefer programs with simpler Kolmogorov complexity • Prefer fewer constants. • Prefer smaller constants.
27
Challenges with Basic ranking scheme
Input Output Rishabh Singh Singh, Rishabh Ben Zorn Zorn, Ben
• 2nd Word + “, ‘’ + 1st Word • “Singh, Rishabh”
How to select between Fewer larger constants vs. More smaller constants?
Idea: Associate numeric weights with constants.
Prefer programs with simpler Kolmogorov complexity • Prefer fewer constants. • Prefer smaller constants.
28
Challenges with Basic ranking scheme
How to select between Same number of same-sized constants?
Idea: Examine data features (in addition to program features)
Input Output Missing page numbers, 1993 1993 64-67, 1995 1995
• 1st Number from the beginning • 1st Number from the end
29
Machine learning based ranking scheme
Rank score of a program: Weighted combination of various features. • Weights are learned using machine learning. Program features • Number of constants • Size of constants
Features over user data: Similarity of generated output (or even intermediate values) over various user inputs • IsYear, Numeric Deviation, Number of characters • IsPersonName
“Predicting a correct program in Programming by Example”; [CAV 2015] Rishabh Singh, Sumit Gulwani
• Core Synthesis Architecture – Domain-specific Language – Search methodology – Ranking function
" Next generation Synthesis – Interactive – Predictive – Adaptive
30
Outline
Programming-by-Examples Architecture
Example based Intent
Ranked Program set (a sub-DSL of D)
DSL D Test inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator Ranking fn
Program Synthesizer Debugging
Refined Intent
31
Incrementality
Interactive Debugging
32
• Intended programs can sometimes be synthesized from just the input. – Tabular data extraction, Sort, Join
• Can save large amount of user effort. – User need not provide examples for each of tens of columns.
33
Predictive
Programming-by-Examples Architecture
Example based Intent
Ranked Program set (a sub-DSL of D)
DSL D Test inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator Ranking fn
Program Synthesizer Debugging
Refined Intent
Incrementality
34
• Learn from past interactions – of the same user (personalized experience). – of other users in the enterprise/cloud.
• The synthesis sessions now require less interaction.
35
Adaptive
Programming-by-Examples Architecture
Example based Intent
Ranked Program set (a sub-DSL of D)
DSL D
Interaction history
Test inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator
Learner
Ranking fn
Program Synthesizer Debugging
Refined Intent
Incrementality
36
https://microsoft.github.io/prose • Efficient implementation of the generic search methodology. • Provides a library of reduction rules.
Role of synthesizer developer • Implement a DSL with executable semantics for new operators. • Implement reduction rules (inverse semantics) for some operators. • Implement ranking strategy.
• Can also specify tactics to resolve non-determinism in search.
37
PROSE Framework
Vu Le
The PROSE Team
Sumit Gulwani
Daniel Perelman
Danny Simmons
Mohammad Raza
Abhishek Udupa
Ranvijay Kumar
Alex Polozov
Mark Plesko
Please apply for intern/full-time positions!
• Application to robotics
• Learn from usage data
• Probabilistic noise handling
• Programming using natural language
39
Future Directions
• Killer application: Data wrangling – 99% of end users are non-programmers. – Data scientists spend 80% time cleaning data.
• Algorithmic search – Domain-specific language – Divide-and-conquer methodology based on inverse functions
• Ambiguity resolution – Ranking – Interactivity
40
Conclusion
Reference: “Programming by Examples (and its applications in Data Wrangling)”, In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016 [based on Marktoberdorf Summer School 2015 Lecture Notes]