Project Summary CAREER: Program Synthesis with ... · Research Thrust 2: Synthesis with Probabilistic Objectives and Guarantees The last point of Thrust 1 dwells with probabilistic

Project Summary

CAREER: Program Synthesis with Quantitative GuaranteesLoris D’Antoni (University of Wisconsin–Madison)

Overview Software has been changing our lives for many decades, but despite the advances in pro-gramming language design, how we write programs has not changed much: programmers keep repeat-ing similar mistakes, writing similar programs, and fixing similar bugs. Program synthesis, the art ofautomatically generating programs that meet user intents, promises to increase the productivity of pro-grammers and end-users of computing devices by automating tedious, error-prone, and time-consumingtasks. Despite the practical successes of program synthesis, we still do not have systematic frameworks tosynthesize programs that are “good” according to certain metrics—e.g., produce programs of reasonablesize or with good runtime—and to understand when synthesis can result in such good programs.

We envision a world in which synthesis algorithms are more predictable and accurate. To realize thisvision, we propose to attack the problem of performing program synthesis in the presence of quantitative ob-jectives while providing quantitative guarantees on the results and on the performance of the synthesis algorithms.Concretely, we will re-design existing specification mechanisms and solving techniques for synthesis prob-lems to accommodate optimization objectives on the synthesized program. To achieve this goal, we planour investigation along two synergistic research thrusts:T1 We will investigate the use of formal methods such as weighted tree grammars, weighted logics, and

logics with probabilities to specify quantitative objectives on the syntax, semantics, and probabilisticoutcomes of the synthesized programs. We will use such formalisms to design new algorithms forsynthesizing programs that satisfy quantitative objectives on the syntax and semantics of programs.

T2 We will design new synthesis algorithms that can (i) generate programs that satisfy user-given prob-abilistic objectives, and (ii) run efficiently when operating on certain probabilistic inputs. We will drawinspiration from the theories of computational learning and random graphs to design synthesis algo-rithms with provable probabilistic guarantees.

Intellectual Merits Whereas current synthesis algorithms do not provide explicit support for quantitativerequirements over the synthesized programs, the proposed work will introduce a general frameworkfor expressing synthesis problems with quantitative objectives and new algorithms for solving problemsexpressed in such a framework. The proposed work will lead to fundamental new approaches that useweighted tree grammars to synthesize programs of minimal size; new algorithms based on weighted logicsfor incrementally solving synthesis problems in the presence of noisy data; new algorithms that leveragetechniques from computational learning theory to optimally and efficiently solve synthesis problems withprobabilistic objectives; and new analysis tools based on the theory of random graphs for reasoning aboutthe performance of synthesis algorithms that use version space algebras.Broader Impacts The proposed work will lead to more predictable, accurate, and robust synthesis al-gorithms, which will benefit widely used synthesis applications such as personalized education, net-work synthesis, programming by examples, and automated program repair. All our implementations anddatasets will be released open-source. The PI will develop a new course and a new workshop on programsynthesis and how to create visual art with it (to be held at UW-Madison). The course/workshop materialwill be made publicly available. The PI will also organize a series of lectures about the impact of pro-gram synthesis on the labor force and will lead several diversity and outreach efforts to undergraduates,women, under-represented minorities, and non-traditional students.Keywords Program synthesis; Computational learning theory; Logic; Graph theory; Automata theory.

B-1

CAREER: Program Synthesis with Quantitative GuaranteesLoris D’Antoni (University of Wisconsin–Madison)

1 IntroductionSoftware has been changing our lives for many decades, but how we write programs has not changedmuch: programmers keep repeating similar mistakes, writing similar programs, and fixing similar bugs.Program synthesis, the art of automatically generating programs that meet user intents, promises to in-crease the productivity of programmers and end-users of computing devices by automating tedious, error-prone, and time-consuming tasks. In most program synthesis problems, the user describes an intent usinghigh-level specifications such as input-output examples or logical formulae, and the synthesizer magicallyproduces a program that meets the user’s intent. For a long time, the high theoretical complexities andthe lack of practical applications prevented program synthesis from being used in practice.

Recently, thanks to the advances in decision procedures and to the advent of the “everyone wantsto be a programmer” era, the landscape has started to change and we have seen remarkable results inusing synthesis to solve practical programs such as feedback generation in introductory programmingassignments [85, 38], automating data wrangling and data extraction tasks [77, 79, 93, 53, 14, 51], gen-erating network configurations that meet user intents [89, 47], optimizing low-level code [88, 83], andmore [15, 52]. Despite these practical successes, we still do not have systematic frameworks for synthesiz-ing programs that are “good” according to certain metrics—e.g., produce programs with a reasonable sizeor with good runtime—and to understand when synthesis can result in such good programs.Goals of this proposal This proposal tackles this foundational problem and presents a multi-faceted re-search and education plan with the goals of (i) performing program synthesis in the presence of quantitativeobjectives to improve the quality of the synthesized programs, (ii) designing program synthesis algorithmsthat operate over probabilistic spaces and therefore have probabilistic guarantees, and (iii) disseminating thepower of program synthesis among students through focused workshops, new coursework, and student in-volvement. We propose new algorithmic advances that marry concepts from formal methods, computationallearning theory, and graph theory to enable new synthesis applications in domains such as automatic feed-back generation, program repair, and programming by examples. Figure 1 provides an overview of theproposed research and education activities. As indicated, our formal framework will influence our algo-rithmic design and our education activities. A primary interface between all activities will be a synthesistoolbox centered on quantitative synthesis tools built by our research group [4, 58, 3, 25].

1.1 Overview of the Research and Education PlanBefore detailing our proposed activities, we summarize our two research thrusts and our education plan.

We envision a world in which synthesis algorithms produce better programs. To realize this vision,we propose to attack the foundational problem of performing program synthesis in the presence of quantitativeobjectives, a problem that is currently governed by ad hoc approaches.

Example 1.1. Suppose we want to write a program that forwards network packets to two switches s1 ands2, and that satisfies the following requirements: (i) Certain packets must be forwarded to certain switches.(ii) To allow load to be equally distributed, the probability of forwarding traffic to switch s1 should bebetween 0.45 and 0.55 (we assume a probabilistic distribution over the input packets). (iii) Across allprograms that satisfy requirements (i) and (ii), prefer the smaller ones because they occupy less space ona switch. Existing synthesis tools do not support these heterogeneous objectives and our goal is to createsynthesis frameworks, algorithms, and tools that can handle all such requirements.

Specifically, we plan our investigation along two synergistic research thrusts:Research Thrust 1: A Quantitative Framework for Syntax-Guided Synthesis In the first thrust, we shallinvestigate formal ways to impose quantitative objectives on synthesized programs.• Weighted tree grammars for quantitative syntactic specifications In many applications, it can be useful to

add quantitative objectives on the syntax of the synthesized programs—e.g., prefer smaller programsas imposed by requirement (iii) in Example 1.1, or programs with fewer constants, which are morelikely to generalize to inputs outside of the specification [84]. For example, in our PLDI’17 paper onprogram inversion [58], we observed that for the same functional specification one synthesizer would

CAREER: Synthesis with Quantitative Guarantees C-1

Weighted tree grammars for quantitative syntactic specifications (Sec. 3.1)

Weighted logics for quantitative semantic specification (Sec. 3.2)

Probabilistic logics for probabilistic specifications (Sec. 3.3)

Research Thrust 1Quantitative framework for Syntax-Guided Synthesis

Synthesis for probabilistic programs with finitely many outputs (Sec. 4.1)

Synthesis for probabilistic programs with infinitely many outputs (Sec. 4.2)

Probabilistic foundations for programming by example (Sec. 4.3)

Research Thrust 2Synthesis with Probabilistic Objectives and Guarantees

Quantitative synthesis toolbox(Digits, Genic, ...)

Curriculum restructuring and modernization (Sec. 6.1)

Synthesis workshops for undergraduates (Sec. 6.2)

Outreach and undergraduate research (Sec. 6.3)

Education plan

Fig. 1: Overview of research thrusts and education plan. A primary interface between research findings and education plan will bea comprehensive synthesis toolbox built around our preliminary work [4, 58, 3, 25].

produce programs with thousands of terms while another one would produce more desirable programswith tens of terms. To express such requirements, we propose to investigate a formal framework forspecifying quantitative objectives over the syntax of synthesized programs based on the formalisms ofweighted tree grammars [44, 45, 65]. This model can assign weights to programs in the search space andit unifies existing techniques for assigning syntactic costs to programs [18]. Moreover, the properties ofweighted tree grammars enable new ways to improve existing synthesis engines—e.g., given a constantc > 0, one can restrict a grammar to only produce programs of cost smaller than c and this techniquecan be directly used to adapt existing synthesizers to incorporate quantitative syntactic objectives.• Weighted logics for quantitative semantic specifications Some applications require quantitative objectives

on the behavior of the program. For example, in the context of programming by examples, when noprogram in the search space is consistent with all the given examples (perhaps due to noise in thedata [78]), one might want to synthesize a program that agrees with as many examples as possible.We propose to use weighted logics, such as MaxSAT and MaxSMT [16], as a natural mechanism fordescribing quantitative properties over the program semantics. In the example, the specification wouldbe expressed as a MaxSAT formula that assigns weight 1 whenever the program is consistent with oneof the input-output examples and the synthesis problem is to find a program that maximizes the weightof the formula. Using this formal approach, we can leverage recent advances in divide and conquer forsynthesis [11] to design new synthesizers that can efficiently find optimal solutions.• Probabilistic logics for probabilistic specifications Recently, there has been interest in programs that operate

over probabilistic input distributions [5, 4, 72]. In this setting, it is useful to synthesize programs thatsatisfy some probabilistic constraints—e.g., requirement (ii) in Example 1.1—or maximize/minimizesome variable expectation—e.g., minimize the probability of failure in a network [89]. We propose to useprobabilistic variants of logics as a natural mechanism to specify such problems. While even verificationis hard in this setting, we show that there is a new opportunity to create algorithms that provideapproximate guarantees using concepts such as Probably Approximately Correct (PAC) learning [60].

Research Thrust 2: Synthesis with Probabilistic Objectives and Guarantees The last point of Thrust1 dwells with probabilistic specifications. Our second thrust is entirely dedicated to solving problems inthis realm where the goal is to design new synthesis algorithms that can (i) generate programs that satisfyprobabilistic objectives, and (ii) run efficiently when operating on certain probabilistic inputs.

• Synthesis for probabilistic programs with finitely many outputs In many applications of probabilistic pro-gramming, programs produce a finite number of possible outputs—e.g., what router to forward trafficto next based on some finite packet history—and the probabilistic objectives can require to minimizethe probability of a certain bad event—e.g., router congestion. We formally investigate the problem ofsynthesizing programs in this setting. We draw inspiration from the field of computational learningtheory [60] and show how we can reduce the complex problem of synthesizing programs that are cor-rect on potentially infinitely many inputs to the problem of synthesizing programs that are correct ona small finite number of input-output examples. In our preliminary work, we show that under certainassumptions, it is enough to synthesize programs that are consistent with polynomial many examplesto get probabilistic guarantees over the whole input distribution the program operates on [4].• Synthesis for probabilistic programs with infinitely many outputs When probabilistic programs require an

infinite output space—e.g., a program that decides the size of the TCP protocol congestion window—many of the existing tools from computational learning theory stop working, making the problem


challenging. We therefore propose to design new theoretical tools that are tailored for the types ofspecification appearing in synthesis with probabilistic guarantees. For example, we propose to leveragethe predicates appearing in the probabilistic specification of the synthesis problem to extract finitelymany classes of outputs that can be used to reduce the synthesis problem to the one with a finitenumber of outputs.• Probabilistic foundations for programming by example Most synthesis algorithms incur in large theoretical

complexities due to worst-case behavior, but perform well in practice [25]. We propose to re-analyzeexisting algorithms for programming by example under the theory of random graphs and show thatwhen the algorithms operate on inputs drawn from certain probability distributions the algorithms willbehave well with high probability. We then propose to design new algorithms that are aware of theinput distribution and can therefore perform well in practical applications.

Education Plan We will integrate our research and our education plan through the following activities.

• Curriculum development Since his arrival at the UW-Madison’s Computer Sciences department, the PIhas been teaching both graduate and undergraduate programming languages courses. We plan tomodernize programming paradigms education by focusing on program synthesis; specifically, we willcarefully design an undergraduate class that teaches synthesizer development, while retaining founda-tional traditional concepts. One of our primary goals is to inspire students to transfer synthesis ideasto their future careers in industry. (See Sec. 6 for a detailed rationale.)• “Visual art with synthesis” workshops for undergraduates To further publicize program synthesis, we will

establish annual “Visual art with synthesis” workshops for undergraduate students. The primary goalof the workshops will be to expose students to program synthesis technologies and let them use suchtechnologies in new creative ways. As detailed in Sec. 6, the PI will take concrete action in creating aninclusive environment that is welcoming to women and students from under-represented minorities.• Awareness, outreach, and undergraduate research Looking forward to the potential of automated program

synthesis, the PI is concerned about its use to replace tasks in the labor force—e.g., TAs with automaticgraders. To introduce students to this issue, we plan to organize a series of lectures that look at thisproblem from various angles [71]; we will ensure that the lectures are accessible to everyone and benefitthe campus community at large. Since the PI strongly believes that getting hands-on experience inan exciting research environment is crucial to motivate students to pursue graduate studies, we havebudgeted funds for undergraduate student support. In fact, the PI has already advised 5 undergraduatestudents on program synthesis projects during his first 2 years at UW-Madison.

1.2 PI Preparation and Preliminary WorkThe proposal outlines an ambitious research and education program and the PI is confident he will com-plete the proposed tasks during the allotted time frame (see Sec. 7 for a timeline). To better substantiatethe above claim, the PI enjoys breadth and depth of expertise in formal methods, program synthesis, andprogram verification, he has already published preliminary work related to this proposal, and he has agood record of involving undergraduate students in his research.Expertise The PI’s recent works on program synthesis have appeared at premier verification, program-ming languages, and software engineering venues—CAV [38, 4], TACAS [43], POPL [23, 89], PLDI [58],ICSE [82], and FSE [25]—showing evidence of the community’s interest in the PI’s line or research. Inaddition, the PI is an expert in formal methods [6, 27, 7, 26, 34, 29, 36, 30, 50, 32, 33, 31, 39], programverification [74, 9, 67, 28, 5], and applying synthesis to personalized education [8, 35, 37, 56, 90]. Hisdissertation won the Morris and Dorothy Rubinoff award at the University of Pennsylvania.Preliminary work The PI and his graduate students have recently published papers on synthesis forautomatic program inversion [58] and synthesis for probabilistic programs [4]. The first paper inspiredResearch Thrust 1 and motivates the need for a framework for formally reasoning about quantities inprogram synthesis. The second paper inspired Research Thrust 2 and is our first milestone on the problemof synthesizing program with probabilistic objectives. In addition, the PI has performed preliminary workon several of the proposed research tasks, which will enable rapid progress and dissemination of results.Further, as a product of his recent work, the PI’s group has built flexible synthesis tools—e.g., Genic [58],


Genesis [89], and digits [4]—which will be used as a platform for evaluation and experimentation, as wellas enablers of teaching and student involvement.Student involvement The PI is committed to enriching undergraduate and graduate student experiencethrough education, research, and outreach. For example, he has recently organized the ProgrammingLanguages Mentoring Workshop (PLMW) co-located with the Conference on Principles of ProgrammingLanguages POPL’17 [76]. In the process, he raised more than $60,000 to allow 46 graduate and under-graduate students to attend POPL 2017. Of these students, 13 were women and 5 were members of otherunder-represent minorities. The PI has also been an advocate of undergraduate student research—servingas a judge for the undergraduate and graduate student research competition at POPL 2016 [87] and work-ing with 5 undergraduate students in the last two years on developing synthesis tools.

1.3 Broader Impacts of the Proposed Work

Impacts on society Our proposed research will produce reliable synthesis mechanisms that will, amongother things, enable computer users to perform error-prone and time-consuming tasks in an automatedfashion. Thus, our work can democratize computing and make simple forms of programming moreaccessible in the era of everyone wants to be a programmer. All our tools will be made open-source.Impacts on programmers Our algorithmic investigations will deepen our understanding of programsynthesis and lead to tools that scale to larger domains, therefore increasing programmer productivity andleading to software that is correct by design. As these tools get incorporated in modern software-designenvironments, their broader impact will continue to manifest. Moreover, as students who are involvedin our outreach activities enter the workforce, we expect that they will spread the acquired knowledge tonew and exciting domains, thus affecting software development as well as society at large.Impacts on education We lay out a comprehensive education and outreach plan in Sec. 6. In our priorwork, we used program synthesis to build education tools that have helped thousands of students andmany teachers around the world, particularly in developing countries with limited access to higher edu-cation [35, 37]. We also intend to incorporate our technologies in tools for education. At the teaching level,we will amplify our impact on education through modernized curriculum, inclusive synthesis-themedworkshops, and undergraduate involvement in research. Finally, we will educate students about the po-tential of synthesis to replace jobs and the corresponding countermeasures through a series of lectures.

2 Background on Program Synthesis with Quantitative ObjectivesIn this section, we present an overview of synthesis techniques that handle quantitative objectives.Program synthesis The goal of program synthesis is to find a program in some search space that meets agiven specification, which can be given as a set of examples, a logical formula, or in other ways. Recently,many problems in this paradigm have been unified into a framework called syntax-guided synthesis(SyGuS). A SyGuS problem is specified by a context-free grammar, which describes the search space ofthe synthesizer, and a logical formula, which describes the specification. Many synthesizers now supportSyGuS inputs and can be used for any problem that can be encoded in SyGuS, making this frameworkof great practical value. Existing synthesizers can be grouped in three categories: (i) enumerative syn-thesizers, which search for a program using smart brute-force algorithms [11]; (ii) symbolic synthesizers,which solve the search problem using constraint solving [86]; (iii) stochastic synthesizers, which explorethe search space using stochastic methods such as Montecarlo Markov Chain search [83].Adding quantitative objectives The Boolean specification mechanism provided by SyGuS is powerful andcan capture the functional requirements of many synthesis problems. However, problems that also havenon-functional quantitative requirements are currently behind the ability of SyGuS—e.g., one cannot aska SyGuS solver to return the smallest program that satisfies a specification. Several synthesis approacheshave attempted to include some form of quantitative specification to express this kind of preferences.

One domain where quantitative specifications arise is automatic grading of programming assignments,where the problem is to synthesize a modification to a program written by a student to make it pass a setof test cases given by an instructor [85, 38]. Here, the quantitative objective is to find the program that issyntactically closest to the one written by the student. Existing techniques solve this problem by simplyaugment the synthesizer with some counter that keeps track of the difference between the synthesized


program and the student’s one. The synthesizer then uses a linear search to find a program of smallestcost [85]. While the problem of repairing a program can be encoded in SyGuS, it is currently not possibleto specify the requirement of producing the smallest repair. A more formal attempt at incorporatingsyntactic costs in program synthesis has been proposed by Bornholt et al. [18]. In this approach, theSyGuS search space is structured into an ordered set of individual search spaces of increasing size, whichallows modelling syntactic costs over, for example, the domain of natural numbers.

In general, existing approaches are domain specific, somewhat cumbersome, and cannot handle quan-tities pertaining to semantic properties of the synthesized program. In particular, current mechanismscannot synthesize programs that agree with as many examples as possible or programs that satisfy probabilis-tic objectives. The goal of this proposal is to design a unified framework for specifying synthesis problemswith quantitative objectives and to provide new techniques that leverage such a framework to efficientlysolve new synthesis problems. To achieve our goals, in Research Thrust 1 (Sec. 3), we propose a SyGuSframework with new formal specification mechanisms that can express complex quantitative objectivesand enable new synthesis algorithms. In Research Thrust 2 (Sec. 4), we propose to address the challengeof synthesizing programs in the presence of probabilistic objectives.

3 Thrust 1: A Quantitative Framework for Syntax-Guided SynthesisIn this thrust, we propose to address three core formal problems with the goals of defining a unifyingframework for describing synthesis problems with quantitative objectives and designing new algorithms for solvingsynthesis problems expressed in this framework.Background Existing program synthesis tools can efficiently synthesize programs that adhere to a givenfunctional specification—e.g., the program performs correctly on a given set of input/output examples.When multiple programs adhere to the specification these tools do not provide guarantees about whichof these programs is returned by the synthesizer. Solvers based on program enumeration typically outputthe syntactically smallest program [2], symbolic solvers output whatever program is produced duringthe constraint solving [86], and probabilistic synthesizers are unpredictable [83]. Certain solvers employcomplex hard-coded ranking functions that guide the search towards a “preferable” program [77], butthese functions are hard to write and are decoupled from the functional specification. In summary, existingspecification mechanisms only deal with functional requirements and do not provide explicit ways to rankprograms to express which one should be preferred if multiple solutions are possible. The lack of a formaltreatment of quantitative requirements stands in the way of designing synthesizers that are aware ofquantitative objectives and can use such objectives to perform smarter forms of synthesis. In this thrust,we attack this problem and introduce a unifying framework for synthesis problems with quantitativeobjectives together with new synthesis algorithms that take advantage of our framework.

The ideas presented in this thrust build on the SyGuS framework. A SyGuS problem is specified withrespect to a background theory T—e.g., linear arithmetic—and the goal is to synthesize a function f thatsatisfies two constraints provided by users. The first constraint describes a functional semantic property thatf should satisfy and is given as a predicate ψ( f ) def

= ∀x.φ( f , x). The second constraint limits the search spaceof f and is given as a set S of expressions specified by a context-free grammar defining a subset of all theterms in T. A solution to the SyGuS problem is an expression e in S such that the formula ψ(e) is valid.

Example 3.1. The following SyGuS problem asks to synthesize a function that computes the max of twonumbers. The syntactic constraint is given by the following context-free grammar.

Start ::= Start + Start | if(BExpr) then Start else Start | x | y | 0 | 1 BExpr ::= BExpr > BExpr | ¬BExpr | BExpr∧ BExpr

The semantic constraint is given by the following formula.

ψ( f ) def= ∀x, y. f (x, y) > x ∧ f (x, y) > y ∧ ( f (x, y) = x ∨ f (x, y) = y)

Two possible solutions are the following semantically equivalent, but syntactically different programs.max(x, y) = if(x > y) then x else y max′(x, y) = if(x > y) then x else (if(y > x) then y else x)

3.1 Weighted Tree Grammars for Quantitative Syntactic SpecificationsThe most common quantitative requirements are those that somehow restrict the size of the synthesizedprogram. For example, in synthesis from input/output examples, it is desirable to produce smaller pro-grams that are more likely to generalize to examples outside of the specification [54] and, in automated


feedback generation for programming assignments, the “repaired” program should be syntactically simi-lar to the incorrect one the student wrote so that the student can understand what mistake was made [38].SyGuS does not provide ways to specify requirements on the size of the program and existing synthesizerscannot directly support these kinds of objectives. Therefore we propose to tackle the following question.

Research question 1 Can we design a SyGuS framework that supports quantitativespecifications on the syntax of the synthesized programs?

This question poses a number of fundamental technical challenges: (i) What is a good formalismfor describing quantitative syntactic requirements? (ii) How can we solve the synthesis problem in thepresence of such requirements? (ii) Can we reuse existing SyGuS solvers to solve such problems?Action plan Since SyGuS defines the synthesis search space using context-free grammars, we propose toextend this formalism with weights to assign costs to programs in the grammar. Before starting, we decideto focus our attention on a restricted class of context-free grammars called regular-tree grammars because,to our knowledge, all the benchmarks appearing in the SyGuS competition [10] and in practical applica-tions of SyGuS operate over regular tree grammars. Moreover, it was recently shown that many synthesisproblems that are undecidable for SyGuS become decidable when using regular tree grammars [20]. A reg-ular tree grammar is a context-free grammar in which each terminal symbol is a term with an associatedarity—e.g., x + y is a term +(x, y) where the + symbol has arity 2—i.e., it has two children.

To add quantities to the picture, we adopt weighted tree grammars [45], a well-studied formalism withmany desirable properties. A weighted tree grammar (WTG) is a tree grammar in which each produc-tion is assigned a weight from a semiring (S,⊕,⊗, 0, 1). Common examples are the tropical semiring(N, min,+, ∞, 0) and the probabilistic semiring ([0..1],+, ∗, 0, 1). The cost of a derivation is the ⊗-productof all the productions involved in the derivation and the cost of a term is the ⊕-sum of the costs of all itspossible derivations. Given a weighted tree grammar G and a term f , we use cost(G, f ) to denote the costof f in G. A weighted tree grammar can, for example, compute the height of a term or the number ofoccurrences of some element in a term—e.g., the number of constants or the number of if statements. ASyGuS problem with quantitative syntactic objectives is given by a functional constraint ψ( f ), a weighted treegrammar G, and a constraint on the weight of the program—e.g., minimize(cost(G, f )) or cost(G, f ) < 10.

Example 3.2. Consider again the grammar in Example 3.1. The following weighted grammar over the(N, min,+, ∞, 0) semiring assigns to each function a cost corresponding to the number of if statements inthe function. We write RHS/k to assign cost k to a production RHS and omit the cost for productionswith cost 0.

Start ::= Start + Start| if(BExpr) then Start else Start/1| x | y | 0 | 1

BExpr ::= BExpr > BExpr| ¬BExpr| BExpr∧ BExpr

The functions max and max′ from Example 3.1 have costs 1 and 2, respectively, and max is a function thatminimizes the syntactic cost cost(G, f ). In this case, there doesn’t exist a function of cost 0.

Aside from being a natural formalism, WTGs enjoy properties that make them a good choice for ourmodel. First, WTGs are equi-expressive to weighted tree automata and have logic characterizations [24, 44,45]. Second, WTGs enjoy many closure and decidability properties—e.g., given two WTGs G1 and G2, wecan compute the grammars G1 ⊕ G2 and G1 ⊗ G2 such that, for every f , cost(G1 ⊕ G2, f ) = cost(G1, f )⊕cost(G2, f ) and cost(G1 ⊗ G2, f ) = cost(G1, f )⊗ cost(G2, f ). WTGs are also closed under tree substitutionand Kleene star. These properties open many research opportunities, some of which we discuss here.Search using closure properties Consider a WTG G that operates over the (N, min,+, ∞, 0) semiring.

Given a constant k, one can effectively construct a regular-tree grammar G(k) that only produces termsin G with weight smaller or equal than k [45]. Using this property, we can use existing SyGuS solversto find the program of smallest cost. First, we can use a solver to generate any solution to the SyGuSinstance and ignore the weight. Second, we can restrict the grammar to only produce programs withsmaller cost than the synthesized one. We can then repeat until the smallest program is found.

Combining objectives Using properties of semirings, we can combine multiple WTGs into a single gram-mar over a product semiring and generate more complex objectives—e.g., minimize one quantity andthen the other, or find a Pareto optimal program. This property can be used to define quantitativemetrics in a modular fashion and perhaps solve the synthesis problems modularly.


Minimization WTGs and weighted tree automata can be minimized [64]. This property can be used tosimplify the grammars and speed up symbolic solvers, which encode the structure of the grammar as aconstraint. Notice that this is not the case for context-free grammars, the current model of SyGuS.

Search as learning WTGs over the probabilistic semiring, also called probabilistic tree grammars (PTGs),assign probabilities to programs in the grammar. PTGs can be learned from data [19, 66] and we canuse this property to design probabilistic synthesizers for domains for which we have many examplesof good and bad programs (according to some definition). In particular, we can use the examples togenerate a PTG and use it as the search space to our SyGuS program. We can then, for example, designan enumeration based solver that instead of searching for programs in a breadth-first manner, exploresthe search space preferring programs with higher probabilities first. Using the closure under ⊕, ⊗,and intersection we can also define variants of the search problem in which we restrict the grammar tocertain templates where the solution is likely to reside, a common technique in program synthesis [86].

This foundational framework is general, enjoys desirable properties, and it opens opportunities for devel-oping more formal and well-reasoned solvers. Our formulation also opens questions about the decidabilityand complexity of SyGuS with quantitative objectives. Moreover, the formalism could be extended to morecomplex models such as attributed tree grammars [13] and capture more complex objectives.Preliminary work The PI and his student, Qinheping Hu, have compiled more than 200 benchmarksthat can be encoded in this framework from their PLDI’17 paper [58], that same paper that providedthe inspiration for the proposed framework. In the paper, SyGuS was used to invert complex bit-vectorfunctions and we observed that some solvers were producing very large outputs—e.g., CVC4 [81] oftenreturned functions with thousands of terms when a program with 7 terms existed. Qinheping has adesigned a format for enriching SyGuS with weights and has encoded the problems from our paper inthis format, giving us an initial suite of benchmarks of different natures: in some we minimize the sizeof the functions and in others we minimize the number of constant terms. In our preliminary results, wemanually restricted the SyGuS grammars to produce terms of size at most k and we showed that someof the existing SyGuS solvers [2] can find optimal solutions in the same amount of time taken withoutquantitative objectives. Finally, the PI and his student curate two automata libraries [3, 1] and alreadyhave many of the starting components needed to implement tree grammar operations. We also plan tobuild on Tiburon [66], a tool that supports training algorithms for probabilistic WTG.Related work FlashMeta [51, 77] uses ranking functions to prefer small programs. The functions arehard-coded in the synthesizer and specialized for certain problems. Synapse [18] supports syntactic costfunctions that are defined separately from the SyGuS grammar using a decidable theory. Synthesis is doneusing an iterative search where sketches of increasing sizes are given to the synthesizer. The examplesprovided in [18] can be naturally encoded in the framework we presented using appropriate choices ofsemirings. Similarly, the synthesis technique discussed in [18] can be naturally adapted to our frameworkand falls in the special case in which the semiring is over a positive countable domain. In summary, ourframework works in richer domains—e.g., probabilities—and it couples the syntactic cost directly withthe search space, therefore enabling properties like minimization and closure under semiring operations.

3.2 Weighted Logics for Quantitative Semantic SpecificationsIn some synthesis domains, it can be useful to impose quantitative objectives over semantic properties ofthe program. For example, when synthesizing spreadsheet transformations [42], the data available to thesynthesizer may be noisy and some of the examples might contain typos or have an incorrect format. Inthis case, there might not be a program in the search space that is consistent with all the examples andone might want to instead find a program that is consistent with most of the them, a very common trade-off in machine learning. Similar issues arise in quantitative program repair [38]. Existing specificationframeworks do not provide ways to specify such a requirement and existing synthesizers do not providegood support for these kinds of objectives. Therefore we propose to tackle the following question.

Research question 2 Can we design a SyGuS framework that supports quantitativespecifications over semantic properties of the synthesized programs?

This question leads to several challenges, which involve picking a formalism that is adequate to specifycommonly occurring objectives and that enables new efficient forms of synthesis.


Action plan Since SyGuS specifies semantic requirements using logic, we propose to enrich this formalismand assign costs to “subformulas” in the specification using logics with quantitative objectives [46]. In thefollowing, we only focus on the variants of such logics that are pertinent to our proposal.

One of the simplest quantitative extension of first order logic is the MaxSMT problem, which is de-fined as follows: given a set of first order formulas Ψ0, . . . , Ψn over some theory and associated weightsw1, . . . wn, find a subset M ⊆ {1, . . . n} such that (i) Ψ0 ∧

∧i∈M Ψi is satisfiable and (ii) the award ∑i∈M wi

is maximized (or minimized). The constraints Ψ1, . . . , Ψn are soft constraints while Ψ0 is a hard constraintthat must be satisfied by the solution. This quantitative extension preserves the decidability of the under-lying logic and it has many applications in programming languages [16]. When the functional requirementis given as a MaxSMT problem, the goal is to find a program in the SyGuS search space that, when re-placed in the formula, maximizes the sum of the weights. More powerful logics we plan to investigateallow linear objectives over numbers appearing in the formula while retaining decidability [63, 16].

Example 3.3. Consider the grammar from Example 3.1. If we assume programs can use at most one ifstatement—we can set this constraint using WTGs—no program can realize the following specification.

ψ( f ) def= f (2, 3) = 3∧ f (4, 1) = 4∧ f (6, 3) = 6∧ f (5, 5) = 6

If we modify the specification to a weighted one in which each conjunct has weight 1, the program max isa solution that assigns weight 3 to the formula—i.e., the maximum possible weight.

Modeling semantic objectives using weighted logics opens a variety of exciting research opportunities.Parallel incremental solvers for quantitative objectives The solutions to a max MaxSMT specification

are monotonic—i.e., a solution that satisfies a set of formulas C also satisfies all the subsets of C.Using this property, we can re-design divide-and-conquer synthesis techniques [11], which search forprograms that satisfy subformulas in a functional specification, to incrementally (and in parallel) searchfor programs that satisfy subsets of clauses with increasingly large weights until an optima is found.

Combining objectives Problems might combine syntactic and semantic costs—e.g., synthesize the small-est program that satisfies as many examples as possible. We plan to investigate variants of quantitativelogics that could help unify these different types of costs. A particular appealing logic for such a task isthe Semiring-Induced Propositional Logic [61], which assigns weights from a semiring to assignmentsof a formula, and could be used to unify the weights assigned by the formula and by WTG.

Preliminary work We have identified several problems that can be modeled in this framework and havecompiled an initial set of benchmarks from synthesis from noisy data [78] and our previous work onpersonalized education [38] where the goal is to find programs that pass a set of test cases and havesimilar semantic behavior to another program. We are also collecting problems from quantitative networkrepair [89], another domain where the PI is active. We will use these benchmarks to drive our research.Related work Classic synthesis algorithms have been adapted to work over noisy data [78, 42]. Theseapproaches are tailored to this domain and do not handle general quantitative semantic objectives.

3.3 Probabilistic Logics for Probabilistic SpecificationsIn certain applications—e.g., automated decision making [72, 4]—programs operate over inputs drawnfrom probability distributions and the synthesis specification can impose constraints on the expectedoutcomes of the program. Almost no tool support exists for specifying and solving synthesis problemswith probabilistic objectives. Therefore, we propose to tackle the following question.

Research question 3 Can we design a SyGuS framework that supports quantitativespecifications on probabilistic outcomes of the synthesized programs?

Action plan We discussed in Sec. 3.1 how WTAs can associate probabilities to terms in the SyGuS gram-mar. In this section, we propose to also allow the functional specification to use probabilistic constructsand have the following form.

ψ( f ) def= RD(x). φ( f , x)

Here, RD(x) (notation from [75]) denotes that inputs are randomly drawn from a distribution D and φcan contain expressions of the form P(p), which denote the probability of an event p.


1 Procedure digits(P,D, φ, S, n)Input : P s.t. RD(x). φ(P, x) does not holdOutput: P′ ∈ S s.t. RD(x). φ(P′, x) holds or ⊥

2 I ← n samples from D, repairs← ∅3 foreach sets I+, I− that partition I do4 P′ ← synth(I+, I−)5 if P′ 6= ⊥ and RD(x). φ(P′, x) then6 repairs← repairs∪ {P′}7 if repairs 6= ∅ then8 return P′ ∈ repairs with minimal PD(P 6= P′)9 return ⊥

Program P

Sampling fromdistribution Synthesizer

Probabilistic verifier

Input Distribution

Probabilistic Specification

Samples fromDistribution

CandidateRepair P’

Candidateaccepted/rejected

Fig. 2: Distribution-Guided Inductive Synthesis algorithm and its overview.

Example 3.4. Consider the grammar in Example 3.1 and assume again that we can use at most one ifstatement. Consider the following specification, where D(x) = D(y) = U(0, 1) says that both variables xand y are drawn independently from the uniform distribution between 0 and 1:

ψ( f ) def= RD(x, y). 0 6 f (x, y) 6 1∧P( f (x, y) = x) = 0.5∧P( f (x, y) = y) = 0.5

For example, programs that compute the max or the min of x and y are solutions to this problem.

This foundational formalism can also be extended to allow maximization of certain probabilities, open-ing new interesting algorithmic opportunities, some of which we will describe in Sec. 4.Preliminary work We have designed a probabilistic synthesis algorithm [4] for a restricted class of pro-grams expressible in the framework we just presented. We will describe this algorithm as well as newproposed ways to synthesize programs with probabilistic objectives in Sec. 4.

Related work Chistikov et al. [21] have proposed efficient approximate solutions for the problem ψ( f ) def=

RD(x). φ( f , x) where f is a Boolean function, all variables in x are Boolean and all distributions areuniform. Our goal is to tackle the general synthesis problem beyond the Boolean domain.

4 Thrust 2: Synthesis with Probabilistic Objectives and GuaranteesTo follow the last topic of Thrust 1, in this thrust, we investigate the problem of designing algorithms forsynthesizing programs that operate over probabilistic input distributions.

4.1 Synthesis for Probabilistic Programs with Finitely Many OutputsWe start by considering programs that output finitely many possible values—e.g., machine learned clas-sifiers or network switches deciding which switch to forward traffic to. In the following, we write P(x) todenote the output of P on input x and assume that the program inputs are distributed according to someprobability distribution D (recall Sec. 3.3). Without loss of generality, we assume the program returns aBoolean value as this assumption can be easily extended to finitely many values.

Research question 4 Can we solve the SyGuS problem for functions with finitely manyoutputs in the presence of probabilistic objectives?

Preliminary work In our CAV’17 paper [4], we presented a new algorithm called distribution-guidedinductive synthesis (digits), for repairing probabilistic programs, a problem that falls in the frameworkpresented in Section 3.3. The overall flow of digits is illustrated in Figure 2. Suppose we have a programP such that RD(x). φ(P, x) does not hold. The goal of digits is to construct a new program P′ that satisfies

RD(x). φ(P′, x) and is semantically close to P—i.e., PD(P 6= P′) def= P(P(x) 6= P′(x) | x∼D) is small. digits

learns the correct repair from a finite set of samples by tightly integrating the following three phases:Sampling digits begins by sampling a finite set I of program inputs from D—we call I the set of samples.

The set I is used to sidestep having to deal with arbitrary distributions directly in the synthesis process.Synthesis The second step is a synthesis phase, where digits searches for a set of candidate programs{P′1, . . . , P′k} in the search space S where each P′i classifies the set of samples I differently.

Quantitative verification Every generated candidate program P′ is checked for correctness using anautomated probabilistic inference technique and digits outputs the repair semantically closest to P.


Example 4.1. Recall Example 1.1 and the problem of synthesizing a network switch that divides trafficbetween two next-hop switches. Packets arrive following a distribution D and the switch has to decidewhether to forward each packet to switch s1 or switch s2. Assume we have a forwarding function P thatviolates the following property, which states that traffic is split almost equally among s1 and s2.

RDx. P( f (x) = s1) > 0.45∧P( f (x) = s2) > 0.45We want a program P′ that satisfies the property and, since reconfiguring switches incurs in a cost, werequire PD(P′ 6= P) to be as small as possible. digits solves this problem by sampling inputs from thepacket distribution D and synthesizing a program for each possible way to output s1 and s2 on the inputs.

To implement digits, one needs to provide two components: a) the procedure synth that producesprograms consistent with labeled examples and b) a (sound) probabilistic inference algorithm to checkwhether the synthesized program satisfies the target property and to compute PD(P 6= P′). We assumesuch components are given as other tools for such problems are available [5, 49]. Remarkably, nothingprevents the procedure synth from using the techniques for syntactic and semantic quantitative synthesispresented in Sec. 3, tying together all the quantitative objectives supported by our framework.

The digits algorithm is simple, but enjoys interesting convergence properties, which we now describein detail. First, we need to ensure that there are enough programs close to the optimal solution P∗ thatsatisfy ψ( f ); For a program P and α > 0, we define the set Bα(P) = {P′ ∈ R | PD(P 6= P′) 6 α} ofprograms close to P. We say that the pair (P, ψ) is α-robust iff ∀P′ ∈ Bα(P). ψ(P′) holds. Second, werecall the concept of Vapnik–Chervonenkis (VC) dimension from computational learning theory [60, 17],which we use to capture the expressiveness of the search space S. We say that the search space S has VCdimension k if k + 1 is the smallest number for which, for any set of samples I of size k + 1, there existstwo sets I+ and I− that partition I for which no program P′ ∈ S is consistent with I+ and I−.

Theorem 4.1 (Convergence of digits [4]). Assume (P∗, ψ) is α-robust for some α>0 and is k the VCdimension of the search space S. For all bounds 0<ε6α and δ>0, function synth, and n>poly(ε, δ, k), withprobability >1− δ digits enumerates a program P′ with PD(P∗ 6= P′)6ε and RD(x). φ(P′, x).

In words, digits converges to the optimal solution using polynomially many samples and, for searchspaces for which we manually proved upper bounds on the VC dimension—e.g., bounded loop-freeprograms—digits could compute good repairs for programs with <50 lines of code in <10 minutes.Action plan Our preliminary work opens new doors to exciting problems.From repair to synthesis Our convergence result relies on the structure of the repair problem—i.e., digits

converges depending on whether the property ψ is robust with respect to the optimal repair P∗. Our firstgoal is to adapt digits to solve synthesis problems and to identify a different condition for convergence.We conjecture that digits converges whenever the set of programs in S that satisfy ψ has non-zeromeasure, meaning that the probability of finding some program satisfying ψ is greater than 0. Webelieve this condition to be sufficient and necessary. We then plan to extend the result to synthesizingprograms that maximize or minimize certain probability expectations—e.g., minimize the probability ofnetwork failure. A unique challenge is that, unlike for functional synthesis, a probabilistic specificationtypically does not specify what the output of the program should be on a certain input.

Scaling digits digits currently scales to small numbers of samples because it tries all possible outputcombinations for the sampled inputs. We propose to modify digits to only output combinations thatare “close” to those needed to prove the probabilistic property. For example, if the goal is to synthesizea program P such that P(P(x) = true) > 0.8, it will be unlikely that, given n samples from D, a programthat returns false on all such samples (or even on 50% of them) produces a useful solution. We plan toformally derive provable ε/δ-guarantees for this kind of sampling strategies.

Measure VC dimension Our convergence bounds require knowing the VC-dimension of the search spaceand computing this quantity is a Σp

3 -hard problem [70]. Although exact solutions are hard to compute,we plan to investigate other quantities, such as the Rademacher complexity [69], which can be efficientlyestimated and could lead to similar testable convergence results.

Related work In computational learning theory [60], the goal is to provide provable guarantees for ma-chine learning algorithms. In classic machine learning there are no functional requirements imposed on


the learned program and the search space is typically well-behaved, unlike what happens in synthesis.Other techniques for synthesizing programs with probabilistic objectives have no provable guarantees [72].The marriage between synthesis and computational learning theory we propose, can explain why synthe-sis works in this domain and opens new algorithmic opportunities.

4.2 Synthesis for Probabilistic Programs with Infinitely Many OutputsWe next consider programs that output infinitely many values—e.g., a controller that sets the temperatureof an air conditioner or a program that decides the size of the TCP protocol congestion window.

Research question 5 Can we solve the SyGuS problem for functions with infinitely manyoutputs in the presence of probabilistic objectives?

Action plan The techniques we presented in Section 4.1 require the output of the program to be finite-valued so that one can try all the possible different output combinations when performing synthesis. Wepropose new fundamental ideas for using such techniques even for synthesizing programs with infinitelymany possible output values.

Output refinement As we explained earlier, a probabilistic specification might not tell us what the outputof the program should be on a certain input. While in the finite output case one can simply try allpossible outputs, this is not possible in the infinite setting. We propose to use a notion of outputrefinement in which we reduce the problem of synthesizing a program that can output infinitely manyvalues to the problem of synthesizing one that outputs finitely many. Concretely, we propose to proceedin two phases. First, we define a partitioning of the output space into finitely many classes—e.g.,positive and negative numbers. Second, we synthesize a program that symbolically outputs in thoseclasses. If the process fails, we further refine the classes into smaller ones and repeat. This approachrequires to carefully choose the classes based on the property we want our synthesized program tosatisfy. For example, if we are trying to satisfy P(P(x) > 2) > 0.5, it will be natural to, at first, partitionthe program outputs into those greater or equal than 2 and those that are not.

Convergence While the partitioning technique is sound, it is unclear whether and when it will terminateor converge to optimal solutions. To provide convergence guarantees, we plan to study new notions ofVC dimension and Rademacher complexity that are tailored to our symbolic approach. In particular,our output refinement technique induces finite sets of output values and we want to understand howto define and compute symbolic notions of the VC dimension that operate over symbolic outputs.

Preliminary work Our investigation will build on our open source implementation of digits [4].Related work We are not aware of a similar problem in classic computational learning theory. The closestis perhaps the work on neural networks with continuous output values [12]. However, the challenges weface are unique to having a combination of probabilistic and functional specifications.

4.3 Probabilistic Foundations for Programming by ExamplesThe next problem is on a somewhat different space than the two we just described and is a foundationalapproach to trying to understand certain classes of synthesis problems. In particular, we are interestedin explaining why certain synthesis algorithms based on the concept of version space algebras (VSA) [62]work in practice despite their high theoretical complexities.Background VSA synthesis techniques are used to succinctly represent the set of all expressions consistentwith a given set of examples [55]. For certain search spaces (usually called domain specific languages),one can efficiently compute the VSA representation at least in practice [55, 82, 25]. However, it is often thecase that adding more features to the language makes VSA impractical for larger numbers of examples.

Consider a language for computing string transformations that only contains the operator substring(l, r)where l and r are integers—e.g., substring(3, 5) extracts the substring of the input between the third andfifth character, while substring(1,−4) extracts the substring between the first and fourth to last character.Given a set of examples one can easily find all the substring expressions consistent with the examples; thereare in fact at most four. The problem becomes more complicated if we add an if-then-else operator to thelanguage that allows to check whether a string starts with a certain substring. Consider the input/outputexamples abc→bc, abce→bce, and ccc→ccc. In this case, a single substring expression cannot describe all


the examples, but the following program can: if(startsWith(a)) then substring(1,−1) else substring(0,−1).The attentive reader will notice that this new operator is very powerful and can generate programs con-sistent with almost any set of examples. In fact, the if-then-else construct is a very common operator insynthesis for string transformations and in VSA-based synthesizers [51, 82, 25]. Given the ubiquity of thisoperators, we propose to formally study algorithms that operate over languages containing it.

We assume that the search language has the following form E ::= if(B) then E else E | L, where B issome Boolean expression language and L is a DSL that cannot generate terms in E. For such a language,the VSA is built by synthesizing all programs in L that are consistent with every subset of the examples.Programs for different subsets are then glued together using the if-then-else construct. This approachrequires exploring exponentially many subsets, making VSA approaches theoretically impractical. Re-markably, for many practical applications and real sets of examples [25, 82], the exponential behavior doesnot manifest because, for most subsets of the examples, there is no program in L consistent with them. Weclaim that this phenomenon can be mathematically explained by understanding over what probabilisticdistributions of examples the VSA operates on. We therefore ask the following question.

Research question 6 For what input distributions do VSA algorithms perform well?

Action plan Given a set of examples I we propose to build an undirected graph G = (V, E) with nodesV = I and edges between any two examples i and i′ for which there exists a program in L (the languagewithout if-then-else) consistent with both i and i′. We can then characterize the number of subsets ofI considered by the VSA algorithm using this graph. For any edge (i, i′) the VSA clearly generates anon-empty set of programs consistent with the set {i, i′}. Similarly, for any two nodes i and i′ that arenot connected by an edge, there won’t be any set of examples I′ containing both i and i′ in the VSA.This means that the largest set of examples in the VSA is smaller than the largest clique in the graph G.Therefore, if cliques are not “too large”, the VSA algorithm will run efficiently in practice.

Probabilistic graph models We propose to use probabilistic graph models—e.g., random graphs—toexplain why VSA algorithms perform well in practice [40]. In particular, if the input distributionof a problem can be modeled using a certain type of probabilistic graph model we can analyze theexpected size of cliques, their distributions, and use them to better understand why on certain inputsthe algorithms perform well. We will show some positive results in our preliminary work.

Distribution-aware VSA If the input distribution is known a priori, we can use our probabilistic modelsto design new synthesis algorithms that are aware of such a distribution. For example, if the input isdistributed according to a power law graph [40], most of the nodes will be part of a potentially verylarge clique, causing existing algorithms to run poorly. However, an algorithm that is aware of theinput distribution will detect which cliques are the most represented ahead of time and it will avoidconsidering all its subsets. Such an algorithm will build a correct VSA with high probability.

Testing input distribution type To designing algorithms that behave well for certain input distributions,we need to devise mechanisms to detect what distribution the input examples resemble most. We willadopt and extend existing graph detection theory techniques [68] to perform this task.

Ours is a new fundamental approach to a challenging problem and no similar algorithmic analyses havebeen developed to explain why synthesis algorithms perform well in practice.Preliminary work We collected around 500 examples from the web implementation of our tool No-FAQ [25], which learns how to fix command line errors by allowing people to provide examples of fixes.NoFAQ uses VSA to synthesise programs in a language that follows the grammar we presented earlierand, based on our preliminary investigations, the collected data behaves similarly to a random graphwhere each edge appears with probability 0.05; in a random graph nodes are given and each edge isadded at random with a certain probability p. In a random graph with n nodes and edges appearing withprobability smaller than 0.5, the largest clique has size O(log n) with high probability. Therefore, even ifwe consider all subsets of each clique, we will consider at most O(n) subsets of examples. Therefore, VSAsynthesis scales linearly in the number of examples, a trend we also observed in our implementation.Related work Our problem is related to rule learning, where the goal is to learn a set of logical rules—e.g., horn clauses—that describe a certain relation [48]. Our setup differs as we consider complex synthesislanguages and probabilistic input distributions, which have not been considered in rule learning.


5 Development, Benchmark Collection, and EvaluationTool development We will implement all of our contributions on top of our existing synthesis tools.First we will develop our quantitative SyGuS framework as an open source library with APIs and parsersthat can be accessed by other tools. We will build on top of our automata libraries [3, 1] to designthe algorithms for SyGuS with syntactic costs (§ 3.1). For the semantic costs (§ 3.2), we will implementon top of our synthesizer for syntactic costs and we will leverage existing SMT solvers [41]. We willimplement the probabilistic version of SyGuS (§ 3.3) as an extension of our framework and integrate itwith our implementation of digits. The final result will be a comprehensive and extensible tool for syntaxguided synthesis with quantitative objectives.Benchmark collection and evaluation As part of our research on synthesis, we have collected a compre-hensive set of benchmarks for various quantitative synthesis problems. We have benchmarks that requiresyntactic constraints on functions used to invert string encoders [58], benchmarks with syntactic and se-mantic constraints in the domain of automatic grading for introductory programming [38, 82], simpleimperative programs with probabilistic objectives [4], and examples used in VSA-style learning [25, 82].We also collected benchmarks from the related work we presented in the proposal [18, 78] and we plan tocollect benchmarks from the SyGuS competitions and modify them to add quantitative specifications [10].

For evaluation, we will be primarily concerned with assessing the effectiveness of the individual algo-rithmic contributions using our benchmark collection. Our key metrics of interest include: (i) synthesisperformance and overhead of quantitative objectives; (ii) expressiveness and succinctness of the proposedspecification framework; (iii) scalability w.r.t. search space size, target program size, and specification size.

For synthesis problems with syntactic and semantic costs, we will compare our new approaches againstthe Synapse system [18] and tools for synthesis from noisy data [78]. Our first objective is to replicate therunning times of these tools while supporting a rich specification framework like ours. Next, we will aimat using the properties of our formalisms to further improve these results. As no tool support exists forsynthesis with probabilistic objectives, we will use the current status of our preliminary work—i.e., the tooldigits—as a baseline for our experiments. On top of our evaluation, we will use the proposed algorithmsto improve QLose [38] and Refazer [82], two of our tools for personalized education that can benefit fromquantitative objectives, and that we will use to assess the effectiveness of our algorithms in practice. Ourevaluation will proceed simultaneously with the research activities outlined in Research Thrusts 1 and 2and both the graduate and undergraduate students we budgeted support for will collaborate in designinga comprehensive evaluation plan.

6 Education Plan6.1 Curriculum Development

Rationale Since beginning his academic career at UW-Madison in 2015, the PI has taught the undergrad-uate compilers course (CS536) and the graduate course on program verification and synthesis (CS703).Since program synthesis is makings its way into industry [51, 52], we would like to also introduce under-graduate students to this emerging topic during their senior year. Learning these advanced concepts willmake senior students better prepared for graduate school and the complex programming projects we willadd to the class will also qualify them to work in industry. Since the undergraduate compilers course isnot a good platform for teaching these topics, the PI has already received approval to reintroduce anotherundergraduate course on programming paradigms (CS538). Traditionally, CS538 introduces students tofunctional and logic programming and we propose to is redesign the course by rephrasing several of thetaught topics as synthesis questions. For example, we can have the students build a small synthesizerusing logic programming. The idea is that students taking the class will learn the fundamental aspects ofprogram synthesis as well as how to use it to automate simple tasks and boost their productivity.Detailed course structure The course will comprise of many small projects that expose the featuresof different programming paradigms. Among others, we will cover functional, logic, and probabilisticprogramming and also introduce students to SAT and SMT solvers [41].• Enumerative synthesis in a lazy functional language In the first part of the course, the students will learn the

basics of a functional programming language. One of the important topics in functional programming


is laziness—i.e., the ability to evaluate a program on a “by need” fashion. To let the students betterappreciate the power of laziness we will have them build a synthesizer that implements, with andwithout laziness, enumerative search for a term that agrees with a set of examples.• Symbolic search using logic programming The second part of the course covers logic and declarative pro-

gramming. For this domain the students will have to complete a project in which they use the declara-tive aspects of logic programming to build a small symbolic synthesizer. For example, the students willbuild a simple tool for generating Boolean circuits that meet a certain functional specification [91].• Advanced paradigms: probabilistic programming In the last part of the course, the students will be exposed

to emerging programming paradigms, including probabilistic programming. By the time we will offerthe course for the first time in year 3, we expect our inference and synthesis tools for probabilistic pro-grams to be mature enough to have the students use them to complete a small project for synthesizinga simple probabilistic network switch like the one presented in Example 1.1.

Deployment plan As discussed in the project management plan (§ 7), we will offer this course for thefirst time in the third year. Over the course of this proposal’s time frame, we will teach the course threetimes, allowing us to evaluate effectiveness of material, receive feedback from students, and incrementallyrefine and improve course content.

6.2 Visual Art with Synthesis

Goals and plan We will organize student workshops where the attendees will use program synthesis tocreate forms of visual art. Our goal is to capitalize on the ability of visual arts to attract a diverse set ofstudents (potentially from different majors) and to expose them to the idea of program synthesis. Specifi-cally, we plan to hold three annual synthesis-themed workshops over the time-frame of this proposal. (Webudgeted funds for organization.) Student projects will be judged for their merit by a panel of facultyfrom departments across campus—e.g., CS and Art. Students will learn how synthesizers can automaterepetitive tasks and use this idea to generate artistic patterns from examples of interactions. We plan tohave prototypes of our synthesizers usable by the time of the first workshop, so that students can rapidlyplay with their ideas over the short course of a workshop. Before the first official workshop, we willrun a small beta workshop to ensure robustness of our synthesis infrastructure. We will also use othersynthesizers such as Prose [77] and Sketch-n-Sketch [57]. Prior to each workshop, the PI and his graduatestudents will give an introductory tutorial about program synthesis, the expectations from participants,and how to use the given tools. We will also seek organizational help from the UW Art Institute [59].Diversity plan The PI is very aware of the inclusiveness and diversity concerns surrounding computerscience events. It has been repeatedly observed in the tech community that workshops like the oneswe propose tend to isolate under-represented minorities in computer science, a phenomenon that haseven gained coverage by mainstream corporations and media [95]. The PI will take concrete measures toensure a welcoming and inclusive environment: We will (i) draw inspiration from other tech conferencesand workshops that have been successful in being more inclusive [22]; (ii) organize all workshops in closecollaboration with WACM [92], Wisconsin’s chapter of ACM Women in Computing, and the WisconsinEmerging Scholars-Computer Sciences [94], a program to recruit a broader cross-section of UW studentsto the field of computer science, particularly women and under-represented minorities; (iii) consult withthe Office for Equity and Diversity at UW-Madison [73], a vital campus resource that provides guidanceon implementing diversity strategies.

6.3 Outreach and Undergraduate Research

Undergraduate student research The PI strongly believes that getting hands-on experience in an excitingresearch environment is crucial to motivate students to pursue graduate studies and we have budgetedfunds for undergraduate students. The PI has advised 5 undergraduate students during his first 2 years atUW-Madison. Students were recruited during the PI’s undergraduate compilers course. His first advisee,Fang Wang, helped designing the experimental evaluation of new algorithm that was published at MFPSXXXIII [39] and is now a graduate student at CMU. The other students have read papers from formalmethods conferences and implemented or extended the techniques presented in the papers. Two of themare writing a paper with their findings. Leveraging the UW-Madison diversity resources mentioned above,the PI will ensure hiring of women and minorities for these positions.


Curriculum restructuring Offering of new undergraduate courseDevelopment of “Visual Art with Synthesis” workshopsPlanned “Visual Art with Synthesis” workshopsOutreach and undergraduate involvement

Weighted grammars for quantitative syntactic specificationsWeighted logics for quantitative semantic specificationProbabilistic logics for probabilistic specifications

RT2: Synthesis with Probabilistic Objectives

RT1: Quantitative framework for Syntax-Guided Synthesis

EP: Curriculum Development, Art with Synthesis, Outreach

Research and Education Thrusts Y1 Y2 Y3 Y4 Y5Planned activities

Probabilistic programs with finitely many outputs Probabilistic programs with infinitely many outputsProbabilistic foundations for programming by example

Timeline in years

Fig. 3: Project timeline

Lectures on the impact of synthesis on the workforce Looking forward at the potential of automatedprogram synthesis, the PI is concerned about its use to replace tasks in the labor force—e.g., TAs withautomatic graders. To make students aware of this issue and of the possible countermeasures, we willorganize a series of lectures that look at this problem from various angles [71], including examiningexisting jobs where machines have replaced humans. The lectures will mostly target CS seniors andgraduate students, who will soon enter the workforce. However, we will ensure that lectures are accessibleto everyone and benefit the campus community at large. Our goal is to ensure that CS students are awareof the possible future landscape and how similar phenomena were handled in the past [80]. For thelectures, we will seek experts from across campus, for example, Dr. Pilar Ossorio, who studies ethics inthe workplace, and with whom the PI is collaborating on another project on algorithmic fairness.

7 Project Management PlanFigure 3 shows our proposed timeline for the five-year time frame of the proposal. We have budgetedsupport for one graduate student who will work with the PI on the research proposed in this document.In addition, we have budgeted annual summer support for an undergraduate researcher, who will bethoroughly involved in the project, both in terms of implementation and research activities.Research Thrust 1 We will begin our research activities with the first research thrust. Specifically, wehave allocated overlapping two-year investigation periods for the first two algorithmic problems outlinedin Thrust 1. The third problem—i.e., the design of the probabilistic fragment of our framework—will startin parallel in the second year. Our goal is to lay the grounds for research Thrust 2 earlier in the timeline.Research Thrust 2 The investigation of our probabilistic algorithms will start in year 2, hand in handwith making progress in the design of the probabilistic specification mechanism from Thrust 1. We willcontinuously pursue evaluations and benchmark collection efforts throughout the project timeline.Education activities We believe that deploying a successful course will require careful planning andtool development; we thus allocate the first two years for course development. In the second year, wewill also begin planning the synthesis-themed visual art workshop, which will be held in years 3-5. Theproposed course and workshops are carefully planned in alternating terms, thus allowing a senior batchof students who have been exposed to synthesis to guide junior students. The proposed lecture series willbe scheduled throughout the project and is not shown in the timeline.Prior NSF support Loris D’Antoni is a PI on ongoing NSF Award CCF-1637516: AitF: Collaborative Re-search: Foundations of Intent-based Networking, 09/01/2016 to 08/31/2018, $339,985. This project aims todevelop techniques to automatically produce network configurations that satisfy a set of high-level poli-cies and objectives. This project will help realizing the vision of intent-based networking by allowingnetwork operators to design data and control planes declaratively and without having to provide device-specific configurations. The developed techniques will also be used to create automatic techniques toprovide feedback to students learning networking concepts. The work has already resulted in one pub-lication [89]. The PI received $20,000 through CCF-1650816 for organizing the Programming LanguageMentoring Workshop at POPL’17. This grant allowed 12 students from under-represented minorities toattend POPL’17 and hear about career opportunities in programming languages. The PI is a co-PI in theproposal SHF: Medium: Formal Methods for Program Fairness, which will start in September 2017.


References[1] AutomataDotNet. https://github.com/AutomataDotNet/.

[2] ESolver. https://github.com/abhishekudupa/sygus-comp14.

[3] symbolicautomata. https://github.com/lorisdanto/symbolicautomata/.

[4] Aws Albarghouthi, Loris D’Antoni, and Samuel Drews. Repairing decision-making programs underuncertainty. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Ger-many, July 24-28, 2017, Proceedings, Part I, pages 181–200, 2017. doi: 10.1007/978-3-319-63387-9_9.URL https://doi.org/10.1007/978-3-319-63387-9_9.

[5] Aws Albarghouthi, Loris D’Antoni, Samuel Drews, and Aditya Nori. Fairsquare: Probabilistic ver-ification of program fairness. In Proceedings of the 2017 ACM SIGPLAN International Conference onObject-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2017, 2017.

[6] Rajeev Alur and Loris D’Antoni. Streaming tree transducers. In Artur Czumaj, Kurt Mehlhorn,Andrew Pitts, and Roger Wattenhofer, editors, Automata, Languages, and Programming, volume 7392of Lecture Notes in Computer Science, pages 42–53. Springer Berlin Heidelberg, 2012. ISBN 978-3-642-31584-8. doi: 10.1007/978-3-642-31585-5_8. URL http://dx.doi.org/10.1007/978-3-642-31585-5_8.

[7] Rajeev Alur, Loris D’Antoni, Jyotirmoy V. Deshmukh, Mukund Raghothaman, and Yifei Yuan.Regular functions and cost register automata. In 28th Annual ACM/IEEE Symposium on Logic inComputer Science, LICS 2013, New Orleans, LA, USA, June 25-28, 2013, pages 13–22, 2013. doi:10.1109/LICS.2013.65. URL http://dx.doi.org/10.1109/LICS.2013.65.

[8] Rajeev Alur, Loris D’Antoni, Sumit Gulwani, Dileep Kini, and Mahesh Viswanathan. Automatedgrading of dfa constructions. In Proceedings of the Twenty-Third International Joint Conference on ArtificialIntelligence, IJCAI ’13, pages 1976–1982. AAAI Press, 2013. ISBN 978-1-57735-633-2. URL http://dl.acm.org/citation.cfm?id=2540128.2540412.

[9] Rajeev Alur, Loris D’Antoni, and Mukund Raghothaman. Drex: A declarative language for efficientlyevaluating regular string transformations. SIGPLAN Not., 50(1):125–137, January 2015. ISSN 0362-1340. doi: 10.1145/2775051.2676981. URL http://doi.acm.org/10.1145/2775051.2676981.

[10] Rajeev Alur, Dana Fisman, Rishabh Singh, and Armando Solar-Lezama. Results and analysis ofSyGuS-comp’15. arXiv preprint arXiv:1602.01170, 2016.

[11] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. Scaling enumerative program synthesisvia divide and conquer. In Tools and Algorithms for the Construction and Analysis of Systems - 23rdInternational Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practiceof Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part I, pages 319–336, 2017.doi: 10.1007/978-3-662-54577-5_18. URL https://doi.org/10.1007/978-3-662-54577-5_18.

[12] Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. CambridgeUniversity Press, New York, NY, USA, 1st edition, 2009. ISBN 052111862X, 9780521118620.

[13] Kablan Barbar. Attributed tree grammars. Theoretical Computer Science, 119(1):3 – 22, 1993. ISSN0304-3975. doi: http://dx.doi.org/10.1016/0304-3975(93)90337-S. URL http://www.sciencedirect.com/science/article/pii/030439759390337S.

[14] Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin G. Zorn. Flashrelate: extracting relationaldata from semi-structured spreadsheets using examples. In Proceedings of the 36th ACM SIGPLAN Con-ference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages218–228, 2015. doi: 10.1145/2737924.2737952. URL http://doi.acm.org/10.1145/2737924.2737952.


https://doi.org/10.1007/978-3-319-63387-9_9

http://dx.doi.org/10.1007/978-3-642-31585-5_8

http://dx.doi.org/10.1109/LICS.2013.65

http://dl.acm.org/citation.cfm?id=2540128.2540412


http://doi.acm.org/10.1145/2775051.2676981

https://doi.org/10.1007/978-3-662-54577-5_18

http://www.sciencedirect.com/science/article/pii/030439759390337S

http://www.sciencedirect.com/science/article/pii/030439759390337S

http://doi.acm.org/10.1145/2737924.2737952

[15] Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. Synthesizing program input grammars.In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementa-tion, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 95–110, 2017. doi: 10.1145/3062341.3062349.URL http://doi.acm.org/10.1145/3062341.3062349.

[16] Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. νz - an optimizing SMT solver. In Toolsand Algorithms for the Construction and Analysis of Systems - 21st International Conference, TACAS 2015,Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London,UK, April 11-18, 2015. Proceedings, pages 194–199, 2015. doi: 10.1007/978-3-662-46681-0_14. URLhttps://doi.org/10.1007/978-3-662-46681-0_14.

[17] Anselm Blumer, A. Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Learnability and thevapnik-chervonenkis dimension. J. ACM, 36(4):929–965, October 1989. ISSN 0004-5411. doi: 10.1145/76359.76371. URL http://doi.acm.org/10.1145/76359.76371.

[18] James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. Optimizing synthesis with metas-ketches. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Program-ming Languages, POPL ’16, pages 775–788, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3549-2.doi: 10.1145/2837614.2837666. URL http://doi.acm.org/10.1145/2837614.2837666.

[19] Peter A. N. Bosman and Edwin D. de Jong. Learning Probabilistic Tree Grammars for Genetic Program-ming, pages 192–201. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. ISBN 978-3-540-30217-9.doi: 10.1007/978-3-540-30217-9_20. URL http://dx.doi.org/10.1007/978-3-540-30217-9_20.

[20] Benjamin Caulfield, Markus N. Rabe, Sanjit A. Seshia, and Stavros Tripakis. What’s decidable aboutsyntax-guided synthesis? CoRR, abs/1510.08393, 2015. URL http://arxiv.org/abs/1510.08393.

[21] Dmitry Chistikov, Rayna Dimitrova, and Rupak Majumdar. Approximate counting in SMT and valueestimation for probabilistic programs. In Tools and Algorithms for the Construction and Analysis ofSystems - 21st International Conference, TACAS 2015, Held as Part of the European Joint Conferences onTheory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings, pages 320–334,2015. doi: 10.1007/978-3-662-46681-0_26. URL https://doi.org/10.1007/978-3-662-46681-0_26.

[22] Melissa Clark. Beyond the code: An inclusive tech conference. https://melissajclark.ca/beyond-the-code-2015/, 2015. Accessed: 2017-05-25.

[23] Robert A. Cochran, Loris D’Antoni, Benjamin Livshits, David Molnar, and Margus Veanes. Programboosting: Program synthesis via crowd-sourcing. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17,2015, pages 677–688, 2015. doi: 10.1145/2676726.2676973. URL http://doi.acm.org/10.1145/2676726.2676973.

[24] H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi.Tree automata techniques and applications. Available on: http://www.grappa.univ-lille3.fr/tata,2007. release October, 12th 2007.

[25] L. D’Antoni, R. Singh, and M. Vaughn. NoFAQ: Synthesizing Command Repairs from Examples. InFSE’17, 2017.

[26] Loris D’Antoni and Rajeev Alur. Symbolic visibly pushdown automata. In Armin Biere andRoderick Bloem, editors, Computer Aided Verification, volume 8559 of Lecture Notes in ComputerScience, pages 209–225. Springer International Publishing, 2014. ISBN 978-3-319-08866-2. doi:10.1007/978-3-319-08867-9_14. URL http://dx.doi.org/10.1007/978-3-319-08867-9_14.

[27] Loris D’Antoni and Margus Veanes. Equivalence of extended symbolic finite transducers. In Proceed-ings of the 25th International Conference on Computer Aided Verification, CAV’13, pages 624–639, Berlin,Heidelberg, 2013. Springer-Verlag. ISBN 978-3-642-39798-1. doi: 10.1007/978-3-642-39799-8_41. URLhttp://dx.doi.org/10.1007/978-3-642-39799-8_41.


http://doi.acm.org/10.1145/3062341.3062349

https://doi.org/10.1007/978-3-662-46681-0_14

http://doi.acm.org/10.1145/76359.76371

http://doi.acm.org/10.1145/2837614.2837666

http://dx.doi.org/10.1007/978-3-540-30217-9_20

http://arxiv.org/abs/1510.08393

https://doi.org/10.1007/978-3-662-46681-0_26

https://melissajclark.ca/beyond-the-code-2015/

https://melissajclark.ca/beyond-the-code-2015/

http://doi.acm.org/10.1145/2676726.2676973

http://doi.acm.org/10.1145/2676726.2676973

http://www.grappa.univ-lille3.fr/tata

http://dx.doi.org/10.1007/978-3-319-08867-9_14

http://dx.doi.org/10.1007/978-3-642-39799-8_41

[28] Loris D’Antoni and Margus Veanes. Static analysis of string encoders and decoders. In R. Giacobazzi,J. Berdine, and I. Mastroeni, editors, VMCAI 2013, volume 7737 of LNCS, pages 209–228. Springer,2013.

[29] Loris D’Antoni and Margus Veanes. Extended symbolic finite automata and transducers. FormalMethods in System Design, pages 1–27, 2015. ISSN 0925-9856. doi: 10.1007/s10703-015-0233-4. URLhttp://dx.doi.org/10.1007/s10703-015-0233-4.

[30] Loris D’Antoni and Margus Veanes. Minimization of symbolic tree automata. In Proceedings of the31st Annual ACM/IEEE Symposium on Logic in Computer Science, LICS ’16, pages 873–882, New York,NY, USA, 2016. ACM. ISBN 978-1-4503-4391-6. doi: 10.1145/2933575.2933578. URL http://doi.acm.org/10.1145/2933575.2933578.

[31] Loris D’Antoni and Margus Veanes. The power of symbolic automata and transducers. In ComputerAided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Pro-ceedings, Part I, pages 47–67, 2017. doi: 10.1007/978-3-319-63387-9_3. URL https://doi.org/10.1007/978-3-319-63387-9_3.

[32] Loris D’Antoni and Margus Veanes. Monadic second-order logic on finite sequences. In Proceedingsof the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France,January 18-20, 2017, pages 232–245, 2017. URL http://dl.acm.org/citation.cfm?id=3009844.

[33] Loris D’Antoni and Margus Veanes. Forward bisimulations for nondeterministic symbolic finiteautomata. In Tools and Algorithms for the Construction and Analysis of Systems - 23rd InternationalConference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Soft-ware, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part I, pages 518–534, 2017. doi:10.1007/978-3-662-54577-5_30. URL http://dx.doi.org/10.1007/978-3-662-54577-5_30.

[34] Loris D’Antoni, Margus Veanes, Benjamin Livshits, and David Molnar. Fast: A transducer-basedlanguage for tree manipulation. In Proceedings of the 35th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, PLDI ’14, pages 384–394, New York, NY, USA, 2014. ACM. ISBN978-1-4503-2784-8. doi: 10.1145/2594291.2594309. URL http://doi.acm.org/10.1145/2594291.2594309.

[35] Loris D’Antoni, Dileep Kini, Rajeev Alur, Sumit Gulwani, Mahesh Viswanathan, and Björn Hartmann.How can automatic feedback help students construct automata? ACM Trans. Comput.-Hum. Interact.,22(2):9:1–9:24, 2015. doi: 10.1145/2723163. URL http://doi.acm.org/10.1145/2723163.

[36] Loris D’Antoni, Margus Veanes, Benjamin Livshits, and David Molnar. Fast: A transducer-basedlanguage for tree manipulation. ACM Trans. Program. Lang. Syst., 38(1):1:1–1:32, 2015. doi: 10.1145/2791292. URL http://doi.acm.org/10.1145/2791292.

[37] Loris D’Antoni, Matthew Weavery, Alexander Weinert, and Rajeev Alur. Automata tutor and whatwe learned from building an online teaching tool. Bulletin of the EATCS, 117, 2015. URL http://eatcs.org/beatcs/index.php/beatcs/article/view/365.

[38] Loris D’Antoni, Roopsha Samanta, and Rishabh Singh. Qlose: Program repair with quantitativeobjectives. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada,July 17-23, 2016, Proceedings, Part II, pages 383–401, 2016. doi: 10.1007/978-3-319-41540-6_21. URLhttp://dx.doi.org/10.1007/978-3-319-41540-6_21.

[39] Loris D’Antoni, Zachary Kincaid, and Fang Wang. A symbolic decision procedure for symbolicalternating finite automata. In MFPS’17, 2017.

[40] Easley David and Kleinberg Jon. Networks, Crowds, and Markets: Reasoning About a Highly ConnectedWorld. Cambridge University Press, New York, NY, USA, 2010. ISBN 0521195330, 9780521195331.

[41] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In Proceedings of the Theoryand Practice of Software, 14th International Conference on Tools and Algorithms for the Construction andAnalysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag.ISBN 3-540-78799-2, 978-3-540-78799-0. URL http://dl.acm.org/citation.cfm?id=1792734.1792766.


http://dx.doi.org/10.1007/s10703-015-0233-4

http://doi.acm.org/10.1145/2933575.2933578

http://doi.acm.org/10.1145/2933575.2933578

https://doi.org/10.1007/978-3-319-63387-9_3

https://doi.org/10.1007/978-3-319-63387-9_3

http://dl.acm.org/citation.cfm?id=3009844

http://dx.doi.org/10.1007/978-3-662-54577-5_30

http://doi.acm.org/10.1145/2594291.2594309

http://doi.acm.org/10.1145/2723163

http://doi.acm.org/10.1145/2791292

http://eatcs.org/beatcs/index.php/beatcs/article/view/365

http://eatcs.org/beatcs/index.php/beatcs/article/view/365

http://dx.doi.org/10.1007/978-3-319-41540-6_21


[42] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, andPushmeet Kohli. Robustfill: Neural program learning under noisy I/O. CoRR, abs/1703.07469, 2017.URL http://arxiv.org/abs/1703.07469.

[43] Samuel Drews and Loris D’Antoni. Learning symbolic automata. In Tools and Algorithms for theConstruction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of theEuropean Joint Conferences on Theory and Practice of Software, ETAPS, 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part I, pages 173–189, 2017. doi: 10.1007/978-3-662-54577-5_10. URL http://dx.doi.org/10.1007/978-3-662-54577-5_10.

[44] Manfred Droste and Heiko Vogler. Weighted tree automata and weighted logics. Theoretical ComputerScience, 366(3):228 – 247, 2006. ISSN 0304-3975. doi: http://dx.doi.org/10.1016/j.tcs.2006.08.025.URL http://www.sciencedirect.com/science/article/pii/S0304397506005676. Automata and FormalLanguages.

[45] Manfred Droste, Werner Kuich, and Heiko Vogler. Handbook of Weighted Automata. Springer Publish-ing Company, Incorporated, 1st edition, 2009. ISBN 3642014917, 9783642014918.

[46] Didier Dubois, Lluis Godo, and Henri Prade. Weighted logics for artificial intelligence - an intro-ductory discussion. International Journal of Approximate Reasoning, 55(9):1819 – 1829, 2014. ISSN 0888-613X. doi: http://dx.doi.org/10.1016/j.ijar.2014.08.002. URL http://www.sciencedirect.com/science/article/pii/S0888613X14001364. Weighted Logics for Artificial Intelligence.

[47] Ahmed El-Hassany, Petar Tsankov, Laurent Vanbever, and Martin Vechev. Network-wide configura-tion synthesis. 2017.

[48] Johannes Fürnkranz, Dragan Gamberger, and Nada Lavrac. Foundations of Rule Learning. SpringerPublishing Company, Incorporated, 2014. ISBN 3642430465, 9783642430466.

[49] Timon Gehr, Sasa Misailovic, and Martin T. Vechev. PSI: exact symbolic inference for probabilisticprograms. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada,July 17-23, 2016, Proceedings, Part I, pages 62–83, 2016. doi: 10.1007/978-3-319-41528-4_4. URL https://doi.org/10.1007/978-3-319-41528-4_4.

[50] Vaibhav Gogte, Aasheesh Kolli, Michael J. Cafarella, Loris D’Antoni, and Thomas F. Wenisch. HARE:hardware accelerator for regular expressions. In 49th Annual IEEE/ACM International Symposium onMicroarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016, pages 1–12, 2016. doi: 10.1109/MICRO.2016.7783747. URL https://doi.org/10.1109/MICRO.2016.7783747.

[51] Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. InProceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL 2011, Austin, TX, USA, January 26-28, 2011, pages 317–330, 2011. doi: 10.1145/1926385.1926423.URL http://doi.acm.org/10.1145/1926385.1926423.

[52] Sumit Gulwani. Program synthesis. In Software Systems Safety, pages 43–75. 2014. doi: 10.3233/978-1-61499-385-8-43. URL https://doi.org/10.3233/978-1-61499-385-8-43.

[53] Sumit Gulwani. Programming by examples - and its applications in data wrangling. In DependableSoftware Systems Engineering, pages 137–158. 2016. doi: 10.3233/978-1-61499-627-9-137. URL https://doi.org/10.3233/978-1-61499-627-9-137.

[54] Sumit Gulwani. Programming by examples: Applications, algorithms, and ambiguity resolution. InAutomated Reasoning - 8th International Joint Conference, IJCAR 2016, Coimbra, Portugal, June 27 - July2, 2016, Proceedings, pages 9–14, 2016. doi: 10.1007/978-3-319-40229-1_2. URL https://doi.org/10.1007/978-3-319-40229-1_2.

[55] Sumit Gulwani, William R. Harris, and Rishabh Singh. Spreadsheet data manipulation using exam-ples. Commun. ACM, 55(8):97–105, August 2012. ISSN 0001-0782. doi: 10.1145/2240236.2240260. URLhttp://doi.acm.org/10.1145/2240236.2240260.


http://arxiv.org/abs/1703.07469

http://dx.doi.org/10.1007/978-3-662-54577-5_10

http://dx.doi.org/10.1007/978-3-662-54577-5_10

http://www.sciencedirect.com/science/article/pii/S0304397506005676

http://www.sciencedirect.com/science/article/pii/S0888613X14001364

http://www.sciencedirect.com/science/article/pii/S0888613X14001364

https://doi.org/10.1007/978-3-319-41528-4_4

https://doi.org/10.1007/978-3-319-41528-4_4

https://doi.org/10.1109/MICRO.2016.7783747

http://doi.acm.org/10.1145/1926385.1926423

https://doi.org/10.3233/978-1-61499-385-8-43

https://doi.org/10.3233/978-1-61499-627-9-137

https://doi.org/10.3233/978-1-61499-627-9-137

https://doi.org/10.1007/978-3-319-40229-1_2

https://doi.org/10.1007/978-3-319-40229-1_2

http://doi.acm.org/10.1145/2240236.2240260

[56] Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, andBjörn Hartmann. Writing reusable code feedback at scale with mixed-initiative program synthesis.In Proceedings of the Fourth ACM Conference on Learning @ Scale, L@S 2017, Cambridge, MA, USA, April20-21, 2017, pages 89–98, 2017. doi: 10.1145/3051457.3051467. URL http://doi.acm.org/10.1145/3051457.3051467.

[57] Brian Hempel and Ravi Chugh. Semi-automated SVG programming via direct manipulation. InProceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST 2016, Tokyo,Japan, October 16-19, 2016, pages 379–390, 2016. doi: 10.1145/2984511.2984575. URL http://doi.acm.org/10.1145/2984511.2984575.

[58] Qinheping Hu and Loris D’Antoni. Automatic program inversion using symbolic transducers. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation,PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 376–389, 2017. doi: 10.1145/3062341.3062345. URLhttp://doi.acm.org/10.1145/3062341.3062345.

[59] UW Art Institute. https://artsinstitute.wisc.edu/.

[60] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press,Cambridge, MA, USA, 1994. ISBN 0-262-11193-4.

[61] Javier Larrosa, Albert Oliveras, and Enric Rodríguez-Carbonell. Semiring-Induced Propositional Logic:Definition and Basic Algorithms, pages 332–347. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.doi: 10.1007/978-3-642-17511-4_19. URL http://dx.doi.org/10.1007/978-3-642-17511-4_19.

[62] Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. Programming by demonstrationusing version space algebra. Mach. Learn., 53(1-2):111–156, October 2003. ISSN 0885-6125. doi:10.1023/A:1025671410623. URL http://dx.doi.org/10.1023/A:1025671410623.

[63] Yi Li, Aws Albarghouthi, Zachary Kincaid, Arie Gurfinkel, and Marsha Chechik. Symbolic opti-mization with SMT solvers. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages, POPL ’14, San Diego, CA, USA, January 20-21, 2014, pages 607–618, 2014. doi:10.1145/2535838.2535857. URL http://doi.acm.org/10.1145/2535838.2535857.

[64] Andreas Maletti. Minimizing Weighted Tree Grammars Using Simulation, pages 56–68. Springer BerlinHeidelberg, Berlin, Heidelberg, 2010. ISBN 978-3-642-14684-8. doi: 10.1007/978-3-642-14684-8_7.URL http://dx.doi.org/10.1007/978-3-642-14684-8_7.

[65] Andreas Maletti. Minimizing Weighted Tree Grammars Using Simulation, pages 56–68. Springer BerlinHeidelberg, Berlin, Heidelberg, 2010. ISBN 978-3-642-14684-8. doi: 10.1007/978-3-642-14684-8_7.URL http://dx.doi.org/10.1007/978-3-642-14684-8_7.

[66] Jonathan May and Kevin Knight. Tiburon: A weighted tree automata toolkit. In Proceedings of the 11thInternational Conference on Implementation and Application of Automata, CIAA’06, pages 102–113, Berlin,Heidelberg, 2006. Springer-Verlag. ISBN 3-540-37213-X, 978-3-540-37213-4. doi: 10.1007/11812128_11.URL http://dx.doi.org/10.1007/11812128_11.

[67] David Merrell, Loris D’Antoni, and Margus Veanes. Repairing decision-making programs underuncertainty. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17,2017.

[68] Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michelle S. Beard. Detection theory forgraphs.

[69] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. TheMIT Press, 2012. ISBN 026201825X, 9780262018258.


http://doi.acm.org/10.1145/3051457.3051467

http://doi.acm.org/10.1145/3051457.3051467

http://doi.acm.org/10.1145/2984511.2984575

http://doi.acm.org/10.1145/2984511.2984575

http://doi.acm.org/10.1145/3062341.3062345

https://artsinstitute.wisc.edu/

http://dx.doi.org/10.1007/978-3-642-17511-4_19

http://dx.doi.org/10.1023/A:1025671410623

http://doi.acm.org/10.1145/2535838.2535857

http://dx.doi.org/10.1007/978-3-642-14684-8_7

http://dx.doi.org/10.1007/978-3-642-14684-8_7

http://dx.doi.org/10.1007/11812128_11

[70] Elchanan Mossel and Christopher Umans. On the complexity of approximating the vc dimen-sion. Journal of Computer and System Sciences, 65(4):660 – 671, 2002. ISSN 0022-0000. doi: http://dx.doi.org/10.1016/S0022-0000(02)00022-3. URL http://www.sciencedirect.com/science/article/pii/S0022000002000223. Special Issue on Complexity 2001.

[71] Newsweek. How artificial intelligence and robots will radically transform the economy. http://www.newsweek.com/2016/12/09/robot-economy-artificial-intelligence-jobs-happy-ending-526467.html,2016. Accessed: 2017-05-25.

[72] Aditya V. Nori, Sherjil Ozair, Sriram K. Rajamani, and Deepak Vijaykeerthy. Efficient synthesis ofprobabilistic programs. SIGPLAN Not., 50(6):208–217, June 2015. ISSN 0362-1340. doi: 10.1145/2813885.2737982. URL http://doi.acm.org/10.1145/2813885.2737982.

[73] OED. Office for equity and diversity, university of Wisconsin-Madison. https://oed.wisc.edu, 2017.Accessed: 2017-05-25.

[74] Peter Ohmann, Alexander Brooks, Loris D’Antoni, and Ben Liblit. Control-flow recovery from partialfailure reports. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Designand Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, pages 390–405, 2017. doi: 10.1145/3062341.3062368. URL http://doi.acm.org/10.1145/3062341.3062368.

[75] Christos H Papadimitriou. Games against nature. J. Comput. Syst. Sci., 31(2):288–301, September 1985.ISSN 0022-0000. doi: 10.1016/0022-0000(85)90045-5. URL http://dx.doi.org/10.1016/0022-0000(85)90045-5.

[76] PLMW. Programming languages mentoring workshop at POPL 2017. conf.researchr.org/track/POPL-2017/PLMW-2017, 2017. Accessed: 2017-05-25.

[77] Oleksandr Polozov and Sumit Gulwani. Flashmeta: a framework for inductive program synthesis. InProceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems,Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30,2015, pages 107–126, 2015. doi: 10.1145/2814270.2814310. URL http://doi.acm.org/10.1145/2814270.2814310.

[78] Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. Learning programs from noisydata. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages, POPL ’16, pages 761–774, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-3549-2. doi:10.1145/2837614.2837671. URL http://doi.acm.org/10.1145/2837614.2837671.

[79] Mohammad Raza and Sumit Gulwani. Automated data extraction using predictive program synthe-sis. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, SanFrancisco, California, USA., pages 882–890, 2017. URL http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15034.

[80] Harvard Business Review. What happens to society when robots replace workers? https://hbr.org/2014/12/what-happens-to-society-when-robots-replace-workers.

[81] Andrew Reynolds, Morgan Deters, Viktor Kuncak, Cesare Tinelli, and Clark Barrett. Counterexample-Guided Quantifier Instantiation for Synthesis in SMT, pages 198–216. Springer International Publishing,Cham, 2015. ISBN 978-3-319-21668-3. doi: 10.1007/978-3-319-21668-3_12. URL http://dx.doi.org/10.1007/978-3-319-21668-3_12.

[82] Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi,Ryo Suzuki, and Björn Hartmann. Learning syntactic program transformations from examples. InProceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Ar-gentina, May 20-28, 2017, pages 404–415, 2017. URL http://dl.acm.org/citation.cfm?id=3097417.

[83] Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic program optimization. Commun. ACM, 59(2):114–122, 2016. doi: 10.1145/2863701. URL http://doi.acm.org/10.1145/2863701.




http://www.newsweek.com/2016/12/09/robot-economy-artificial-intelligence-jobs-happy-ending-526467.html

http://www.newsweek.com/2016/12/09/robot-economy-artificial-intelligence-jobs-happy-ending-526467.html

http://doi.acm.org/10.1145/2813885.2737982

https://oed.wisc.edu

http://doi.acm.org/10.1145/3062341.3062368

http://dx.doi.org/10.1016/0022-0000(85)90045-5

http://dx.doi.org/10.1016/0022-0000(85)90045-5

conf.researchr.org/track/POPL-2017/PLMW-2017

conf.researchr.org/track/POPL-2017/PLMW-2017

http://doi.acm.org/10.1145/2814270.2814310

http://doi.acm.org/10.1145/2814270.2814310

http://doi.acm.org/10.1145/2837614.2837671

http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15034

http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/15034

https://hbr.org/2014/12/what-happens-to-society-when-robots-replace-workers

https://hbr.org/2014/12/what-happens-to-society-when-robots-replace-workers

http://dx.doi.org/10.1007/978-3-319-21668-3_12

http://dx.doi.org/10.1007/978-3-319-21668-3_12


http://doi.acm.org/10.1145/2863701

[84] Rishabh Singh and Sumit Gulwani. Predicting a correct program in programming by example. InComputer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I, pages 398–414, 2015. doi: 10.1007/978-3-319-21690-4_23. URL https://doi.org/10.1007/978-3-319-21690-4_23.

[85] Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. Automated feedback generation forintroductory programming assignments. In Proceedings of PLDI’13, pages 15–26, New York, NY, USA,2013. ACM. ISBN 978-1-4503-2014-6. doi: 10.1145/2462156.2462195. URL http://doi.acm.org/10.1145/2462156.2462195.

[86] Armando Solar-Lezama. Program sketching. International Journal on Software Tools for TechnologyTransfer, 15(5):475–495, Oct 2013. ISSN 1433-2787. doi: 10.1007/s10009-012-0249-7. URL http://dx.doi.org/10.1007/s10009-012-0249-7.

[87] SRC. POPL 2016 student research competition. http://conf.researchr.org/track/POPL-2016/POPL-2016-SRC, 2016. Accessed: 2017-05-25.

[88] Venkatesh Srinivasan and Thomas W. Reps. Synthesis of machine code from semantics. In Proceedingsof the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR,USA, June 15-17, 2015, pages 596–607, 2015. doi: 10.1145/2737924.2737960. URL http://doi.acm.org/10.1145/2737924.2737960.

[89] Kausik Subramanian, Loris D’Antoni, and Aditya Akella. Genesis: synthesizing forwarding ta-bles in multi-tenant networks. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles ofProgramming Languages, POPL 2017, Paris, France, January 18-20, 2017, pages 572–585, 2017. URLhttp://dl.acm.org/citation.cfm?id=3009845.

[90] Ryo Suzuki, Gustavo Soares, Andrew Head, Elena Glassman, Ruan Reis, Melina Mongiovi, LorisD’Antoni, and Bjoern Hartmann. Tracediff: Debugging unexpected codebehavior using synthesizedcode corrections. In VL/HCC 2017, 2017.

[91] Takao Uehara and Nobuaki Kawato. Logic circuit synthesis using prolog. New Generation Computing,1(2):187–193, 1983. ISSN 1882-7055. doi: 10.1007/BF03037425. URL http://dx.doi.org/10.1007/BF03037425.

[92] WACM. WACM is the university of wisconsin-madison’s student chapter of acm-w, acm’s women incomputing. http://research.cs.wisc.edu/wacm/, 2017. Accessed: 2017-05-25.

[93] Xinyu Wang, Sumit Gulwani, and Rishabh Singh. FIDEX: filtering spreadsheet data using exam-ples. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming,Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Nether-lands, October 30 - November 4, 2016, pages 195–213, 2016. doi: 10.1145/2983990.2984030. URLhttp://doi.acm.org/10.1145/2983990.2984030.

[94] WESCS. Wisconsin emerging scholars-computer sciences. http://www.cs.wisc.edu/wes, 2017. Ac-cessed: 2017-05-25.

[95] IT World. Why hackathons are a boys’ club and how to change that. http://www.itworld.com/article/2700151/networking/why-hackathons-are-a-boys--club-and-how-to-change-that.html, 2014.Accessed: 2017-05-25.


https://doi.org/10.1007/978-3-319-21690-4_23

https://doi.org/10.1007/978-3-319-21690-4_23

http://doi.acm.org/10.1145/2462156.2462195

http://doi.acm.org/10.1145/2462156.2462195

http://dx.doi.org/10.1007/s10009-012-0249-7

http://dx.doi.org/10.1007/s10009-012-0249-7

http://conf.researchr.org/track/POPL-2016/POPL-2016-SRC

http://conf.researchr.org/track/POPL-2016/POPL-2016-SRC

http://doi.acm.org/10.1145/2737924.2737960

http://doi.acm.org/10.1145/2737924.2737960


http://dx.doi.org/10.1007/BF03037425

http://dx.doi.org/10.1007/BF03037425

http://research.cs.wisc.edu/wacm/

http://doi.acm.org/10.1145/2983990.2984030

http://www.cs.wisc.edu/wes

http://www.itworld.com/article/ 2700151/networking/why- hackathons- are- a- boys- - club- and- how- to- change- that.html

http://www.itworld.com/article/ 2700151/networking/why- hackathons- are- a- boys- - club- and- how- to- change- that.html

Documents

Project Summary CAREER: Program Synthesis with ... · Research Thrust 2: Synthesis with Probabilistic Objectives and Guarantees The last point of Thrust 1 dwells with probabilistic