Neurally-Guided Procedural Models:earning to Guide ... · found other ways to apply neural networks to procedural model-ing. One recent project uses neural networks as computationally

Neurally-Guided Procedural Models:Learning to Guide Procedural Models with Deep Neural Networks

Daniel Ritchie∗

Stanford UniversityAnna Thomas∗

Stanford UniversityPat Hanrahan∗

Stanford UniversityNoah D. Goodman∗

Stanford University

Guided(N = 10particles)

Unguided(Equal N )

Unguided(EqualTime)

Figure 1: (Top Row) Used as an importance sampler for Sequential Monte Carlo with N = 10 particles, our neurally-guided proceduralmodels generate shape-matching results for each of the above letters in about a second. (Middle Row) The naıve, unguided procedural modeldoes not converge to recognizable results using the same number of particles (N = 10). (Bottom Row) The unguided model does better, butstill does not reliably converge, when given the same amount of computation time as the guided model (≈ 1 sec).

Abstract

We present a deep learning approach for speeding up constrainedprocedural modeling. Probabilistic inference algorithms such asSequential Monte Carlo (SMC) provide powerful tools for con-straining procedural models, but they require many samples to pro-duce desirable results. In this paper, we show how to create pro-cedural models which learn how to satisfy constraints. We aug-ment procedural models with neural networks: these networks con-trol how the model makes random choices based on what outputit has generated thus far. We call such a model a neurally-guidedprocedural model. As a pre-computation, we train these modelson constraint-satisfying example outputs generated via SMC. Theyare then used as efficient importance samplers for SMC, generatinghigh-quality results with very few samples. We evaluate our methodon L-system-like models with image-based constraints. Given a de-sired quality threshold, neurally-guided models can generate satis-factory results up to 10x faster than unguided models.

CR Categories: I.3.5 [Computer Graphics]: Computational Ge-ometry and Object Modeling—Geometric algorithms, languages,and systems G.3 [Probability And Statistics]: Probabilistic al-gorithms (including Monte Carlo) I.2.6 [Artificial Intelligence]:Learning—Connectionism and neural nets

Keywords: Procedural Modeling, Probabilistic Programming, Se-quential Monte Carlo, Deep Learning, Neural Networks

1 Introduction

∗e-mail: {dritchie, thomasat, hanrahan, ngoodman}@stanford.edu

ROCEDURAL modeling is a powerful technique for cre-ating graphics content. It facilitates efficient content cre-ation at massive scale, such as procedural cities [Muller

et al. 2006]. It can generate fine detail that would require painstak-ing effort to create by hand, such as decorative floral patterns [Wonget al. 1998]. It can even generate surprising or unexpected results,helping users to explore large or unintuitive design spaces [Markset al. 1997; Ritchie et al. 2015a].

Many applications demand the ability to constrain or control pro-cedural models: making their outputs resemble examples [Stavaet al. 2014; Dang et al. 2015], fit a target shape [Prusinkiewicz et al.1994; Talton et al. 2011; Ritchie et al. 2015b], or respect func-tional constraints such as physical stability [Ritchie et al. 2015a].Bayesian probabilistic inference provides a general-purpose frame-work for imposing such controls: the procedural model specifies agenerative prior distribution, and the constraints are encoded as alikelihood function. Samples from the posterior distribution can bedrawn via approximate inference algorithms such as Markov ChainMonte Carlo (MCMC) or Sequential Monte Carlo (SMC) [Brookset al. 2011; Doucet et al. 2001].

Unfortunately, these algorithms are often slow, requiring manysamples to produce high-quality results. This limits their usabilityfor interactive applications. Performance can be improved throughclever sampler design, but this requires time and expertise and is of-ten problem-specific [Schwarz and Wonka 2014; Zhu et al. 2012].

Sampling from constrained procedural models is challenging be-cause the constraints implicitly define complex (often non-local)dependencies not present in the original procedural model. Can weinstead make these dependencies explicit by encoding them in themodels’ generative logic? Such an explicit model could simply berun forward to generate constraint-satisfying results.

arX

iv:1

603.

0614

3v1

[cs

.GR

] 1

9 M

ar 2

016

In this paper, we propose a method for automatically learning anapproximation to such a perfect explicit model. Our method lever-ages advances in deep learning: it augments the procedural modelwith neural networks that control how the model makes randomchoices, based on what partial output the model has generated thusfar. We call such a model a neurally-guided procedural model. Theneural networks are expressive enough to capture many implicit de-pendencies induced by the constraints.

We train neurally-guided procedural models using constraint-satisfying example outputs generated via SMC. Once trained, thesemodels can be used as intelligent SMC important samplers. Our ap-proach thus enables ‘bootstrapping’ samplers which train on theirown outputs and become more efficient over time. Or, the systemcan invest time up-front generating and training on many examples,effectively ‘pre-compiling’ an efficient sampler.

We demonstrate our method through experiments with L-system-like procedural models with image-based soft constraints (Fig-ure 1). For a given constraint satisfaction score threshold,our neurally-guided procedural model can generate results whichreliably achieve that threshold using 10-20x fewer particles and upto 10x less compute time than an unguided procedural model.

In summary, our main contributions are:

• A general mathematical framework for defining and trainingneurally-guided procedural models that make implicit con-straints into explicit generative processes.

• A specific implementation for L-system-like models withimage-based constraints.

• Performance evaluation of neurally-guided image-constrainedmodels, showing up to 10x speedups.

We give a high-level overview of our approach in Section 3 andthen present the mathematical foundations of our method in Sec-tion 4. In Section 5, we describe how to implement neurally-guidedprocedural models with image-matching constraints. Finally, weevaluate the performance of those models in Section 6.

2 Related Work

Probabilistic Inference for Procedural Modeling Many re-search projects have used Bayesian probabilistic inference to con-trol procedural models: constraining the shape of a 3D object [Tal-ton et al. 2011; Ritchie et al. 2015b], creating functionally-plausibleand aesthetically-pleasing furniture arrangements [Merrell et al.2011; Yeh et al. 2012], coloring in patterns [Lin et al. 2013], anddressing virtual characters [Yu et al. 2012] are a few recent ap-plications. Our work aims to make such systems more efficient:neurally-guided procedural models can capture many of the depen-dencies introduced by constraint likelihood functions, so samplersneed fewer samples to find good results.

In recent work similar in spirit to our own, Dang and colleaguesbuilt a system which modifies a procedural grammar so that itsoutput distribution reflects user preference scores given to exampleoutputs [Dang et al. 2015]. Like us, they seek a model whose gener-ative logic captures dependencies induced by a likelihood function(in their case, a Gaussian process regression over user-provided ex-amples). Their method works by splitting non-terminal symbols inthe original grammar, giving it more degrees of freedom to capturemore dependencies. This approach works well for discrete depen-dencies, such as ensuring all floors of a building have the samearchitectural style. In contrast, our method captures dependenciesusing neural networks, making it better suited for complex, contin-uous constraint functions, such as shape-fitting.

Guided Procedural Modeling The shape of procedural modelscan be controlled using purely generative methods. The semi-nal work on open/environmentally-sensitive L-systems developeda formalism by which L-systems could query their spatial positionand orientation [Prusinkiewicz et al. 1994; Mech and Prusinkiewicz1996]. This ability allows them to prune their growth to an implicitsurface. Recent follow-up work extends this technique to largermodels by decomposing them into separate guide regions with lim-ited interaction [Benes et al. 2011]. These guide methods were care-fully designed for the specific problem of fitting procedural modelsto shapes. In contrast, our method learns how to guide procedu-ral models and is generally applicable to constraints which can beexpressed as a likelihood function.

Neural Networks for Procedural Modeling Previous work hasfound other ways to apply neural networks to procedural model-ing. One recent project uses neural networks as computationallyinexpensive proxies for costly scoring functions in an inverse urbanprocedural modeling setting [Vanegas et al. 2012]. Another usesan autoencoder network to learn a low-dimensional representationspace in which it is easy to explore the variability in a proceduralmodel’s output [Yumer et al. 2015]. Our use of neural networksdiffers from both of the above projects, as we use them to captureconstraint-induced dependencies via feedforward functions.

Neural Variational Inference Our method is also inspired by re-cent work in variational inference [Mnih and Gregor 2014; Rezendeet al. 2014; Kingma and Welling 2014]. These algorithms use neu-ral networks to define more expressive parametric families of prob-ability distributions. They train stochastic deep belief networksand autoencoders, primarily modeling distributions over images forcomputer vision applications. Our method uses a different learningobjective, and we focus on training procedural models with morecomplex recursive control flow.

The Neural Adaptive Sequential Monte Carlo algorithm is mostsimilar to our method; it uses a similar learning objective and aimsto train more efficient SMC importance samplers [Gu et al. 2015].However, they focus on inference in time series models, such asnonlinear state space models.

3 Approach

In this section, we motivate and outline the process of creating,training, and using neurally-guided procedural models. Through-out this paper, we represent procedural models as probabilistic pro-grams, i.e. programs that make random choices and support condi-tional inference queries about their distribution of outputs [Good-man and Stuhlmuller 2014].

3.1 Motivation

We motivation our approach using a simple program chain that re-cursively generates a random sequence of linear segments, con-strained to match a target image. Figure 2a shows the text ofthis program, along with samples generated from it (drawn inblack) against several target images (drawn in gray). Chains gen-erated by running the program forward do not match the targets,since forward sampling is oblivious to the constraint. Instead, wecan generate constrained samples using Sequential Monte Carlo(SMC) [Ritchie et al. 2015b]. SMC generates multiple samples,or particles, in parallel, resampling them at each step of the pro-gram to favor constraint-satisfying partial outputs. This results infinal chains that more closely match the target images. However,the algorithm requires many particles—and therefore significant

function chain(pos, ang) {var newang = ang + gaussian(0, PI/8);var newpos = pos + polarToRect(LENGTH, newang);genSegment(pos, newpos);if (flip(0.5)) chain(newpos, newang);

}

ForwardSamples

SMCSamples(N = 10)

(a)

function chain_neural(pos, ang) {var newang = ang + gaussMixture(nn1(...));var newpos = pos + polarToRect(LENGTH, newang);genSegment(pos, newpos);if (flip(nn2(...))) chain_neural(newpos, newang);

}

ForwardSamples

SMCSamples(N = 10)

(b)

Figure 2: Transforming a simple linear chain model into a neurally-guided procedural model. (a) The original program. When the program’soutput (shown in black) is constrained to match a target image (shown in gray), forward sampling gives poor results. SMC sampling performsbetter but requires far more than 10 particles to achieve good results for all targets. (b) The neurally-guided program, where parametersof random choices are computed via neural networks. The neural nets receive the target image and all previous random choices as input(abstracted as “...”; see Figure 3b). Once trained, forward sampling from this program adheres closely to the target image, and SMC with10 particles consistently produces good results.

computation—to produce acceptable results. Figure 2a shows thatN = 10 particles is not sufficient.

In an ideal world, we would not need costly inference algorithmsto generate constraint-satisfying results. Instead, we would haveaccess to an ‘oracle’ program, chain_perfect, that perfectly fills inthe target image when run forward. What form might this programtake? At each step, it would need access to the target image, toknow where to grow the chain next. It would also need to see theoutput it has already generated, to know when it has filled the targetand can stop growing the chain.

Our insight is that while oracle programs such as chain_perfect canbe difficult or impossible to write by hand, it is possible to learn aprogram chain_neural that comes close. Figure 2b shows our ap-proach. For each random choice in the program text (e.g. gaussian,flip), we replace the parameters of that choice with the output of aneural network. This neural network’s inputs (abstracted as “...”)include the target image as well the choices the program has madethus far. The network thus shapes the distribution over possiblechoices, guiding the programs’s future output based on the targetimage and its past output. These neural nets affect both continuouschoices (e.g. angles) as well as control flow decisions (e.g. recur-sion): they dictate where the chain goes next, as well as whether itkeeps going at all. For continuous choices such as gaussian, we alsomodify the program to sample from a mixture distribution. Thishelps the program handle situations where the constraints permitmultiple distinct choices (e.g. in which direction to start the chainfor the circle-shaped target image in Figure 2).

When properly trained, a neurally-guided procedural model suchas chain_neural generates constraint-satisfying results more effi-ciently than its un-guided counterpart. Figure 2b shows exampleoutputs from chain_neural. Forward samples adhere closely to thetarget images, and SMC with 10 particles is sufficient to producechains that fully fill the target shape. The next sections of the paperdescribe the process of building and training these neurally-guidedprocedural models in more detail.

3.2 System Overview

Figure 3 shows a high-level overview of our workflow for defining,training, and using neurally-guided procedural models. It consistsof the following steps:

Transform The procedural model is first transformed by insert-ing one neural network for each random choice in the program textand turning continuous random choices into mixture distributions(Figure 3a-b). The network receives as input the constraint (e.g.a target image) and all previously-made random choices (showngrayed out in Figure 3a-b) and outputs the parameters for the choice(e.g. Gaussian means, variances, and mixture weights). We performthis transformation manually; it could be automated via source-to-source compilation. The neural networks can capture multiple dif-ferent constraints, but an appropriate architecture for them dependson the generative paradigm and the output domain of the procedu-ral model (e.g. images, 3D models, etc.) In Section 5, we presentan architecture for 2D L-system-like procedural models which gen-erate images. In particular, we describe how our implementationconverts the previous random choices into a fixed-width vector ap-propriate for input to a neural net.

Generate Given a constraint, such as a target image, Sequen-tial Monte Carlo generates samples from the constrained procedu-ral model (Figure 3c). Our system uses the version of SMC forprobabilistic programs presented by Ritchie et al. [2015b], whereparticles are resampled after the program generates a new piece ofgeometry. It also uses the trained models as importance samplersfor this SMC algorithm when generating final results.

Train The generated samples are then used to train the neural net-works: the desired outcome is a set of network parameters thatmake the model more likely to generate these samples when runforward. We derive the learning objective in Section 4 and the de-tails of our stochastic gradient learning method in Section 5. Thetrained neurally-guided model can then quickly generate more sam-ples, which can serve as further training data for refining the model,if desired.

= 0.71 = true = 3

…

= gaussMixture( )

…

x1x2

xn

xn+1

= 0.71 = true = 3

…

= gaussian( , )

…

µ �

x1x2

xn

xn+1

(a) Procedural Model

NNn+1

Transform

Train (SGD)

Generate(SMC)

Neurally-GuidedProcedural Model(b)

(c) SamplesConstraints

Figure 3: Overview of our approach. (a) We start with a procedural model: a program that makes a sequence of random choices x1 . . .xm.(b) The procedural model is transformed into a neurally-guided procedural model by adding a neural network at each random choice. Thenetwork predicts the parameters of the random choice as a function of the constraints and the previous random choices (shown grayed-out).(c) An SMC sampling algorithm generates samples from the constrained procedural model. A stochastic gradient learning algorithm thentrains the neurally-guided procedural model to maximize the probability of generating these samples.

4 Mathematical Foundations

Having outlined our approach, we now formally define neurally-guided procedural models. For our purposes, a procedural model isa generative probabilistic model of the following form:

PM(x) =

|x|∏i=1

pi(xi; Φi(x1, . . . ,xi−1))

Here, x is the vector of random choices the model makes as it ex-ecutes (the dimensionality of x may be variable, as with recursiveprocedural models such as stochastic L-systems). The pi’s are localprobability distributions from which each successive random choiceis drawn. pi is parameterized by a set of parameters (e.g. meanand variance, for a Gaussian distribution), which are determinedby some function Φi of the previous random choices x1, . . . ,xi−1.The total probability density is the product of these local probabili-ties, according to the chain rule.

A constrained procedural model is a procedural model whoseprobability distribution is modulated by some likelihood function`(x, c), i.e. a scoring function indicating how well an output ofthe model satisfies some constraint c. For example, c could be animage, with `(·, c) measuring similarity to that image. By Bayes’rule:

PCM(x|c) =1

Z· PM(x) · `(x, c)

where Z is a normalizing constant. The set of all constraints csupported by the procedural model forms the constraint space C(e.g. all images, all binary mask images, etc.)

A neurally-guided procedural model modifies a procedural modelby, for each local probability pi, replacing the parameter functionΦi with a neural network:

PGM(x|c; θ) =

|x|∏i=1

pi(xi; NNi(x1, . . . ,xi−1, c; θ))

The neural nets receive the previous random choice values and theconstraint as input, and are themselves parameterized by θ. pi isa mixture distribution if random choice i is continuous; otherwise,pi = pi.

In training a neurally-guided procedural model, our goal is to findthe parameters θ such that PGM is as close as possible to PCM forall supported constraints. Formally, we seek to minimize the con-ditional KL divergence DKL(PCM||PGM). Given some prior distri-bution P (c) over constraints c ∈ C, our optimization objective is:

minθDKL(PCM||PGM) (1)

= minθ

EP (c)

[EPCM(x|c)

[log

PCM(x|c)

PGM(x|c; θ)

]]= min

θEP (c)

[EPCM(x|c)

[logPCM(x|c)− logPGM(x|c; θ)

]]= max

θEP (c)

[EPCM(x|c)

[logPGM(x|c; θ)− logPCM(x|c)

]]= max

θEP (c)

[EPCM(x|c)

[logPGM(x|c; θ)

]]≈ max

θ

1

N

N∑s=1

logPGM(xs|cs; θ)

xs ∼ PCM(x) , cs ∼ P (c)

In the last step, we approximate the expectations with an averageover a finite set of samples xs, cs drawn from the constrained pro-cedural model PCM using SMC and the constraint prior P (c). If weview these samples as a training data set, then this optimization ob-jective is simply maximizing the log-likelihood of the training dataunder the neurally-guided model PGM.

With such a training set in hand, optimization proceeds via stochas-tic gradient ascent using the gradient

∇ logPGM(x|c; θ)

=

|x|∑i=1

∇ log pi(xi; NNi(x1, . . . ,xi−1, c; θ))(2)

Is is worth noting that DKL(PCM||PGM) is not the only measure ofdistance between probability distributions we could have used. Inparticular, several related works have used the other direction of KLdivergence, DKL(PGM||PCM), due to its attractive properties: it re-quires training samples from PGM, which are much less expensiveto generate than samples from PCM. It is the optimization objectiveused in many variational inference algorithms [Wingate and Weber2012; J. Manning and Blei 2014; Mnih and Gregor 2014] as wellthe REINFORCE algorithm for reinforcement learning [Williams1992]. When used for procedural modeling, however, this objectiveleads to models whose outputs lack diversity, making them unsuit-able for generating visually-varied content. This behavior is dueto a well-known property of the objective: minimizing it producesapproximating distributions that are overly-compact, i.e. concen-trating their probability mass in a smaller volume of the state spacethan the true distribution being approximated [MacKay 2002].

FullyConnected

(FC)tanh FC Bounds Output

Params

nf

na

npnf

2

36c

36c

nf

2np

function branch( pos, ang, width ) {...}

Target Image

(50x50)

Convolve +Downsample

Convolve +Downsample

CurrentPartialOutput(50x50)

Target Image Features

Partial Output Features

Local State Features

Figure 4: Neural network architecture for image-matching procedural models. The network uses a multilayer perceptron which takes avector of features as input and ouputs the parameters for a random choice probability distribution. The input features come from threesources. Local State Features are the arguments to the function in which the random choice occurs. Target Image Features come from 3x3pixel windows of the target image, extracted at multiple resolutions, around the procedural model’s current position. Partial Output Featuresare analogous windows extracted from the partial image the model has generated. All of these features can be computed from the targetimage and the sequence of random choices made thus far.

5 Implementation

In this section, we describe an implementation of neurally-guidedaccumulative procedural models: models that iteratively or recur-sively add new geometry to a structure. Most growth models, suchas L-systems, are accumulative [Prusinkiewicz and Lindenmayer1990]. This is in contrast with other modeling paradigms: spa-tial subdivision, such as architectural split grammars [Muller et al.2006]; object subdivision, such as fractal terrain [Lewis 1987]; orsimulation, such as erosion-based terrain [St’ava et al. 2008]. Forour purposes, a procedural model is accumulative if, while execut-ing, it provides a ‘current position’: a point p from which geom-etry generation will continue. We focus on 2D models (p ∈ R2),though the techniques we present extend naturally to 3D.

We first describe the neural network architecture used by theneurally-guided models before giving details on how we train them.

5.1 Network Architecture

Our neural networks take as input the constraint c (in this case, atarget image) and all previously-made random choices, and outputthe parameters of a random choice. Figure 4 shows our networkarchitecture. We use a multilayer perceptron (MLP) architecture,because it is simple, easy to scale, and is a universal function ap-proximator [Rumelhart et al. 1986; Cybenko 1989]. Our MLP takesnf inputs, has one hidden layer of size nf/2 with a tanh nonlinear-ity, and has np outputs, where np is the number of parameters therandom choice expects. Since some parameters are bounded (e.g.Gaussian variance must be positive), each output is remapped via anappropriate bounding transform (e.g. ex for non-negative parame-ters). We experimented with more hidden layers but found that thisdid not improve performance.

The inputs for the MLP come from several sources, each providingthe network with decision-critical information. All of these featurescan be computed from the target image and the choices-so-far; forefficiency, we compute them incrementally as the program runs.

Local State Features The first set of relevant data is the localstate of the procedural model: its current position p, the currentorientation of any local reference frame, its current recursion depth,etc. Our networks access this information via the arguments ofthe function in which the random choice occurs. We extract allna scalar arguments, normalize them to [−1, 1], and pass them tothe MLP.

Target Image Features To make appropriate decisions formatching a target image, the network must have access to that im-age. The raw pixels provide too much data; we need to summarizethe relevant image contents. Convolutional neural networks reducean image to a fixed-width feature vector but are aimed at classifi-cation tasks: they detect features but are intentionally invariant towhere those features occur [Krizhevsky et al. 2012].

Instead, we use a different, location-sensitive architecture. We ex-tract a 3x3 window of pixels around the model’s current positionp. We do this at four different resolution levels, with each levelcomputed by convolving the previous level with a 3x3 kernel andthen downsampling via a 2x2 box filter. For a image with chan-nel depth c, this results in 36c features. Together, these featuressummarize what the target image looks like from the proceduralmodel’s current position, where resolution decreases with distance.This architecture is similar to the foveated ‘glimpses’ used in recentwork on neural models of visual attention [Mnih et al. 2014].

Partial Output Features The target image features provide thenetwork with information it needs to generate matching contentwith accuracy (e.g. how to stay within a target shape) However,they do not provide the information necessary to achieve complete-ness (e.g. how to completely fill in a target shape). To give thenetwork this capability, we also extract multi-resolution windowsfrom the partial output image generated by the procedural modelthus far. This adds another 36c input features.

The parameters θ of this architecture consist of the weights andbiases for both fully-connected layers in the MLP, as well as thekernel weights and biases for the three convolution + downsampling

Scribbles

Glyphs

PhyloPic

Figure 5: Example images from our datasets.

layers on each image. Each network typically has around severalthousand such parameters. For example, given a program with fourlocal features (position x, position y, angle, width) which targets aone-channel image, the network that predicts the parameters of afour-component Gaussian mixture has 3466 parameters.

5.2 Training

We train neurally-guided procedural models by stochastic gradientascent using the gradient in Equation 2. Our system computes thisgradient via backpropagation from the log pi’s to the neural net-work parameters θ. We use the Adam algorithm for stochastic gra-dient optimization, with α = β = 0.75 and an initial learningrate of 0.01 [Kingma and Ba 2015]. We found that a mini-batchsize of one worked best in our experiments: more frequent gradientupdates led to faster convergence than less-frequent-but-less-noisyupdates. We terminate training after 20000 gradient updates.

5.3 Implementation Details

We implemented our prototype system in the Javascript-basedprobabilistic programming language WebPPL [Goodman andStuhlmuller 2014], with neural networks implemented using anopen-source Javascript library for neural computation.1 The sourcecode for our system is available at [LINK ANONYMIZED].

6 Experiments

In this section, we qualitatively and quantitatively evaluate howwell our neurally-guided procedural models capture image-basedconstraints. We first describe our databases of target images beforepresenting the details of several experiments. All timing data re-ported in this section was collected on an Intel Core i7-3840QMmachine with 16GB RAM running OSX 10.10.5.

6.1 Image Datasets

As shown in Equation 1, each training sample from a proceduralmodel must be paired with a constraint c drawn from a prior P (c)over possible constraints. During training, we sample target imagesuniformly at random from a database of training images. In ourexperiments, we use the following image collections:

• Scribbles: A set of 49 binary mask images drawn by handwith the brush tool in Photoshop. These were designed to

1https://github.com/dritchie/adnn

cover a range of possible shapes with thick and thin regions,high and low curvature, and different self-intersections.

• Glyphs: A subset of 197 glyphs from the FF Tartine ScriptBold typeface. Consists of all glyphs which have only oneforeground connected component and at least 500 foregroundpixels when rendered at 129x97.

• PhyloPic: A set of 35 images from PhyloPic, a database ofsilhouettes for plants, animals, and other organisms.2

When using these images for training, we augment the datasets byalso including a horizontally-mirrored duplicate of each image. Wealso annotate each image with a starting point and starting direc-tion from which to initialize the execution of a procedural model.Figure 5 shows some representative images from each collection.

6.2 Shape Matching

In our first set of experiments, we train neurally-guided proceduralmodels to capture 2D shape matching constraints, where the tar-get shape is specified as a binary mask image. If D is the spatialdomain of the image, and I(x) is the function which renders thecurrent partial output defined by random choices x, then the likeli-hood function for this constraint is

`shape(x, c) = N (sim(I(x), c)− sim(0, c)

1− sim(0, c), 1, σshape) (3)

sim(I1, I2) =

∑p∈D w(p) · 1{I1(p) = I2(p)}∑

p∈D w(p)

w(p) =

1 if I2(p) = 0

1 if ||∇I2(p)||= 1

wfilled if ||∇I2(p)||= 0

where N is the normal distribution. This function encourages therendered image to be similar to the target mask, where similarityis normalized against the target’s similarity to an empty image 0.Each pixel p’s contribution to the similarity is weighted by w(p),determined by whether the target mask is empty, filled, or has anedge at that pixel. We use wfilled = 0.6, so that empty and edgepixels are worth 1.5 times as much as filled pixels. This encouragesthe program to match perceptually-important contours before fillingin flat regions. We set σshape = 0.02 in all experiments.

We wrote a WebPPL program which recursively generates vineswith leaves and flowers and then trained a neurally-guided versionof this program to capture the above likelihood. The model wastrained on 10000 sample traces, each generated using SMC with600 particles. Target images were drawn uniformly at random fromthe Scribbles dataset. Each sample took on average 17 secondsto generate; parallelized across four CPU cores, the entire set ofsamples took approximately 12 hours to generate (later in this sec-tion, we show that far fewer samples are actually needed). Trainingtook 55 minutes in our single-threaded Javascript implementation.These times could be reduced with more efficient implementations(e.g. leveraging GPUs for training).

Figure 1 shows example outputs from the vines program. Theweighting scheme of `shape causes the geometry to adhere to tar-get shape contours, making the shape recognizable without clut-tering interior regions and obscuring the vines’ structural charac-teristics. This behavior is not easy to achieve with a purely gen-erative space-filling approach such as environmentally-sensitive L-systems [Prusinkiewicz et al. 1994], but it is simple to specify withconstraints. The top row outputs were generated using 10-particle

2http://phylopic.org

0 1 2 3 4 5 6 7 8 9 10 11Number of Particles

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7S

imila

rity

(a)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75Similarity

0.0

0.2

0.4

0.6

0.8

1.0

Com

puta

tion

Tim

e (s

econ

ds)

ConditionAll Features

+ Target Image Features

+ Local State Features

Constant Params

Unguided

(b)

Figure 6: Performance comparison for the shape matching problem. “Similarity” is median normalized similarity to target mask, averagedover all targets in a test dataset. Lines drawn in lighter shades show 95% confidence bounds. (a) Performance as the number of SMCparticles increases. The neurally-guided model achieves higher average similarity as more features are added. (b) Computation time requiredas desired similarity increases. The vertical gap between the two curves indicates speedup. Despite the neurally-guided model being moreexpensive to evaluate, it still reliably generates high-similarity results significantly faster than the unguided model.

SMC with the trained neurally-guided model, which reliably pro-duces recognizable results. In contrast, 10-particle SMC with theunguided model produces totally unrecognizable results (middlerow). Because the neural networks make the guided model morecomputationally-expensive to evaluate, a more equitable compari-son is to give the unguided model the same amount of wall-clocktime as the guided model—this correponds to ∼ 50 particles, inthis case (bottom row). While the resulting outputs fare better, thetarget shape is still obscured. We should also note that the unguidedmodel is unpredictable at such low particle counts; results of eventhis quality took many tries to obtain at 50 particles. In practice, wefind that the unguided model needs∼200 particles to reliably matchthe performance of the guided model. Figure 7 shows more outputsfrom the vines program, and Figure 8 shows example outputs froma neurally-guided procedural lightning program.

Figure 6 shows a quantitative comparison between five differentmodels on the shape matching task:

• Unguided: The original, unguided procedural model.

• Constant Params: The neural network for each randomchoice is simply a set of constant parameters; training thismodel finds the optimal constants. This is also known as apartial mean field approximation [Wingate and Weber 2012].

• + Local State Features: Adding the local state features de-scribed in Section 5.

• + Target Image Features: Adding the target image featuresdescribed in Section 5.

• All Features: The full neural net architecture described inSection 5, including local state features, target image features,and partial output features.

We test each model on the images in the Glyph dataset and reportthe median normalized similarity-to-target achieved (i.e. argumentone to the Gaussian in Equation 3). Figure 6a plots this averagesimilarity as the number of SMC particles increases. The perfor-mance of the neurally-guided models improves with the additionof more features; at 10 particles, the full model is already near thepeak performance asymptote. Figure 6b shows the wall-clock time


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Sim

ilarit

y

ConditionWith Mixture Distributions

Without Mixture Distributions

Figure 9: The effect of guiding continuous random choices withmixture distributions. Using 4-component mixtures for all continu-ous random choices provides a noticeable boost in performance.

each method requires as the desired average similarity increases.The vertical gap between the two curves shows the speedup givenby neural guidance, which can be as high as 10x. Note that wetrained on the Scribbles dataset but tested on the Glyphs dataset;these results suggest that our models can generalize to qualitatively-different previously-unseen images.

Figure 9 shows the benefit of using mixture distributions for con-tinuous random choices in the guided model. The experimentalsetup is the same as in Figure 6. We compare a model whichuses 4-component mixture distributions with a no-mixture model.The with-mixtures model provides a noticeable performance boost,which we alluded to in Section 3: when matching complex shapeswith junctions and intersections, such as the crossing of the letter‘t,’ the program benefits from modeling uncertainty at these pointswith multi-modal distributions. We found 4 mixture componentssufficient for our examples.

Target “Ground Truth” Guided Unguided (Equal N ) Unguided (Equal Time)

N = 600 , 38.68 s N = 5 , 0.86 s N = 5 , 0.09 s N = 30 , 0.83 s

N = 600 , 30.26 s N = 10 , 1.5 s N = 10 , 0.1 s N = 45 , 1.58 s

N = 600 , 33.5 s N = 10 , 1.23 s N = 10 , 0.14 s N = 40 , 1.28 s

N = 600 , 25.5 s N = 10 , 1.04 s N = 10 , 0.14 s N = 40 , 1.05 s

N = 600 , 25.55 s N = 15 , 1.75 s N = 15 , 0.23 s N = 50 , 1.73 s

N = 600 , 20.76 s N = 10 , 0.81 s N = 10 , 0.15 s N = 40 , 0.85 s

Figure 7: Using Sequential Monte Carlo to make a vine-growth procedural model match target images. N is the number of SMC particlesused. The “Ground Truth” column shows an example result after running SMC with the unguided model with a large number of particles(N = 600); these images represent the best possible result for a given target. Our neurally-guided procedural models can generate resultsof nearly this quality in a couple seconds; the unguided model struggles given the same number of particles or the same computation time.

0.99 s 0.81 s 1.01 s 1.03 s 0.9 s 1.16 s 0.86 s 1.08 s

Figure 8: Targeting letter shapes with a neurally-guided procedural lightning program. Generated using SMC with 10 particles; computetime required is shown below each letter. Best viewed on a high-resolution display.

10 20 50 100 200 500 1000 2000Number of Training Traces

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Sim

ilarit

y at

10

Par

ticle

s

Figure 10: The effect of training set size on performance (at 10SMC particles), plotted on a logarithmic scale. Average similarity-to-target increases sharply for the first few hundred sample trainingtraces, then appears to plateau at around 1000 traces. Noise inthe plot is due to randomness in neural net training, as differenttraining sessions converge to different local optima of the learningobjective function.

We also investigate how the number of training samples affects per-formance. Figure 10 plots the median similarity at 10 particles astraining set size increases. Performance increases rapidly for thefirst few hundred samples before appearing to level off (the noisein the curve is due to randomness in neural net training initializa-tion). This suggests that ∼1000 sample traces is sufficient, whichmay seem surprising, as many published deep learning systems re-quire millions of training examples [Krizhevsky et al. 2012]. Inour case, each training trace contains up to thousands of randomchoices, each of which provides a learning signal—in this way, thetraining data is “bigger” than it appears. Our implementation cangenerate 1000 samples in just over an hour using four CPU cores.As mentioned previously, this time could be reduced by ‘boostrap-ping’ the system: training on smaller subsets of data and using thepartially-learned model to generate further data faster.

6.3 Stylized “Circuit” Design

Thus far, we have focused on image-matching constraints. How-ever, the architecture we have presented can learn other types ofimage-based constraints. In this section, we constrain the vines pro-gram to generate outputs which resemble stylized circuit designs.

Dense packing of long wire traces is one of the most striking vi-sual characteristics of circuit boards. To achieve dense packing,we encourage the program to fill a certain percentage τ of the im-age (τ = 0.5 in the subsequent results). To mimic the appearance

of traces, we encourage the output image to have a dense, high-magnitude gradient field, as the vines program can best achieve thisresult by creating many long rectilinear or diagonal edges. Theseconstraints result in the following likelihood:

`circ(x) = N (edge(I(x)) · (1− η(fill(I(x)), τ)), 1, σcirc) (4)

edge(I) =1

|D|∑p∈D

||∇I(p)||

fill(I) =1

|D|∑p∈D

I(p)

where η(x, x) is the relative error of x from x and σcirc = 0.01.Finally, we also include a separate term that penalizes the programfrom generating geometry outside the bounds of the image; thisencourages the program to fill in a rectangular “die”-like region.

To guide this program, we use the same architecture as before, mi-nus the target image features (since there is no target image). Wetrain the neurally-guided model using 2000 traces generated usingSMC with 600 particles. Sample generation took about 10 hourson four CPU cores, and training took just under two hours. Fig-ure 11 shows some outputs from this program, and Figure 12 showsa performance comparison between unguided and neurally-guidedmodels for this task. As with the shape matching examples, theneurally-guided model generates high-scoring results significantlyfaster than the unguided model.

7 Discussion and Future Work

This paper introduced neurally-guided procedural models: con-strained procedural models that use neural networks to captureconstraint-induced dependencies. We developed a mathemati-cal framework for defining and training such models. We alsodescribed a specific neural architecture for accumulative modelsthat generate images. Finally, we evaluated the performance ofneurally-guided models, demonstrating that they can generate high-quality results significantly faster than unguided models.

7.1 Limitations

One limitation of our system is its need for training data, whichmust be generated via expensive inference. This can be a signifi-cant up-front cost, especially for computationally-expensive mod-els. Thus, our method is not well-suited for scenarios where the pro-cedural model changes rapidly, such as speeding up the inner loopof a development and debugging cycle. Instead, it is best suited forscenarios where the model is fixed, such as deploying a finalizedprocedural model as part of a design tool. It may be particularlyattractive for online, multi-user deployments, where the system cangather example results from the community, periodically retrain,and push the updated procedural model to users.

“Ground Truth” N = 600 Guided N = 15 Unguided (Equal N ) Unguided (Equal Time)

Figure 11: Constraining the vine-growth program to generate circuit-like patterns. The “Ground Truth” outputs took around 70 seconds togenerate; the outputs from the guided model took around 3.5 seconds.


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Sco

re

(a)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Score

0.0

0.5

1.0

1.5

2.0

2.5

Com

puta

tion

Tim

e (s

econ

ds)

ConditionNeurally-Guided

Unguided

(b)

Figure 12: Performance comparison for the circuit design problem. “Score” is median normalized score (i.e. argument one to the Gaussianin Equation 4), averaged over 50 runs. The neurally-guided version achieves significantly higher average scores than the unguided versiongiven the same number of particles or the same amount of compute time.

Our method is also not well-suited for capturing hard constraints,which some visual effects necessitate (e.g. symmetries), as it re-quires a continuous probability for each training sample. Whilehard constraints can sometimes be usefully approximated with tightsoft constraints, neural networks such as ours are best at approxi-mating noisy and/or random functions, not precise, deterministicrelationships. Other techniques are needed for generatively captur-ing these kinds of constraints.

7.2 Future Work

The neural guide architecture we presented applies to image-generating programs. How might we extend it to other output do-mains? The architecture must compactly represent the model’s par-tial output at any point in the program. For 3D modeling, our ar-chitecture extends naturally to 3D, e.g. using voxel grids instead ofimages. For other output domains, it may be possible to developarchitectures that learn a partial output state representation, as inrecent work on recurrent sequence generation [Graves 2013].

Our architecture also focuses on accumulative procedural models,but applications of other generative paradigms could also benefitfrom neural guidance. One example is texture generation, which re-peatedly generates content across its entire domain, often in a mul-tiscale fashion. In such a setting, the guide model cannot rely on a“current position” for extracting decision-critical features. It might

instead learn what parts of the current partial output are relevant, asrecently-developed visual attention models learn where to look inan image to make classification decisions [Mnih et al. 2014].

Our method could also be extended to inference algorithms be-yond Sequential Monte Carlo. For example, Markov Chain MonteCarlo works better when the likelihood of a partial output is notpredictive of the likelihood of the final output (e.g. generating sta-ble arches, where the structure is unstable until finished). Just asneurally-guided models are efficient importance samplers for SMC,they might also serve as efficient proposal distributions for MCMC.

By adding neural networks which predict random choice parame-ters, we have only explored one simple program transformation toenable constraint capture. More extensive transformations, such aschanging the control flow of the program, may be necessary forcapturing especially complex constraints. One step in this direc-tion would be to combine our approach with the grammar-splittingtechnique of Dang et al. [2015].

Ultimately, we envision a future in which procedural models learnto encode complex constraints using purely generative methods, sothat forward sampling alone produces beautiful results. The workwe have presented in this paper takes a step toward that goal, butit is only the beginning. We hope that our work inspires other re-searchers to develop new and better neural network architecturesand program transformations to attack this problem.

References

BENES, B., SAVA, O., MECH, R., AND MILLER, G. 2011. GuidedProcedural Modeling. In Eurographics 2011.

BROOKS, S., GELMAN, A., JONES, G., AND MENG, X. 2011.Handbook of Markov Chain Monte Carlo. CRC Press.

CYBENKO, G. 1989. Approximation by superpositions of a sig-moidal function. Mathematics of Control, Signals and Systems.

DANG, M., LIENHARD, S., CEYLAN, D., NEUBERT, B.,WONKA, P., AND PAULY, M. 2015. Interactive Design of Prob-ability Density Functions for Shape Grammars.

DOUCET, A., DE FREITAS, N., AND GORDON, N., Eds. 2001.Sequential Monte Carlo Methods in Practice. Springer.

GOODMAN, N. D., AND STUHLMULLER, A., 2014. The Designand Implementation of Probabilistic Programming Languages.http://dippl.org. Accessed: 2015-12-23.

GRAVES, A. 2013. Generating Sequences With Recurrent NeuralNetworks. CoRR abs/1308.0850.

GU, S., GHAHRAMANI, Z., AND TURNER, R. E. 2015. NeuralAdaptive Sequential Monte Carlo. In NIPS 2015.

J. MANNING, R. RANGANATH, K. N., AND BLEI, D. 2014. BlackBox Variational Inference. In AISTATS 2014.

KINGMA, D. P., AND BA, J. 2015. Adam: A Method for Stochas-tic Optimization. In ICLR 2015.

KINGMA, D. P., AND WELLING, M. 2014. Auto-Encoding Varia-tional Bayes. In ICLR 2014.

KRIZHEVSKY, A., SUTSKEVER, I., AND HINTON, G. E. 2012.ImageNet Classification with Deep Convolutional Neural Net-works. In NIPS 2012.

LEWIS, J. P. 1987. Generalized Stochastic Subdivision. ACMTrans. Graph. 6, 3.

LIN, S., RITCHIE, D., FISHER, M., AND HANRAHAN, P. 2013.Probabilistic Color-by-Numbers: Suggesting Pattern Coloriza-tions Using Factor Graphs. In SIGGRAPH 2013.

MACKAY, D. J. C. 2002. Information Theory, Inference & Learn-ing Algorithms. Cambridge University Press.

MARKS, J., ANDALMAN, B., BEARDSLEY, P. A., FREEMAN, W.,GIBSON, S., HODGINS, J., KANG, T., MIRTICH, B., PFISTER,H., RUML, W., RYALL, K., SEIMS, J., AND SHIEBER, S. 1997.Design galleries: A general approach to setting parameters forcomputer graphics and animation. In SIGGRAPH 1997.

MERRELL, P., SCHKUFZA, E., LI, Z., AGRAWALA, M., ANDKOLTUN, V. 2011. Interactive Furniture Layout Using InteriorDesign Guidelines. In SIGGRAPH 2011.

MNIH, A., AND GREGOR, K. 2014. Neural Variational Inferenceand Learning in Belief Networks. In ICML 2014.

MNIH, V., HEESS, N., GRAVES, A., AND KAVUKCUOGLU, K.2014. Recurrent Models of Visual Attention. In NIPS 2014.

MULLER, P., WONKA, P., HAEGLER, S., ULMER, A., ANDVAN GOOL, L. 2006. Procedural Modeling of Buildings. InSIGGRAPH 2006.

MECH, R., AND PRUSINKIEWICZ, P. 1996. Visual Models ofPlants Interacting with Their Environment. In SIGGRAPH 1996.

PRUSINKIEWICZ, P., AND LINDENMAYER, A. 1990. The Algo-rithmic Beauty of Plants. Springer-Verlag New York, Inc.

PRUSINKIEWICZ, P., JAMES, M., AND MECH, R. 1994. SyntheticTopiary. In SIGGRAPH 1994.

REZENDE, D. J., MOHAMED, S., AND WIERSTRA, D. 2014.Stochastic Backpropagation and Approximate Inference in DeepGenerative Models. In ICML 2014.

RITCHIE, D., LIN, S., GOODMAN, N. D., AND HANRAHAN, P.2015. Generating Design Suggestions under Tight Constraintswith Gradient-based Probabilistic Programming. In Eurograph-ics 2015.

RITCHIE, D., MILDENHALL, B., GOODMAN, N. D., AND HAN-RAHAN, P. 2015. Controlling Procedural Modeling Programswith Stochastically-Ordered Sequential Monte Carlo. In SIG-GRAPH 2015.

RUMELHART, D. E., HINTON, G. E., AND WILLIAMS, R. J.1986. Parallel Distributed Processing: Explorations in the Mi-crostructure of Cognition, Vol. 1. MIT Press, ch. Learning Inter-nal Representations by Error Propagation.

SCHWARZ, M., AND WONKA, P. 2014. Procedural Design ofExterior Lighting for Buildings with Complex Constraints. ACMTrans. Graph. 33, 5.

STAVA, O., PIRK, S., KRATT, J., CHEN, B., MCH, R., DEUSSEN,O., AND BENES, B. 2014. Inverse Procedural Modelling ofTrees. Computer Graphics Forum 33, 6.

TALTON, J. O., LOU, Y., LESSER, S., DUKE, J., MECH, R., ANDKOLTUN, V. 2011. Metropolis Procedural Modeling. ACMTrans. Graph. 30, 2.

VANEGAS, C. A., GARCIA-DORADO, I., ALIAGA, D. G.,BENES, B., AND WADDELL, P. 2012. Inverse Design of Ur-ban Procedural Models. In SIGGRAPH Asia 2012.

ST’AVA, O., BENES, B., BRISBIN, M., AND KRIVANEK, J. 2008.Interactive Terrain Modeling Using Hydraulic Erosion. In SCA2008.

WILLIAMS, R. J. 1992. Simple statistical gradient-following algo-rithms for connectionist reinforcement learning. Machine Learn-ing 8.

WINGATE, D., AND WEBER, T. 2012. Automated VariationalInference in Probabilistic Programming. In NIPS 2012 Workshopon Probabilistic Programming.

WONG, M. T., ZONGKER, D. E., AND SALESIN, D. H. 1998.Computer-generated Floral Ornament. In SIGGRAPH 1998.

YEH, Y.-T., YANG, L., WATSON, M., GOODMAN, N. D., ANDHANRAHAN, P. 2012. Synthesizing Open Worlds with Con-straints Using Locally Annealed Reversible Jump MCMC. InSIGGRAPH 2012.

YU, L.-F., YEUNG, S.-K., TERZOPOULOS, D., AND CHAN, T. F.2012. DressUp!: Outfit Synthesis Through Automatic Optimiza-tion. In SIGGRAPH Asia 2012.

YUMER, M. E., ASENTE, P., MECH, R., AND KARA, L. B. 2015.Procedural Modeling Using Autoencoder Networks. In UIST2015.

ZHU, L., XU, W., SNYDER, J., LIU, Y., WANG, G., AND GUO,B. 2012. Motion-guided Mechanical Toy Modeling. In SIG-GRAPH Asia 2012.

http://dippl.org

Documents

Neurally-Guided Procedural Models:earning to Guide ... · found other ways to apply neural networks to procedural model-ing. One recent project uses neural networks as computationally