17
Published as a conference paper at ICLR 2021 P RIVATE P OST-GAN B OOSTING Marcel Neunhoeffer University of Mannheim [email protected] Zhiwei Steven Wu Carnegie Mellon University [email protected] Cynthia Dwork Harvard University [email protected] ABSTRACT Differentially private GANs have proven to be a promising approach for gener- ating realistic synthetic data without compromising the privacy of individuals. Due to the privacy-protective noise introduced in the training, the convergence of GANs becomes even more elusive, which often leads to poor utility in the output generator at the end of training. We propose Private post-GAN boosting (Private PGB), a differentially private method that combines samples produced by the sequence of generators obtained during GAN training to create a high-quality synthetic dataset. To that end, our method leverages the Private Multiplicative Weights method (Hardt and Rothblum, 2010) to reweight generated samples. We evaluate Private PGB on two dimensional toy data, MNIST images, US Census data and a standard machine learning prediction task. Our experiments show that Private PGB improves upon a standard private GAN approach across a collection of quality measures. We also provide a non-private variant of PGB that improves the data quality of standard GAN training. 1 I NTRODUCTION The vast collection of detailed personal data, including everything from medical history to voting records, to GPS traces, to online behavior, promises to enable researchers from many disciplines to conduct insightful data analyses. However, many of these datasets contain sensitive personal information, and there is a growing tension between data analyses and data privacy. To protect the privacy of individual citizens, many organizations, including Google (Erlingsson et al., 2014), Microsoft (Ding et al., 2017), Apple (Differential Privacy Team, Apple, 2017), and more recently the 2020 US Census (Abowd, 2018), have adopted differential privacy (Dwork et al., 2006) as a mathematically rigorous privacy measure. However, working with noisy statistics released under differential privacy requires training. A natural and promising approach to tackle this challenge is to release differentially private synthetic data—a privatized version of the dataset that consists of fake data records and that approximates the real dataset on important statistical properties of interest. Since they already satisfy differential privacy, synthetic data enable researchers to interact with the data freely and to perform the same analyses even without expertise in differential privacy. A recent line of work (Beaulieu-Jones et al., 2019; Xie et al., 2018; Yoon et al., 2019) studies how one can generate synthetic data by incorpo- rating differential privacy into generative adversarial networks (GANs) (Goodfellow et al., 2014). Although GANs provide a powerful framework for synthetic data, they are also notoriously hard to train and privacy constraint imposes even more difficulty. Due to the added noise in the private gradient updates, it is often difficult to reach convergence with private training. In this paper, we study how to improve the quality of the synthetic data produced by private GANs. Unlike much of the prior work that focuses on fine-tuning of network architectures and training techniques, we propose Private post-GAN boosting (Private PGB)—a differentially private method that boosts the quality of the generated samples after the training of a GAN. Our method can be viewed as a simple and practical amplification scheme that improves the distribution from any ex- 1 arXiv:2007.11934v2 [cs.LG] 25 Mar 2021

arXiv:2007.11934v2 [cs.LG] 25 Mar 2021

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Published as a conference paper at ICLR 2021

PRIVATE POST-GAN BOOSTING

Marcel NeunhoefferUniversity of [email protected]

Zhiwei Steven WuCarnegie Mellon [email protected]

Cynthia DworkHarvard [email protected]

ABSTRACT

Differentially private GANs have proven to be a promising approach for gener-ating realistic synthetic data without compromising the privacy of individuals.Due to the privacy-protective noise introduced in the training, the convergenceof GANs becomes even more elusive, which often leads to poor utility in theoutput generator at the end of training. We propose Private post-GAN boosting(Private PGB), a differentially private method that combines samples produced bythe sequence of generators obtained during GAN training to create a high-qualitysynthetic dataset. To that end, our method leverages the Private MultiplicativeWeights method (Hardt and Rothblum, 2010) to reweight generated samples. Weevaluate Private PGB on two dimensional toy data, MNIST images, US Censusdata and a standard machine learning prediction task. Our experiments show thatPrivate PGB improves upon a standard private GAN approach across a collectionof quality measures. We also provide a non-private variant of PGB that improvesthe data quality of standard GAN training.

1 INTRODUCTION

The vast collection of detailed personal data, including everything from medical history to votingrecords, to GPS traces, to online behavior, promises to enable researchers from many disciplinesto conduct insightful data analyses. However, many of these datasets contain sensitive personalinformation, and there is a growing tension between data analyses and data privacy. To protectthe privacy of individual citizens, many organizations, including Google (Erlingsson et al., 2014),Microsoft (Ding et al., 2017), Apple (Differential Privacy Team, Apple, 2017), and more recentlythe 2020 US Census (Abowd, 2018), have adopted differential privacy (Dwork et al., 2006) as amathematically rigorous privacy measure. However, working with noisy statistics released underdifferential privacy requires training.

A natural and promising approach to tackle this challenge is to release differentially private syntheticdata—a privatized version of the dataset that consists of fake data records and that approximates thereal dataset on important statistical properties of interest. Since they already satisfy differentialprivacy, synthetic data enable researchers to interact with the data freely and to perform the sameanalyses even without expertise in differential privacy. A recent line of work (Beaulieu-Jones et al.,2019; Xie et al., 2018; Yoon et al., 2019) studies how one can generate synthetic data by incorpo-rating differential privacy into generative adversarial networks (GANs) (Goodfellow et al., 2014).Although GANs provide a powerful framework for synthetic data, they are also notoriously hardto train and privacy constraint imposes even more difficulty. Due to the added noise in the privategradient updates, it is often difficult to reach convergence with private training.

In this paper, we study how to improve the quality of the synthetic data produced by private GANs.Unlike much of the prior work that focuses on fine-tuning of network architectures and trainingtechniques, we propose Private post-GAN boosting (Private PGB)—a differentially private methodthat boosts the quality of the generated samples after the training of a GAN. Our method can beviewed as a simple and practical amplification scheme that improves the distribution from any ex-

1

arX

iv:2

007.

1193

4v2

[cs

.LG

] 2

5 M

ar 2

021

Published as a conference paper at ICLR 2021

isting black-box GAN training method – private or not. We take inspiration from an empiricalobservation in Beaulieu-Jones et al. (2019) that even though the generator distribution at the end ofthe private training may be a poor approximation to the data distribution (due to e.g. mode collapse),there may exist a high-quality mixture distribution that is given by several generators over differenttraining epochs. PGB is a principled method for finding such a mixture at a moderate privacy costand without any modification of the GAN training procedure.

To derive PGB, we first formulate a two-player zero-sum game, called post-GAN zero-sum game,between a synthetic data player, who chooses a distribution over generated samples over trainingepochs to emulate the real dataset, and a distinguisher player, who tries to distinguish generatedsamples from real samples with the set of discriminators over training epochs. We show that undera “support coverage” assumption the synthetic data player’s mixed strategy (given by a distributionover the generated samples) at an equilibrium can successfully “fool” the distinguisher–that is, nomixture of discriminators can distinguish the real versus fake examples better than random guessing.While the strict assumption does not always hold in practice, we demonstrate empirically that thesynthetic data player’s equilibrium mixture consistently improves the GAN distribution.

The Private PGB method then privately computes an approximate equilibrium in the game. Thealgorithm can be viewed as a computationally efficient variant of MWEM (Hardt & Rothblum,2010; Hardt et al., 2012), which is an inefficient query release algorithm with near-optimal samplecomplexity. Since MWEM maintains a distribution over exponentially many “experts” (the set ofall possible records in the data domain), it runs in time exponential in the dimension of the data. Incontrast, we rely on private GAN to reduce the support to only contain the set of privately generatedsamples, which makes PGB tractable even for high-dimensional data.

We also provide an extension of the PGB method by incorporating the technique of discriminatorrejection sampling (Azadi et al., 2019; Turner et al., 2019). We leverage the fact that the distin-guisher’s equilibrium strategy, which is a mixture of discriminators, can often accurately predictwhich samples are unlikely and thus can be used as a rejection sampler. This allows us to furtherimprove the PGB distribution with rejection sampling without any additional privacy cost since dif-ferential privacy is preserved under post-processing. Our Private PGB method also has a naturalnon-private variant, which we show improves the GAN training without privacy constraints.

We empirically evaluate both the Private and Non-Private PGB methods on several tasks. To vi-sualize the effects of our methods, we first evaluate our methods on a two-dimensional toy datasetwith samples drawn from a mixture of 25 Gaussian distributions. We define a relevant quality scorefunction and show that the both Private and Non-Private PGB methods improve the score of thesamples generated from GAN. We then show that the Non-Private PGB method can also be used toimprove the quality of images generated by GANs using the MNIST dataset. Finally, we focus onapplications with high relevance for privacy-protection. First we synthesize US Census datasets anddemonstrate that the PGB method can improve the generator distribution on several statistical mea-sures, including 3-way marginal distributions and pMSE. Secondly, we evaluate the PGB methodson a dataset with a natural classification task. We train predictive models on samples from PrivatePGB and samples from a private GAN (without PGB), and show that PGB consistently improvesthe model accuracy on real out-of-sample test data.

Related work. Our PGB method can be viewed as a modular boosting method that can improveon a growing line of work on differentially private GANs (Beaulieu-Jones et al., 2019; Xie et al.,2018; Frigerio et al., 2019; Torkzadehmahani et al., 2020). To obtain formal privacy guarantees,these algorithms optimize the discriminators in GAN under differential privacy, by using privateSGD, RMSprop, or Adam methods, and track the privacy cost using moments accounting Abadiet al. (2016); Mironov (2017). Yoon et al. (2019) give a private GAN training method by adaptingideas from the PATE framework (Papernot et al., 2018).

Our PGB method is inspired by the Private Multiplicative Weigths method (Hardt & Rothblum,2010) and its more practical variant MWEM (Hardt et al., 2012), which answer a large collectionof statistical queries by releasing a synthetic dataset. Our work also draws upon two recent tech-niques (Turner et al. (2019) and Azadi et al. (2019)) that use the discriminator as a rejection samplerto improve the generator distribution. We apply their technique by using the mixture discriminatorcomputed in PGB as the rejection sampler. There has also been work that applies the idea of boostingto (non-private) GANs. For example, Arora et al. (2017) and Hoang et al. (2018) propose methods

2

Published as a conference paper at ICLR 2021

that directly train a mixture of generators and discriminators, and Tolstikhin et al. (2017) proposesAdaGAN that reweighes the real examples during training similarly to what is done in AdaBoost(Freund & Schapire, 1997). Both of these methods may be hard to make differentially private: theyeither require substantially more privacy budget to train a collection of discriminators or increase theweights on a subset of examples, which requires more adding more noise when computing privategradients. In contrast, our PGB method boosts the generated samples post training and does notmake modifications to the GAN training procedure.

2 PRELIMINARIES

Let X denote the data domain of all possible observations in a given context. Let pd be a distributionover X . We say that two datasets X,X ′ ∈ Xn are adjacent, denoted by X ∼ X ′, if they differ by atmost one observation. We will write pX to denote the empirical distribution over X .Definition 1 (Differential Privacy (DP) (Dwork et al., 2006)). A randomized algorithm A : Xn →R with output domain R (e.g. all generative models) is (ε, δ)-differentially private (DP) if for alladjacent datasets X,X ′ ∈ Xn and for all S ⊆ R: P (A(X) ∈ S) ≤ eεP (A(X ′) ∈ S) + δ.

A very nice property of differential privacy is that it is preserved under post-processing.Lemma 1 (Post-processing). LetM be an (ε, δ)-differentially private algorithm with output rangeR and f : R→ R′ be any mapping, the composition f ◦M is (ε, δ)-differentially private.

As a result, any subsequent analyses conducted on DP synthetic data also satisfy DP.

The exponential mechanism (McSherry & Talwar, 2007) is a private mechanism for selecting amongthe best of a discrete set of alternativesR, where “best” is defined by a quality function q : Xn×R →R that measures the quality of the result r for the dataset X . The sensitivity of the quality score q isdefined as ∆(q) = maxr∈RmaxX∼X′ |q(X, r)−q(X ′, r)|. Then given a quality score q and privacyparameter ε, the exponential mechanismME(q, ε,X) simply samples a random alternative from therangeR such that the probability of selecting each r is proportional to exp(εq(X, r)/(2∆(q))).

2.1 DIFFERENTIALLY PRIVATE GAN

The framework of generative adversarial networks (GANs) (Goodfellow et al., 2014) consists oftwo types of neural networks: generators and discriminators. A generator G is a function that mapsrandom vectors z ∈ Z drawn from a prior distribution pz to a sample G(z) ∈ X . A discriminator Dtakes an observation x ∈ X as input and computes a probability D(x) that the observation is real.Each observation is either drawn from the underlying distribution pd or the induced distribution pgfrom a generator. The training of GAN involves solving the following joint optimization over thediscriminator and generator:

minG

maxD

Ex∼pX [f(D(x))] + Ez∼pz [f(1−D(G(z)))]

where f : [0, 1] → R is a monotone function. For example, in standard GAN, f(a) = log a,and in Wasserstein GAN (Arjovsky et al., 2017), f(a) = a. The standard (non-private) algorithmiterates between optimizing the parameters of the discriminator and the generator based on the lossfunctions:

LD = −Ex∼pX [f(D(x))]− Ez∼pz [f(1−D(G(z)))], LG = Ez∼pz [f(1−D(G(z)))]

The private algorithm for training GAN also performs the same alternating optimization, but itoptimizes the discriminator under differential privacy while keeping the generator optimization thesame. In general, the training proceeds over epochs τ = 1, . . . , N , and at the end of each epochτ the algorithm obtains a discriminator Dτ and a generator Gτ by optimizing the loss functionsrespectively. In Beaulieu-Jones et al. (2019); Xie et al. (2018), the private optimization on thediscriminators is done by running the private SGD method Abadi et al. (2016) or its variants. Yoonet al. (2019) performs the private optimization by incorporating the PATE framework Papernot et al.(2018). For all of these private GAN methods, the entire sequence of discriminators {D1, . . . , DN}satisfies privacy, and thus the sequence of generators {G1, . . . , GN} is also private since they canbe viewed as post-processing of the discriminators. Our PGB method is agnostic to the exact privateGAN training methods.

3

Published as a conference paper at ICLR 2021

3 PRIVATE POST-GAN BOOSTING

The noisy gradient updates impede convergence of the differentially private GAN training algo-rithm, and the generator obtained in the final epoch of the training procedure may not yield a goodapproximation to the data distribution. Nonetheless, empirical evidence has shown that a mixtureover the set of generators can be a realistic distribution (Beaulieu-Jones et al., 2019). We nowprovide a principled and practical scheme for computing such a mixture subject to a moderate pri-vacy budget. Recall that during private GAN training method produces a sequence of generatorsG = {G1, . . . , GN} and discriminators D = {D1, . . . , DN}. Our boosting method computes aweighted mixture of the Gj’s and a weighted mixture of the Dj’s that improve upon any individualgenerator and discriminator. We do that by computing an equilibrium of the following post-GAN(training) zero-sum game.

3.1 POST-GAN ZERO-SUM GAME.

We will first draw r independent samples from each generator Gj , and let B be the collection of therN examples drawn from the set of generators. Consider the following post-GAN zero-sum gamebetween a synthetic data player, who maintains a distribution φ over the data in B to imitate thetrue data distribution pX , and a distinguisher player, who uses a mixture of discriminators to tellthe two distributions φ and pX apart. This zero-sum game is aligned with the minimax game in theoriginal GAN formulation, but is much more tractable since each player has a finite set of strategies.To define the payoff in the game, we will adapt from the Wasserstein GAN objective since it is lesssensitive than the standard GAN objective to the change of any single observation (changing anysingle real example changes the payoff by at most 1/n), rendering it more compatible with privacytools. Formally, for any x ∈ B and any discriminator Dj , define the payoff as

U(x,Dj) = Ex′∼pX [Dj(x′)] + (1−Dj(x))

For any distribution φ over B, let U(φ, ·) = Ex∼φ[U(x, ·)], and similarly for any distribution ψover {D1, . . . , DN}, we will write U(·, ψ) = ED∼ψ[U(·, D)]. Intuitively, the payoff function Umeasures the predictive accuracy of the distinguisher in classifying whether the examples are drawnfrom the synthetic data player’s distribution φ or the private dataset X . Thus, the synthetic dataplayer aims to minimize U while the distinguisher player aims to maximize U .

Definition 2. The pair (D,φ) is an α-approximate equilibrium of the post-GAN game if

maxDj∈D

U(φ,Dj) ≤ U(φ,D) + α, and minφ∈∆(B)

U(φ,D) ≥ U(φ,D)− α (1)

By von Neumann’s minimax theorem, there exists a value V – called the game value – such that

V = minφ∈∆(B)

maxj∈[N ]

U(φ,Dj) = maxψ∈∆(D)

minx∈B

U(x, ψ)

The game value corresponds to the payoff value at an exact equilibrium of the game (that is α = 0).When the set of discriminators cannot predict the real versus fake examples better than randomguessing, the game value V = 1. We now show that under the assumption that the generatedsamples in B approximately cover the support of the dataset X , the distinguisher player cannotdistinguish the real and fake distributions much better than by random guessing.Theorem 1. Fix a private dataset X ∈ (Rd)n. Suppose that for every x ∈ X , there exists xb ∈ Bsuch that ‖x − xb‖2 ≤ γ. Suppose D includes a discriminator network D1/2 that outputs 1/2for all inputs, and assume that all networks in D are L-Lipschitz. Then there exists a distributionφ ∈ ∆(B) such that (φ,D1/2) is a Lγ-approximate equilibrium, and so 1 ≤ V ≤ 1 + Lγ.

We defer the proof to the appendix. While the support coverage assumption is strong, we showempirically the synthetic data player’s mixture distribution in an approximate equilibrium improveson the distribution given by the last generatorGN even when the assumption does not hold. We nowprovide a method for computing an approximate equilibrium of the game.

3.2 BOOSTING VIA EQUILIBRIUM COMPUTATION.

Our post-GAN boosting (PGB) method computes an approximate equilibrium of the post-GANzero-sum game by simulating the so-called no-regret dynamics. Over T rounds the synthetic data

4

Published as a conference paper at ICLR 2021

player maintains a sequence of distributions φ1, . . . , φT over the set B, and the distinguisher plays asequence of discriminatorsD1, . . . , DT . At each round t, the distinguisher first selects a discrimina-torD using the exponential mechanismME with the payoff U(φt, ·) as the score function. This willfind an accurate discriminator Dt against the current synthetic distribution φt, so that the syntheticdata player can improve the distribution. Then the synthetic data player updates its distribution to φtbased on an online no-regret learning algorithm–the multiplicative weights (MW) method Kivinen& Warmuth (1997). We can view the set of generated examples in B as a set of “experts”, andthe algorithm maintains a distribution over these experts and, over time, places more weight on theexamples that can better “fool” the distinguisher player. To do so, MW updates the weight for eachx ∈ B with

φt+1(x) ∝ φt exp(−ηU(x,Dt)

)∝ exp

(ηDt(x)

)(2)

where η is the learning rate. At the end, the algorithm outputs the average plays (D,φ) for bothplayers. We will show these form an approximate equilibrium of the post-GAN zero-sum game(Freund & Schapire, 1997).

Algorithm 1 Differentially Private Post-GAN Boosting

Require: a private dataset X ∈ Xn, a synthetic dataset B generated by the set of generators G, acollection of discriminators {D1, . . . , DN}, number of iterations T , per-round privacy budget ε0,learning rate parameter η.Initialize φ1 to be the uniform distribution over Bfor t = 1, . . . , T do

Distinguisher player: Run exponential mechanism ME to select a discriminator Dt usingquality score q(X,Dj) = U(φt, Dj) and privacy parameter ε0.Synthetic data player: Multiplicative weights update on the distribution over B: for eachexample b ∈ B:

φt+1(b) ∝ φt(b) exp(ηDt(b))

Let D be the discriminator defined by the uniform average over the set {D1, . . . , DT }, and φ bethe distribution defined by the average over the set {φ1, . . . , φT }

Note that the synthetic data player’s MW update rule does not involve the private dataset, and henceis just a post-processing step of the selected discriminator Dt. Thus, the privacy guarantee followsfrom applying the advacned composition of T runs of the exponential mechanism.1

Theorem 2 (Privacy Guarantee). For any δ ∈ (0, 1), the private MW post-amplification algorithmsatisfies (ε, δ)-DP with ε =

√2 log(1/δ)Tε0 + Tε0(exp(ε0)− 1).

Note that if the private GAN training algorithm satisfies (ε1, δ1)-DP and the Private PGB methodsatisfies (ε2, δ2)-DP, then the entire procedure is (ε1 + ε2, δ1 + δ2)-DP.

We now show that the pair of average plays form an approximate equilibrium of the game.

Theorem 3 (Approximate Equilibrium). With probability 1−β, the pair (D,φ) is an α-approximateequilibrium of the post-GAN zero-sum game with α = 4η + log |B|

ηT + 2 log(NT/β)nε0

. If T ≥ n2 and

η = 12

√log(|B|)/T , then

α = O

(log(nN |B|/β)

nε0

)We provide a proof sketch here and defer the full proof to the appendix. By the result of Fre-und & Schapire (1997), if the two players have low regret in the dynamics, then their averageplays form an approximate equilibrium, where the regret of the two players is defined as Rsyn =∑Tt=1 U(φt, Dt)−minb∈B

∑Tt=1 U(b,Dt) andRdis = maxDj

∑Tt=1 U(φt, Dj)−

∑Tt=1 U(φt, Dt).

Then the approximate equilibrium guarantee directly follows from bounding Rsyn with the regretbound of MW and Rdis with the approximate optimality of the exponential mechanism.

1Note that since the quality scores from the GAN Discriminators are assumed to be probabilities and thescore function takes an average over n probabilities (one for each private example), the sensitivity is ∆(q) = 1

n.

5

Published as a conference paper at ICLR 2021

Non-Private PGB. The Private PGB method has a natural non-private variant: in each round, in-stead of drawing from the exponential mechanism, the distinguisher player will simply compute theexact best response: Dt = arg maxDj

U(φt, Dj). Then if we set learning rate η = 12

√log(|B|)/T

and run for T = log(|B|)/α2 rounds, the pair (D,φ) returned is an α-approximate equilibrium.

Extension with Discriminator Rejection Sampling. The mixture discriminator D at the equi-librium provides an accurate predictor on which samples are unlikely. As a result, we can useD to further improve the data distribution φ by the discriminator rejection sampling (DRS) tech-nique of Azadi et al. (2019). The DRS scheme in our setting generates a single example as fol-lows: first draw an example x from φ (the proposal distribution), and then accept x with probabil-ity proportional to D(x)/(1 − D(x)). Note that the optimal discriminator D∗ that distinguishesthe distribution φ from true data distribution pd will accept x with probability proportional topd(x)/pφ(x) = D∗(x)/(1−D∗(x)). Our scheme aims to approximate this ideal rejection samplingby approxinatingD∗ with the equilibrium strategyD, whereas prior work uses the last discriminatorDN as an approximation.

4 EMPIRICAL EVALUATION

We empirically evaluate how both the Private and Non-Private PGB methods affect the utility ofthe generated synthetic data from GANs. We show two appealing advantages of our approach: 1)non-private PGB outperforms the last Generator of GANs, and 2) our approach can significantlyimprove the synthetic examples generated by a GAN under differential privacy.

Datasets. We assess our method with a toy dataset drawn from a mixture of 25 Gaussians, which iscommonly used to evaluate the quality of GAN (Srivastava et al., 2017; Azadi et al., 2019; Turneret al., 2019) and synthesize MNIST images. We then turn to real datasets from the American Census,and a standard machine learning dataset (Titanic).

Privacy budget. For the tasks with privacy, we set the privacy budget to be the same across allalgorithms. Since Private PGB requires additional privacy budget this means that the differentiallyprivate GAN training has to be stopped earlier as compared to running only a GAN to achieve thesame privacy guarantee. Our principle is to allocate the majority of the privacy budget to the GANtraining, and a much smaller budget for our Private PGB method. Throughout we used 80% to 90%of the final privacy budget on DP GAN training.2

Utility measures. Utility of synthetic data can be assessed along two dimensions; general utilityand specific utility (Snoke et al., 2018; Arnold & Neunhoeffer, 2020). General utility describes theoverall distributional similarity between the real data and synthetic datasets, but does not capturespecific use cases of synthetic data. To assess general utility, we use the propensity score meansquared error (pMSE) measure (Snoke et al., 2018) (detailed in the Appendix). Specific utility of asynthetic dataset depends on the specific use an analyst has in mind. In general, specific utility canbe defined as the similarity of results for analyses using synthetic data instead of real data. For eachof the experiments we define specific utility measures that are sensible for the respective example.For the toy dataset of 25 gaussians we look at the number of high quality samples. For the AmericanCensus data we compare marginal distributions of the synthetic data to marginal distributions of thetrue data and look at the similarity of regression results.

4.1 EVALUATION OF NON-PRIVATE PGB

Mixture of 25 Gaussians. We first examine the performance of our approach on a two dimensionaldataset with a mixture of 25 multivariate Gaussian distributions, each with a covariance matrix of0.0025I . The left column in Figure 1 displays the training data. Each of the 25 clusters consists

2Our observation is that the DP GAN training is doing the “heavy lifting”. Providing a good “basis” forPGB requires a substantial privacy expenditure in training DP GAN. The privacy budget allocation is a hy-perparameter for PGB that could be tuned. In general, the problem of differentially private hyperparameterselection is extremely important and the literature is thin (Liu & Talwar, 2019; Chaudhuri & Vinterbo, 2013).

6

Published as a conference paper at ICLR 2021

of 1, 000 observations. The architecture of the GAN is the same across all results.3 To comparethe utility of the synthetic datasets with the real data, we inspect the visual quality of the resultsandcalculate the proportion of high quality synthetic examples similar to Azadi et al. (2019),Turner et al.(2019) and Srivastava et al. (2017).4

Visual inspection of the results without privacy (in the top row of Figure 1) shows that our proposedmethod outperforms the synthetic examples generated by the last Generator of the GAN, as well asthe last Generator enhanced with DRS. PGB over the last 100 stored Generators and Discriminatorstrained for T = 1, 000 update steps, and the combination of PGB and DRS, visibly improves theresults. The visual impression is confirmed by the proportion of high quality samples. The data fromthe last GAN generator have a proportion of 0.904 high quality samples. The synthetic data afterPGB achieves a higher score of 0.918. The DRS samples have a proportion of 0.826 high qualitysamples, and the combination of PGB and DRS a higher proportion of 0.874 high quality samples.5

MNIST Data. We further evaluate the performance of our method on an image generation taskwith the MNIST dataset. Our results are based on the DCGAN GAN architecture (Radford et al.,2015) with the KL-WGAN loss (Song & Ermon, 2020). To evaluate the quality of the generatedimages we use a metric that is based on the Inception score (IS) (Salimans et al., 2016), whereinstead of the Inception Net we use a MNIST Classifier that achieves 99.65% test accuracy. Thetheoretical best score of the MNIST IS is 10, and the real test images achieve a score of 9.93.Without privacy the last GAN Generator achieves a score of 8.41, using DRS on the last Generatorslightly decreases the score to 8.21, samples with PGB achieve a score of 8.76, samples with thecombination of PGB and DRS achieve a similar score of 8.77 (all inception scores are calculated on5,000 samples). Uncurated samples for all methods are included in the Appendix.

4.2 EVALUATION OF PRIVATE PGB

Mixture of 25 Gaussians. To show how the differentially private version of PGB improves thesamples generated from GANs that were trained under differential privacy, we first re-run the ex-periment with the two-dimensional toy data.6 Our final value of ε is 1 and δ is 1

2N . For the resultswith PGB, the GAN training contributes ε1 = 0.9 to the overall ε and the Private PGB algorithmε2 = 0.1. Again a first visual inspection of the results in Figure 1 (in the bottom row) shows thatpost-processing the results of the last GAN Generator with Private PGB is worthwhile. Private PGBover the last 100 stored Generators and Discriminators trained for T = 1, 000 update steps, again,visibly improves the results. Again, our visual impression is confirmed by the proportion of highquality samples. The last Generator of the differentially private GAN achieves a proportion of 0.031high quality samples. With DRS on top of the last Generator, the samples achieve a quality score of0.035. The GAN enhanced with Private PGB achieves a proportion of 0.044 high quality samples,the combination of Private PGB and DRS achieves a quality score of 0.053.

MNIST Data. On the MNIST data, with differential privacy (ε = 10, δ = 12N ) the last DP GAN

Generator achieves an inception score of 8.07, using DRS on the last Generator the IS improves to8.18. With Private PGB the samples achieve an IS of 8.58, samples with the combination of PrivatePGB and DRS achieve the highest IS of 8.66.7 Uncurated samples for all methods are included inthe Appendix.

Private Synthetic 1940 American Census Samples. While the results on the toy dataset are en-couraging, the ultimate goal of private synthetic data is to protect the privacy of actual persons in

3A description of the architecture is in the Appendix. The code for the GANs and the PGB algorithm willbe made available on GitHub.

4Note that the scores in Azadi et al. (2019) and Turner et al. (2019) do not account for the synthetic datadistribution across the 25 modes. We detail our evaluation of high quality examples in the Appendix.

5The lower scores for the DRS samples are due to the capping penalty in the quality metric. Without thecapping penalty the scores are 0.906 for the last Generator, 0.951 for PGB , 0.946 for DRS and 0.972 for thecombination of PGB and DRS.

6To achieve DP, we trained the Discriminator with a DP optimizer as implemented intensorflow privacy or the opacus library. We keep track of the values of ε and δ by using themoments accountant (Abadi et al., 2016; Mironov, 2017).

7All inception scores are calculated on 5, 000 samples.

7

Published as a conference paper at ICLR 2021

Figure 1: Real samples from 25 multivariate normal distributions, synthetic examples without pri-vacy from a GAN, DRS, Non-Private PGB and the combination of PGB and DRS (top row). Syn-thetic examples from a GAN with differential privacy, DP DRS, Private PGB and the combinationof Private PGB and DRS (bottom row).

data collections, and to provide useful data to interested analysts. In this section we report the resultsof synthesizing data from the 1940 American Census. We rely on the public use micro data samples(PUMS) as provided in Ruggles et al. (2019).8 For 1940 we synthesize an excerpt of the 1% sampleof all Californians that were at least 18 years old.9 Our training sample consists of 39,660 observa-tions and 8 attributes (sex, age, educational attainment, income, race, Hispanic origin, marital statusand county). The test set contains another 9,915 observations. Our final value of ε is 1 and δ is

12N ≈ 6.3×10−6 (after DP GAN training with ε1 = 0.8 and PGB with ε2 = 0.2, δ1 = 1

2N , δ2 = 0).The general utility scores as measured by the pMSE ratio score are 2.357 (DP GAN), 2.313 (DPDRS), 2.253 (DP PGB), and 2.445 (DP PGB+DRS). This indicates that PGB achieves the bestgeneral utility. To assess the specific utility of our synthetic census samples we compare one-waymarginal distributions to the same marginal distributions in the original data. In panel (A) of Figure2 we show the distribution of race membership. Comparing the synthetic data distributions to thetrue distribution (in gray), we conclude that PGB, improves upon the last Generator. To underpinthe visual impression we calculate the total variation distance between each of the synthetic distri-butions and the real distribution, the data from the last GAN Generator has a total variation distanceof 0.58, DP DRS of 0.44, DP PGB of 0.22 and DP PGB+DRS of 0.13. Furthermore, we evaluatewhether more complex analysis models, such as regression models, trained on synthetic samplescould be used to make sensible out-of-sample predictions. Panel (B) of Figure 2 shows a parallelcoordinate plot to compare the out-of-sample root mean squared error of regression models trainedon real data and trained on synthetic data. The lines show the RMSE for predicted income for alllinear regression models trained with three independent variables from the set of on the syntheticdata generated with Private PGB as compared to the last GAN generator and other post processingmethods like DRS.

8Further experiments using data from the 2010 American Census can be found in the appendix.9A 1% sample means that the micro data contains 1% of the total American (here Californian) population.

8

Published as a conference paper at ICLR 2021

Figure 2: Specific Utility of Synthetic 1940 American Census Data. Panel (A): Distribution of RaceMembership in Synthetic Samples. Panel (B): Regression RMSE with Synthetic Samples.

Machine Learning Prediction with Synthetic Data. In a final set of experiments we evaluate theperformance of machine learning models trained on synthetic data (with and without privacy) andtested on real out-of-sample data. We synthesize the Kaggle Titanic10 training set (891 observationsof Titanic passengers on 8 attributes) and train three machine learning models (Logistic Regression,Random Forests (RF) (Breiman, 2001) and XGBoost (Chen & Guestrin, 2016)) on the syntheticdatasets to predict whether someone survived the Titanic catastrophe. We then evaluate the perfor-mance on the test set with 418 observations. To address missing values in both the training set andthe test set we independently impute values using the MissForest (Stekhoven & Buhlmann, 2012)algorithm. For the private synthetic data our final value of ε is 2 and δ is 1

2N (for PGB this impliesDP GAN training with ε1 = 1.6 and PGB ε2 = 0.4). The models trained on synthetic data gener-ated with our approaches (PGB and PGB+DRS) consistently perform better than models trained onsynthetic data from the last generator or DRS – with or without privacy.11

ACKNOWLEDGMENTS

This work began when the authors were at the Simons Institute participating in the “Data Privacy:Foundations and Applications” program. We thank Thomas Steinke, Adam Smith, Salil Vadhan,and the participants of the DP Tools meeting at Harvard for helpful comments. Marcel Neunhoefferis supported by the University of Mannheim’s Graduate School of Economic and Social Sciencesfunded by the German Research Foundation. Zhiwei Steven Wu is supported in part by an NSFS&CC grant #1952085, a Google Faculty Research Award, and a Mozilla research grant. CynthiaDwork is supported by the Alfred P. Sloan Foundation, “Towards Practicing Privacy” and NSF CCF-1763665.

10https://www.kaggle.com/c/titanic/data11Table 1 in the appendix summarizes the results in more detail. We present the accuracy, ROC AUC and PR

AUC to evaluate the performance.

9

Published as a conference paper at ICLR 2021

REFERENCES

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, andLi Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security, CCS ’16, pp. 308–318, New York, NY,USA, 2016. ACM. ISBN 978-1-4503-4139-4. doi: 10.1145/2976749.2978318. URL http://doi.acm.org/10.1145/2976749.2978318.

John M. Abowd. The U.S. census bureau adopts differential privacy. In Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London,UK, August 19-23, 2018, pp. 2867, 2018. doi: 10.1145/3219819.3226070. URL https://doi.org/10.1145/3219819.3226070.

Martın Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein GAN. CoRR, abs/1701.07875,2017. URL http://arxiv.org/abs/1701.07875.

Christian Arnold and Marcel Neunhoeffer. Really useful synthetic data–a framework to evaluate thequality of differentially private synthetic data. arXiv preprint arXiv:2004.07740, 2020.

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(6):121–164, 2012. doi: 10.4086/toc.2012.v008a006. URL http://www.theoryofcomputing.org/articles/v008a006.

Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Generalization and equilibriumin generative adversarial nets (GANs). In Doina Precup and Yee Whye Teh (eds.), Proceedings ofthe 34th International Conference on Machine Learning, volume 70 of Proceedings of MachineLearning Research, pp. 224–232, International Convention Centre, Sydney, Australia, 06–11 Aug2017. PMLR. URL http://proceedings.mlr.press/v70/arora17a.html.

Samaneh Azadi, Catherine Olsson, Trevor Darrell, Ian J. Goodfellow, and Augustus Odena. Dis-criminator rejection sampling. In 7th International Conference on Learning Representations,ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019. URL https://openreview.net/forum?id=S1GkToR5tm.

Brett K. Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P. Bhavnani,James Brian Byrd, and Casey S. Greene. Privacy-preserving generative deep neural networks sup-port clinical data sharing. Circulation: Cardiovascular Quality and Outcomes, 12(7):e005122,2019. doi: 10.1161/CIRCOUTCOMES.118.005122.

Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

Kamalika Chaudhuri and Staal A Vinterbo. A stability-based validation procedure for differentiallyprivate machine learning. In Advances in Neural Information Processing Systems, pp. 2652–2660,2013.

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794,2016.

Differential Privacy Team, Apple. Learning with privacy at scale. https://machinelearning.apple.com/docs/learning-with-privacy-at-scale/appledifferentialprivacysystem.pdf, December 2017.

Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. InAdvances in Neural Information Processing Systems 30, NIPS ’17, pp. 3571–3580. Curran Asso-ciates, Inc., 2017.

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivityin private data analysis. In Proceedings of the 3rd Theory of Cryptography Conference, volume3876, pp. 265–284, 2006.

Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatableprivacy-preserving ordinal response. In Proceedings of the 2014 ACM Conference on Computerand Communications Security, CCS ’14, pp. 1054–1067, New York, NY, USA, 2014. ACM.

10

Published as a conference paper at ICLR 2021

Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learningand an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139,1997. ISSN 0022-0000. doi: https://doi.org/10.1006/jcss.1997.1504. URL http://www.sciencedirect.com/science/article/pii/S002200009791504X.

Lorenzo Frigerio, Anderson Santana de Oliveira, Laurent Gomez, and Patrick Duverger. Differen-tially private generative adversarial networks for time series, continuous, and discrete open data.In ICT Systems Security and Privacy Protection - 34th IFIP TC 11 International Conference,SEC 2019, Lisbon, Portugal, June 25-27, 2019, Proceedings, pp. 151–164, 2019. doi: 10.1007/978-3-030-22312-0\ 11. URL https://doi.org/10.1007/978-3-030-22312-0_11.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proceedings of the27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14,pp. 2672–2680, Cambridge, MA, USA, 2014. MIT Press. URL http://dl.acm.org/citation.cfm?id=2969033.2969125.

Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-preservingdata analysis. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010,October 23-26, 2010, Las Vegas, Nevada, USA, pp. 61–70, 2010. doi: 10.1109/FOCS.2010.85.URL https://doi.org/10.1109/FOCS.2010.85.

Moritz Hardt, Katrina Ligett, and Frank McSherry. A simple and practical algorithm for differ-entially private data release. In Advances in Neural Information Processing Systems 25: 26thAnnual Conference on Neural Information Processing Systems 2012. Proceedings of a meetingheld December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 2348–2356, 2012.

Quan Hoang, Tu Dinh Nguyen, Trung Le, and Dinh Phung. MGAN: Training generative adversarialnets with multiple generators. In International Conference on Learning Representations, 2018.URL https://openreview.net/forum?id=rkmu5b0a-.

Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax. arXivpreprint arXiv:1611.01144, 2016.

Jyrki Kivinen and Manfred K. Warmuth. Exponentiated gradient versus gradient descent for linearpredictors. Information and Computation, 132(1):1 – 63, 1997. ISSN 0890-5401. doi: https://doi.org/10.1006/inco.1996.2612. URL http://www.sciencedirect.com/science/article/pii/S0890540196926127.

Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In Proceedings of the51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 298–309, 2019.

Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuousrelaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.

Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Proceedings ofthe 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’07, pp. 94–103,Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-3010-9. doi: 10.1109/FOCS.2007.41. URL http://dx.doi.org/10.1109/FOCS.2007.41.

Ilya Mironov. Renyi differential privacy. In 30th IEEE Computer Security Foundations Symposium,CSF 2017, Santa Barbara, CA, USA, August 21-25, 2017, pp. 263–275, 2017. doi: 10.1109/CSF.2017.11. URL https://doi.org/10.1109/CSF.2017.11.

Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Ulfar Er-lingsson. Scalable private learning with PATE. In 6th International Conference on LearningRepresentations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference TrackProceedings, 2018. URL https://openreview.net/forum?id=rkZB1XbRZ.

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deepconvolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

11

Published as a conference paper at ICLR 2021

Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and MatthewSobek. Ipums usa: Version 9.0 [dataset]. Minneapolis, MN: IPUMS, 10:D010, 2019. doi:10.18128/D010.V9.0.

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.Improved techniques for training gans. In Advances in neural information processing systems,pp. 2234–2242, 2016.

Joshua Snoke, Gillian M. Raab, Beata Nowok, Chris Dibben, and Aleksandra Slavkovic. Generaland specific utility measures for synthetic data. Journal of the Royal Statistical Society: Series A(Statistics in Society), 181(3):663–688, 2018. doi: 10.1111/rssa.12358. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12358.

Jiaming Song and Stefano Ermon. Bridging the gap between f -gans and wasserstein gans, 2020.

Akash Srivastava, Lazar Valkov, Chris Russell, Michael U Gutmann, and Charles Sutton. Veegan:Reducing mode collapse in gans using implicit variational learning. In Advances in Neural Infor-mation Processing Systems, pp. 3308–3318, 2017.

Daniel J Stekhoven and Peter Buhlmann. Missforest—non-parametric missing value imputation formixed-type data. Bioinformatics, 28(1):112–118, 2012.

Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, and BernhardScholkopf. Adagan: Boosting generative models. In Proceedings of the 31st International Con-ference on Neural Information Processing Systems, NIPS’17, pp. 5430–5439, USA, 2017. CurranAssociates Inc. ISBN 978-1-5108-6096-4. URL http://dl.acm.org/citation.cfm?id=3295222.3295294.

Reihaneh Torkzadehmahani, Peter Kairouz, and Benedict Paten. DP-CGAN: differentially privatesynthetic data and label generation. CoRR, abs/2001.09700, 2020. URL https://arxiv.org/abs/2001.09700.

Ryan Turner, Jane Hung, Eric Frank, Yunus Saatchi, and Jason Yosinski. Metropolis-Hastings gener-ative adversarial networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedingsof the 36th International Conference on Machine Learning, volume 97 of Proceedings of MachineLearning Research, pp. 6345–6353, Long Beach, California, USA, 09–15 Jun 2019. PMLR. URLhttp://proceedings.mlr.press/v97/turner19a.html.

Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. Differentially private generativeadversarial network. CoRR, abs/1802.06739, 2018. URL http://arxiv.org/abs/1802.06739.

Jinsung Yoon, James Jordon, and Mihaela van der Schaar. PATE-GAN: Generating synthetic datawith differential privacy guarantees. In International Conference on Learning Representations,2019. URL https://openreview.net/forum?id=S1zk9iRqF7.

12

Published as a conference paper at ICLR 2021

A PROOFS

A.1 PROOF OF THEOREM 1

Proof of Theorem 1. Note that if the synthetic data player plays the distribution over X , thenU(pX , D) = Ex∼pX [D(x′)] + Ex∼φ[1 − D(x)] = 1 for any discriminator D ∈ D. Now letus replace each element in X with its γ-approximation in B and obtain a new dataset XB , andlet pXB

denote the empirical distribution over XB . By the Lipschitz conditions, we then have|U(pX , D)− U(pXB

, D)| ≤ Lγ. This means U(pXB, D) ∈ [1− Lγ, 1 + Lγ] for all D. Also, for

all φ ∈ ∆(B), we have U(φ,D1/2) = 1. Thus, (pXb, D1/2) satisfies equation 1 with α = Lγ.

A.2 PROOF OF THE APPROXIMATE EQUILIBRIUM

Proof. We will use the seminal result of Freund & Schapire (1997), which shows that if the twoplayers have low regret in the dynamics, then their average plays form an approximate equilibrium.First, we will bound the regret from the data player. The regret guarantee of the multiplicativeweights algorithm (see e.g. Theorem 2.3 of Arora et al. (2012)) gives

T∑t=1

U(φt, Dt)−minb∈B

T∑t=1

U(b,Dt) ≤ 4ηT +log |B|η

(3)

Next, we bound the regret of the distinguisher using the accuracy guarantee of the exponentialmechanism (McSherry & Talwar, 2007). For each t, we know with probability (1− β/T ),

maxDj

U(φt, Dj)− U(φt, Dt) ≤ 2 log(NT/β)

nε0

Taking a union bound, we have this accuracy guarantee holds for all t, and so

maxDj

T∑t=1

U(φt, Dj)−T∑t=1

U(φt, Dt) ≤ 2T log(NT/β)

nε0(4)

Then following the result of Freund & Schapire (1997), their average plays (D,φ) is an α-approximate equilibrium with

α = 4η +log |B|ηT

+2 log(NT/β)

nε0

Plugging in the choices of T and η gives the stated bound.

B ADDITIONAL DETAILS ON THE QUALITY EVALUATION

B.1 ON THE CALCULATION OF THE PMSE.

To calculate the pMSE one trains a discriminator to distinguish between real and synthetic examples.The predicted probability of being classified as real or synthetic is the propensity score. Takingall propensity scores into account the mean squared error between the propensity scores and theproportion of real data examples is calculated. A synthetic dataset has high general utility, if themodel can at best predict probabilities of 0.5 for both real and synthetic examples, then the pMSEwould be 0.

B.2 SPECIFIC UTILITY MEASURE FOR THE 25 GAUSSIANS.

In the real data, given the data generating process outlined in section 4.1, at each of the 25 modes99% of the observations lie within a circle with radius r =

√0.0025 · 9.21034 around the mode

centroids, where 9.21034 is the critical value at p = 0.99 of a χ2 distribution with 2 degrees offreedom, and 0.0025 is the variance of the spherical gaussian.

13

Published as a conference paper at ICLR 2021

To calculate the quality score we count the number of observations within each of these 25 circles.If one of the modes contains more points than we would expect given the true distribution the countis capped accordingly. Our quality score for the toy dataset of 25 gaussians can be expressed asQ =

∑25i (min(pireal ·Nsyn, N i

syn)/Nsyn), where i indexes the clusters, preal is the true distributionof points per cluster,N i

syn the number of observations at a cluster within radius r, andNsyn the totalnumber of synthetic examples.

C GAN ARCHITECTURES

C.1 DETAILS ON THE EXPERIMENTS WITH THE 25 GAUSSIANS.

The generator and discriminator are neural nets with two fully connected hidden layers (Discrimi-nator: 128, 256; Generator: 512, 256) with Leaky ReLu activations. The latent noise vector Z is ofdimension 2 and independently sampled from a gaussian distribution with mean 0 and standard de-viation of 1. For GAN training we use the KL-WGAN loss (Song & Ermon, 2020). Before passingthe Discriminator scores to PGB we transform them to probabilities using a sigmoid activation.

C.2 GAN ARCHITECTURE FOR THE 1940 AMERICAN CENSUS DATA.

The GAN networks consist of two fully connected hidden layers (256, 128) with Leaky ReLu activa-tion functions. To sample from categorical attributes we apply the Gumbel-Softmax trick (Maddisonet al., 2016; Jang et al., 2016) to the output layer of the Generator. We run our PGB algorithm overthe last 150 stored Generators and Discriminators and train it for T = 400 update steps.

D PRIVATE SYNTHETIC 2010 AMERICAN DECENNIAL CENSUS SAMPLES.

We conducted further experiments on more recent Census files. The 2010 data is similar to thedata that the American Census is collecting for the 2020 decennial Census. For this experiment,we synthesize a 10% sample for California with 3,723,669 observations of 5 attributes (gender,age, Hispanic origin, race and puma district membership). Our final value of ε is 0.795 and δis 1

2N ≈ 1.34 × 10−7 (for PGB the GAN training contributes ε = 0.786 and PGB ε = 0.09).The pMSE ratio scores are 1.934 (DP GAN), 1.889 (DP DRS), 1.609 (DP PGB) and 1.485 (DPPGB+DRS), here PGB achieves the best general utility. For specific utility, we compare the accuracyof three-way marginals on the synthetic data to the proportions in the true data.12 We tabulate race(11 answer categories in the 2010 Census) by Hispanic origin (25 answer categories in the 2010Census) by gender (2 answer categories in the 2010 Census) giving us a total of 550 cells. To assessthe specific utility for these three-way marginals we calculate the average accuracy across all 550cells. Compared to the true data DP GAN achieves 99.82%, DP DRS 99.89%, DP PGB 99.89%and the combination of DP PGB and DRS 99.93%. Besides the average accuracy across all 550cells another interesting metric of specific utility is the number of cells in which each synthesizerachieves the highest accuracy compared to the other methods, this is the case 43 times for DP GAN,30 times for DP DRS, 90 times for DP PGB and 387 times for DP PGB+DRS. Again, this showsthat our proposed approach can improve the utility of private synthetic data.

E DETAILED RESULTS OF MACHINE LEARNING PREDICTION WITHSYNTHETIC DATA

Table 1 summarizes the results for the machine learning prediction experiment with the Titanic data.We present the accuracy, ROC AUC and PR AUC to evaluate the performance. It can be seen thatthe models trained on synthetic data generated with our approach consistently perform better thanmodels trained on synthetic data from the last generator or DRS – with or without privacy. To putthese values into perspective, the models trained on the real training data and tested on the sameout-of-sample data achieve the scores in table 2.

12A task that is similar to the tables released by the Census.

14

Published as a conference paper at ICLR 2021

Table 1: Predicting Titanic Survivors with Machine Learning Models trained on synthetic data andtested on real out-of-sample data. Median scores of 20 repetitions with independently generatedsynthetic data. With differential privacy ε is 2 and δ is 1

2N ≈ 5.6× 10−4.

GAN DRS PGB PGB+ DRS

Logit Accuracy 0.626 0.746 0.701 0.765Logit ROC AUC 0.591 0.760 0.726 0.792Logit PR AUC 0.483 0.686 0.655 0.748RF Accuracy 0.594 0.724 0.719 0.742RF ROC AUC 0.531 0.744 0.741 0.771RF PR AUC 0.425 0.701 0.706 0.743XGBoost Accuracy 0.547 0.724 0.683 0.740XGBoost ROC AUC 0.503 0.732 0.681 0.772XGBoost PR AUC 0.400 0.689 0.611 0.732

DP DP DP DP PGBGAN DRS PGB +DRS

Logit Accuracy 0.566 0.577 0.640 0.649Logit ROC AUC 0.477 0.568 0.621 0.624Logit PR AUC 0.407 0.482 0.532 0.547RF Accuracy 0.487 0.459 0.481 0.628RF ROC AUC ROC AUC 0.512 0.553 0.558 0.652RF PR AUC PR AUC 0.407 0.442 0.425 0.535XGBoost Accuracy 0.577 0.589 0.609 0.641XGBoost ROC AUC 0.530 0.586 0.619 0.596XGBoost PR AUC 0.398 0.479 0.488 0.526

Table 2: Predicting Titanic Survivors with Machine Learning Models trained on real data and testedon real out-of-sample data.

Model Score

Logit Accuracy 0.764Logit ROC AUC 0.813Logit PR AUC 0.785

RF Accuracy 0.768RF ROC AUC 0.809RF PR AUC 0.767

XGBoost Accuracy 0.768XGBoost ROC AUC 0.773XGBoost PR AUC 0.718

F SYNTHETIC MNIST SAMPLES

Figure 3 shows uncurated samples from the last Generator after 30 epochs of training without dif-ferential privacy in Panel 3a and with differential privacy (ε =, δ =) in Panel 3b. Figure 4 showsuncurated samples with DRS on the last Generator. Figure 5 shows uncurated samples after PGBand Figure 6 shows uncurated samples after the combination of PGB and DRS. In Figure 7 we showthe 100 samples with the highest PGB probabilities.

15

Published as a conference paper at ICLR 2021

(a) Without Differential Privacy. (b) With Differential Privacy (ε = 10, δ = 12N

)

Figure 3: Uncurated MNIST samples from last GAN Generator after 30 epochs.

(a) Without Differential Privacy. (b) With Differential Privacy (ε = 10, δ = 12N

)

Figure 4: Uncurated MNIST samples with DRS on last Generator after 30 epochs.

(a) Without Differential Privacy. (b) With Differential Privacy (ε = 10, δ = 12N

)

Figure 5: Uncurated MNIST samples with PGB after 30 epochs (without DP) and 25 epochs (withDP).

16

Published as a conference paper at ICLR 2021

(a) Without Differential Privacy. (b) With Differential Privacy (ε = 10, δ = 12N

)

Figure 6: Uncurated MNIST samples with PGB and DRS after 30 epochs (without DP) and 25epochs (with DP).

(a) Without Differential Privacy. (b) With Differential Privacy (ε = 10, δ = 12N

)

Figure 7: Top 100 MNIST samples after PGB after 30 epochs (without DP) and 25 epochs (withDP).

17