35
“I love working with astronomers, since their data is worthless.” 8 Jim Gray, Microso'

Large-Scale Inference in Time Domain Astrophysics

Embed Size (px)

DESCRIPTION

Presented at the 2014 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2014), June 19, 2014 (Berkeley, CA): The scientific promise of modern astrophysical surveys - from exoplanets to gravity waves - is palpable. Yet extracting insight from the data deluge is neither guaranteed nor trivial: existing paradigms for analysis are already beginning to breakdown under the data velocity. I will describe our efforts to apply statistical machine learning to large-scale astronomy datasets both in batch and streaming mode. From the discovery of supernovae to the characterization of tens of thousands of variable stars such approaches are leading the way to novel inference. Specific discoveries concerning precision distance measurements and using LSST as a pseudo-spectrograph will be discussed.

Citation preview

Page 1: Large-Scale Inference in Time Domain Astrophysics

“I#love#working#with#astronomers,#since#their#data#is#worthless.”

8#Jim#Gray,#Microso'

Page 2: Large-Scale Inference in Time Domain Astrophysics

Large&Scale*Inference*in*Time&Domain*Astrophysics

Joshua'Bloom'UC'Berkeley,'Astronomy

@pro>sbMMDS#2014#8#Berkeley#8#19#June#2014

Page 3: Large-Scale Inference in Time Domain Astrophysics

c. 1890Harvard College Observatory

Page 4: Large-Scale Inference in Time Domain Astrophysics

c. 1890Harvard College Observatory

Binary stars deeplyinteresting:Reveal the fundamental properties of stars

Page 5: Large-Scale Inference in Time Domain Astrophysics

Large*Synop:c*Survey*Telescope*(LSST)*&*2020+*# Light#curves#for#800M#sources#every#3#days#####106#supernovae/yr,#105#eclipsing#binaries#####3.2#gigapixel#camera,#20#TB/night

Gaia*space*astrometry*mission*&*2015+####1#billion#stars#observed#∼70#Tmes#over#5#years#######Will#observe#20K#supernovae

Many#other#astronomical#surveys#are#already#producing#data

Astronomical*Data*Deluge

Page 6: Large-Scale Inference in Time Domain Astrophysics

Goal#all8sky#radio#map#of#“epoch#of#reionizaTon”

2006#simulaTon

AntennaData#Rate

Max#InternalData#Rate

CorrelatedData#Volume

CompressedData#Volume

PAPER8128 0.4#Tbps 210#Tbps 200#TB 30#TB

HERA8331 2.6#Tbps 3500#Tbps 2800#TB 140#TB

HERA8576 4.6#Tbps 10,000#Tbps 8400#TB 420#TB

Data*challenge

Hybrid*Compute*Solu:on:

custom/FPGA GPU cluster

Prof.#A.#Parsons#(PI;#UC#Berkeley)

Page 7: Large-Scale Inference in Time Domain Astrophysics

Bayesian FrequenTst

Theory/HypothesisDriven

DataDriven

non-parametric

parametric

Data*Inference*Space

Hardware###laptops#→#clusters/supercomputersSofware###Python/Scipy,#R,#...

Carbonware###(astro)#grad#students,#postdocs

Page 8: Large-Scale Inference in Time Domain Astrophysics

Variable Source Taxonomy: A Mess

Page 9: Large-Scale Inference in Time Domain Astrophysics

•#noisy,#irregularly#sampled

Considerable*Complica:ons*with*Time&Series*Data

Page 10: Large-Scale Inference in Time Domain Astrophysics

•#noisy,#irregularly#sampled

•#spurious#data

Considerable*Complica:ons*with*Time&Series*Data

Page 11: Large-Scale Inference in Time Domain Astrophysics

•#noisy,#irregularly#sampled

•#telltale#signature#event#may#not#have#happened#yet

•#spurious#data

Considerable*Complica:ons*with*Time&Series*Data

Page 12: Large-Scale Inference in Time Domain Astrophysics

variability*metrics:e.g.#Stetson#indices,#χ2/dof#(constant#hypothesis)

periodic*metrics:e.g.#dominant#frequencies#in#Lomb8Scargle,#phase#offsets#between#periodsshape*analysis

e.g.#skewness,#kurtosis,#Gaussianity

context*metricse.g.#distance#to#nearest#galaxy,#type#of#nearest#galaxy,#locaTon#

in#the#eclipTc#plane

Machine&Learning*Approach*to*Classifica:on

Wózniak#et#al.#2004;#Protopapas+06,#Willemsen#&#Eyer#2007;#Debosscher#et#al.#2007;#Mahabal#et#al.#2008;#Sarro#et#al.#2009;#Blomme#et#al.#2010;#Kim+11,#Richards+11

Engineered#features*homogenize#data#→#�pDescribe#Tme8domain#characterisTcs#&#context#of#a#source

p#≈#100#features#computed#in#<#1#sec#per#88core#machine#(including#periodogram#analysis)

Page 13: Large-Scale Inference in Time Domain Astrophysics

Structured Classification

Structured Classification: Let class taxonomy guide classifier.

HSC: Hierarchical single-labelclassification.

I Fit separate classifier ateach non-terminal node.

HMC: Hierarchical multi-labelclassification.

I Fit one classifier, whereL(y ,�f (x)) � wdepth

0

J. Richards Time-Series Classification 31

Structured*Learning

Richards+11

5%*gross*mis&classifica:on*

rate!

Page 14: Large-Scale Inference in Time Domain Astrophysics

Results: All-Sky Automated Survey Classifications

28-class variable star classification problem with 50,000 stars

!

!

!

!

!

!

!

!!

!

0 2 4 6 8

0.66

0.68

0.70

0.72

0.74

0.76

0.78

0.80

AL Iteration

Perc

ent A

gree

men

t with

AC

VS

!

!

!

!

!

!

!

!

!

!

0 2 4 6 8

0.15

0.20

0.25

0.30

0.35

0.40

AL Iteration

Perc

ent o

f Con

fiden

t ASA

S R

F La

bels

Off-the-shelf RFError Rate = 34.5%

RF w/ Active LearningError Rate = 20.5%

3-fold increase in classifier confidence

Note: No other method yielded improvement in classification

J. Richards Astronomical Discovery and Classification 32

Active Learning

Page 15: Large-Scale Inference in Time Domain Astrophysics

Machine*Learned*Classifica:on

258class#variable#starData:#50k#from#ASAS,#810#with#known#labels#

(Tmeseries,#colors)

PRRL#=#0.94

Richards+12

Page 16: Large-Scale Inference in Time Domain Astrophysics

Machine*Learned*Classifica:on

258class#variable#starData:#50k#from#ASAS,#810#with#known#labels#

(Tmeseries,#colors)

PRRL#=#0.94

Richards+12

74#dimensional#feature#set#for#learning

featurizaTon#is#the#bopleneck#

(but#embarrassingly#

parallel)

The Astrophysical Journal Supplement Series, 203:32 (27pp), 2012 December Richards et al.

y. W Ursae Maj.

x. Beta Lyrae

w. Beta Persei

v. Ellipsoidal

u. S Doradus

t. Herbig AE/BE

s3. RS CVn

s2. Weak−line T Tauri

s1. Class. T Tauri

r1. RCB

q. Chem. Peculiar

p. RSG

o. Pulsating Be

l. Beta Cephei

j1. SX Phe

j. Delta Scuti

i. RR Lyrae, DM

h. RR Lyrae, FO

g. RR Lyrae, FM

f. Multi. Mode Cepheid

e. Pop. II Cepheid

d. Classical Cepheid

c. RV Tauri

b4. LSP

b3. SARG B

b2. SARG A

b1. Semireg PV

a. Mira

a b1 b2 b3 b4 c d e f g h i j j1 l o p q r1 s1 s2 s3 t u v w x y

Pre

dict

ed C

lass

True Class

91 68 15 29 54 24 89 13 4 86 29 2 28 1 18 5 35 23 17 4 20 17 8 1 1 27 33 68

0.011

0.066

0.923

0.044

0.029

0.074

0.015

0.824

0.015

0.067

0.067

0.267

0.6

0.034

0.069

0.586

0.034

0.276

0.87

0.019

0.111

0.042

0.042

0.75

0.125

0.042

0.011

0.011

0.011

0.955

0.011

0.077

0.077

0.308

0.538

0.5

0.25

0.25

0.012

0.012

0.965

0.012

0.069

0.034

0.897

0.5

0.5

0.036

0.036

0.107

0.786

0.036

1

0.722

0.278

0.2

0.2

0.4

0.2

0.686

0.171

0.086

0.057

0.043

0.913

0.043

0.059

0.706

0.059

0.176

0.25

0.25

0.25

0.25

0.05

0.7

0.1

0.1

0.05

0.059

0.118

0.353

0.059

0.118

0.059

0.059

0.059

0.059

0.059

0.375

0.125

0.125

0.375

1

1

0.074

0.889

0.037

0.091

0.667

0.182

0.061

0.882

0.044

0.015

0.015

0.015

0.015

0.015

Figure 5. Cross-validated confusion matrix for all 810 ASAS training sources. Columns are normalized to sum to unity, with the total number of true objects of eachclass listed along the bottom axis. The overall correspondence rate for these sources is 80.25%, with at least 70% correspondence for half of the classes. Classes withlow correspondence are those with fewer than 10 training sources or classes which are easily confused. Red giant classes tend to be confused with other red giantclasses and eclipsing classes with other eclipsing classes. There is substantial power in the top-right quadrant, where rotational and eruptive classes are misclassifiedas red giants; these errors are likely due to small training set size for those classes and difficulty to classify those non-periodic sources.(A color version of this figure is available in the online journal.)

with any monotonically increasing function (which is typicallyrestricted to a set of non-parametric isotonic functions, such asstep-wise constants). A drawback to both of these methods isthat they assume a two-class problem; a straightforward wayaround this is to treat the multi-class problem as C one-versus-all classification problems, where C is the number of classes.However, we find that Platt Scaling is too restrictive of a trans-formation to reasonably calibrate our data and determine thatwe do not have enough training data in each class to use IsotonicRegression with any degree of confidence.

Ultimately, we find that a calibration method similar to theone introduced by Bostrom (2008) is the most effective for ourdata. This method uses the probability transformation

!pij ="pij + r(1 − pij ) if pij = max{pi1, pi2, . . . , piC}pij (1 − r) otherwise,

(4)where {pi1, pi2, . . . , piC} is the vector of class probabilitiesfor object i and r ∈ [0, 1] is a scalar. Note that the adjusted

probabilities, {!pi1, !pi2, . . . , !piC}, are proper probabilities in thatthey are each between 0 and 1 and sum to unity for each object.The optimal value of r is found by minimizing the Brier score(Brier 1950) between the calibrated (cross-validated) and trueprobabilities.14 We find that using a fixed value for r is toorestrictive and, for objects with small maximal RF probability,it enforces too wide of a margin between the first- and second-largest probabilities. Instead, we implement a procedure similarto that of Bostrom (2008) and parameterize r with a sigmoidfunction based on the classifier margin, ∆i = pi,max −pi,2nd, foreach source,

r(∆i) = 11 + eA∆i+B

− 11 + eB

, (5)

where the second term ensures that there is zero calibration per-formed at ∆i = 0. This parameterization allows the amount of

14 The Brier score is defined as B(!p) = 1/N#N

i=1#C

j=1(I (yi = j ) − !pij )2,where N is the total number of objects, C is the number of classes, andI (yi = j ) is 1 if and only if the true class of the source i is j.

11

Page 17: Large-Scale Inference in Time Domain Astrophysics

hOp://bigmacc.infoMachine8learned'Variable'Star'catalog:

Page 18: Large-Scale Inference in Time Domain Astrophysics

Doing*Science*with*Probabilis:c*Catalogs

Demographics#(with#liple#followup):####trading#high#purity#at#the#cost#of#lower#efficiency))))e.g.,)using)RRL)to)find)new)Galac9c)structure

Novelty#Discovery#(with#lots#of#followup):####trading#high#efficiency#for#lower#purity))))e.g.,)discovering)new)instances)of)rare)classes))

Page 19: Large-Scale Inference in Time Domain Astrophysics

DRAFT April 20, 2012

Discovery of Bright Galactic R Coronae Borealis and DY Persei

Variables: Rare Gems Mined from ASAS

A. A. Miller1,⇤, J. W. Richards1,2, J. S. Bloom1, S. B. Cenko1, J. M. Silverman1,

D. L. Starr1, and K. G. Stassun3,4

ABSTRACT

We present the results of a machine-learning (ML) based search for new R

Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy

using cataloged light curves obtained by the All-Sky Automated Survey (ASAS).

RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of

great interest owing to the insights they can provide on the late stages of stellar

evolution. DYPers are possibly the low-temperature, low-luminosity analogs to

the RCB phenomenon, though additional examples are needed to fully estab-

lish this connection. While RCB stars and DYPers are traditionally identified

by epochs of extreme dimming that occur without regularity, the ML search

framework more fully captures the richness and diversity of their photometric

behavior. We demonstrate that our ML method recovers ASAS candidates that

would have been missed by traditional search methods employing hard cuts on

amplitude and periodicity. Our search yields 13 candidates that we consider

likely RCB stars/DYPers: new and archival spectroscopic observations confirm

that four of these candidates are RCB stars and four are DYPers. Our discovery

of four new DYPers increases the number of known Galactic DYPers from two

to six; noteworthy is that one of the new DYPers has a measured parallax and is

m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations

of these new DYPers should prove instrumental in establishing the RCB con-

nection. We consider these results, derived from a machine-learned probabilistic

1Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA

2Statistics Department, University of California, Berkeley, CA, 94720-7450, USA

3Department of Physics and Astronomy, Vanderbilt University, Nashville, TN 37235, USA

4Department of Physics, Fisk University, 1000 17th Ave. N., Nashville, TN 37208, USA

*E-mail: [email protected]

arX

iv:1

204.

4181

v1 [

astro

-ph.

SR]

18 A

pr 2

012

– 13 –

Fig. 2.— ASAS V -band light curves of newly discovery RCB stars and DYPers. Note the

di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confirm

the top four candidates to be RCB stars, while the bottom four are DYPers.

– 13 –

Fig. 2.— ASAS V -band light curves of newly discovery RCB stars and DYPers. Note the

di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confirm

the top four candidates to be RCB stars, while the bottom four are DYPers.

– 13 –

Fig. 2.— ASAS V -band light curves of newly discovery RCB stars and DYPers. Note the

di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confirm

the top four candidates to be RCB stars, while the bottom four are DYPers.

– 13 –

Fig. 2.— ASAS V -band light curves of newly discovery RCB stars and DYPers. Note the

di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confirm

the top four candidates to be RCB stars, while the bottom four are DYPers.17#known#GalacTc#RCB/DY#Per

Page 20: Large-Scale Inference in Time Domain Astrophysics

E.)Ramirez?Ruiz)(UCSC)

50 100 150 200Days Since Explosion

Type Ia

NS + NS Mergers

Type IIp

NS + RSG CollisionIMBH + WD Collision

Pair Production Supernovae

-10

-12

-14

-16

-18

-20

-22M

H

z=0.45

200Mpc

-log(brightness)

Extragalac:c*Transient*Universe:*Explosive*Systems

Page 21: Large-Scale Inference in Time Domain Astrophysics
Page 22: Large-Scale Inference in Time Domain Astrophysics

strategyscheduling

observingreducTon

findingdiscovery

classificaTonfollowup

inference

Towards(a(Fully(Automated(Scien5fic(Stackfor(Transients}current

state)of)the)artstack

automated#(e.g.iPTF)#not#(yet)#automated

typing

papers

NSF/CDINSF/BIGDATA

Page 23: Large-Scale Inference in Time Domain Astrophysics

4 H. Brink et al.

Figure 1. Examples of bogus (top) and real (bottom) thumbnails.Note that the shapes of the bogus sources can be quite varied,which poses a challenge in developing features that can accuratelyrepresent all of them. In contrast, the set of real detections ismore uniform in terms of the shapes and sizes of the subtractionresidual. Hence, we focus on finding a compact set of features thataccurately captures the relevant characteristics of real detectionsas discussed in §2.2.

candidates. For every real or bogus candidate, we have atour disposal the subtraction image of the candidate (whichis reduced to a 21-by-21 pixel—about 10 times the medianseeing full width at half maximum—postage stamp imagecentered around the candidate), and metadata about thereference and subtraction images. Figure 1 shows subtrac-tion thumbnail images for several arbitrarily chosen bogusand real candidates.

In this work, we supplement the set of features devel-oped by Bloom et al. (2011) with image-processing featuresextracted from the subtraction images and summary statis-tics from the PTF reduction pipeline. These new features—which are detailed below—are designed to mimic the wayhumans can learn to distinguish real and bogus candidatesby visual inspection of the subtraction images. For conve-nience, we describe the features from Bloom et al. (2011),hereafter the RB1 features, in Table 1, along with the fea-tures added in this work. In §3.1, we critically examine therelative importance of all the features and select an optimalsubset for real–bogus classification.

Prior to computing features on each subtraction imagepostage stamp, we normalize the stamps so that their pixel

values lie between �1 and 1. As the pixel values for real can-didates can take on a wide range of values depending on theastrophysical source and observing conditions, this normal-ization ensures that our features are not overly sensitive tothe peak brightness of the residual nor the residual level ofbackground flux, and instead capture the sizes and shapes ofthe subtraction residual. Starting with the raw subtractionthumbnail, I, normalization is achieved by first subtract-ing the median pixel value from the subtraction thumbnailand then dividing by the maximum absolute value across allmedian-subtracted pixels via

IN(x, y) =

⇢I(x, y)�med[I(x, y)]max{abs[I(x, y)]}

�. (1)

Analysis of the features derived from these normalized realand bogus subtraction images showed that the transfor-mation in (1) is superior to other alternatives, such asthe Frobenius norm (

ptrace(IT I)) and truncation schemes

where extreme pixel values are removed.Using Figure 1 as a guide, our first intuition about

real candidates is that their subtractions are typically az-imuthally symmetric in nature, and well-represented by a2-dimensional Gaussian function, whereas bogus candidatesare not well behaved. To this end, we define a spherical 2DGaussian, G(x, y), over pixels x, y as

G(x, y) = A · exp

⇢�

12

(c

x

� x)2

+(c

y

� y)2

��, (2)

which we fit to the normalized PTF subtraction image, IN

,of each candidate by minimizing the sum-of-squared di↵er-ence between the model Gaussian image and the candidatepostage stamp with respect to the central position (c

x

, c

y

),amplitude A

1 and scale � of the Gaussian model. This fitis obtained by employing an L-BFGS-B optimization algo-rithm (Lu, Nocedal & Zhu 1995). The best fit scale and am-plitude determine the scale and amp features, respectively,while the gauss feature is defined as the sum-of-squared dif-ference between the optimal model and image, and corr

is the Pearson correlation coe�cient between the best-fitmodel and the subtraction image.

Next, we add the feature sym to measure the symmetryof the subtraction image. The sym feature should be smallfor real candidates, whose subtraction image tends to have aspherically symmetric residual. sym is computed by first di-viding the subtraction thumbnail into four equal-sized quad-rants, then summing the flux over the pixels in each quad-rant (in units of standard deviations above the background)and lastly averaging the sum-of-squares of the di↵erences be-tween each quadrant to the others. Thus, sym will be largefor di↵erence images that are not symmetric and will benearly zero for highly symmetric di↵erence images.

Next, we introduce features that aim to capture thesmoothness characteristics of the subtraction image thumb-nails. A typical real candidate will have a smoothly varyingsubtraction image with a single prominent peak while bogus

1 As subtraction images of real candidates can be negative whenthe brightness of the source is decreasing, we allow the Gaussianamplitude A to take on negative, as well as positive, values.

c� 2012 RAS, MNRAS 000, 1–16

“bogus”

“real”

PTF)subtrac9ons

Goal:build#a#framework#to#discover#variable/transient#sources#without#people

•#fast#(compared#to#people)•#parallelizeable•#transparent•#determinisTc•#versionable

1000)to)1)needle)in)the)haystack)problem

Discovery*Engine

Page 24: Large-Scale Inference in Time Domain Astrophysics

“Discovery”*is*Imperfect

Real or Bogus? 5

Fig. 2.— Histogram of a selection of features divided in real (purple) and bogus (cyan) populations. First two newly introduced featuresgauss and amp, the goodness-of-fit and amplitude of the Gaussian fit. Then mag ref, the magnitude of the source in the reference image,flux ratio, the ratio of the fluxes in the new and reference images and lastly, ccid, the ID of the camera CCD where the source wasdetected. The fact that this feature is useful at all is surprising, but we can clearly see that there are a higher probability of the candidatesbeeing real or bogus on some of the CCDs.

els of performance in the astronomy literature ( | joey:

add refs | ). A description of the algorithm can be foundin Breiman (2001). Briefly, the method aggregates a col-lection of hundreds to thousands of classification trees,and for a given new candidate, outputs the fraction ofclassifiers that vote real. If this fraction is greater thansome threshold ⌧ , random forest classifies the candidateas real ; otherwise it is deemed to be bogus.While an ideal classifier will have no missed detections

(i.e., no real identified as bogus), with zero false positives(bogus identified as real), a realistic classifier will typi-cally o↵er a trade-o↵ between the two types of errors. Areceiver operating characteristic (ROC) curve is a com-monly used diagram which displays the missed detectionrate (MDR) versus the false positive rate (FPR) of a clas-sifier6. With any classifier, we face a trade-o↵ betweenMDR and FPR: the larger the threshold ⌧ by which wedeem a candidate to be real, the lower the MDR buthigher the FPR and vice versa. Varying ⌧ maps out theROC curve for a particular classifier, and we can com-pare the performance of di↵erent classifiers by comparingtheir cross-validated ROC curves: the lower the curve thebetter the classifier.A commonly used figure of merit (FoM) for selecting

a classifier is the so-called Area Under the Curve (AUC,Friedman et al. (2001)), by which the classifier with min-imal AUC is deemed optimal. This criterion is agnosticto the actual FPR or MDR requirements for the problemat hand, and thus is not appropriate for our purposes. In-deed, the ROC curves of di↵erent classifiers often cross,so that performance in one regime does not necessarilycarry over to other regimes. In the real–bogus classifica-tion problem, we instead define our FoM as the MDR at1% FPR, which we aim to minimize. The choice of thisparticular value for the false positive rate stems from apractical reason: we don’t want to be swamped by boguscandidates misclassified as real.Figure 3 shows example ROC curves comparing the

performance on pre-split training and testing sets includ-ing all features. With minimal tuning, Random Forestsperform better, for any position on the ROC curve, than

6 Note that the standard form of the ROC is to plot the falsepositive rate versus the true positive rate (TPR = 1-MDR)

SVM with a radial basis kernel, a common alternativefor non-linear classification problems. A line is plottedto show the 1% FPR to which our figure of merit is fixed.

Fig. 3.— Comparison of a few well known classification algo-rithms applied to the full dataset. ROC curves enable a trade-o↵between false positives and missed detections, but the best classi-fier pushes closer towards the origin. Linear models (Logistic Re-gression or Linear SVMs) perform poorly as expected, while non-linear models (SVMs with radial basis function kernels or RandomForests) are much more suited for this problem. Random Forestsperform well with minimal tuning and e�cient training, so we willuse those in the remainder of this paper.

3. OPTIMIZING THE DISCOVERY ENGINE

With any machine learning method, there are aplethora of modeling decisions to make when attempt-ing to optimize predictive accuracy on future data. Typ-ically, a practitioner is faced with questions such as whichlearning algorithm to use, what subset of features to em-ploy, and what values of certain model-specific tuning pa-rameters to choose. Without rigorous optimization of themodel, performance of the machine learner can be hurtsignificantly. In the context of real–bogus classification,this could mean failure to discover objects of tremen-dous scientific impact. In this section, we describe severalchoices that must be made in the real–bogus discoveryengine and outline how we choose the optimal classifica-

Real or Bogus? 5

Fig. 2.— Histogram of a selection of features divided in real (purple) and bogus (cyan) populations. First two newly introduced featuresgauss and amp, the goodness-of-fit and amplitude of the Gaussian fit. Then mag ref, the magnitude of the source in the reference image,flux ratio, the ratio of the fluxes in the new and reference images and lastly, ccid, the ID of the camera CCD where the source wasdetected. The fact that this feature is useful at all is surprising, but we can clearly see that there are a higher probability of the candidatesbeeing real or bogus on some of the CCDs.

els of performance in the astronomy literature ( | joey:

add refs | ). A description of the algorithm can be foundin Breiman (2001). Briefly, the method aggregates a col-lection of hundreds to thousands of classification trees,and for a given new candidate, outputs the fraction ofclassifiers that vote real. If this fraction is greater thansome threshold ⌧ , random forest classifies the candidateas real ; otherwise it is deemed to be bogus.While an ideal classifier will have no missed detections

(i.e., no real identified as bogus), with zero false positives(bogus identified as real), a realistic classifier will typi-cally o↵er a trade-o↵ between the two types of errors. Areceiver operating characteristic (ROC) curve is a com-monly used diagram which displays the missed detectionrate (MDR) versus the false positive rate (FPR) of a clas-sifier6. With any classifier, we face a trade-o↵ betweenMDR and FPR: the larger the threshold ⌧ by which wedeem a candidate to be real, the lower the MDR buthigher the FPR and vice versa. Varying ⌧ maps out theROC curve for a particular classifier, and we can com-pare the performance of di↵erent classifiers by comparingtheir cross-validated ROC curves: the lower the curve thebetter the classifier.A commonly used figure of merit (FoM) for selecting

a classifier is the so-called Area Under the Curve (AUC,Friedman et al. (2001)), by which the classifier with min-imal AUC is deemed optimal. This criterion is agnosticto the actual FPR or MDR requirements for the problemat hand, and thus is not appropriate for our purposes. In-deed, the ROC curves of di↵erent classifiers often cross,so that performance in one regime does not necessarilycarry over to other regimes. In the real–bogus classifica-tion problem, we instead define our FoM as the MDR at1% FPR, which we aim to minimize. The choice of thisparticular value for the false positive rate stems from apractical reason: we don’t want to be swamped by boguscandidates misclassified as real.Figure 3 shows example ROC curves comparing the

performance on pre-split training and testing sets includ-ing all features. With minimal tuning, Random Forestsperform better, for any position on the ROC curve, than

6 Note that the standard form of the ROC is to plot the falsepositive rate versus the true positive rate (TPR = 1-MDR)

SVM with a radial basis kernel, a common alternativefor non-linear classification problems. A line is plottedto show the 1% FPR to which our figure of merit is fixed.

Fig. 3.— Comparison of a few well known classification algo-rithms applied to the full dataset. ROC curves enable a trade-o↵between false positives and missed detections, but the best classi-fier pushes closer towards the origin. Linear models (Logistic Re-gression or Linear SVMs) perform poorly as expected, while non-linear models (SVMs with radial basis function kernels or RandomForests) are much more suited for this problem. Random Forestsperform well with minimal tuning and e�cient training, so we willuse those in the remainder of this paper.

3. OPTIMIZING THE DISCOVERY ENGINE

With any machine learning method, there are aplethora of modeling decisions to make when attempt-ing to optimize predictive accuracy on future data. Typ-ically, a practitioner is faced with questions such as whichlearning algorithm to use, what subset of features to em-ploy, and what values of certain model-specific tuning pa-rameters to choose. Without rigorous optimization of themodel, performance of the machine learner can be hurtsignificantly. In the context of real–bogus classification,this could mean failure to discover objects of tremen-dous scientific impact. In this section, we describe severalchoices that must be made in the real–bogus discoveryengine and outline how we choose the optimal classifica-

Brink+2012

Real#and#Bogus#objects#in#our#training#set#of#78k#detecTons,#428dimensional#image#and#context#features#on#each#candidate

but)some)classifiers)work)beNer)than)others

Page 25: Large-Scale Inference in Time Domain Astrophysics

Caffe:#ConvoluTonal#Architecture#for#Fast#Feature#EmbeddingC++/CUDA#framework#for#deep#learning#&#vision

http://www.nersc.gov/users/computational-systems/testbeds/dirac/node-and-gpu-configuration/

Learning*without*Feature*Engineering?

“Dirac”50 node cluster

with NVIDIA Fermi GPU

https://github.com/BVLC/caffe/pulse

Code

“Carver”IBM iDataPlex 9,984 cores

Hardware

Page 26: Large-Scale Inference in Time Domain Astrophysics

PTF11kly)(SN)2011fe)

©Peter)Nugent

Supernova#Discovery#in#the#Pinwheel#Galaxy####11#hr#afer#explosion

nearest(SN(Ia(in(>3(decadesDiscovered'by'our'machine8learning'framework

in#PTF:#>10,000#events#in#>#0.2#PB#of#imaging#→#50+#journal#arTcles

Page 27: Large-Scale Inference in Time Domain Astrophysics

– 13 –

Fig. 3.— Constraints on mass, effective temperature, radius and average density of the

primary star of SN 2011fe. The shaded red region is excluded from non-detection of an

optical quiescent counterpart in Hubble Space Telescope (HST) imaging. The shaded green

region is excluded from considerations of the non-detection of a shock breakout at early

times, taking the least constraining Rp of the three model in Table 1. The blue region is

excludes by the non detection of a quiescent counterpart in Chandra X-ray imaging. The

location of the H, He, and C main sequence is shown, with the symbol size scaled for different

primary masses. Several observed WD and NSs are shown. The primary radius in units of

R⊙ is shown for Mp = 1.4M⊙.

Bloom+2012

Page 28: Large-Scale Inference in Time Domain Astrophysics

Real8Tme#ClassificaTons...

Page 29: Large-Scale Inference in Time Domain Astrophysics

Real8Tme#ClassificaTons...

Page 30: Large-Scale Inference in Time Domain Astrophysics

LETTERdoi:10.1038/nature13304

A Wolf–Rayet-like progenitor of SN 2013cu fromspectral observations of a stellar windAvishay Gal-Yam1, I. Arcavi1, E. O. Ofek1, S. Ben-Ami1, S. B. Cenko2, M. M. Kasliwal3, Y. Cao4, O. Yaron1, D. Tal1, J. M. Silverman5,A. Horesh4, A. De Cia1, F. Taddia6, J. Sollerman6, D. Perley4, P. M. Vreeswijk1, S. R. Kulkarni4, P. E. Nugent7, A. V. Filippenko8

& J. C. Wheeler5

The explosive fate of massive Wolf–Rayet stars1 (WRSs) is a keyopen question in stellar physics. An appealing option is that hydro-gen-deficient WRSs are the progenitors of some hydrogen-poorsupernova explosions of types IIb, Ib and Ic (ref. 2). A blue object,having luminosity and colours consistent with those of some WRSs,has recently been identified in pre-explosion images at the locationof a supernova of type Ib (ref. 3), but has not yet been conclusivelydetermined to have been the progenitor. Similar work has so faronly resulted in non-detections4. Comparison of early photometricobservations of type Ic supernovae with theoretical models suggeststhat the progenitor stars had radii of less than 1012 centimetres, asexpected for some WRSs5. The signature of WRSs, their emissionline spectra, cannot be probed by such studies. Here we report thedetection of strong emission lines in a spectrum of type IIb super-nova 2013cu (iPTF13ast) obtained approximately 15.5 hours afterexplosion (by ‘flash spectroscopy’, which captures the effects of thesupernova explosion shock breakout flash on material surroundingthe progenitor star). We identify Wolf–Rayet-like wind signatures,suggesting a progenitor of the WN(h) subclass (those WRSs withwinds dominated by helium and nitrogen, with traces of hydrogen).The extent of this dense wind may indicate increased mass loss fromthe progenitor shortly before its explosion, consistent with recenttheoretical predictions6.

Wolf–Rayet stars are massive stars stripped of their outer, hydrogen-rich envelope. These stars blow strong, hydrogen-poor winds. The innerpart of the wind engulfing the star is dense and optically thick, andefficiently absorbs the ionizing continuum from the hot stellar surface.Farther from the star, the density drops and the wind becomes opticallythin in the continuum, leading to a rich emission spectrum of recom-bination lines. Detailed models of such spectra can be constructed1,7 anddepend essentially on only three parameters: the effective temperature,Teff, in the line-forming region, a normalized radius, Rt (a combinationof the stellar radius, luminosity, mass loss rate and wind terminalvelocity7), and the chemical composition, Z, of the wind (assumed tobe uniform, spherical and of a constant mass loss rate). The compositionof the wind determines Wolf–Rayet spectral classes: stars with dominantHe and N lines belong to the WN class (with those also showing tracesof H usually denoted as WN(h)), stars with strong carbon lines belongto the WC class and rare (and possibly hotter) stars with oxygen-richspectra belong to the WO class1.

Shortly after a WRS explodes as a supernova, the outer parts of thewind (which in some cases1 extend to radii of .1013 cm) that have notyet been swept up by the expanding supernova ejecta will emit strongrecombination lines in response to ionizing flux released by the explosionshock breakout from the stellar surface. We estimate an increase in ion-izing luminosity by a factor of order 102–104, assuming an initial absolutemagnitude range of 22.5 mag , M , 210 mag for the exploding WRS1

and a typical early-time luminosity of M 5 212.5 mag for the resultingsupernova3,5,8. The effective temperature, Teff, will also change, beingvery high (.105 K) shortly after explosion9 and decreasing as theshocked supernova ejecta cool. The radiation illuminating the surviv-ing Wolf–Rayet wind will thus effectively vary monotonically throughthe range of temperatures in WRS line-forming regions. Because thewind parameters (composition, mass loss rate and terminal velocity)do not change, the measured line spectrum observed shortly after explo-sion should be similar to that of a WRS with the spectral class of theexploding progenitor (because the spectral classes mainly reflect thewind composition). The high wind densities around WRSs (with electrondensities of ne 5 1011–1012 cm23) imply short recombination times10,trec < 3.9 3 1012(ne/cm23)21(T/104 K)0.85 s, typically of a few minutesfor Wolf–Rayet densities and temperatures, and so the emitted spectrumwill promptly react to the rapidly evolving supernova radiation field.

We obtained rapid spectroscopic observations of the recent type IIbsupernova SN 2013cu (iPTF13ast) shortly after shock breakout (byflash spectroscopy; see Methods). This event was first detected bythe Intermediate Palomar Transient Factory (iPTF) survey11 on 2013May 3.18 (we express dates in coordinated universal time (UTC)), photo-metrically confirmed 5.8 h later and promptly identified by an on-dutyastronomer who triggered rapid follow-up observations12, including anoptical spectrum obtained just 4 h later. Analysis of the early-time lightcurve of this supernova (Extended Data Fig. 1) suggests that it explodedon May 2.93, implying that the first iPTF detection and the first spec-trum correspond to only 5.7 h and 15.5 h after the explosion, respect-ively. A full description of the supernova and its evolution will bereported in a forthcoming publication (A.G.-Y. et al., manuscript inpreparation). We note that this event was independently observed bythe MASTER survey on May 5.3 (,2.3 d after explosion), and it wasassigned the name SN 2013cu following spectroscopic confirmation13.

Our first spectrum of SN 2013cu reveals a continuum and emissionlines that bear a striking resemblance to spectra of WRSs (Fig. 1a).According to accepted Wolf–Rayet terminology1, the spectrum is clas-sified as subclass WN6(h) (Fig. 1a, blue trace); the relative strength ofnitrogen to carbon lines precludes a WC classification, and the absenceof any high-excitation oxygen lines is inconsistent with a WO star. Thestronger lines (Ha, Hb, N IV l7,115 (7,115 A wavelength) and He II

l5,411) exhibit a complex profile (Fig. 2) consisting of a relatively broadbase (, 2,500 km s21 full width at zero intensity (FWZI)) on whichprominent narrow, unresolved lines (FWZI < 3A; velocity dispersion,,150 km s21) are superimposed. This is consistent with predictionsfor Wolf–Rayet pre-supernova wind velocities14,15, although we cannotexclude the possibility that at least some of the observed line broadeningis produced by electron scattering rather than genuine velocity disper-sion. To the best of our knowledge, no similar spectra of a stripped(H-poor) supernova have been acquired previously. Wolf–Rayet-like

1Department of Particle Physics and Astrophysics, Weizmann Institute of Science, Rehovot 76100, Israel. 2Astrophysics Science Division, NASA Goddard Space Flight Center, Mail Code 661, Greenbelt,Maryland 20771, USA. 3Observatories of the Carnegie Institution for Science, 813 Santa Barbara Street, Pasadena, California 91101, USA. 4Cahill Center for Astrophysics, California Institute of Technology,Pasadena, California 91125, USA. 5Department of Astronomy, University of Texas, Austin, Texas 78712, USA. 6The Oskar Klein Centre, Department of Astronomy, Stockholm University, AlbaNova, 10691Stockholm, Sweden. 7Physics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA. 8Department of Astronomy, University of California, Berkeley, California 94720-3411, USA.

2 2 M A Y 2 0 1 4 | V O L 5 0 9 | N A T U R E | 4 7 1

Macmillan Publishers Limited. All rights reserved©2014

−5 0 5 10 15 20 25 30 35 40

−19

−18

−17

−16

−15

−14

−13

MJD−56414.93 [days]

Abso

lute

mag

nitu

de

SN2013cu r3σ upper limitsSwift U,UVW1parabolic fit

0 6 12 18 24 30 36 42 48

171819202122

Time since explosion [hours]

Obs

erve

d m

agni

tude

Keck

Keck

Extended Data Figure 1 | The r-band light curve of SN 2013cu. A parabolicmodel of the flux–time (red solid curve) describes the pre-peak data (1serror bars) very well. Backward extrapolation indicates an explosion date ofUTC 2013 May 2.93 6 0.11 (MJD 5 56414.93; 5.7 h before the first iPTFdetection; see inset); we estimate the uncertainty from the scatter generated by

modifying the subset of points used in the fit. Our first Keck spectrum wasobtained about 15.5 h after explosion (vertical dotted line). Early Swiftultraviolet photometry (diamonds) places a lower limit of T 5 25,000 K on theblack-body temperature measured 40 h after explosion.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014

expected at this radius, and require it to be lower than t 5 1 for the lineemission to escape. We find that this self-consistency requirement placesa lower limit of r 5 23 1014 cm on the radius of the line-formation region,and implies substantial mass loss rates, _Mw0:03M8 yr{1. If we inter-pret the disappearance of essentially all emission lines from our day-6spectrum (Fig. 1b) as evidence that the wind was swept up by the expand-ing ejecta (moving at 104 km s21), the radius of the line-emitting regionmust be r , 5.23 1014 cm, which is fully consistent with our estimates.

We can then calculate the total H mass by integrating over r:

Mtot~0:006 _M!

(0:01M8 yr{1

)" #

| vw!

(500 km s{1)$ %{1 r

!(1015 cm)

$ %{1M8

This indicates a range of 0:008M8vMtotv0:0035M8 for the range ofpermitted H masses. Assuming that the typical H abundance for WN(h)stars (,20%) applies, the total wind mass (dominated by He) can beestimated to be several times larger than these values. Detailed simula-tions19 show that as little as 0:1M8 of He-dominated CSM would resultin strong spectroscopic interaction signatures (that we do not observe),consistent with our derived total masses.

We conclude that we have directly detected a Wolf–Rayet-like windfrom the supernova progenitor with a WN(h) spectral class, indicatinga low H mass fraction. Assuming that the wind composition we measurerepresents the surface composition of the progenitor star, our observa-tions indicate that some members of the spectroscopic WN(h) Wolf–Rayet class explode after having lost most of their hydrogen envelope,exposing the CNO-processed, N-rich He layer below. Analysis ofphotometric and spectroscopic follow-up observations (A.G.-Y. et al.,manuscript in preparation) confirms that the explosion was indeed a

supernova of type IIb (Fig. 1c), as expected if the progenitor was amassive star that lost all but e 0:1M8 of its H envelope20.

Our observations have interesting implications. First, we note thatthe derived values of the mass loss rate and emission-line-region sizeare quite extreme compared with known Wolf–Rayet observations andradiatively driven models21, including models with clumpy, inflatedatmospheres22. This suggests that the mass loss rate from the progen-itor star may have increased shortly (of order 1 yr for the assumed velo-cities) before its explosion. Interestingly, such pre-supernova activity maybe explained by recent wave-driven models6, or a more extreme envelopeinflation22 may be indicated. These data can thus provide a key diagnosticof the final stages of nuclear core burning in massive stars, which arecurrently poorly understood, with possible implications for the explo-sion mechanism itself. In any case, the star probably exploded inside athick wind, and the explosion shock may have broken out from theopaque inner wind rather than from the hydrostatic surface of the star9.

Our finding is in general accord with some previous work on type IIbsupernova progenitors. Direct imaging of the progenitor of SN 2008ax(ref. 23) is consistent with a WN(h) progenitor. Furthermore, increasedmass loss during the final year before explosion may inflate the appar-ent photospheric radius of the pre-supernova star, making stars withcompact cores appear to have extended (low-mass) envelopes22 and pos-sibly reconciling the conflicting findings about the progenitor of thetype IIb supernova SN 2011dh (refs 24–27). Regardless of the exact mech-anism, our observations suggest that substantial Wolf–Rayet-like windspre-date at least some type IIb supernovae. A strong metallicity depen-dence of this process may explain the trend in the type IIb/type Ibsupernova number ratio with host-galaxy metallicity28. Future studiesof numerous additional supernova progenitors via their spectroscopic

Rel

ativ

e flu

x

Relative velocity (km s–1)

He II O5,412

He I O6,678

[S II] O6,716 N IV O7,123

N IV O7,109N IV O5,047

[O III] O5,007

[O III] O4,959

Hα O6,563

Hβ O4,861 C IV O5,801

He I O7,065

He I O5,876 He II O6,891

–2,000 –2,000 –2,0000 2,000 2,000 2,0000 0

[S II] O6,731

Figure 2 | Emission line velocity structure at 15.5 h. The strongest lines(Ha, Hb, He II l5,411 and the N IV l7,115 complex) show broad wings

extending out to ,2,500 km s21. Other weaker lines are narrowand unresolved.

LETTER RESEARCH

2 2 M A Y 2 0 1 4 | V O L 5 0 9 | N A T U R E | 4 7 3

Macmillan Publishers Limited. All rights reserved©2014

Last%month...

Page 31: Large-Scale Inference in Time Domain Astrophysics

Machine*Learning*Workflows*for*the*Sake*of*Science

Page 32: Large-Scale Inference in Time Domain Astrophysics

Berkeley Institute for

Data Science

Berkeley Institute for

Data Science

http://bitly.com/bundles/fperezorg/1

“Bold new partnership launches to harness potential of data scientists and big data”

Founded#in#December#2013#as#a#result#of#a#year+#long#naTonal#selecTon#process$37.8M#over#5#years,#along#with#University#of#Washington#&#NYU

‣ An#accelerator#for#data8driven#discovery‣ An#agent*of*change#in#the#modern#university#as#Data#Science#takes#hold

‣ An#incubator#for#the#next#generaTon#of#Data#Science#technology#and#pracTce

Page 33: Large-Scale Inference in Time Domain Astrophysics

Par5ng(Thoughts

• Astronomy’s#data#deluge#demands#an#abstracTon#of#the#tradiTonal#roles#in#the#scienTfic#process.#

• Parallel,#Distributed#CompuTng/Algorithms#and#Machine#Learning#at#the#heart#of#what#we#do

• Berkeley#InsTtute#for#Data#Science#(BIDS)#to#be#an#intersecTon#point#between#physical#science#&#computaTonal/algorithmic#efforts

MMDS#2014#8#Berkeley#8#June#19#2014

Page 34: Large-Scale Inference in Time Domain Astrophysics

@pro>sb

Thank*you.

MMDS

Page 35: Large-Scale Inference in Time Domain Astrophysics

Turning*Imagers*into*Spectrographs

Miller,#JSB+14

Data:#5000#variables#in#SDSS#Stripe#82#with#spectra###~80#dimensional#regression#with#Random#Forest

Time#variability#+#colors#→#fundamental#stellar#parameters