Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Cour
tesy
of D
. Cam
pbel
l-Wils
on
SNLS supernovae photometric classification
with machine learningAnais Möller
CAASTRO, Australian National University
Statistical Challenges in Astronomy, Carnegie Mellon University June 2016
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNe Ia cosmology
SNIa cosmology advantages:
• orthogonal constraints to
CMB and BAO
• sensitive to dark energy properties
2
Betoule et al. 2014
Chapter 2. Cosmological Observable: Supernovae 19
Figure 2.1 Left, schematic light curves of SNe Ia, Ib/c, II-P and II-L. Right, spectrallines that allow to classify the same types of SNe, obtained in 1999. [8]
Figure 2.2 Galaxy with SNIa 04D1dc explosion on the left side of its center.
2.2 Supernovae type Ia
Type Ia SNe have very homogeneous spectral and photometric properties. Therefore
their light curves and spectra can be rather reproducible. They are transient events
which last circa 60 days and their maximum luminosity is equal or higher to their host
galaxy luminosity. They occur in galaxies such as ours once every century. An SNIa
explosion is illustrated in Figure 2.2.
SNIa have quite homogeneous properties. However there is a dispersion, for SNeIa on
the same redshift, of around 30% in magnitude. We will show that half of this variability
can be correlated to measurable features on the SNIa lightcurve in subsection § 2.2.3.
Also, there are peculiar SNeIa that can be identified by spectral properties and can be
either sub or super luminous relative to the bulk of the distribution.
Type Ia SNe
• very luminous • homogeneous spectral and photometric
properties• characteristic light curves-> we assume reproducible luminosities
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNIa cosmology challenges
more:
not enough resources for spectroscopy (DES,LSST)
-> photometric classification
we want : - more SNe Ia - precise measurements e.g. improving systematics. - host galaxy - modelling / templates - …
e.g. JLA Betoule et al. 2014
how to classify?:
- features: some type of fitter
- redshift: photometric/ host-galaxy spectroscopic
- classification algorithm
we need to trust our sample:
- high purity sample
- without sacrificing stats (efficiency)
from simulations to data
from a classified sample to cosmology
the unexpected
SN photometric classification challenges
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
arX
iv:1
001.
5210
v6 [
astro
-ph.
IM]
27 A
pr 2
010
Challenge Released on Jan 29, 2010. Last update: September 24, 2013Preprint typeset using LATEX style emulateapj v. 11/10/09
SUPERNOVA PHOTOMETRIC CLASSIFICATION CHALLENGE
Richard Kessler,1,2 Alex Conley,3 Saurabh Jha,4 Stephen Kuhlmann5
Challenge Released on Jan 29, 2010. Last update: September 24, 2013
ABSTRACT
We have publicly released a blinded mix of simulated SNe, with types (Ia, Ib, Ic, II) selected inproportion to their expected rate. The simulation is realized in the griz filters of the Dark EnergySurvey (DES) with realistic observing conditions (sky noise, point spread function and atmospherictransparency) based on years of recorded conditions at the DES site. Simulations of non-Ia type SNeare based on spectroscopically confirmed light curves that include unpublished non-Ia samples donatedfrom the Carnegie Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the SloanDigital Sky Survey-II (SDSS–II). We challenge scientists to run their classification algorithms andreport a type for each SN. A spectroscopically confirmed subset is provided for training. The goalsof this challenge are to (1) learn the relative strengths and weaknesses of the different classificationalgorithms, (2) use the results to improve classification algorithms, and (3) understand what spectro-scopically confirmed sub-sets are needed to properly train these algorithms. The challenge is availableat www.hep.anl.gov/SNchallenge, and the due date for classifications is May 1, 2010.Subject headings: supernova light curve fitting and classification
1. MOTIVATION
To explore the expansion history of the universe, in-creasingly large samples of high quality SNe Ia lightcurves are being used to measure luminosity distances asa function of redshift. With increasing sample sizes, thereare not nearly enough resources to spectroscopically con-firm each SN. Currently, the world’s largest samples arefrom the Supernova Legacy Survey (SNLS: Astier et al.(2006)) and the Sloan Digital Sky Survey-II (SDSS-II:Frieman et al. (2008)), each with more than 1000 SNe Ia,yet less than half of their SNe are spectroscopically con-firmed. The numbers of SNe are expected to increasedramatically in the coming decade: thousands for theDark Energy Survey (DES: Bernstein et al. (2009)) anda few hundred thousand for the Panoramic Survey Tele-scope and Rapid Response System (Pan-STARRS)6 andthe Large Synoptic Survey Telescope (LSST: Ivezic et al.(2008); LSST Science Book (2009)). Since only a smallfraction of these SNe will be spectroscopically confirmed,photometric identification is crucial to fully exploit theselarge samples.In the discovery phase of accelerated cosmologi-
cal expansion, results were based on tens of high-redshift SNe Ia, and some samples included a signif-icant fraction of events that that were not classifiedfrom a spectrum (Riess et al. 1998; Perlmutter et al.1999; Tonry et al. 2003; Riess et al. 2004). While hu-man judgment played a significant role in classifyingthese SNe without a spectrum, more formal methods of
1 Department of Astronomy and Astrophysics, The Univer-sity of Chicago, 5640 South Ellis Avenue, Chicago, IL 60637
2 Kavli Institute for Cosmological Physics, The University ofChicago, 5640 South Ellis Avenue Chicago, IL 60637
3 Center for Astrophysics and Space Astronomy, Universityof Colorado, Boulder, CO, 80309-0389, USA
4 Department of Physics and Astronomy, Rutgers University,136 Frelinghuysen Road, Piscataway, NJ 08854
5 Argonne National Laboratory, 9700 S. Cass Avenue,Lemont, IL 604376 http://pan-starrs.ifa.hawaii.edu/public
photometric classification have been developed over thepast decade: Poznanski et al. (2002); Dahlen & Goobar(2002); Sullivan et al. (2006); Johnson & Crotts (2006);Poznanski et al. (2007); Kuznetsova & Connolly (2007);Rodney & Tonry (2009). Some of these methods havebeen used to select candidates for spectroscopic follow-up observations, but these methods have not beenused to select a significant photometric SN Ia sam-ple for a Hubble diagram analysis. In short, cosmo-logical parameter estimates from much larger and re-cent surveys are based solely on spectroscopically con-firmed SNe Ia (SNLS: Astier et al. (2006), ESSENCE:Wood-Vasey et al. (2007), CSP: Freedman et al. (2009),SDSS-II: Kessler et al. (2009)).The main reason for the current reliance on spectro-
scopic identification is that vastly increased spectroscopicresources have been used in these more recent surveys.In spite of these increased resources, more than half ofthe discovered SNe do not have a spectrum and there-fore photometric methods will eventually be needed toclassify the majority of the SNe. There are two difficul-ties limiting the application of photometric classification.First is the lack of adequate non-Ia data for training al-gorithms. Many classification algorithms were developedusing non-Ia templates7 constructed from averaging andinterpolating a limited amount of spectroscopically con-firmed non-Ia data, and therefore the impact of the non-Ia diversity has not been well studied. The second diffi-culty is that there is no standard testing procedure, andtherefore it is not clear which classification methods workbest.To aid in the transition to using photometric SN-
classification, we have released a public “SN Photomet-ric Classification Challenge” to the community, hereaftercalled SNPhotCC. The SNPhotCC consists of a blinded mixof simulated SNe, with types (Ia, Ib, Ic, II) selected inproportion to their expected rate. The challenge is forscientists to run their classification algorithms and re-
7 http://supernova.lbl.gov/nugent/nugent templates.html
Table 5 shows the list of groups and participants, indicateswhich challenge(s) were taken, and indicates if SN photo-zestimates were given. The average processing time is also givenfor each method, and these times vary from 1 s to greater than200 s per SN using similar processors. A brief description foreach method is given in the Appendix.
Among the participants, four general strategies were used toclassify SNe. The first and simplest strategy was to fit each lightcurve to an SN Ia model and use the “duck test” philosophy: if itlooks like a duck (i.e., an SN Ia) and quacks like a duck, then itis a duck. Selection cuts, mainly on the minimum χ2, were usedto determine which SNe are type Ia, and there was no attemptto classify a subtype for non-Ia. This strategy was used byGonzalez, Portsmouth-χ2, and SNANA cuts.
The second strategy compares each light curve against bothSN Ia and non-Ia templates and uses the Bayesian probabilitiesto determine the most likely SN type. Poz2007 used the simplestBayesian implementation, with a single Ia and non-Ia template.Belov & Glazov and Sako used SN Ia templates that depend onstretch and extinction and also used several non-Ia templates.Sako included eight non-Ia templates from the SDSS-II,although there was no coordination between his template devel-opment for classification and the development of templates forthe SNPhotCC. Rodney used a variant of this technique by ac-counting for the fact that templates from observed SNe do not
form a complete set. MGU+DU (Mahatma Gandhi Universityand Delhi University) used another variation by using slopes(mag/day) at four different epochs and comparing with slopesexpected for type Ia and non-Ia SNe.
The third strategy used spectroscopically confirmed SNe Ia toparametrize a Hubble diagram and then identified SN Ia as thoseSNe that lie near the expected Hubble diagram. Portsmouth-Hubble used a high-order polynomial to define the Hubble dia-gram, while JEDI-Hubble used the kernel density estimationtechnique.
In the last strategy (InCA and JEDI-KDE), each light curvewas fit with a parametric function such as a spline, and the fittedparameters were used for statistical inferences. Light-curve fit-ting parameters such as stretch and color were not expli-citly used.
4. EVALUATING THE SNPhotCC
Ideally, we would like to assign a single number, or figure ofmerit (FoM), for each SNPhotCC submission. We begin the dis-cussion by considering a measurement of the SN Ia rate basedon photometric identification. After selection requirements havebeen applied, let N true
Ia be the number of correctly typed SNe Ia,and let N false
Ia be the number of non-Ia that are incorrectly typedas an SN Ia. A simple classification FoM is the square of the S/N
TABLE 5
LIST OF PARTICIPANTS IN THE SNPHOTCC
Participants Abbreviationa Classified +Zb/noZc SN zphd CPUe Description (strategy class)f
P. Belov and S. Glazov . . . . . . . . . . Belov & Glazov Yes/no No 90 Light-curve χ2 test against Nugent templates (2)S. Gonzalez . . . . . . . . . . . . . . . . . . . . . . Gonzalez Yes/yes No 120 Cuts on SiFTO fit χ2 and fit parameters (1)InCA groupg . . . . . . . . . . . . . . . . . . . . . . InCA No/yes No 1 Spline fit and nonlinear dimensionality reduction (4)JEDI grouph . . . . . . . . . . . . . . . . . . . . . . JEDI-KDE Yes/yes No 10 Kernel density evaluation with 21 parameters (4)JEDI group . . . . . . . . . . . . . . . . . . . . . . . JEDI boost Yes/yes No 10 Boosted decision trees (4)JEDI group . . . . . . . . . . . . . . . . . . . . . . . JEDI-Hubble Yes/no No 10 Hubble-diagram KDE (3)JEDI group . . . . . . . . . . . . . . . . . . . . . . . JEDI combo Yes/no No 10 Boosted decision trees and Hubble KDE (3 & 4)MGU+DU groupI . . . . . . . . . . . . . . . . MGU+DU-1 No/yes No <1 Light-curve slopes and neural network (2)MGU+DU group . . . . . . . . . . . . . . . . . MGU+DU-2 No/yes No <1 Light-curve slopes and random forests (2)Portsmouth groupj . . . . . . . . . . . . . . . . Portsmouth χ2 Yes/no No 1 SALT-II χ2
r and false discovery rate statistic (1)Portsmouth group . . . . . . . . . . . . . . . . Portsmouth-Hubble Yes/no No 1 Deviation from parametrized Hubble diagram (3)D. Poznanski . . . . . . . . . . . . . . . . . . . . . Poz2007 RAW Yes/no Yes 2 SN automated Bayesian classifier (SN–ABC) (2)D. Poznanski . . . . . . . . . . . . . . . . . . . . . Poz2007 OPT Yes/no Yes 2 SN-ABC with cuts to optimize CFoM-Ia (2)S. Rodney . . . . . . . . . . . . . . . . . . . . . . . . Rodney Yes/yes Yes 230 SN ontology with fuzzy templates (2)M. Sako . . . . . . . . . . . . . . . . . . . . . . . . . . Sako Yes/yes Yes 120 χ2 test against grid of Ia/II/Ibc templates (2)S. Kuhlmann and R. Kessler . . . . . SNANA cuts Yes/yes Yes 2 Cut on MLCS fit probability, S/N, and sampling (1)
a Groups are listed alphabetically by abbreviation.b Classifications included for SNPhotCC/HOSTZ.c Classifications included for SNPhotCC/noHOSTZ.d Photo-z estimates included.e Average processing time per SN (seconds) using similar 2–3 GHz cores.f From § 3, strategy classes are 1) selection cuts, 2) Bayesian probabilities, 3) Hubble-diagram parametrization, and 4) statistical inference.g International Computational Astrophysics Group (http://www.incagroup.org); J. Richards D. Homrighausen, C. Schafer, and P. Freeman.hJoint Exchange and Development Initiative (http://jedi.saao.ac.za); J. Newling, M. Varuguese, B. Bassett, R. Hlozek, D. Parkinson, M. Smith, H. Campbell, M.
Hilton, H. Lampeitl, M. Kunz, and P. Patel.I S. Philip, V. Bhatnagar, A. Singhal, A. Rai, A. Mahabal, and K. Indulekha.j H. Campbell, B. Nichol, H. Lampietl, and M. Smith.
SUPERNOVA PHOTOMETRIC CLASSIFICATION CHALLENGE 1423
2010 PASP, 122:1415–1431
This content downloaded from 90.35.23.152 on Mon, 13 Apr 2015 06:10:14 AMAll use subject to JSTOR Terms and Conditions
arX
iv:1
208.
1264
v3 [
astro
-ph.
CO]
15 N
ov 2
012
Mon. Not. R. Astron. Soc. 000, 1–8 (2012) Printed 16 November 2012 (MN LATEX style file v2.2)
A simple and robust method for automated photometricclassification of supernovae using neural networks
N.V. Karpenka1⋆, F. Feroz2 and M.P. Hobson2
1The Oskar Klein Centre for Cosmoparticle Physics, Department of Physics, Stockholm University, AlbaNova, SE-106 91 Stockholm, Sweden2Astrophysics Group, Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 0HE, UK
Accepted 2012 November 14. Received 2012 October 11; in original form 2012 August 7
ABSTRACTA method is presented for automated photometric classification of supernovae (SNe) as Type-Ia or non-Ia. A two-step approach is adopted in which: (i) the SN lightcurve flux measure-ments in each observing filter are fitted separately to an analytical parameterised function thatis sufficiently flexible to accommodate vitrually all types of SNe; and (ii) the fitted functionparameters and their associated uncertainties, along with the number of flux measurements,the maximum-likelihood value of the fit and Bayesian evidence for the model, are used asthe input feature vector to a classification neural network (NN) that outputs the probabilitythat the SN under consideration is of Type-Ia. The method is trained and tested using datareleased following the SuperNova Photometric Classification Challenge (SNPCC), consistingof lightcurves for 21,319 SNe in total. We consider several random divisions of the data intotraining and testing sets: for instance, for our sample D1 (D4), a total of 10 (40) per cent ofthe data are involved in training the algorithm and the remainder used for blind testing of theresulting classifier; we make no selection cuts. Assigning a canonical threshold probability ofpth = 0.5 on the network output to class a SN as Type-Ia, for the sample D1 (D4) we ob-tain a completeness of 0.78 (0.82), purity of 0.77 (0.82), and SNPCC figure-of-merit of 0.41(0.50). Including the SN host-galaxy redshift and its uncertainty as additional inputs to theclassification network results in a modest 5–10 per cent increase in these values. We find thatthe quality of the classification does not vary significantly with SN redshift. Moreover, ourprobabilistic classification method allows one to calculate the expected completeness, purityand figure-of-merit (or other measures of classification quality) as a function of the thresh-old probability pth, without knowing the true classes of the SNe in the testing sample, as isthe case in the classification of real SNe data. The method may thus be improved further byoptimising pth and can easily be extended to divide non-Ia SNe into their different classes.
Key words: methods: data analysis – methods: statistical – supernovae: general
1 INTRODUCTION
Much interest in supernovae (SNe) over the last decade has beenfocused on Type-Ia (SNIa) for their use as ‘standardizable’ can-dles in constraining cosmological models. Indeed, observations ofSNIa led to the discovery of the accelerated expansion of the uni-verse (Riess et al. 1998; Perlmutter et al. 1999), which is usuallyinterpreted as evidence for the existence of an exotic dark en-ergy component. Ongoing observations of large samples of SNIaare being used to improve the measurement of luminosity dis-tance as a function of redshift, and thereby constrain cosmologicalparameters further (e.g., Kessler et al. 2009; Benitez-Herrera et al.2012; Sullivan et al. 2011; Conley et al. 2011; March et al. 2011)and improve our knowledge of dark energy (e.g., Mantz et al.
⋆ E-mail: [email protected]
2010; Blake et al. 2011). Moreover, the gravitational lensing ofSNIa by foreground cosmic structure along their lines-of-sight hasbeen used to constrain cosmological parameters (Metcalf 1999,Dodelson & Vallinotto 2006, Zentner & Bhattacharya 2009) andthe properties of the lensing matter (Rauch 1991, Metcalf & Silk1999, Jonsson et al. 2007, Kronborg et al. 2010, Jonsson et al.2010a, Jonsson et al. 2010b, Karpenka et al. 2012). In addition totheir central role in cosmology, the astrophysics of SNIa is also ofinterest in its own right, and much progress has been made in under-standing these objects in recent years (e.g. Hillebrandt & Niemeyer2000).
Other types of SNe are also of cosmological interest. Type IIPlateau Supernovae (SNII-P), for example, can also be used as dis-tance indicators, although only for smaller distances and to loweraccuracy than SNIa. Compared to SNIa, however, for which thereis still uncertainty regarding the progenitor system, SNII-P explo-
Draft version March 4, 2016Preprint typeset using LATEX style emulateapj v. 5/2/11
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
Michelle Lochner1, Jason D. McEwen2, Hiranya V. Peiris1, Ofer Lahav1, Max K. Winter1
1Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, UK and2Mullard Space Science Laboratory, University College London, Surrey RH5 6NT, UK
Draft version March 4, 2016
ABSTRACT
Automated photometric supernova classification has become an active area of research in recentyears in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) andthe Large Synoptic Telescope (LSST), given that spectroscopic confirmation of type for all super-novae discovered with these surveys will be impossible. Here, we develop a multi-faceted classificationpipeline, combining existing and new approaches. Our pipeline consists of two stages: extractingdescriptive features from the light curves and classification using a machine learning algorithm. Ourfeature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more inde-pendent techniques fitting parametric models to curves, to a completely model-independent waveletapproach. We cover a range of representative machine learning algorithms, including naive Bayes,k -nearest neighbors, support vector machines, artificial neural networks and boosted decision trees.We test the pipeline on simulated multi-band DES light curves from the Supernova PhotometricClassification Challenge. Using the commonly-used area under the Receiver Operating Characteristiccurve (AUC) as a metric, we find that the SALT2 fits and the wavelet approach, with the boosteddecision trees algorithm, each achieves an AUC of 0.98, where 1 represents perfect classification. Wefind that a representative training set is essential for good classification, whatever the feature set oralgorithm, suggesting that spectroscopic follow-up is best focused on fewer objects at a broad range ofredshifts than on many bright objects. Importantly, we find that by using either one of the two bestfeature extraction methods (SALT2 model fits and wavelet decomposition) and a boosted decisiontree algorithm, accurate classification is possible purely from light curve data, without the need forany redshift information.Subject headings: methods:data analysis — cosmology:observations — supernovae:general
1. INTRODUCTION
Astronomy is entering an era of deep, wide-field sur-veys and massive datasets, which requires the adoptionof new, automated techniques for data reduction andanalysis. In the past, supernova datasets were smallenough to allow spectroscopic follow-up for the major-ity of objects, confirming the type of each. Only typeIa’s are currently used for cosmology, and the type isof course required for astrophysical modeling and stud-ies. With the onset of surveys such as the Dark En-ergy Survey (DES) (Dark Energy Survey Collaboration2005; Dark Energy Survey Collaboration et al. 2016) andthe upcoming Large Synoptic Survey Telescope (LSST)(LSST Science Collaboration 2009), only a small fractionof the dataset can be spectroscopically followed up. Thecurrent commonly used dataset, the Joint Light curveAnalysis (JLA) (Betoule et al. 2014), contains only 740supernovae, while DES is expected to detect thousands(Bernstein et al. 2012) and LSST hundreds of thousands(LSST Science Collaboration 2009) of supernovae. Thusalternative approaches to supernova science must be de-veloped to leverage these largely photometric datasets.Supernova cosmology is possible without strictly know-
ing the supernova type using, for example, Bayesianmethods (M. Kunz, B. Bassett and R. Hlozek 2007; R.Hlozek et al. 2012; J. Newling et al. 2012; Knights et al.2013; Rubin et al. 2015). However, these techniques ben-efit from having a reasonable probability for the type of
Contact email: [email protected]
each object in the dataset, so some form of probabilis-tic classification is useful. Additionally, studies of core-collapse supernovae and other transients rely on goodphotometric classification. Further, the observing strat-egy for LSST has not yet been finalized and the e↵ectof observing strategy on supernova classification has notyet been established. Here we outline a multi-facetedpipeline for photometric supernova classification. In fu-ture work, we will apply it to LSST simulations to under-stand the e↵ect of observing strategy on classification.Current photometric supernova classification tech-
niques focus on empirically-based template fitting (Sakoet al. 2008; Sako et al. 2014). However, in the past fewyears there have been several innovative techniques pro-posed to address this problem (see Kessler et al. (2010b)and references therein).Here, we apply machine learning to this problem, as a
well-established method of automated classification usedin many disciplines. As astronomical datasets becomelarger and more di�cult to process, machine learninghas become increasingly popular (Ball & Brunner 2010;Bloom & Richards 2012). Machine learning techniqueshave been proposed as a solution to an earlier step inthe supernova pipeline, that of classifying transients fromimages (du Buisson et al. 2015; Wright et al. 2015). Ma-chine learning is also already being employed at somelevel for photometric supernova classification in the SloanDigital Sky Survey (SDSS) (Frieman et al. 2008; Sakoet al. 2008), using the parameters from template-fittingas features (Sako et al. 2014).
arX
iv:1
603.
0088
2v1
[ast
ro-p
h.IM
] 2
Mar
201
6
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNLS
• based on the Canada France Hawaii Telescope
• MegaCam : 36 CCD mosaic
• 4 broadband filters g,r,i,z
• 4 fields of 1 square degree
• rolling search mode
• spectroscopic follow up (Keck, Gemini and VLT)
• observations: 2003-2008 - SNLS3 analysed and published - SNLS5 currently being processed
(complete SNLS data set)
the SuperNova Legacy Survey
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNLS3
contains large spectroscopically and photometrically identified SNe samples
A&A 534, A43 (2011)
0
5
10
15
20
25
30
35
19 20 21 22 23 24 25 26Fitted peak iM magnitude
Ent
ries
Unidentified sampleIdentified sample
EntriesMeanRMS
310 23.47 0.67
EntriesMeanRMS
175 22.60 0.73
0
5
10
15
20
25
30
35
40
45
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6Host photometric redshift
Ent
ries
Unidentified sampleIdentified sample
EntriesMeanRMS
310 0.86 0.22
EntriesMeanRMS
175 0.60 0.21
Fig. 19. Distributions of the SALT2 fitted iM peak magnitude (top) andof the host galaxy photometric redshift (bottom) for the identified (ingreen) and unidentified (in black) subsamples of photometrically se-lected SNe Ia.
Table 5. Contamination from core-collapse supernovae.
Redshift Events CCzgal < 0.4 49 4.4± 1.10.4 < zgal < 0.8 196 11.1± 2.50.8 < zgal 240 2.1± 0.5
Notes. The table shows the number of events passing the photometricSN Ia selection and the estimated CC contamination in three bins ofhost galaxy photometric redshift.
confirmation8, called hereafter the “unidentified sample”. Thedistribution of the peak magnitude in iM for the two subsam-ples is given in Fig. 19. The unidentified sample is about onemagnitude deeper on average than the identified sample, as ex-pected from the constraints to obtain a spectrum for the identi-fied SNe Ia. This translates into an average redshift of 0.60 forthe identified set and of 0.86 for the additional set.
5.1. Comparison of bright events
At bright magnitudes, iM < 23, there are 110 identified eventsand 56 unidentified ones, all but 5 events found in the real timeanalysis. We traced back the reasons why these bright eventsmissed a spectroscopic identification. A third of them had beendetected in the real-time analysis, declared (from photometry) asprobable SNe Ia and sent for spectroscopy. Insufficient quality
8 Note that event SNLS 06D2bo has been included in the unidentifiedsample.
Table 6. Comparison of events with and without spectroscopic identifi-cation.
Parameter Identified Unidentified KSsample sample proba
C 0.00 ± 0.01 0.02 ± 0.02 0.40X1 0.23 ± 0.08 −0.01 ± 0.13 0.30
Notes. The mean values of SALT2 colour and X1 are given for photo-metrically selected events with iM < 23, split into subsamples with andwithout spectroscopic identification. Numbers in the last column are theKolmogorov-Smirnov probabilities that the two distributions arise fromthe same parent distribution.
of the spectrum, however, resulted in a spectroscopic redshift(usually from the host galaxy) but no typing of the supernova.Another third were also declared as probable SNe Ia but couldnot be scheduled for spectroscopic follow-up near maximumlight. The last third were usually less convincing candidates ac-cording to their real-time, and thus partial, light curves.
The characteristics of the two subsamples of bright eventswere compared, based on their complete light curves as recon-structed in this analysis. Their colour and X1 distributions werefound to be compatible, as summarised in Table 6. We also com-pared the colour-magnitude and X1-magnitude relations in thetwo subsamples. For this purpose, the distance modulus of eachevent was defined from the SALT2 fitted B-band peak magnitudem∗B, X1 and colour C as:
µB = m∗B − M + αX1 − βC (2)
using values of M, α and β introduced in Sect. 3.3.1. Residualswere then computed from these distance moduli by subtracting5 log[P(z)] where P(z) is a third-degree polynomial that we fittedon the total photometric sample in order to describe the meanredshift dependence of the distance modulus with no assump-tion on a specific cosmology model. The Hubble diagram resid-uals without the colour or X1 term in the distance modulus arerepresented in Fig. 20. The figure shows the fit of the colour-magnitude and X1-magnitude relations from the full sample ofbright events, as fits from the two subsamples were found to beindistinguishable.
In conclusion, when restricted to the same range of iM peakmagnitudes, the unidentified and identified samples of events donot exhibit significant differences.
5.2. Comparison of full samples
We now consider the full set of photometrically selected SNe Iaas an extension of the identified subsample towards fainterevents, and estimate the impact on distance moduli of usingthe limited sample of spectroscopically identified SNe Ia. TheMalmquist bias due to spectroscopic sample selections is animportant issue in cosmology fits and was thoroughly studiedin SNLS with Monte Carlo simulations. The results, reportedin Perrett et al. (2010) were used to correct SN Ia distance mod-uli in SNLS 3-year cosmological analyses (Guy et al. 2010;Conley et al. 2011). The aim here is to check whether we canmeasure this bias directly from data and with an analysis com-pletely independent from that used to define the 3-year SNLScosmological sample.
To do so, ideally one would compare average distance mod-uli at a given redshift, measured from the identified subsampleand from the whole photometric sample. Our statistical samplebeing limited, we compare Hubble diagram residuals in large
A43, page 16 of 21
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNLS pipelinesstandard SNLS approach e.g. Astier et al. 2005, Guy et al. 2010, used in JLA Betoule et al. 2015
deferred photometric pipeline
developed in the SNLS Saclay group (France) • no spectroscopy required • differed detection of all kinds of transient events • larger number of detections • larger redshift coverage • sensitive to other SN types
• real time detection, uses spectroscopy for typing and redshift
G. Bazin et al. A&A 534, A43 (2011) Photometric selection of Type Ia supernovae in the Supernova Legacy Survey. G.Bazin et al. A&A 499, Issue 3,(2009) The core-collapse rate from the Supernova Legacy Survey.
• 0.15 < z <1.1
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNe Ia
photometrySN-like
photometry core-collapse SNe
transient events
photometry
pre-processed CFHT images
SNLS photometric pipeline
Preselection cuts (SN-like): -One significant flux variation
SN-like variation Star rejection
-Quality cuts on light curves
SNLS3 original pipeline 300,000 -> 1,500
Chapter 4. SNLS deferred photometric analysis 47
(a)
(b)
(c)
Figure 5.1 Example of using subtraction for detecting SNe Ia: (5.1a) Explosion ofan SNIa (04D1dc) in a galaxy. The image of the galaxy before explosion (5.1b) issubtracted from the current image 5.1a to obtain an image of our transient object(5.1c). The area shown here represents a very small fraction of the current image.
were chosen from the first and second season depending on the field. This set of images
were co-added to obtain a better signal-to-noise ratio but also to ensure complete field
coverage. They were also cleaned for resampling defects. An example of a reference
image can be seen in Figure 5.2.
Subtractions were done using the TRITON package [71] based on the algorithm by
Alard and Lupton [72]. To compare a current image to a reference image, it is necessary
to adapt the PSF of the reference image to the one of the current image. For each
image a determination of the sky background and a convolution kernel was performed.
The convolution kernel translates di↵erent PSFs: the shape adapts the seeing between
images, while the norm of kernel allows to adapt the flux scale between the two images.
The importance of adapting PSFs for subtractions is illustrated in Figure 5.3.
Both the sky background and the convolution kernel were computed independently on
eight non-overlapping tiles for each CCD. The number of tiles was determined taking
into account two criteria: the optimization of subtraction for spatial variations of the
background and kernel which requires small tiles and the necessity of a su�ciently large
number of bright events on a tile to compute the convolution kernel. These criteria
constrain the size, and therefore the number, of the tiles.
Bright objects were selected in both reference and current image tiles by applying a
detection threshold at 2� w.r.t. the sky background using the TERAPIX tool SExtractor
[73]. This utility extracts coordinates for detected objects as well other information as
flux and background estimations. The convolution kernel and sky background were then
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SNe IaSN-like
core-collapse SNe
SN classification
How? 1. extract information:
A. light-curve features B. redshift
2. classification strategy
I. sequential cuts with host-galaxy photometric z & SALT2
III. supervised learning with host-galaxy photometric z & SALT2
II. supervised learning with SN z & general fitter
G. Bazin et al. A&A 534, A43 (2011)
Möller et al. in preparation (2016)
good z
bad z
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
How:1. extract information:
A. light-curve features SALT2: C, x1, magnitudes,chi2. B. redshift host-galaxy photometric:
Ilbert et al. 2006: 520,000 galaxies with an AB magnitude brighter than 25 in iM . ~4% resolution , catastrophic assignment ( ∆z/(1 + z) > 0.15 ): 3.7%
method IG. Bazin et al. A&A 534, A43 (2011)
J. Guy et al, SNLS Collaboration: SALT2 9
phase-10 0 10 20 30 40
U*
-19-18
-17-16
phase-10 0 10 20 30 40
U
-20-19-18-17-16
phase-10 0 10 20 30 40
B
-19
-18
-17-16
phase-10 0 10 20 30 40
V
-19
-18
-17
-16
phase-10 0 10 20 30 40
R
-19
-18
-17
-16
phase-10 0 10 20 30 40
I
-19
-18
-17
-16
Fig. 2 The U∗UBVRI template light curves obtained after thetraining phase for values of x1 of -2, 0, 2 (corresponding tostretches of 0.8, 1.0, and 1.2; dark to light curves) and null BVcolor excess. U∗ is a synthetic top hat filter in the range 2500–3500 Å. The shaded areas correspond to the one standard devia-tion estimate as described in section 6.1.
)Å (λ4000 6000 8000
B
- A
λA
-0.2
-0.1
0
0.1
0.2 B V
Fig. 3 The color law c × CL(λ) as a function of wavelength fora value of c of 0.1 (solid line). The dashed curve represents theextinction with respect to B band, (Aλ − AB), from Cardelli et al.(1989) with RV = 3.1 and E(B − V) = 0.1, and the dotted lineis the color law obtained with SALT (very close to the resultobtained here).
Guy et al. 20082. selection cuts: - sampling - poor fit
3. classification strategy: sequential cuts
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
Chapter 4. SNLS deferred photometric analysis 61
(a)
(b)
Figure 5.8 SALT2 parameters plot for color and stretch. X1 vs. C is shown in 5.8aand illustrate the way the X1-C cut was chosen (shown as a red ellipse). As an exampleof the e↵ect of the color cut, the di↵erence between redshift assignement as a functionof C is seen in 5.8b with red lines showing the extreme value of color cuts. In both plotssynthetic SNe Ia (blue dots) and data (other symbols) are shown after sampling and�2 constraints. Spectroscopically identified SNeIa are represented with green circlesand spectroscopically identified core collapse events with red triangles. For Figure 5.8bpink crosses show events rejected by �2 cuts in unfitted bands [66].
spectroscopic redshifts.
Color-magnitude diagrams
The last criteria for rejecting core collapse events were based on color-magnitude dia-
grams. These can be seen in Figure 5.9. Events entered these diagrams only when the
filters of interest were used in the fit.
• Figure 5.9a contributed only in the range 0.16 < z < 0.68.
• Figure 5.9b contributed only in the range 0.26 < z < 1.15.
Chapter 4. SNLS deferred photometric analysis 62
(a)
(b)
(c)
Figure 5.9 Color-magnitude diagrams using SALT2 fitted magnitudes for events thatpassed all cuts up to color and stretch cuts. Spectroscopically identified SNeIa arerepresented with green circles, core collapse events with red triangles and syntheticSNe Ia as blue dots. Cuts are shown using red lines. Magnitude ranges (1�) areindicated for 0.1 redshift bins centered in the given value (vertical location of segmentsis arbitrary). [66].
• Figure 5.9c contributed only in the range z > 0.26.
Since core collapse events have more g than SNe Ia, the (g � i) vs. g diagram proved
helpful in eliminating these.
It must be noted that all cuts were necessary to disentangle type Ia from core collapse
SNe. Results for SNLS3 data are presented in Figure 5.10. Using the MC, the core-
collapse contamination in the photometric sample was found to be of 4%. The average
SNIa e�ciency in the selection was 67% using well sampled light curves of bright SNe
Ia events as given by Bazin et al. [66]. To emphasize the e↵ect of redshift assignment
over the average e�ciency, if all events were assigned a host galaxy redshift the global
e�ciency for bright events would be 80%.
spectroscopic SNe Ia
simulated SNe Iaspectroscopic CC
SNLS3 dataG. Bazin et al. A&A 534, A43 (2011)
photometric host galaxy z + SALT2 + sequential cuts
method I
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
Chapter 7. SNe classification 124
8.4.1 zgal + SALT2 + sequential cuts: SNLS3
For consistency, I reprocessed statistics for the SNLS3 analysis. All results are in agree-
ment with published results [66] with the exception of the core-collapse contamination.
This triggered a reevaluation of results and we found a divergence due to an error in the
normalization procedure. Once corrected, the core-collapse contamination was reeval-
uated to be 5 ± 1% for the original analysis. Results for my computation are shown
in Table 8.4. The residual di↵erence may be attributed to host-extinction included in
the original analysis (to obtain the observed rate), while my computation takes the rate
corrected for extinction.
Simulation
purity contamination e�ciency
SNe Ia bad redshift SNe Ia core-collapse SNIa
94.4± 0.5% 0.65± 0.08% 4.9± 0.5% 29.9± 0.3%
SNLS3 data
# events # spectroscopic SNe Ia # spectroscopic CC # photometric CC
486 175 0 0
Table 8.4 SNLS3 analysis (zgal + SALT2 + sequential cuts): statistics for SNLS3data and purities, e�ciencies from simulation. The indicated SNIa e�ciency is theglobal e�ciency and includes weights and e�ciency of host-galaxy assignment. Thecore-collapse photometric sample was determined by [35].
The purity and contamination distributions as a function of host-galaxy redshift can
be seen in Figure 8.10. The contamination of SNe Ia with bad redshift assignment
increases slightly with redshift but remains small (below 0.5%) whatever the redshift.
The core-collapse contamination is dominated by non-plateau events and decreases with
redshift. The contamination statistics is very low and therefore we can’t draw any
conclusion about the trends, those results are consistent with a flat distribution (within
error bars).
the core-collapse photometric sample was determined by Bazin et al. 2009 in a specific analysis.
photometric host galaxy z + SALT2 + sequential cuts
method I
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
supervised learning
• decision trees make successive rectangular cuts in the parameter space
• supervised classification methods
• simulations to train
• improving model:
• boosting: making the model more complex AdaBoost, XGBoost
• averaging methods: Random Forest, Bagging
• scikit-learn implementation
Chapter 7. SNe classification 111
events are reweighted to put more emphasis on poorly classified events. Then, with the
reweighted data, the training is redone. This is an iterative procedure where the final
classification is averaged over all decision trees. There are many boosting techniques,
but I centered on AdaBoost which is a largely used boosting method.
Figure 8.2 Graphic representation of a simple decision tree. The root node containsan event with all its information and is our starting point. A sequence of binary splitsusing variables xi, xj , xk is applied to data. At each split, the variable that gives thebest separation is used to discriminate between signal and background. The end nodesare labelled ”S” and ”B” for signal and background classifcation [92].
In order to train BDTs, one has to choose the algorithm’s hyperparameters. These
include the maximum depth of a tree, the number of trees used for boosting and the
minimum size of a node. Choices can be done using overtraining tests and applying
classification to independent known samples and studying their statistical evolution.
Moreover, data used for training (simulations) and application (data) must be prepro-
cessed to eliminate data overflows that may yield missclassifications.
Supervised learning methods can be implemented using di↵erent toolkits such as Python’s
scikitlearn and ROOT’s TMVA. Although I performed tests with both, my classifica-
tion is based on the TMVA toolkit. Possible extensions of this work include exploring
other packages as scikitlearn which can mix several supervised learning methods and
can optimize hyperparameters in an automatic way.
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
redshift + features BDT responseN-dimensional problem 1-dimensional problem
supervised learning
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
How:1. extract information:
A. light-curve features SALT2: C, x1, magnitudes and chi2 of the fit. B. redshift host-galaxy photometric
2. selection cuts: - sampling - poor fit
3. classification strategy: XGB DT
method II
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
purity 98.0 +- 0.3efficiency 38.2 +- 0.7
host-galaxy photometric z, SALT2 fit, XGB
BUT host-galaxy redshift assignment (zgal) efficiency is 83%Gupta et al. 2016
Bazin et al. 2011
method II
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
How:1. extract information:
A. redshift SN-redshift B. light-curve features general fitter
2. selection cuts (adapted to our analysis): - sampling - poor fit
3. classification strategy: supervised learning
Möller et al. (2016) in preparation
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SN redshift
• estimated directly from SN light curves.
light curves are fitted using SALT2 searching minimisation of the reduced χ2
1. for successive values of the redshift separated by δz = 0.1 with color and stretch set with gaussian priors.
2. setting the color and stretch parameters free and performing a new redshift scan around first step’s redshift with the same step δz as above.
• set up on SNLS3 data
• ~2% resolution , catastrophic assignment ( ∆z/(1 + z) > 0.15 ): 1.4%
Palanque-Delabrouille et al. 2010
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
SN photometric redshift algorithm uses SALT2. Advisable to use a light curve fitter which is:
• independent from SALT2
• able to fit other types of SNe. e.g. the empirical model used for SN-like selection.
fk = Ak e�(t�tk0 )/⌧kfall
1 + e�(t�tk0 )/⌧krise
+ ck
tkmax
= tk0 + ⌧krise
ln(⌧kfall
/⌧krise
� 1)
k= filter
general SN fitter
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
How:1. extract information:
A. light-curve features general fitter B. redshift SN-redshift
2. selection cuts (adapted to our analysis): - sampling - poor fit
3. classification strategy: supervised learning
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
spectroscopic SNe Ia
visually non SN-likespectroscopic CC
all events
visually longextreme z
preliminarydata
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
How:1. extract information:
A. light-curve features general fitter B. redshift SN-redshift
2. selection cuts (adapted to our analysis): - sampling - poor fit
3. classification strategy: supervised learning
- setting parameters - selecting features - crossvalidation
using scikit-learn packages
-Random Forest -AdaBoost -XGBoost (extreme gradient boosting)
. http://xgboost.readthedocs.org/
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
preliminarysimulations
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
we want to compare classified samples -> 95% purity
preliminarysimulations
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
preliminary
simulations
data
95% purityAdaBoost Random Forest XGBoost
total e�ciency Ia 36.9± 0.6 32.4± 0.7 41.1± 0.7purity Ia 95.6± 0.5 95.6± 0.4 95.3± 0.4
contamination Ia bad z 0.53± 0.09 0.29± 0.07 0.60± 0.09contamination CC 3.9± 0.4 4.1± 0.4 4.0± 0.4
Table 1: Estimated total e�ciency, purity and contamination from simulated SNe for dif-ferent boosting methods with a given purity of 95%.
and photometric samples and 3 in the photometrically identified sample. One last event was378
classified photometrically, therefore no spectroscopic redshift was available.379
Several core-collapse events were found to be shared in the di↵erent classified samples.380
Since the spectroscopically and core-collapse samples are not complete and have di↵erent381
selection criteria, we can not conclude on the statistical significance of this variation.382
The purity applies on the total size of the classified as type Ia sample. To compare383
the three methods we use the ratios between classified as type Ia samples and the estimated384
e�ciency. The ratio XGB with RF shows agreement between the two samples. However,385
any ratio with AdaBoost shows a disagreement where the estimated e�ciency ratio is much386
smaller than the ratio found in the data. This discrepancy remains unexplained.387
Figure 6: Venn diagram for the photometrically classified SNLS3 data samples by the threedi↵erent algorithms (RF in green, XGB in pink, Ada in blue) when requiring a purity of 95%in the sample. 443 are the common events found in the three samples.
5.1.1 Comparison with the SNLS3 sub-luminous and peculiar SN Ia samples388
SNLS detected 8 SNe classified as peculiar type Ia after spectroscopy, this sample includes389
super-Chandrasekhar type Ia and 1991T-like [20–23]. A photometrically classified sample of390
18 sub-luminous SNe Ia at z < 0.6 was obtained in [24]. The SN-like sample (the starting391
point of our classification) contains 11 sub-luminous events and 5 peculiar events.392
– 12 –
AdaBoost Random Forest XGBoostphotometric sample 478 549 670spectroscopic Ia 166 198 223photometric Ia 318 364 444
spectroscopic CC 2 2 3photometric CC 1 1 6
Table 2: Photometrically classified samples from SNLS3 data for di↵erent boosting methodswith a given purity of 95%. Subsamples of events identified by spectroscopy or previousphotometric pipelines are shown [15], [6], [7].
For all classification methods, the same 3 peculiar events are contained in our photomet-393
rically classified sample. None of them exhibit any sign of peculiarity in their light curves.394
The super-Chandrasekhar type Ia and the 1991T-like object are rejected by our classification395
methods.396
Sub-luminous supernovae are found in our classified samples. In the case of Random397
Forest and AdaBoost classifications, 4 events are in the classified sample while 8 are included398
in the XGB sample. Despite our methods were not trained for disentangling normal type399
Ia and sub-luminous SNe, our photometric classification appears to have some e�ciency on400
sub-luminous SNe Ia as well.401
5.2 E�ciency evolution with redshift402
The total e�ciency as a function of generated redshift is shown in Figure 8 for all classification403
methods. The higher e�ciency at low redshift can be attributed to an easier classification404
for brighter SNe Ia. The SN Ia classification e�ciency varies from one algorithm to the405
other, XGB being the best performing method over the whole redshift range. Interestingly406
AdaBoost and XGB di↵erences are quite homogeneous which can be attributed to the simi-407
larities of their optimization methods.408
The high number of spectroscopic type Ia in the XGB classification of SNLS3 data may409
be linked to the high e�ciencies found at low-redshift (z < .7). The other two methods have410
high e�ciencies at low-redshift but their spectroscopic samples are either too low or two high411
when compared between each other.412
5.3 Evolution of purity with redshift413
We study the evolution of the purity and contamination with respect to generated redshift414
and the SN-redshift for each algorithm in Figure 9.415
The contamination by type CC SNe is higher at lower generated redshift but remains416
small (below 15%) whatever the method and redshift. Comparing the contamination as a417
function of generated redshift and measured photometric redshift, there is a migration of418
low-z events towards higher redshifts. This is attributed to incorrect redshift assignments for419
some core-collapse events that can be seen in Figure 2.420
The contamination by type Ia with incorrect redshift assignment increases at higher421
generated redshift. Those events are assigned a variety of SN-redshifts. The overall contam-422
ination stays well below 1%.423
– 13 –
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
AdaBoost Random Forest XGBoostphotometric sample 478 549 677spectroscopic Ia 166 198 223photometric Ia 318 364 444
spectroscopic CC 1 0 2photometric CC 1 1 6
Table 2: Photometrically classified sample for di↵erent boosting methods with a given purityof 95%. Subsamples of events identified by spectroscopy or previous photometric pipelinesare shown [14], [6], [16].
optimization methods.367
The high number of spectroscopic type Ia in the XGB and RF classification of SNLS3368
data may be linked to the high e�ciencies found at low-redshift (z < .7). Although AdaBoost369
has a similar simulated eficiency at low-redshifts, the number of spectroscopic type Ia SNe370
in the SNLS3 classified sample is relatively low when compared to the other methods.371
Figure 7: Selection e�ciency from synthetic SN Ia light curves as a function of the generatedredshift, for di↵erent classification methods with a given purity of 95%.
5.2 Evolution of purity with redshift372
We study the evolution of the purity and contamination with respect to generated redshift373
and the SN-redshift for each algorithm in Figure 9.374
The contamination by type CC SNe is higher at lower generated redshift but remains375
small (below 15%) whatever the method and redshift. Comparing the contamination as a376
– 13 –
simulations
method III
preliminarysimulations
method IIISN-z, general fit, XGB 98% purity
datacut SNLS3 events spectroscopic photometric simulated
in sample Ia CC Ia CC Ia%SN-like 1483 246 42 486 109 50selected 1193 238 30 481 77 47classified 529 205 1 374 1 35
Table 3: E↵ect of selection cuts and classification (XGB) over SNLS3 data and synthetictype Ia. Classification is done using XGB setting 98% purity requirement.
6 Choosing a method: XGB with high purity 98%424
The best performing algorithm was found to be XGB with a high achievable purity and425
e�ciency. We chose to select our photometric sample with this algorithm fixing a BDT426
response threshold such that we obtain a 98.0 ± 0.3% purity and 34.7 ± 0.7% e�ciency427
sample.428
We now study in detail the classified sample by this method.429
6.1 E↵ect of selection cuts and classification430
The e↵ect of the cuts is given for the data and synthetic SNe Ia in Table 3. The selection431
cuts are shown to reduce the spectroscopic and photometric type Ia subsamples by 3.3%432
and 1% respectively. The core-collapse SNe are mainly discarded through classification. The433
core-collapse events in this sample have both incorrect SN-redshifts. The photometric sample434
contains 6 sub-luminous and 3 peculiar type Ia SNe from samples introduced in Section 5.1.1.435
6.2 Classification and SN-redshifts436
We investigate the e↵ect of SN-redshifts on the SN classification. Figure 10 shows the com-437
parison between spectroscopic and SN redshifts for the classified sample. Contamination by438
core-collapse is mostly due to events that have an incorrect SN-redshift.439
Interestingly, those core-collapse SNe that were assigned a good SN-redshift (see Figure440
2) were not classified as type Ia SNe by our method. A core-collapse event that has correct441
SN-redshift is an event that has colors and photometry consistent with type Ia (SN-redshifts442
are obtained using a modified SALT2 template based on type Ia SNe). We highlight this and443
attribute it to the features obtained using the general SN fitter.444
The distribution of classified events over SN-redshift is shown in Figure 11. This dis-445
tribution peaks at higher SN-redshifts when compared to the spectroscopically identified446
sample. There is a large overlap between events in both samples and no particular trend over447
SN-redshift is seen.448
The distribution of the SN-photometric redshift for the photometric sample classified449
in [6] and the one of this work are shown in Figure 11. The new classification provides more450
z > 0.7 events while maintaining the number of events at low redshift and therefore a large451
fraction of the spectroscopic sample.452
6.3 E�ciency evolution: classification, selection and total453
E�ciency estimations can be done taking into account all or some steps on the SN pipeline. In454
Figure 12 we evaluate the evolution of e�ciency and purity taking into account: classification455
only, classification and selection cuts and the complete pipeline. E�ciencies are computed456
taking into account SN rates.457
– 17 –
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
preliminary simulations
simulations data
method III
Anais Möller, ANUStatistical Challenges in Modern Astronomy, June 2016
how to classify? 3 DT algorithms
different features
purity and efficiency high purity 95-98% purity, high efficiency
from simulations to data - selection cuts necessary
1st application to high-z data - unexpected sim->data
- rely on training
from a classified sample to cosmology - SNLS3 V. Ruhlmann-Kleider
- and…
new questions SNIa diversity
method to be applied for SNLS5
take away
stats ML