1
Discovery Mission 10 - Launched 2009 - http://Kepler.NASA.gov BAYESIAN COMPUTATION OF KEPLER DR25 VETTING COMPLETENESS AND RELIABILITY Impact of Reliability on Occurrence Rates: GK Dwarf Stars Components of an Occurrence Rate Calculation Abstract Exoplanet occurrence rates based on Kepler’s final exoplanet catalog (DR25) are of great interest to the exoplanet community. Several occurrence rates have been published in the literature with widely varying results, particularly for long- period, small planets at the Kepler detection limit. Previous occurrence rate computations have typically accounted for detection completeness (the fraction of true planets that are detected), but to date few have taken full advantage of the DR25 products to compute vetting completeness (the fraction of detected true planets that are vetted as planet candidates) or reliability (the fraction of planet candidates that are true planets). The DR25 release includes several data products that support the computation of vetting completeness and reliability, including synthetic transit injection to provide a population of “true” planets, and scrambled and inverted data to provide a known population that mimics false positives. Using these products to compute vetting completeness and reliability is challenging, presenting issues of small statistics and sensitivity to data gridding. We present a new Bayesian approach to computing both vetting completeness and reliability. This approach is based on casting both problems as binomial point processes with selection probabilities that depend on several parameters, fitted via MCMC to appropriate DR25 synthetic data. We find that this approach can detect small signals, and is robust against small statistics and data gridding. We briefly discuss how our results are used in an occurrence rate calculation. Steve Bryson 1 , Jeff Coughlin 1, 2 , Natalie Batalha 1,6 , Chris Burke 2,3 , Susan Mullally 2,4 , Douglas Caldwell 1, 2 , Jessie Christiansen 2,5 , and the KEPLER TEAM 1 NASA Ames Research Center, 2 SETI Institute, 3 MIT, 4 STScI, 5 CalTech, 6 UC Santa Cruz Using a Binomial Model to Infer Fractions Both completeness and reliability can be reduced to a problem of determining fractions. In the case of completeness, it is the fraction of actual PCs that are vetted as PC. In the case of reliability two fractions will be involved: the fraction of observed vetted false positives, and the fraction of correctly vetted false positives (see below). These fractions can be can be modeled as binomial point processes: if the fraction of “successes” is a function r(p, m) of period p and maximum event statistic m (MES, a Kepler-specific measure of SNR), then the probability of actually picking c success in n measurements is given by the binomial distribution where θ is a set of parameters that describe the function. We infer each fraction from data using the binomial distribution as the likelihood. Which data we use depends on which fraction we are measuring. In all cases, we partition period-MES space with a grid, and set n to the number of TCEs in each grid cell. Then c is the number of “successes”, where success is defined by the kind of fraction we are measuring. The θ that best explains the data is inferred from a conventional MCMC Bayesian inference using a binomial likelihood. This approach allows the use of fine grids even when there are no TCEs or no “successes” in some grid cells. We find that the results are robust and are not strongly dependent on grid resolution. The Kepler science pipeline first identifies Transit Crossing Events (TCEs), which are potential transiting exoplanet detections. The automated Robovetter then applies a uniform set of criteria to determine if each TCE is a planet candidate (PC) or false positive. Measurements of completeness, the fraction of actual planets that are detected, and reliability, the fraction of identified planet candidates that are actually true planets, are used to infer the true population of exoplanets that underlie the detected planet candidates. Completeness has two parts: TCE detection efficiency, the fraction of true planets that are detected, and Robovetter vetting efficiency, the fraction of true planets vetted as PC. For vetting completeness we use simulated transits injected into observed data, so every detection matching an injected transit is a true PC. We model the fraction of injected transits vetted as PC using a product of rotated logistic functions. We use expected MES because we are predicting completeness for all target stars. Reliability is measured by manipulating the observed data so that real transits are not detected, but instrumental false positives have the same detection statistics as observed data. This is done in two ways: -Inversion: the observed light curves are inverted, so real transits show increases in flux. Inverted light curves should preserve the statistics of false alarms due to rolling bands. -Scrambling: the observed light curves are scrambled at the quarterly observation level, so real transits no long fall on a regular ephemeris over the 4-year data duration. This approach preserves the statistics of transit detections due to noise fluctuations and pixel sensitivity dropouts. TCEs found in the inverted and scrambled data sets are, by definition, false alarms. Observed Data Scrambled Inverted The inverted and scrambled data sets primarily create instrumental false alarms, which are identified by the robovetter with the “not transit like” (NTL) flag. We define reliability as the fraction of PCs that are true planets. However we do not know which PCs are true planets. Luckily, we can approximately express reliability in terms of things we can measure: the observed NTL fraction, F obs , and E, the fraction of TCEs in the inverted/scrambled data that were correctly vetted as NTL: The vetting completeness is multiplied into each star’s detection completeness contour and summed (as required by a Poisson-based occurrence rate calculation). We separately measure F obs and E in the same way as vetting efficiency: in each grid cell, we set n = # of TCEs and c = # of vetted NTLs, using observed data for F obs , and inverted/scrambled data for E. The NTL reliability of each PC is multiplied by the false positive probability of Morton et. al. 2016, which accounts for astrophysical false positives, giving total reliability In each grid cell, we set n = # of TCEs in that cell and c = # of vetted PCs. We then perform an MCMC inference using the binomial likelihood (with broad uniform priors) to determine the logistic function parameters θ. We illustrate the use of vetting and reliability with a Poisson-based occurrence rate calculation following Burke et. al. 2015, comparing results with and without consideration of reliability. Stellar population: GK dwarf stars using Gaia-derived stellar properties from Berger et. al. 2018, supplemented with consistently fitted radius and log g from Berger et. al. in prep. Planet population: DR25 PCs identified by the robovetter with radius between 0.5 and 2 Earth Radii, and period between 50 and 400 days. The planet radii have been recomputed based on the new stellar properties. Rate model: a product of coordinate-aligned power laws, following Burke et. al. 2015. Reliability is implemented by running many occurrence rate calculations, with PCs included with a probability given by their total reliability. Burke 2015 Burke 2015 “high reliability” Without Reliability 0.167+0.033-0.028 With Reliability 0.135+0.032-0.026 Without Reliability 0.027+0.016-0.010 With Reliability 0.013+0.010-0.006 Burke 2015 Burke 2015 “high reliability” The distribution of occurrence rates from the θ posteriors for the Burke 2015 baseline occurrence rate. Reliability has < 1σ impact for this population. The distribution for ζ-Earth, planets within 20% of Earth size and orbital period. Reliability has > 1σ impact for this population R 1 - F obs 1 - F obs 1 - E E

BAYESIAN COMPUTATION KEPLER DR25 VETTING ......et. al. 2015. Reliability is implemented by running many occurrence rate calculations, with PCs included with a probability given by

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BAYESIAN COMPUTATION KEPLER DR25 VETTING ......et. al. 2015. Reliability is implemented by running many occurrence rate calculations, with PCs included with a probability given by

Discovery Mission 10 - Launched 2009 - http://Kepler.NASA.gov

BAYESIAN COMPUTATION OF KEPLER DR25 VETTING COMPLETENESS AND RELIABILITY 

Impact of Reliability on Occurrence Rates: GK Dwarf Stars

Components of an Occurrence Rate Calculation Abstract

Exoplanet occurrence rates based on Kepler’s final exoplanet catalog (DR25) are of great interest to the exoplanet community. Several occurrence rates have been published in the literature with widely varying results, particularly for long-period, small planets at the Kepler detection limit. Previous occurrence rate computations have typically accounted for detection completeness (the fraction of true planets that are detected), but to date few have taken full advantage of the DR25 products to compute vetting completeness (the fraction of detected true planets that are vetted as planet candidates) or reliability (the fraction of planet candidates that are true planets). The DR25 release includes several data products that support the computation of vetting completeness and reliability, including synthetic transit injection to provide a population of “true” planets, and scrambled and inverted data to provide a known population that mimics false positives. Using these products to compute vetting completeness and reliability is challenging, presenting issues of small statistics and sensitivity to data gridding.

We present a new Bayesian approach to computing both vetting completeness and reliability. This approach is based on casting both problems as binomial point processes with selection probabilities that depend on several parameters, fitted via MCMC to appropriate DR25 synthetic data. We find that this approach can detect small signals, and is robust against small statistics and data gridding. We briefly discuss how our results are used in an occurrence rate calculation.

Steve Bryson1, Jeff Coughlin1, 2, Natalie Batalha1,6, Chris Burke2,3, Susan Mullally2,4, Douglas Caldwell1, 2, Jessie Christiansen2,5, and the KEPLER TEAM1 NASA Ames Research Center, 2 SETI Institute, 3 MIT, 4 STScI, 5 CalTech, 6UC Santa Cruz

Using a Binomial Model to Infer Fractions Both completeness and reliability can be reduced to a problem of determining fractions. In the case of completeness, it is the fraction of actual PCs that are vetted as PC. In the case of reliability two fractions will be involved: the fraction of observed vetted false positives, and the fraction of correctly vetted false positives (see below). These fractions can be can be modeled as binomial point processes: if the fraction of “successes” is a function r(p, m) of period p and maximum event statistic m (MES, a Kepler-specific measure of SNR), then the probability of actually picking c success in n measurements is given by the binomial distribution

where θ is a set of parameters that describe the function. We infer each fraction from data using the binomial distribution as the likelihood. Which data we use depends on which fraction we are measuring. In all cases, we partition period-MES space with a grid, and set n to the number of TCEs in each grid cell. Then c is the number of “successes”, where success is defined by the kind of fraction we are measuring. The θ that best explains the data is inferred from a conventional MCMC Bayesian inference using a binomial likelihood. This approach allows the use of fine grids even when there are no TCEs or no “successes” in some grid cells. We find that the results are robust and are not strongly dependent on grid resolution.

The Kepler science pipeline first identifies Transit Crossing Events (TCEs), which are potential transiting exoplanet detections. The automated Robovetter then applies a uniform set of criteria to determine if each TCE is a planet candidate (PC) or false positive. Measurements of completeness, the fraction of actual planets that are detected, and reliability, the fraction of identified planet candidates that are actually true planets, are used to infer the true population of exoplanets that underlie the detected planet candidates. Completeness has two parts: TCE detection efficiency, the fraction of true planets that are detected, and Robovetter vetting efficiency, the fraction of true planets vetted as PC.

For vetting completeness we use simulated transits injected into observed data, so every detection matching an injected transit is a true PC. We model the fraction of injected transits vetted as PC using a product of rotated logistic functions. We use expected MES because we are predicting completeness for all target stars.

Reliability is measured by manipulating the observed data so that real transits are not detected, but instrumental false positives have the same detection statistics as observed data. This is done in two ways: - Inversion: the observed light curves are inverted, so real transits show increases in flux. Inverted light curves should preserve the statistics of false alarms due to rolling bands. - Scrambling: the observed light curves are scrambled at the quarterly observation level, so real transits no long fall on a regular ephemeris over the 4-year data duration. This approach preserves the statistics of transit detections due to noise fluctuations and pixel sensitivity dropouts. TCEs found in the inverted and scrambled data sets are, by definition, false alarms.

Observed Data

Scrambled

Inverted

The inverted and scrambled data sets primarily create instrumental false alarms, which are identified by the robovetter with the “not transit like” (NTL) flag. We define reliability as the fraction of PCs that are true planets. However we do not know which PCs are true planets. Luckily, we can approximately express reliability in terms of things we can measure: the observed NTL fraction, Fobs, and E, the fraction of TCEs in the inverted/scrambled data that were correctly vetted as NTL:

The vetting completeness is multiplied into each star’s detection completeness contour and summed (as required by a Poisson-based occurrence rate calculation).

We separately measure Fobs and E in the same way as vetting efficiency: in each grid cell, we set n = # of TCEs and c = # of vetted NTLs, using observed data for Fobs, and inverted/scrambled data for E.

The NTL reliability of each PC is multiplied by the false positive probability of Morton et. al. 2016, which accounts for astrophysical false positives, giving total reliability

In each grid cell, we set n = # of TCEs in that cell and c = # of vetted PCs. We then perform an MCMC inference using the binomial likelihood (with broad uniform priors) to determine the logistic function parameters θ.

We illustrate the use of vetting and reliability with a Poisson-based occurrence rate calculation following Burke et. al. 2015, comparing results with and without consideration of reliability. Stellar population: GK dwarf stars using Gaia-derived stellar properties from Berger et. al. 2018, supplemented with consistently fitted radius and log g from Berger et. al. in prep. Planet population: DR25 PCs identified by the robovetter with radius between 0.5 and 2 Earth Radii, and period between 50 and 400 days. The planet radii have been recomputed based on the new stellar properties. Rate model: a product of coordinate-aligned power laws, following Burke et. al. 2015.

Reliability is implemented by running many occurrence rate calculations, with PCs included with a probability given by their total reliability.

Burke 2015 Burke 2015

“high reliability”

Without Reliability 0.167+0.033-0.028

With Reliability 0.135+0.032-0.026

Without Reliability 0.027+0.016-0.010

With Reliability 0.013+0.010-0.006

Burke 2015 Burke 2015

“high reliability”

The distribution of occurrence rates from the θ posteriors for the Burke 2015 baseline occurrence rate. Reliability has < 1σ impact for this population.

The distribution for ζ-Earth, planets within 20% of Earth size and orbital period. Reliability has > 1σ impact for this population

R � 1� Fobs

1� Fobs

�1� E

E