Benchmark Tests for Markov Chain Monte Carlo Fitting of Exoplanet Eclipse Observations
Justin C. Rogers, Mercedes Lopez-Morales, Daniel Apai, Elisabeth Adams
aa r X i v : . [ a s t r o - ph . E P ] F e b Benchmark Tests for Markov Chain Monte Carlo Fitting ofExoplanet Eclipse Observations
Justin Rogers , , Mercedes L´opez-Morales , , D´aniel Apai , Elisabeth Adams e-mail: [email protected] Received October 24, 2018
ABSTRACT
Ground-based observations of exoplanet eclipses provide important clues tothe planets’ atmospheric physics, yet systematics in light curve analyses are notfully understood. It is unknown if measurements suggesting near-infrared fluxdensities brighter than models predict are real, or artifacts of the analysis pro-cesses. We created a large suite of model light curves, using both synthetic andreal noise, and tested the common process of light curve modeling and parame-ter optimization with a Markov Chain Monte Carlo (MCMC) algorithm. Withsynthetic white-noise models, we find that input eclipse signals are generally re-covered within 10% accuracy for eclipse depths greater than the noise amplitude,and to smaller depths for higher sampling rates and longer baselines. Red-noisemodels see greater discrepancies between input and measured eclipse signals, of-ten biased in one direction. Finally, we find that in real data, systematic biasesresult even with a complex model to account for trends, and significant falseeclipse signals may appear in a non-Gaussian distribution. To quantify the biasand validate an eclipse measurement, we compare both the planet-hosting starand several of its neighbors to a separately-chosen control sample of field stars.Re-examining the Rogers et al. (2009) Ks-band measurement of CoRoT-1b findsan eclipse 3190 +370 − ppm deep centered at φ me =0 . +0 . − . . Finally, we pro-vide and recommend the use of selected datasets we generated as a benchmark Johns Hopkins University, Department of Physics and Astronomy, 366 Bloomberg Center, 3400 N.Charles Street, Baltimore, MD 21218, USA Carnegie Institution of Washington, Department of Terrestrial Magnetism, 5241 Broad Branch Rd. NW,Washington D.C., 20015-1305, USA Harvard-Smithsonian Center for Astrophysics, 60 Garden St., Cambridge, MA 02138, USA Department of Astronomy, The University of Arizona, and Department of Planetary Sciences, TheUniversity of Arizona, Tucson, AZ, USA
Subject headings: binaries:eclipsing – planetary systems – stars:individual (CoRoT–1) – techniques: photometric – methods: analytical – methods: statistical
1. Introduction
Eclipses of transiting exoplanets, detected as a dip in flux while the planet passes be-hind the star, provide a wealth of information about the planets’ atmospheric properties.The depth of an eclipse in a given wavelength passband provides information about theplanet’s thermal or reflected emission, or both, relative to the flux of the host star. There-fore, an eclipse detection, even in a single passband, can constrain the day-side tempera-ture and atmospheric energy circulation of the planet at a given atmospheric height. Ameasurement at optical wavelengths can also place an upper limit on the planet’s albedo(see e.g. L´opez-Morales & Seager 2007). Multiple photometric detections at different wave-lengths, or equivalently a spectrum, can further identify the albedo and energy trans-fer, and provide information about the chemical composition of the planets’ atmospheres,their vertical temperature profiles, and the presence or absence of thermal inversion layers(e.g. Burrows et al. 2008; Fortney et al. 2008).To date, more than a dozen exoplanets have published eclipse detections. Most of thoseare from the
Spitzer Space Telescope , including eclipses in the four mid-infrared bands of theIRAC instrument at 3.6, 4.5, 5.6 and 8 µ m (e.g. Charbonneau et al. 2005; Machalek et al.2008; Knutson et al. 2009), at 16 µ m with IRS (e.g. Deming et al. 2006; Grillmair et al.2008), and at 24 µ m with MIPS (e.g. Deming et al. 2005; Charbonneau et al. 2008). Re-cently, optical and near-IR eclipse detections have also been made with the Hubble Space Tele-scope (Swain et al. 2009), the
CoRoT and
Kepler missions (Alonso et al. 2009; Snellen et al.2009; Borucki et al. 2009; Coughlin & L´opez-Morales 2012), and also with ground-based tele-scopes (Sing & L´opez-Morales 2009; de Mooij & Snellen 2009; Gillon et al. 2009; Rogers et al.2009; Anderson et al. 2010; Gibson et al. 2010; L´opez-Morales et al. 2010; Croll et al. 2010a,b;Smith et al. 2011; Croll et al. 2011; C´aceres et al. 2011; de Mooij et al. 2011; Zhao et al.2012b,a; Crossfield et al. 2012).The optical and near-IR detections are of particular interest, since they can be madefrom the ground, can provide information about the planetary albedos (optical observations),and because they are closest to the blackbody emission peaks of hot jupiter exoplanets. Thislast factor is important because it allows an estimation of the bulk of the energy emitted by 3 –those planets. One feature observed in the ground-based detections to date is that severaleclipses appear too deep (i.e. the planet is too bright) in the near-IR (around 2.0 µ m),which suggests that the planets are hotter than predicted by standard models. This hasbeen seen for CoRoT-1b in the eclipse observations by two independent teams at similarwavelengths, i.e. in Ks-band (2.2 µ m) (Rogers et al. 2009) and in NB2090 band (2.09 µ m)(Gillon et al. 2009), for WASP-19b in NB2090 band (Gibson et al. 2010), for WASP-12bin Ks-band (Croll et al. 2011), for HAT-P-1b in Ks-band (de Mooij et al. 2011), and mostrecently for WASP-3b in Ks-band (Zhao et al. 2012a). These results may be attributed toa real physical effect, suggesting alternate atmospheric condition scenarios – for example,unexpected chemical abundances or non-LTE conditions. However, because the signal-to-noise ratios are low, any bias or systematic effects in the analytical methods used to detectand model the eclipses could dramatically change the eclipse measurements and conclusions.The most common methods used to measure physical characteristics from transit andeclipse light curves involve constructing a parameterized model of the light curve and its sys-tematic effects, and then using a routine – typically a Markov Chain Monte Carlo (MCMC)algorithm – to derive the best-fitting set of parameters according to a simple criterion such asthe χ or Bayesian Information Criterion value (e.g. Harrington et al. 2007). Some alterna-tives exist; for example, SysRem (Tamuz et al. 2005), wavelet-based algorithms (Carter & Winn2009), non-parametric algorithms (Waldmann 2011), and Gaussian processes (Gibson et al.2012). These are discussed in Section 2.2.In this work, however, we focus on what remain the most common methods for measuringthe signals and potential systematics associated with exoplanet eclipse observations from theground, i.e. MCMC algorithms, and present benchmark data sets to test the effectivenessof these methods. In Section 2 we describe the characteristics of a typical ground-basedeclipse observation and the data modeling and signal extraction process. In Section 3 wedescribe our model implementation to test potential systematic effects associated with thesteps described in Section 2. Section 4 presents the results of the tests for synthetic data,both with and without red noise, and in Section 5 we show the results of those same testsusing real datasets. In Section 6 we discuss the measurement of real eclipses and our re-evaluation of the depth and uncertainty of the published Ks-band eclipse of CoRoT–1b. Ourconclusions and the implications for future exoplanet eclipse searches are summarized inSection 7. Appendix A provides a collection of benchmark data sets that can be shared fortesting and verifying eclipse modeling and analysis processes. 4 –
2. Secondary Eclipse Observations and Detection Methods
The methods described in this section focus on ground-based observations of secondaryeclipses, which are becoming the primary means of exploring the optical and near-IR portionsof exoplanetary emission spectra, but are currently made at only modest confidence levels(typically less than 10 σ ). However, this same description may also apply to primary transitobservations and to observations from space, if the significance levels of the signals are low.Detections of eclipses in the optical and near-IR are presently limited to “very hotJupiters” (VHJs), i.e. Jupiter-size exoplanets which reside very close to their host stars (lessthan 3-day orbits) and are expected to be tidally locked, with typical atmospheric day-sidetemperatures around 1500–2500 K. The large radii and hot day-sides make these VHJs themost favorable exoplanets for detection. However, even in these cases, the eclipse signals ofVHJs are typically very small ( < < “red” noise, caused mainly by systematic trends related to changing atmosphericand instrumental parameters (Pont et al. 2006). Although noise levels should be improvedby collecting a large number of data points during one or multiple eclipse epochs, this is thecase if the only noise present is random (Poisson or white noise). In practice, the red noiselimits the noise reduction, which in turn limits the confidence of the eclipse measurement.This explains all of the published ground-based detections being at the modest 3-10 σ con-fidence level, except for the K S -band detection of WASP-12b by Croll et al. (2011) — byobserving the target for 6.2 hours with no dithering and 22 stable reference stars, this eclipsewas measured to a remarkable 25- σ level. New techniques for detrending and removal ofsystematics have significantly lowered the amplitude of residual red noise with respect to theunbinned dispersion due to Poisson noise, but some level of residuals are still typical, as seenin most of the published detection papers. Even if the instrumental and other systematicsare well-accounted for, intrinsic stellar noise itself can contribute above the Poisson level(Gilliland et al. 2011). A typical exoplanet eclipse observation consists of a series of short exposures – a “lightcurve” – in a single night over several hours (enough to capture the entire eclipse andsuitable baseline). The most critical attributes determining whether the eclipse signal can 5 –be extracted are the depth of the signal (i.e. the planet-to-star flux ratio) and the noise levels(both red and white components). The chief goals are to minimize the point-to-point fluxdispersion of the light curve and to obtain as many data points as possible, in order to bestdetect a difference between the in- and out-of-eclipse flux levels. These goals are complicated,however, as each characteristic of the observation set – comparison stars, airmass, ditheringstrategy, phase coverage, sampling rate – contributes to the level and structure of the noise.There are a number of observational observational techniques employed to optimize the lightcurve for detection, whether from a space-borne dedicated instrument (see e.g. Gilliland et al.2011 for the
Kepler process), or using existing ground-based facilities.Most VHJ eclipses last between 1.5 and 2.5 hours, the time it takes for the planet to passbehind the star from the observer’s viewpoint. Table 1 lists the parameters that determinethis duration for CoRoT-1b (Barge et al. 2008; Bean 2009), a VHJ that we have targeted forsecondary eclipse detection, and a synthetic planet that we discuss in Section 4. T is thetotal time from the beginning of the eclipse’s ingress (first contact) to the end of its egress(fourth contact), i.e. the “Full Duration” of the eclipse; T is the “Flat-Bottom Duration”– the time from second to third contact during which that the planet is entirely eclipsed bythe star (i.e. when the light curve is flat and at minimum flux).While it is possible to measure the depth of an eclipse with only partial coverage (e.g. theWASP-12b J-band measurement by Croll et al. 2011), the optimal scenario is to observe acontinuous window including the entire eclipse and some out-of-eclipse baseline both beforeand after. If the total number of baseline observations is less than the number of observationsduring full eclipse, the confidence level of the eclipse detection will be limited by the baseline’snoise statistics. Obtaining a baseline at least as long as the eclipse duration will provide abetter opportunity to make a detection, with the noise statistics only limited by the numberof in-eclipse points.Obtaining many data points requires a sampling rate as fast as possible. There isa competing interest, however, in obtaining enough counts per frame to limit the photonnoise in the target and comparison stars. An established strategy is to set the exposuresto be just long enough that the dispersion is dominated by correlated noise rather thanphoton statistics. For ground-based observations, this typically means obtaining at least10 integrated photons per frame (in both the target and comparisons) to limit the photonnoise to 10 − , approximately equal to the contribution level of the red noise, readout errors,flat-fielding errors, etc. Longer exposures, limited by the same systematic noise factors, willnot substantially improve the photometric precision. For near-IR observations, particularlyK-band, there are also sky background concerns to consider – long exposures will saturate,and the faster sky variability is another motivation to have a fast sampling rate. To avoid 6 –saturation but still obtain enough photon counts, it is sometimes necessary to defocus thestar images and spread the signal over more pixels on the chip. This allows the collectionof more photons and can also reduce the sensitivity to small shifts in the stars’ locationson the chip, but the greater number of pixels used will also contribute more noise. Moderninstruments can employ other methods of improving the observing cadence, such as the useof faster-readout subframes or routines designed to optimize the duty cycle. For example, theeclipse of OGLE-TR-56b (Sing & L´opez-Morales 2009) was detected with an E2V camerafeaturing frame-transfer technology allowing almost no readout time and near 100% dutycycle, and the WASP-4b detection (Caceres et al. 2011) used a datacube mode that canstore 250 images at a time with no readout overhead until the end. Taking all of this intoconsideration, the ideal sampling rate is set anywhere from one point every couple of minutesto as many as 100 points per minute, depending on the telescope’s aperture and the apparentbrightness of the host and comparison stars.One of the most important systematic noise minimization strategies is to have at leastone nearby star suitable to be used as a comparison for differential photometry. An idealcomparison star should be non-variable and similar to the target in brightness (to achievegood photon noise in the same exposure time as the target, without saturating) and color(to limit differential atmospheric extinction). Better still is to have many comparison stars,although this is limited by the instrument’s field of view and the brightness of the target.Measuring the target system’s flux relative to the comparison stars, rather than the absoluteflux count of the target, substantially reduces the atmospheric effects, including airmass andthin clouds. To minimize chromatic effects, it is also desirable to limit the observations toan airmass less than ∼ Even with all the noise-minimization techniques described in the previous section, asignificant amount of red noise systematics will likely remain in the observed light curve. 7 –The next (and most difficult) step is to model and remove these systematic trends; otherwisethey can confound the extraction of the eclipse signal.The amount of correlated noise in a data series can be estimated by calculating thedispersion σ N when binning the residuals (i.e. the difference between the observed data andthe best-fit model) into N points. For an entirely random noise distribution (white noise), σ N will decrease as N − / , but in the presence of red noise it instead follows: σ N = σ w N + σ r (1)where σ w and σ r are the amplitudes of white and red noise, respectively. Thus, the contri-bution of the white noise can be reduced by binning over larger samples, but the red noisecannot. Over an eclipse light curve, the red noise often contributes 20 to 50% as much noiseas the unbinned white noise level before detrending (Pont et al. 2006). The white and rednoise amplitudes can be estimated for any data series by finding the σ w and σ r values thatbest suit this equation.The red noise is not completely unexplained, however. In practice, the detected fluxis often found to correlate with various effects of the atmosphere and instrument (seee.g. Harrington et al. 2007). The observed light curve can be thought of as the productof three components: the flux reduction by the actual eclipse, the intrinsic random whitenoise (photon noise, read noise, etc.), and the red noise resulting from systematic correla-tions. The goal is to remove the systematic trends and recover the basic parameters thatdescribe the shape of the eclipse signal: the eclipse depth D e , which represents the emis-sion of the planet in the observed passband, and the mid-eclipse phase φ me , which providesinformation about the eccentricity.There are several approaches to model and remove these systematics, and several so-phisticated algorithms have been proposed and tested. For example, for fields with manystars, algorithms like SysRem (Tamuz et al. 2005) can automatically measure trends andremove them from the light curve to reduce the red noise (e.g. Sing & L´opez-Morales 2009).Carter & Winn (2009) introduced a wavelet-based method to estimate the parameters. In it,the noise is treated as the sum of time-correlated and uncorrelated components, and the like-lihood function is calculated with a fast wavelet transform. Waldmann (2011) has proposeda non-parametric algorithm to blindly de-convolve all non-Gaussian signals from the eclipselight curve. This method is best suited for simultaneous observations at different wave-lengths (e.g. binned spectroscopic observations); most photometric detections are based on asingle light curve, or curves of eclipses on different nights. Both of these alternative methodspreferentially remove the correlated noise without making use of the measured atmosphericand instrumental quantities that are often used to seek correlations and systematic effects. 8 –In another method, Gibson et al. (2012) have used Gaussian processes for regression, whichdefine a distribution over function space, and maintain some advantages of both the linearparameter model and the Waldmann algorithm. Their technique makes use of additionalmeasured parameters, but then finds the distributions of systematics in a non-parametricway. Most common, however, is to simply look for correlations between observed flux andmeasured atmospheric or instrumental parameters, and remove them to reveal the signal.The most basic method is to use only the out-of-eclipse baseline points of the light curves,i.e. the portions that are expected to be constant in time, to manually look for and removetrends one at a time, beginning with the most strongly correlated. For example, if thebaseline flux appears to correlate strongly with airmass, one can fit a linear trend betweenairmass and the differential flux in the baseline, and then remove this trend from the entirelight curve. This process can be repeated iteratively with other parameters until the trendsremaining in the baseline are minimized. Since these fits are done using only the out-of-eclipseportions of the light curve, systematics in the in-eclipse data are removed by interpolation ofthe out-of-eclipse fits. Many of the ground-based eclipse detections (e.g. Rogers et al. 2009;Croll et al. 2011) have been made in this manner.Some weaknesses of looking for trends only in the out-of-eclipse portions of the lightcurve are that it requires a prior guess of which points are not in the eclipse, and it does notmake use of the large amount of in-eclipse data, in which other trends could be present. Themanual removal of one correlation at a time can also be problematic in cases where thereis not a dominant trend, as the removal of a second trend can reintroduce correlations withthe first parameter.Arguably a better approach is to consider the effects of the correlation trends andthe shape of the eclipse signal simultaneously together in a model. This can be done byconstructing a correlation function C q ( q ) with each independent variable q one might pickout from the observations (e.g. airmass, position on the detector, sky brightness, exposuretime, shape of the images, etc.), and then combining those functions with the eclipse lightcurve F ecl to form a more detailed model for the observed flux: F obs = F ecl Y q C q ( q ) (2)At first-order one may use linear C q functions, although in fact the best correlations maybe non-linear. This has been demonstrated to be a useful technique, simultaneously ac-counting for the physical effect of the system and the atmospheric and instrumental effects(Harrington et al. 2007).A challenging aspect of this method, however, is the need to test a large number of free 9 –parameters, without initially knowing which parameters are the most essential. Testing acomprehensive grid of possible combinations of these parameters can be time-prohibitive;instead, a method is needed to efficiently find the set of parameters, with uncertainties, thatbest models the observed light curve. A few different algorithms are used to achieve this,but the most common is the MCMC, which we discuss in Section 3.2.We consider this last approach to be the most robust; therefore it is the one we havefocused on in the rest of this work. Our implementation is discussed in detail in Section 3.
3. Model Implementation and Systematic Effects Tests
To test the analysis process which extracts eclipse detections from photometric lightcurves, we set out to first generate a variety of test datasets (discussed in Sections 4 and5), and then evaluate them using our own adaptation of a multi-parameter model of theeclipse plus correlation trends, and MCMC routines to find the best-fit set of parameters.The goal is to determine how well this type of model and the MCMC process measure aninput eclipse of known depth and phase placement. If the process is working effectively,the best-fit solution should recover the same correlation and eclipse signal that was input.Under a range of noise levels, systematics, and other conditions, how close does the MCMCcome to recovering a signal of a given depth? At what noise limits do we no longer detectthe input signal? Are there any systematic errors in the measurement? In order to answerthese key questions, we required our own implementation of an observed light curve modeland a process for fitting the model with the best parameters.
We began constructing our model of the light curve of a secondary eclipse of a typicalVHJ with a simple eclipse curve - the light curve that the system’s flux would produce in theabsence of any noise. This shape is determined by a number of star-planet system parameters,generally constrained by transit and radial velocity observations to better precision thanachievable from eclipse observations: stellar radius R ∗ , planet radius R P , orbital period P ,semi-major axis a , inclination i , and eccentricity e . We selected for these quantities valuescomparable to those of CoRoT-1b and other representative VHJs, as given in Table 1.Similarly to the transit models in Mandel & Agol (2002), the eclipse curve was generatedby simulating the motion of the planet relative to the star, and calculating from this thefraction of the planet’s surface area eclipsed P ecl as a function of orbital phase φ . The 10 –eclipse window for a transiting VHJ is typically a small enough portion of the period thatorbital eccentricity affects only φ me , the phase at which the middle of the eclipse is centered,which is directly related to e cos ω , where ω is the argument of the periastron. We beganby calculating the function r ( φ ), the separation between star and planet disk centers in theplane of the sky, in units of R ∗ : r ( φ ) = aR ∗ [1 − sin i cos ( φ − φ me )] / , (3)and from there we found the portion of the planet that is eclipsed as a function of r : P ecl ( r ) = r ≥ p πp ( θ − sin θ cos θ ) + π ( θ − sin θ cos θ ) 1 − p < r < p r ≤ − p (4)where p is the unitless ratio R P /R ∗ , and the angles θ and θ are defined by:cos θ = 1 + r − p r ; cos θ = r + p − rp . (5)Because it is the planet and not the star that is eclipsed, stellar limb darkening does notaffect the eclipse shape, and the portion of the light curve when the planet is fully obscuredwill be flat. The flux decrement, or depth D e , of the eclipse depends on the brightness(thermal and reflected emission) of the planet. For our purposes, we considered the planetto be seen as a uniformly bright disk, allowing us to model the eclipse shape as F ecl ( φ ) = F b ∗ { D e ∗ [1 − P ecl ( φ, φ me )] } (6)where F b is a normalizing baseline flux parameter. P ecl ( φ, φ me ) was found using Equations3, 4, and 5, the mid-eclipse phase φ me , and the other fixed parameters, and then used with D e to model the expected flux from the star and planet throughout the eclipse (flux of 1 infull eclipse, 1+ D e when the planet is fully visible, normalized to the brightness of the staralone). Thus, with all the other star and planet parameters held constant, the model for theeclipse shape depends only on F b , D e , and φ me .Next, in order to model the systematic effects, we allowed possible relationships betweenthe detected flux and various independent variables measured in the dataset, constructingfor each variable a simple linear correlation function: C q ( q ) = 1 + m q q (7)where m q is the correlation slope factor between a given independent variable q and the flux.Each linear function could in principle also have an intercept b q , but these can easily becombined into the overall baseline F b , so they are not needed in each correction curve. 11 –Note that this linear function is merely the simplest possible correlation one could devise.For measurements such as pointing/pixel effects, it is an approximation for effects that couldbe nonlinear. Thus, this model can be expanded to include polynomial or other correlationfunctions, e.g. a quadratic function C q ( q ) = 1 + m q q + m q q . Later, we explored this in our“real data” tests (Section 5.3).We considered several variables that might be available and relevant: the airmass ( z ),the full-width half-maximum size of the star images ( w ), their x- and y- positions on the chip( x , y ), sky brightness ( s ), and time (or, equivalently, orbital phase φ ). Correlations withother characteristics can of course also be modeled as needed, but the model parameters weselected for this work are these, also listed in Table 2.Some of these ( φ , z ) are universal in each dataset, but the rest ( m w , m x , m y , m s )are measured separately for each star in the field. As long as a variable exhibits the samebehavior over time for each star (e.g. if one star moves three pixels in the x-direction and itsFWHM increases by 10%, the same happens to all other stars in the field), a single set ofvalues can be used to determine that trends with that variable. Even though each star maybe affected differently by changes in one parameter, the differential flux can be modeled witha single correlation. However, if the behavior of the same variable differs greatly from starto star, it may be better to use a different set of measurements, and model parameters, foreach star (e.g. if there is a different sky brightness behavior in different parts of the field, use s , s , etc. rather than just one s – and increase the number of corresponding slope factors m s ).In addition, as discussed in Section 2.1, a single dataset may comprise observations frommultiple offset / dither positions, each of which can exhibit different trends. For example, x , y , and w each affect the distribution of photons across different pixels on the detector, sotheir effect on the measured flux will not be the same for different offset positions. The twoprimary parameters ( D e , φ me ) in Table 2 are global to the entire dataset, defining the shapeand placement of the eclipse in the light curve, but the rest can have different behavior atdifferent offset positions; thus, we use a separate set of these parameters to apply to thepoints at each offset position.All the correlation functions for the independent variables are then multiplied by thenoiseless secondary eclipse light curve (Equation 6) to form a more accurate model for theobserved flux of the target planet-star system as a function of orbital phase φ , and theadjustable model parameters: F mod = F ecl ( φ, F b , D e , φ me ) Y q C q ( q ) . (8) 12 –By finding the parameters that produce a single best-fit match of the model in Equa-tion 8 to the observed light curve, we derive the key values we are seeking: the eclipse’sdepth and central phase. With differently-behaving parameters, multiple stars and offset positions, the numberof parameters required to produce the full light curve model can begin to grow rapidly,and testing even a sparse grid of possible combinations can become prohibitive. To moreefficiently sample the parameter space and find the best solution with uncertainties, weimplemented two separate MCMC algorithms, each developed independently to ensure thatwe were not seeing the effects of a problematic procedure.The first MCMC process, which we called “MCMC-A”, begins by generating a set ofinitial guesses for all the parameters, and in each step randomly selects one parameter toperturb by a small amount, or “jump.” Each step generates a model light curve for thisset of parameters including the jump, and calculates an information criterion, or “fitness”,for example, the χ value between that model and the observed points (see Section 3.3).The new fitness is compared to the fitness at the previous step, and then the jump is eitheraccepted or rejected, following the Metropolis-Hastings algorithm: If the new fitness is lower(i.e. better) than the previous one, the jump is accepted, and the chain proceeds to thenext step with a set of parameter values including that new jumped-to value; if the newfitness is worse than the previous one, the jump may still be accepted, with a probability P jump = e − ∆ χ , where ∆ χ is the difference in fitness caused by this jump. The secondoption prevents the solution from remaining in a local minimum in the parameter space. Ifthe jump is rejected, the pre-jump parameters are carried over to the next step. This processof random jumps being accepted or rejected is repeated up to four million times, resultingin a distribution of values for each parameter from all the steps of the MCMC. To removebiases from the starting position, we discard the first 20% of the steps, and then examine theremaining distribution for each parameter. These values are then binned into a histogram,and the midpoint of the most heavily-populated bin (i.e. the mode average) is taken to bethe best-fit value of that parameter, with the errors set such that the central 68% of thepoints fall within the 1- σ limits.For every eclipse we tested, MCMC-A first ran two short (10,000-step) MCMC trials toadjust the jump size for each parameter, i.e. the amplitude of the random-normal distributionfrom which each jump is selected. If a jump size were set too small, the distribution wouldbe too diffuse; if the jumps were too large, the chain would not jump enough. After setting 13 –jump sizes for each parameter that would result in 30 - 40% of jumps being accepted, we ranfifteen separate MCMC’s – three starting from each of five different selected initial parameterguesses, in order to test the effects that the starting position would have on the chain. Werepeated each test with two different information criteria used for the Metropolis-Hastingstest: χ and the Bayesian Information Criterion (described in Section 3.3). All of the datadescribed in the following sections (synthetic and real datasets) were analyzed with thisMCMC-A implementation.We also tested a representative sample of light curves with a second, independent MCMCanalysis process by a different team member, using the model described in Carter et al.(2011). This analysis, which we called MCMC-B, used a Gibbs sampler to vary a randomparameter for each step, before also using the Metropolis-Hastings jump acceptance criterion.Before each long chain, a sequence of shorter sequences was run to tune the step sizes foreach parameter to between 20-40%. For each test, one chain of 1,000,000 values was run,with the first 5% discarded, and the median and standard deviation of the truncated chainfor each parameter are reported as the best-fit values and errors.When run independently on the same datasets, the two MCMC processes producedsimilar results within their reported uncertainties. Table 3 compares the results between thetwo processes on a sample of eight selected synthetic light curves. The first three columnsdescribe the characteristics of each dataset: the amplitude σ r and pattern of the red noiseused in the light curve (see Section 4.4 and Figure 1), the length of the observing window,and the sampling rate of the observations. Columns 4 through 6 list the eclipse depth thatis put into the light curve initially, and recovered by each MCMC process, where “Fit (A)”and “Fit (B)” correspond to MCMC-A and MCMC-B respectively. Columns 7 through 9show the inputs and results for the mid-eclipse phase. All eight tests used the χ informationcriterion, and had an input white noise amplitude σ w set to 1000 ppm. The best-fit solutionsfor eclipse depth D e and mid-eclipse phase are also shown in Figure 2. From these testswe concluded that both independently-developed MCMCs produce consistent results. Thusconfirming the reliability of each, we proceeded to use MCMC-A in the remainder of thiswork. When including parameters in the model, we do not initially know which are relevantand which will simply complicate the process without helping to model the systematics; forexample, one light curve may have a strong dependence on the position of the stars on thechip, but little to no effect from the changing airmass. In such a case, allowing a dependence 14 –on airmass only increases the uncertainty on the other parameters without improving thefit. Thus, a quantitative measure is needed to describe both the goodness of fit and theusefulness of the free parameters. For that purpose, in addition to the basic χ value, wealso tested an alternative, the Bayesian Information Criterion (BIC), that takes into accountthe number of parameters used. The BIC is defined as: BIC = χ + k ln n (9)where n is the number of observed points and k is the number of nonzero parameters. Thepurpose of the BIC is to introduce a preference for simpler models over more complex modelsthat may be “overfitting” the data with more free parameters than needed. A complex model(i.e. one with a higher number of nonzero parameters k ) will still be preferred in some cases,but only if it significantly improves the fit to overcome the greater k ln n term. For example,removing one parameter from the model (i.e. setting it to zero) reduces the BIC value by ln ( n ), so if doing so does not raise the χ value by more than that amount, the solution withone fewer parameter is preferred.The MCMC will not naturally jump a parameter to exactly zero, so in order to im-plement the BIC, we set a threshold range for each parameter such that jumps to valuesnear zero would be set to zero to minimize the BIC. We found that using a threshold largerthan a few times a parameter’s adjusted jump size would raise rather than lower the BIC.Thus, we set the threshold for each parameter equal to the jump size (previously adjustedas described in Section 3.2).
4. Synthetic Datasets
As discussed in Section 2.1, the recovery of a secondary eclipse signal by the modelingand analysis process depends on the depth of the eclipse and the noise levels, as well asother aspects of the data set such as phase coverage and sampling rate. Before beginning onreal data, which has unknown noise patterns, we wanted to test our method on completelysynthetic datasets with known and controlled noise sources. To do this, and investigate howeach characteristic might affect the process’s results, we created a wide variety of syntheticeclipse light curves, and then ran them through the MCMC-A process to see what depthsand phases would be measured. Each test dataset consisted of a secondary eclipse curve witha controlled eclipse depth D e , and mid-eclipse phase φ me , combined with a noise profile. 15 – We began by creating a model “Exoplanet X” similar to many detected VHJs. Theplanet and star radii, orbital period, semimajor axis, and inclination, which constrain theduration and shape of the eclipses, are listed in Table 1. The period, 25 hours, was selectedfor an easy conversion between time and phase – every hour of observation corresponds toa span of 0.04 in phase. This configuration results in a flat-bottom portion of the eclipse( T ) spanning 94 minutes, with ingress and egress lasting 18 minutes each, for a full eclipseduration ( T ) of 130 minutes.For most of the trials, we used an observation window beginning two hours before andending two hours after the mid-eclipse time (240 minutes total, extending from phase 0.42 to0.58). This provided 55 minutes of baseline on each side of the eclipse, giving a total baselineof 110 minutes, which is slightly longer than the time spent in the flat-bottom portion of theeclipse shape. We also tested runs with shorter and longer baselines (15, 94, and 188 minuteson each side; total durations of 160, 318, and 506 minutes), and a 240-minute series withthe mid-eclipse phase offset to 0.475, to provide a baseline that was short (17.5 minutes) onone side and long on the other.We then set up an array of observation points across the selected window with a givensampling rate, and a small random error (normally distributed; σ =0.1 s) added to the timingof each point to simulate imperfect data spacing. We tested sampling rates of 25, 75, and250 points per hour (100, 300, and 1000 data points, respectively, in the four-hour observingwindow). Table 4 lists all the values we adjusted and tested in this section.Next, using these phase points and the planet model, we built the (noiseless) secondaryeclipse shape with the same function (Equation 6) used for that part of the light curve model.Given all the fixed star and planet values, and a set baseline flux F b = 1, the only adjustableparameters that would adjust the noiseless eclipse shape were D e and φ me .Finally, we generated a noise profile to add to the eclipse curve, with either white noiseonly (Section 4.3) or a combination of white and red noise (Section 4.4). An example noiselesseclipse shape is shown in panel (a) of Figure 3, along with different noise models, constructedlight curves, and their best-fit model solutions. As described in Section 3.2, the MCMC-A process ran fifteen Markov chains for eachlight curve evaluated, starting from five distinct sets of model parameter values. Only the 16 –first four parameters ( D e , φ me , F b , and m φ ) from Table 2 were used in fitting the syntheticeclipses, as values for the other measured atmospheric and detector effects did not exist. Forreal datasets (Section 5), we would include all of the parameters.As an example, Table 5 shows the results for a complete set of fifteen trials on one lightcurve: 75 data points per hour in a four-hour observing window, with a synthetic ExoplanetX eclipse of depth D e =4000 parts per million (ppm), centered at phase φ me =0.500, and rednoise (Pattern 3; see Section 4.4 and Figure 1) equal in magnitude to the white noise (1000ppm), and run to optimize the χ information criterion. All of these aspects are discussedin Section 4.4, and this is also the light curve shown in panel (d) of Figure 3.The trials are divided into five subsets of three trials, each one set starting from differentinitial guesses for D e and φ me . The trials are identified in the first column by their subset’sstarting point and then the trial number within that subset, e.g. 1-2 for the second trial atthe first starting position. The second and third columns provide the initial guesses for eachsubset. Columns four through six show the best-fit results for the eclipse depth, mid-eclipsephase, and baseline flux (phase slope m φ is not shown). The final two columns show twomeasures of the model’s fitness – the χ value and the root mean square of the residuals.The results among each set of trials were generally in good agreement and independent ofthe point from which the chain was started, though occasionally the series from one startingpoint would find a less optimal solution, which was reflected by a higher dispersion of theresiduals between the observed and model points. For example, in the set of trials shown inTable 5, the recovered values from four of the five starting points were in agreement witheach other within their errors, and had very similar χ and residual RMS values, suggestingthat the space they populated represented the global best fit. Meanwhile, the dispersionsof the residuals from starting point 4 were much worse, indicating that the MCMC foundonly a local minimum. Of the fifteen trials, we adopted the solution with the lowest residualdispersion to provide the model light curve and best-fit parameters. The last row of thetable lists the key results from this example’s best fit (trial 5-1): eclipse depth (adjusted forthe baseline), and mid-eclipse phase. This solution differs from the input depth by 360 ppm(7.3 σ , but only 9% of the input depth), and the input mid-eclipse phase by 0.00084 (3.2 σ ),equivalent to 76 s for the orbit of Exoplanet X. Both of these values suggest that the errorsmay be underestimated. 17 – In the simplest scenario, we generated normally-distributed white noise to add to thenoiseless eclipse curves. Because the detectability would be dependent on the depth-to-noiseratio, we chose to keep the white noise amplitude σ w (i.e. the RMS of the normal distribution)fixed at 0.1%, or 1000 ppm, and test different eclipse depths, beginning at 5000 ppm (fivetimes the white noise level) and continuing down to 50 ppm, as well as different values forthe other attributes of the light curves (see Table 2). Panel (b) in Figure 3 shows a whitenoise pattern and the combined light curve (eclipse plus noise) with its best-fit model.Figure 4 shows the difference between the eclipse depths that were input into the testsand those that were measured by the analysis, as a percentage of the input depths. We testedthe range of eclipse depths for four different observing window lengths, and using both the χ and Bayesian information criteria. Each of these sets is shown in a separate plot, withthe three different sampling rates in different colors. At the highest input depths, both theeclipse depth and mid-eclipse phase that were recovered were in close agreement with thevalues from the input models, as expected; as the model depths were decreased, these valuesbegan to disagree by greater amounts. By looking at each series starting with the deepestinput and moving to the shallower ones, we were able to examine under what conditions thesignals continued to be recovered successfully.To quantify the results, we calculated the percent difference between the recovered andinput values at each eclipse depth, beginning with the deepest and looked for a point belowwhich we could no longer recover the eclipse signal reliably (for our purposes, we referred toa recovered eclipse depth within 10% of the input depth as “successful”). We then defineda “critical depth” D e,crit as the input depth of the last successful recovery before the second time the difference was greater than 10% (to avoid being confounded by one outlier point).We found that the mid-eclipse phases were also recovered accurately for tests above thisthreshold – 63% contained the input φ me within 1 σ of the calculated values and errors; 91%within 2 σ – but diverged as the input depths became shallower. Above the critical depth, wecalculated a “dispersion” as the average of the absolute values of the errors after discardingthe highest and lowest values, and considered whether the deviations were systematic orrandom in nature. Figure 4 illustrates the accuracy of the recovered eclipse depths for allthe synthetic, white noise only tests, and the critical depths and average errors for everycombination of observing window, sampling rate, and information criterion are listed in thecolumns in Table 6 labeled “ σ r =0”.Typically, the signals were recovered in agreement with the inputs as long as the depthwas at or above the white noise level (1000 ppm). The only instances where this was not thecase were at the lowest sampling rate (25 points per hour), where there were fewer points in 18 –the light curve; however, even in these instances the results were not off by more than about15% until we passed 1000 ppm. At depths below that level, the process was much less likelyto recover the input depth, although the observing window and sampling rate did affect this.For the standard 240-minute observing block, D e,crit was at or near the white noise level(800 – 1000 ppm) when using both the χ and BIC criteria. We also found that, while thesampling rate had little effect on the critical depth reached, it did affect the dispersion of theresults for the points above D e,crit . As we increased the sampling rate, the recovered depthsclustered closer to the input values. This showed, as expected, that a faster sampling rateat the same noise level will return a more accurate measurement.In the case of the shorter (160-minute) observing windows, eclipse depths were recov-ered less accurately, and the critical depth was reached sooner. For the longer windows(particularly the longest, 506 minutes) and at the faster sampling rates, accurate depthswere recovered from eclipses with signals as low as 250 to 300 ppm, i.e. a factor of threeto four times shallower than the white noise level. This demonstrated that baselines thatsignificantly exceed the duration of the eclipse are preferable.In all cases tested, there were no systematic biases toward measuring the eclipses to bedeeper or shallower eclipses than the inputs; rather, the errors appeared random. Next, we created an additional component of the noise model to simulate the effects ofred noise. Carter & Winn (2009) took a multi-step approach to synthesizing red noise: first,they established the covariance conditions for wavelet and scaling coefficients for the caseof combined white and red noise. Then they drew a sequence of 1024 independent randomvariables, set such that their wavelet and scaling coefficients would obey these conditions,and finally, took the inverse Fast Wavelet Transform of the sequence to generate a red noisesignal.We constructed our initial models in a simpler fashion, by adding together four sinewaves with periods equal to 2, 1, 0.5, and 0.25 times the four-hour observation window(i.e. P n = 480, 240, 120, or 60 minutes): R ( φ ) = X n =1 A n sin ( φ/P n + ω n ) (10)This model has eight adjustable parameters (amplitude A n and phase angle ω n for eachcomponent). We chose sets of parameter values to create three distinct red noise models, 19 –shown as Patterns 1, 2, and 3 in Figure 1, and measured the RMS of the points in eachpattern. Then to use a given red-noise pattern, we scaled the amplitude to set the RMS toa desired red noise level σ r . We began with a red noise amplitude of σ r = 200 ppm (20% of σ w , a typical level for the red noise found in differential light curves), but also tested higherred noise levels of 500 ppm and 1000 ppm.While we anticipated that our models using a few frequencies would produce similareffects as typical observed noise, real systematics can include power at all frequencies, andincreasing for longer wavelengths. Thus, in addition to the initial patterns, we generatedBrownian noise (1 /f ) models by integrating series of Gaussian white noise. Two suchpatterns are shown in Figure 1 as Patterns 4 and 5. These were scaled and added into thelight curves the same way as the sinusoidal patterns.For each test, we then combined the red-noise pattern with the randomized white noisepattern (always kept at σ w = 1000 ppm) and eclipse shape, and ran the resulting light curvethrough the MCMC and modeling process as before in the white noise tests. An examplelight curve (noise and eclipse signal) for Pattern 3, σ r =1000 ppm, and a synthetic eclipse of D e = 4000 ppm is shown in panel (c) in Figure 3.Figure 5 shows the recovery results for the light curves with a red noise component(Pattern 3) of amplitude 500 ppm, i.e. 50% of the white noise level, added to the raw eclipseshape and white noise. These are subdivided by observing window length, informationcriterion, and sampling rate. In all of these tests, we saw little difference between the χ and BIC results, due to the fact that these models had fewer free parameters than the realdata tests with instrumental and atmospheric measurements. In Section 5.2, we will notethat this is no longer the case.We calculated critical depths and error dispersions in the same manner as for the whitenoise tests (Section 4.3); these results for every observing window / sampling rate / informa-tion criterion combination are also given in Table 6 in the columns marked “ σ r =500 ppm”.As in the white-noise case, the accuracy of both the depth and mid-eclipse phase measure-ments diverged as the input depths became smaller, but this began to happen at depthsgreater than the white noise level, or even the combined point-to-point uncertainty from thewhite and red noise together. The dispersions above the critical depths were greater as well,and we saw stronger systematic errors than in the white noise tests.In the series of 240-minute tests for all three red noise levels, the recovered eclipse depthswere within ∼
10% of the input depths down to near the white noise level, yet the best-fitmodels consistently measured eclipse signals slightly deeper than the inputs, as seen in thesecond row of Figure 5. The critical depths for this baseline ranged from 600 ppm (60% of 20 –the white noise level) to as large as 3300 ppm (over three times the white noise amplitude)for σ r =200 ppm and 500 ppm. For inputs above the critical depths, both information criteriamethods resulted in eclipse measurements 3 to 6% deeper than the input depths. For the σ r =1000 ppm series, the critical depths were all 2300 ppm or higher, and the dispersion wasgreater (5 to 9%). With the shorter 160-minute baseline, there was a stronger bias towardsdeeper detections than the inputs, but the critical depths were much higher as well (2000 to4000 ppm).Baselines of 318 minutes showed improvement in D e,crit and dispersion, but when thebaseline was extended to 506 minutes, the accuracy became systematically worse again, witheach series tending to measure the eclipse signals around 5% shallower than the inputs for σ r =200 ppm, and even more so for the higher levels of red noise. Above the critical depths,these deviations were greater than the error bars on the depths measured by the MCMC,particularly for the cases with faster sampling rates.The strongest biases were seen in the tests with the most red noise ( σ r =1000, equal to σ w ). Especially in the 160- and 240-minute baselines, the recovered eclipses were offset fromthe input values by more than 10% even for the highest input depths (see again Table 6).These trends illustrate the red noise floor explained by Equation 1 – increasing the numberof points reduces the error from the white noise, but not from the red noise. As expected,the critical depths and dispersions increased with higher red noise amplitudes.Examining the shape of red noise Pattern 3 in Figure 1 indicated why the differentbaselines showed different systematic biases. For the longest baseline (506 minutes, extendingfrom phase 0.33 to 0.67), the full red noise pattern was lower on both ends and high in themiddle, which counteracted the inserted eclipse shape. When the ends were clipped to ashorter phase coverage, that influence was removed, and when clipped even further (to 240or 160 minutes), the dip at phase 0.51-0.52 became a signal that was added to the eclipsedepth. With Pattern 1, the longest-baseline set showed a bias recovering deeper eclipsesthan input; with Pattern 2 there was little bias either way. These also suggest that with along observing window it may be prudent to attempt fits with only the necessary portion ofit, since extended baselines can have trends that confound the eclipse measurement.The effects of the Brownian noise patterns (4 and 5) on one series (four-hour observingwindow, sampling rate of 250 points per hour) are shown in Figure 6: for both χ andBIC, Pattern 5 behaves similarly to the simpler Pattern 3, though with a slightly higherpositive bias at most depths. Pattern 4 acts in the opposite direction with similar amplitudes,resulting in slightly lesser measured eclipse depths. They did not show qualitatively differentresults than the sinusoidal noise patterns. 21 –A key finding was that the resulting uncertainties from the MCMC process far under-predicted the actual offsets between input and measured eclipse depths. For example, in thePattern 4, σ r =500 ppm, χ series, the measured depths differed from the input depths by2% to 10% above D e,crit , but the estimated errors from the MCMC were between 0.7% and3%, a factor of 2 - 4 smaller.These results showed that the unknown structure of the red noise, even in the baselineportions of the light curve, can introduce in addition to the random error an additionalsystematic bias affecting all the measurements on a series of tests with a fixed noise patternin a certain way. However, in practice on a real dataset, this bias is only a once-observedeffect, which amounts to a greater uncertainty on the actual depth of the eclipse event. Thiswas reflected by the disagreements between the input and calculated eclipse depths beinggreater than the error bars found by the MCMC process.
5. Real Data Systematics
After a thorough exploration of synthetic light curves, we began to test the process onreal data, which would have measurable white and red noise levels, but unknown systematicsand trends. We took some photometry from a previously observed exoplanet field to betreated as the noise and, as in the previous section, added a range of simulated eclipsesignals to see which could be recovered.
The dataset we used for these tests was the photometry from the CoRoT–1b field, takenin Ks-band with APO/NICFPS, on UT 15 January, 2009, and published in Rogers et al.(2009) (hereafter R09). This set included 758 images alternating between two offset po-sitions 15 arcseconds apart. However, finding the differential photometric precision to besubstantially worse at high airmass, we removed the points at z > σ from their neighbors within 20 minutes, leaving 691data points. In addition to the target, there were four stars of relatively similar magnitude(between 0.95 and 1.8 times the target flux) in the same area of the chip, and many morestars in the field. We assigned each star a number from 0 through 13, with 0 being the target,CoRoT-1, and 1 through 4 the bright nearby reference stars used in R09. The remainingimages spanned 293 minutes, with a small gap (8.8 minutes) in the coverage near phase 0.55,resulting in a cadence of approximately 145 points per hour. The observing window and 22 –sampling rate for our real dataset tests were fixed by those characteristics of the datasetitself, unlike in the case of the synthetic tests.To produce light curves for use with our synthetic eclipse tests, we selected bright, stablefield stars one at a time to treat as though they were the target, and performed differentialphotometry with respect to other nearby stars in the field, using the comparison stars whichproduced the lowest flux RMS in the differential light curves. To avoid issues with the actualeclipse in the data, we excluded the main target star CoRoT-1 from this process. For eachlight curve, we then measured the amount of red and white noise as the values for σ w and σ r that best fit the binned dispersions to Equation 1, finding that both were greater than in thesynthetic noise tests. For example, the noise levels for star 1’s differential light curve werefound to be σ w = 4580 ppm and σ r = 1540 ppm. We then also made the same measurementsusing just the out-of-eclipse baseline points, since for a measurement on the target light curvewe would want to use only these points to judge the noise structure. In most cases the levelswere very similar to those in the full light curve. We repeated this for several stars in thefield, numbered 1, 2, 3, 4, 8, and 13, in order to test the systematics of each. For stars 1 and4, the best comparisons and the measured red and white noise for each (in ppm) light curveare given in the third and fourth columns of Table 7 (identified as “1-a” and “4-a”).The differential flux for light curve 1-a is shown at the top of Figure 7 in blue and red,with each color representing one of the two offset positions. Below it are the other measuredcharacteristics that we used for de-correlation: the airmass ( z ), full width at half-maximumof the star images ( w ), displacement of the stars on the chip in the x- and y-directions ( x , y ),and the sky brightness ( s ). In some datasets we have seen these attributes behave differentlyfor each star; In this particular dataset, however, all of these comparison stars were localizedto the same area, and we tested that the values behaved similarly for each star. Therefore,we were able to use single arrays for each variable φ , z , w , x , y , and s , although each variablestill required two different slope coefficients – one for the behavior at each offset position.In total, the light curve model required 16 parameters (see Table 2: F b , m φ , m z , m w , m x , m y , and m s for each of the two offset positions, along with the global parameters D e and φ me . When using the χ value as the fitness criterion, the models would make use of all16 parameters, allowing correlations with each variable / offset position combination. In theBIC runs, however, we expected that most of the best-fit models would use only the mostinfluential parameters, with the others fixed to zero for a better BIC value. 23 – The resulting light curves from each test star’s differential photometry were then usedas the noise components of new eclipse light curves to be tested. As before in Section 4,we constructed a noiseless eclipse shape for each given depth and mid-eclipse phase, withthe same star and planet parameters of the hypothetical Exoplanet X (Table 1). We thencombined this signal with the real noise profiles to form light curves to be analyzed by ourmodeling and parameter optimization processes, and then ran the tests for a range of eclipsedepths, two different mid-eclipse phases (0.5 and 0.475), and both the χ and BIC informationcriteria. While the models with χ optimization used all sixteen model parameters, the BICruns typically chose only nine to twelve, omitting some of the correlation slope parameters(see Table 7).The behavior of the recovered eclipse signals can be seen in Figure 8. In most cases,as with the synthetic tests, the recovered eclipse parameters were close to the input valuesfor large input eclipse depths, but began to deviate more at depths lower than the whitenoise level. Unlike in the synthetic tests, which had a newly-generated random white noisecomponent for each trial, the noise structure was the same in every run for a given targetstar. This generally resulted in calculated eclipse depths consistently higher or lower thanthe input depth, with the systematic bias getting worse as the input model depths decreased.For each test star’s series, both the uncertainties on the depth measurements and thedifferences between the measured and input depths were substantially higher than in thesynthetic-noise tests. This led to higher values for the critical depth D e,crit , ranging from2000 to 9500 ppm, and in some cases (e.g. for star 4) the recovered eclipse depths werefar outside the 10% range for all depths tested. This was partially due to the point-to-point uncertainty being greater, as evidenced by the higher measured white and red noiseamplitudes, shown in each window of Figure 8 as vertical gray and red bars. Notably, therewas far more difference between the χ and BIC results for these series than for the syntheticmodel tests, because there were far more parameters that could be set to zero for differentpreferred solutions.For the best-behaved test stars (2 and 3), the critical depths occurred at or below thewhite noise amplitude. In these well-behaved series we saw better behavior from the BICthan from the χ models. However, the others (1, 4, 8, and 13) had strong biases from theirinputs at greater depths, and had mixed results comparing χ and BIC. Star 8, for example,behaved well with the BIC to below the level of σ w before getting abruptly worse, while for χ it differed by more than 20% even from input signals greater than 10 ppm.The disagreements between the reported errors and the actual offsets between input 24 –and measured eclipse depths varied substantially as well. For example, for Star 1 and Star4 using the BIC, the offsets were 3 - 4 times greater than the error bars, whereas for Star 1using χ and Star 2 using either criterion, the error bars appeared to be appropriately-sized,containing many of the offsets within 1 σ .Recalling from the tests in Section 4.4 that different baselines can affect the depthmeasurement, we also tried clipping the baseline on two of the light curves: one well-behavedseries (star 2), and one poorly-behaved one (star 4). In addition to running a series with allthe available points, we ran tests with only the points between phases 0.444 and 0.556 (572points over 241 minutes). The series for star 2 was made significantly worse by removingthe extra baseline, but interestingly, the tests for star 4 resulted in values closer to the inputdepths when using χ (though still mostly outside the 10% error threshold) while the BICresults remained about the same. The results for these two clipped light curves can be seenin the bottom row of Figure 8.Even if we account for the different noise sources and consider the eclipse depths relativeto the noise levels, it is apparent that systematic biases exist, and are different for each choiceof target star and the amount of baseline. In practice, it appears that each star’s light curvehas its own systematics relative to the others, which can lead to biases in a measured eclipsesignal. As discussed in Section 3.1, some of the parameters may have non-linear effects; e.g. al-though we modeled them as such, there is no reason that the relationship between fluxsystematics and position on the chip, or the size and shape of the star images, should behavelinearly. To explore this, we added the use of a quadratic fit to the x- and y-displacementparameters, since those are values that may most likely have non-linear shapes. This in-creased the number of model parameters by two per offset position, with two coefficients inthe x and y models rather than one.Figure 9 shows the results in the same format as Figures 4, 5, and 8, with the calculateddepth offsets versus the input depths for synthetic eclipses put into the light curve using Star2 from the CoRoT-1 Ks-band field. The green and pink points with solid lines represent theoffsets with linear correlations only (same as the second plot in Figure 8, but using the eclipsemodel of CoRoT-1b instead of Exoplanet X), while the blue and orange points with dashedlines show the results using quadratic correlation models for the positional displacement.In these tests we found that the quadratic models underestimated the eclipse depths for 25 –all input depths, meaning that they were both more complex and less accurate. However,that may be the case only for this specific data set, and knowing that using a quadratic ormore complicated fit model in place of a linear correlation might affect the depth estimation,it is recommended to at least perform this test on some of the field stars.
6. Identifying and Measuring Real Eclipses
Having tested the MCMC analysis described in the previous sections on a wide rangeof test eclipse light curves with both real and synthetic noise patterns, we then applied theprocess to re-evaluate the CoRoT-1b Ks-band eclipse depth measurement in R09. As we sawin Section 5, each light curve had its own systematics that caused a different bias for eachstar when treated as the target. We measured the eclipse signals in the target light curveas well as other field stars in multiple ways, in order to establish a corrected, more robustvalue for the true eclipse depth.
The main analysis of R09 optimized the differential photometry between the target starCoRoT-1 and four nearby reference stars (those numbered 1, 2, 3, and 4 in this analysis), andthen tested the fitting and removal of trends manually, one at a time. That analysis foundthat correcting for a correlation between the out-of-eclipse differential flux and the FWHMof the star images was sufficient to reach a detection of an eclipse of depth D e = 3360 ± φ me = 0 . +0 . − . .For consistency in re-measuring the eclipse, we took the weighted differential light curvewith respect to the same four comparison stars (without adding any eclipse signal). Theresulting light curve had a flux RMS of 5390 ppm, and measured white and red noise levels(fitting to Equation 1) of σ w =7320 ppm, σ r =1230 ppm. Looking only at the out-of-eclipsebaseline points, however, the noise levels drop, with the red noise component nearly vanish-ing: σ w =6610 ppm, σ r =170 ppm.We then ran the MCMC-A analysis process to find the best-fit model incorporating asecondary eclipse signal – now using the previously-measured characteristics for the planetCoRoT-1b by its host star (see Table 1) rather than “Exoplanet X” – and allowing corre-lations with any of the other variables in Table 2. The best-fit eclipse model using the χ criterion included an eclipse 3240 +330 − ppm deep, centered at phase 0 . +0 . − . ; with theBIC, the best-fit model had D e = 3150 +380 − ppm, φ me = 0 . +0 . − . . These results, all 26 –agreeing with the previously published values within the reported errors of R09, are listedin Table 7 in the column marked “0-a”.The small uncertainties on φ me appear to suggest an eccentricity that is nonzero at the4 to 5- σ level; however, the phase values for the real datasets were calculated based on theephemeris from Bean (2009), derived from multiple transit timing observations: T (RJD)=54159 . ± . p = 1 . ± . φ me to 0.50410 +0 . − . ( χ ) and0.50426 +0 . − . (BIC). Both of these now show a nonzero eccentricity at only a 2.1- σ level,and place the R09 value of φ me =0.5022 at 1 σ away from the new measurements.Because the two information criteria methods gave nearly identical results, we deter-mined it best to adopt average values of the two solutions, with the error bars adjusted toinclude the spread of both. Thus, we state as our re-measurement of the R09 eclipse thevalues D e =3190 +370 − and φ me =0.50418 +0 . − . .We also tried constructing other light curves using different combinations of stars 1through 4 as the comparison flux for the target CoRoT-1. The measurements resulting fromone comparison star were inconsistent, sometimes detecting a negative depth or stronglyoffset phases, and high levels of red noise in the residuals. However, there was better consis-tency in the tests that used two or more reference stars, with a most of the best-fit eclipsemodels having a depth between 2800 and 3500 ppm and a central phase of 0.502 to 0.504.Although these measurements produced eclipse attributes consistent with the results ofR09, we saw in the analysis for Section 5.2 that it was common to find eclipse solutions ofsignificantly non-zero depth in the light curves of other field stars with no synthetic eclipseadded. Two examples of this are seen in the best-fit results for light curves 1-a and 4-a,run through the same process as the target (the third and fourth columns of Table 7), andexamining the results from test target stars 1, 2, 3, 4, 8, and 13, we noted two importantfindings. First, the detected depths appeared to be spread in a non-Gaussian distribution;rather than finding many of the depths close to zero with few outliers, they were clusteredclose to the red noise amplitude in both the positive and negative direction. Second, mostof the spurious detections were not centered at phases close to 0.5 – only one of the positiveeclipse values (4-a with BIC) had a φ me within even 5 σ of 0.5. These spurious detectionsmake it difficult to determine how much a detected eclipse signal in the target light curve isthe result of systematic effects versus an actual measurement of the planet flux. In particular,since the reliability of the measurements is poorest for eclipse signals below the noise levels,eclipses of small depth are very difficult to measure. 27 – In all of the real noise tests, the systematics were produced by a combination of thetarget star’s light curve and the composite flux of the different reference stars, so a bias couldresult from either the target or the reference star flux. To create a more controlled datasetto compare the detection from CoRoT-1 with the spurious detections from the other teststars, we prepared an external “Group B” of stars (numbers 5, 6, 7, 8, 9, 11, 12, and 13 inthe field), and combined their fluxes into a single composite light curve that we could use totest each star from “Group A” (stars 1, 2, 3, 4, and CoRoT-1) in a consistent manner. Withthese we created five new differential light curves, “0-b” through “4-b”. Since we were notusing the optimal set of reference stars, their noise levels were approximately 1.5 to 2 timeshigher than for the “A” set of light curves.We then ran these light curves through the MCMC-A process as before; the right-handside of Table 7 lists three of them – 0-b, 1-b, and 4-b – with their calculated white andred noise amplitudes (in ppm) and the results fo their best-fit solutions. We saw similarbehavior to what appeared in the Group A light curve results: eclipse depths at large values,and mid-eclipse phases often far from 0.5. The depths of these spurious eclipse detectionswere more consistent in these tests than in the Group A tests, clustered mainly at 4100 to4800 ppm, and with only one negative depth detection among them. The measurement ofthe eclipse on the target light curve 0-b did not plainly stand out from the rest; its depthusing χ was only slightly greater than that for 4-b, and while its φ me values were the closestto 0.5, they were not much closer than the values found for 1-b.The eclipse depths found at large phase offsets, however, tell us little about the biasthat the comparison set may contribute when combined with a real eclipse. When a planet’sorbital eccentricity is known from other measurements, as is the case with CoRoT-1b, thatinformation can be used to better rule out false eclipse detections. We expected the trueeclipse to be centered close to φ =0.5 (recall R09 measured 0 . +0 . − . , and the detectionsin other bands either had similar results or assumed a circular orbit), and thus tried runningadditional MCMCs with that parameter fixed. Table 8 lists these results for fixing φ me toboth 0.5 and 0.5022, and in each of these cases, we saw new trends. Since the routine couldno longer fit deep eclipses at phases far from 0.5, the eclipses measured in the light curvesfrom stars 1 through 4 were shallower than before. There also were no negative depthsmeasured, and the eclipse signal for light curve 3-b was nearly zero.The greatest depth measurements were now by a significant margin those for the targetlight curve 0-b, ranging from 4310 +700 − to 5780 +380 − ppm. These values are significantly higherthan in R09; however, there was a greater amount of noise in the comparison flux (compositeof the eight “Group B” stars), than in the optimal reference composite from stars 1 through 28 –4 (“Group A”), and the uniformly positive spurious detections in 1-b through 4-b suggestthat some of this depth is a bias in the comparison light curve. Conveniently, we can lookat these results for the other light curves with the same comparisons to determine what biasthey may be introducing.For each fixed φ me and information criterion, we estimated the Group B bias contributionwith a weighted average of the eclipse depths found for target stars 1 through 4. Theweighting factors were the same that were used for the composite comparison flux fromGroup A: 40% for 2-b, 22% for 3-b, and 19% each for 1-b and 4-b. Then to determine acorrected depth measurement for the real eclipse, we simply subtracted the bias measurementfrom the best-fit 0-b light curve result. The final, corrected D e values for each fixed-phase testare given (in ppm) in the last row of Table 8. These values end up in fairly close agreementwith one another as well as with the R09 value. In particular, the best-fit depths using χ arenearly identical (3210 +560 − ppm at φ me =0.5, 3240 +530 − ppm at φ me =0.5022), suggesting thatthe measured flux decrement is not strongly dependent on the exact placement of the mid-eclipse phase in these trials. The BIC results are less consistent with one another (2650 +760 − ppm at φ me =0.5, 3750 ±
620 ppm at φ me =0.5022), but remain in agreement within theirerrors of the χ results, the values in Section 6.1, and the original measurement in R09.These additional test results served to reinforce that the measurements of CoRoT-1b arethe most consistent in our photometric dataset; the eclipse was found at close to the samedepth and phase placement by a number of different means. The red noise was eliminatedor nearly eliminated in many of the residuals, indicating that the process did a good job ofremoving trends.
7. Summary & Discussion
Much of our understanding of exoplanet atmospheres is based on a small number ofemission detections, making it crucial to know how accurate those observations are. Espe-cially when using ground-based instruments, correlated “red” noise complicates obtaininga reliable measurement. Measurements subject to red noise require an optimization of theobservations to limit the systematic effects, and then the identification, modeling, and re-moval of the trends from the resulting light curves. Simultaneously considering the eclipseshape and correlation trends together in a single model often requires a large number ofparameters to be optimized, making use of statistical tools like the Bayesian InformationCriterion (BIC) and a Markov Chain Monte Carlo (MCMC) algorithm.To test the reliability of commonly used analysis methods, we created a sample of 29 –hundreds of test eclipse light curves with both synthetic and real noise and ran them throughthe processes we developed for modeling (the combination of the eclipse signal and trends)and parameter optimization (MCMC). To simulate red noise, we used both simple sinusoidalpatterns and stochastic Brownian noise, observing similar results with each. Another concernregarding the synthetic models was that the noise amplitude was time-invariant, whereasin actual data sets the changes in airmass or sky brightness, or instrumental changes likedefocusing will change the noise level over the course of the observation. This assumptionof stationarity is used in all of the current modeling approaches, but nevertheless may leadto an underestimation of the effects of real noise. A selection of these data sets are madeavailable in Appendix A for testing other analysis methods. From our tests and results, wedetermined some lessons for both the observation and analysis process.In the case where only white noise is present in the light curves, our tests find for eachsampling rate, baseline, and information criterion a critical input depth dividing the results.Eclipse signals with input depths greater than this value are generally recovered within 10%without any systematic biases, while below it the eclipse depth measurements differ from theinput depths by more than their measured uncertainties. This critical depth can be foundby beginning with synthetic eclipses of a great depth relative to the white noise level, andtesting lesser depths until they are no longer measured within the desired accuracy. We findthat the sampling rate and baseline are both important considerations: a longer baselineand a faster cadence improve the limits to which one can make a successful measurementand reduce the dispersion of the errors.When red noise is present in the light curve, the dispersion of results is greater and thecritical depth is reached at a higher input value; more importantly, we find that several ofthe series exhibit systematic biases. Whereas in the white noise tests, the discrepancy be-tween the input and measured eclipse depths and mid-eclipse phases was randomly positiveor negative, structure in the red noise often results in measured eclipses that are consistentlydeeper or shallower than the inputs, or centered at a particular phase offset. The uncer-tainties reported by the MCMC also may underestimate the depth errors by a factor of twoto four. We find that baselines that significantly exceed the duration of the eclipse are stillusually preferable; however, in certain cases the structure of the noise can negatively affectthe best-fit solutions. Including all available baseline may not be optimal for real datasetseither, particularly if some of it is noisier (e.g. taken at high airmass or different weatherconditions). In cases when this is possible, additional tests should be done with a clippedbaseline.Examining datasets with real noise, we find unexplained or under-modeled systematicsto be more prevalent still, which can lead to large biases or even completely spurious de- 30 –tections. Even with a detailed model to account for a wide variety of trends in the lightcurves, we find that discrepant results between input and output signals appear when usingdifferent field stars for photometric comparison. For this reason, confirming a measurementusing different sets of comparison stars is advised. Much larger biases can accumulate onshallow eclipses than on deeper ones, and each star shows its own systematics and biasesthat sometimes cannot be removed by de-correlation techniques. The actual depth biasescan be anywhere from about the same magnitude as the measured uncertainties to three tofour times as large, reinforcing the common attitude of preferring substantially more thana 3- σ detection. A significant finding is that spurious eclipse detections measured on fieldstars do not follow a Gaussian distribution, but rather cluster around the level of the lightcurve’s red noise, with depths in both the positive and negative direction.This uncertainty of what is a real eclipse signal and what is a false detection fromred noise poses a serious problem to the measurement of eclipses, and further tests shouldbe used to determine the nature of the biases and verify the detections. One way to doso is to design a control set of comparison stars and determine a bias level by combiningthe spurious detections of other stars in place of the target. If the expected mid-eclipsetime (i.e. eccentricity) is well-known, this value can be fixed to better determine the noisestructure’s contribution to the measurement of the actual eclipse depth. The establishedbias level can then be corrected for in the eclipse measurement of the actual target to verifyits depth.These additional tests and verification processes are particularly advisable in the caseof low-confidence ( < σ ) detections, i.e. most of the ground-based measurements that havebeen published. In order to claim a confident detection, we recommend the following criteriabe met: • A sufficient level of significance on the measured signal (commonly adopted as ∼ σ ),and agreement using different comparison stars and/or methods • Mid-eclipse phase consistent with other studies of the eccentricity • A relatively small amount of red noise in the residuals • Spurious eclipse signals with similar confidence levels and phase placements as thetarget not seen in the light curves of other nearby starsThis set of criteria is stricter than adopted in the literature, but ensures more reliabledetections and depth measurements. We provide as an example the tests performed on theCoRoT-1 data from R09. Whereas the tests on other stars in the field produce a distribution 31 –of eclipse detections with positive and negative depths and a range of mid-eclipse phasessuggesting strongly nonzero eccentricities, the measurements of the eclipse in the CoRoT-1light curve agree closely in depth and with a mid-eclipse phase close to 0.5. Testing differentlight curves for the target using different reference stars, the results are not strongly affectedas long as at least two of the bright, nearby comparison stars are used. When using a separateset of less-ideal reference stars, a deeper eclipse is measured, but this bias is estimated andremoved effectively by finding the depth measured in other stars with the same referenceset, fixed to the same mid-eclipse phase. Through this variety of tests, we establish the R09detection to be robust and consistent.Given the difficulties and complications of measuring secondary eclipse signals in typicalground-based data, we caution that both published and working detections be checked toensure that they represent an actual eclipse measurement and are not the products of unre-solved biases. If the published eclipse depths (and, subsequently, planetary thermal emissionlevels) are adjusted, the best-fitting models and conclusions about atmospheric composi-tion and structure can be dramatically altered. We recommend that these analysis routinesand tests be applied to current and future eclipse light curves, as a means to evaluate andminimize the presence of systematics in the reported eclipse signals.This research has been supported by the National Science Foundation through grantAST-0908278. Critical support was provided by the Space Telescope Science Institute Di-rector’s Discretionary Research Fund D0101.90131. We thank the scientific editor and theanonymous referee for insightful comments and suggestions, and Veselin Kostov for helpfuldiscussion.
REFERENCES
Alonso, R., Guillot, T., Mazeh, T., Aigrain, S., Alapini, A., Barge, P., Hatzes, A., & Pont,F. 2009, A&A, 501, L23Anderson, D. R., Gillon, M., Maxted, P. F. L., Barman, T. S., Collier Cameron, A., Hellier,C., Queloz, D., Smalley, B., & Triaud, A. H. M. J. 2010, A&A, 513, L3+Barge, P., Baglin, A., Auvergne, M., Rauer, H., L´eger, A., Schneider, J., Pont, F., Aigrain,S., Almenara, J.-M., Alonso, R., Barbieri, M., Bord´e, P., Bouchy, F., Deeg, H. J., LaReza, D., Deleuil, M., Dvorak, R., Erikson, A., Fridlund, M., Gillon, M., Gondoin, P.,Guillot, T., Hatzes, A., Hebrard, G., Jorda, L., Kabath, P., Lammer, H., Llebaria, A., 32 –Loeillet, B., Magain, P., Mazeh, T., Moutou, C., Ollivier, M., P¨atzold, M., Queloz,D., Rouan, D., Shporer, A., & Wuchterl, G. 2008, A&A, 482, L17Bean, J. L. 2009, A&A, 506, 369Borucki, W. J., Koch, D., Jenkins, J., Sasselov, D., Gilliland, R., Batalha, N., Latham,D. W., Caldwell, D., Basri, G., Brown, T., Christensen-Dalsgaard, J., Cochran, W. D.,DeVore, E., Dunham, E., Dupree, A. K., Gautier, T., Geary, J., Gould, A., Howell,S., Kjeldsen, H., Lissauer, J., Marcy, G., Meibom, S., Morrison, D., & Tarter, J. 2009,Science, 325, 709Burrows, A., Budaj, J., & Hubeny, I. 2008, ApJ, 678, 1436C´aceres, C., Ivanov, V. D., Minniti, D., Burrows, A., Selman, F., Melo, C., Naef, D., Mason,E., & Pietrzynski, G. 2011, A&A, 530, A5Caceres, C., Ivanov, V. D., Minniti, D., Burrows, A., Selman, F., Melo, C., Naef, D., Mason,E., & Pietrzynski, G. 2011, ArXiv e-printsCarter, J. A., Rappaport, S., & Fabrycky, D. 2011, ApJ, 728, 139Carter, J. A. & Winn, J. N. 2009, ApJ, 704, 51Charbonneau, D., Allen, L. E., Megeath, S. T., Torres, G., Alonso, R., Brown, T. M.,Gilliland, R. L., Latham, D. W., Mandushev, G., O’Donovan, F. T., & Sozzetti, A.2005, ApJ, 626, 523Charbonneau, D., Knutson, H. A., Barman, T., Allen, L. E., Mayor, M., Megeath, S. T.,Queloz, D., & Udry, S. 2008, ApJ, 686, 1341Coughlin, J. L. & L´opez-Morales, M. 2012, AJ, 143, 39Croll, B., Albert, L., Lafreniere, D., Jayawardhana, R., & Fortney, J. J. 2010a, ApJ, 717,1084Croll, B., Jayawardhana, R., Fortney, J. J., Lafreni`ere, D., & Albert, L. 2010b, ApJ, 718,920Croll, B., Lafreniere, D., Albert, L., Jayawardhana, R., Fortney, J. J., & Murray, N. 2011,AJ, 141, 30Crossfield, I. J. M., Barman, T., Hansen, B. M. S., Tanaka, I., & Kodama, T. 2012, ApJ,760, 140 33 –de Mooij, E. J. W., de Kok, R. J., Nefs, S. V., & Snellen, I. A. G. 2011, A&A, 528, A49+de Mooij, E. J. W. & Snellen, I. A. G. 2009, A&A, 493, L35Deming, D., Harrington, J., Seager, S., & Richardson, L. J. 2006, ApJ, 644, 560Deming, D., Seager, S., Richardson, L. J., & Harrington, J. 2005, Nature, 434, 740Fortney, J. J., Lodders, K., Marley, M. S., & Freedman, R. S. 2008, ApJ, 678, 1419Gibson, N. P., Aigrain, S., Pollacco, D. L., Barros, S. C. C., Hebb, L., Hrudkov´a, M.,Simpson, E. K., Skillen, I., & West, R. 2010, MNRAS, 404, L114Gibson, N. P., Aigrain, S., Roberts, S., Evans, T. M., Osborne, M., & Pont, F. 2012, MNRAS,419, 2683Gilliland, R. L., Chaplin, W. J., Dunham, E. W., Argabright, V. S., Borucki, W. J., Basri,G., Bryson, S. T., Buzasi, D. L., Caldwell, D. A., Elsworth, Y. P., Jenkins, J. M.,Koch, D. G., Kolodziejczak, J., Miglio, A., van Cleve, J., Walkowicz, L. M., & Welsh,W. F. 2011, ApJS, 197, 6Gillon, M., Demory, B., Triaud, A. H. M. J., Barman, T., Hebb, L., Montalb´an, J., Maxted,P. F. L., Queloz, D., Deleuil, M., & Magain, P. 2009, A&A, 506, 359Grillmair, C. J., Burrows, A., Charbonneau, D., Armus, L., Stauffer, J., Meadows, V., vanCleve, J., von Braun, K., & Levine, D. 2008, Nature, 456, 767Harrington, J., Luszcz, S., Seager, S., Deming, D., & Richardson, L. J. 2007, Nature, 447,691Knutson, H. A., Charbonneau, D., Cowan, N. B., Fortney, J. J., Showman, A. P., Agol, E.,Henry, G. W., Everett, M. E., & Allen, L. E. 2009, ApJ, 690, 822L´opez-Morales, M., Coughlin, J. L., Sing, D. K., Burrows, A., Apai, D., Rogers, J. C.,Spiegel, D. S., & Adams, E. R. 2010, ApJ, 716, L36L´opez-Morales, M. & Seager, S. 2007, ApJ, 667, L191Machalek, P., McCullough, P. R., Burke, C. J., Valenti, J. A., Burrows, A., & Hora, J. L.2008, ApJ, 684, 1427Mandel, K. & Agol, E. 2002, ApJ, 580, L171Pont, F., Zucker, S., & Queloz, D. 2006, MNRAS, 373, 231 34 –Rogers, J. C., Apai, D., L´opez-Morales, M., Sing, D. K., & Burrows, A. 2009, ApJ, 707,1707Sing, D. K. & L´opez-Morales, M. 2009, A&A, 493, L31Smith, A. M. S., Anderson, D. R., Skillen, I., Collier Cameron, A., & Smalley, B. 2011,MNRAS, 416, 2096Snellen, I. A. G., de Mooij, E. J. W., & Albrecht, S. 2009, Nature, 459, 543Swain, M. R., Vasisht, G., Tinetti, G., Bouwman, J., Chen, P., Yung, Y., Deming, D., &Deroo, P. 2009, ApJ, 690, L114Tamuz, O., Mazeh, T., & Zucker, S. 2005, MNRAS, 356, 1466Waldmann, I. P. 2011, ArXiv e-printsZhao, M., Milburn, J., Barman, T., Hinkley, S., Swain, M. R., Wright, J., & Monnier, J. D.2012a, ApJ, 748, L8Zhao, M., Monnier, J. D., Swain, M. R., Barman, T., & Hinkley, S. 2012b, ApJ, 744, 122
This preprint was prepared with the AAS L A TEX macros v5.2.
35 –Table 1: Characteristics of Exoplanet X and CoRoT-1b.
References. (1) See Section 4.1(2) Barge et al. 2008 (3) Bean 2009Parameter Exoplanet X CoRoT–1bStar Radius R ∗ [ R Sun ] 1.0 0.95Planet Radius R P [ R Jup ] 1.5 1.49Period P [days] 1.041667 1.508956Semimajor Axis a [AU] 0.01978 0.02540Inclination i [degrees] 88.0 85.1Eccentricity e T [minutes] 130.15 150.32 T [minutes] 94.13 106.56Reference 1 2, 3Table 2: Parameters Used for Light Curve ModelsModel Parameter SymbolEclipse Depth D e Mid-Eclipse Phase φ me Baseline Flux Level F b Phase Slope m φ Airmass Slope m z FWHM Slope m w X-displacement Slope m x Y-displacement Slope m y Sky Brightness Slope m s Table 3: Results from Two Independent MCMC Methods σ r (ppm) Obs. Win Samp. Rate Eclipse Depth D e (ppm) Mid-eclipse phase φ me (pattern) (minutes) (points/hour) Input Fit (A) Fit (B) Input Fit (A) Fit (B)0 240 25 1300 1318 +141 − ±
210 0.500 0.50093 +0 . − . ± +79 − ±
99 0.500 0.50123 +0 . − . ± +31 − ±
65 0.500 0.49914 +0 . − . ± +130 − ±
119 0.500 0.50176 +0 . − . ± +88 − ±
120 0.500 0.49940 +0 . − . ± +72 − ±
119 0.500 0.50207 +0 . − . ± +100 − ±
120 0.500 0.49929 +0 . − . ± +51 − ±
66 0.500 0.49994 +0 . − . ± σ w (ppm) 1000Red Noise Pattern (See Fig. 1) 0, 1, 2, 3Red Noise Level σ r (ppm) 200, 500, 1000Observing Window (minutes) 160, 240, 318, 506Input Mid-Eclipse Phase φ me .475, .500Sampling Rate (points / hour) 25, 75, 250Input Eclipse Depth D e (ppm) 50, 100, 150, 200, 250, 300, 400, 500,600, 800, 1000, 1300, 1600, 2000, 2300,2600, 3000, 3300, 3600, 4000, 4500, 5000Information Criterion χ , BIC Table 5: Example Results of MCMC TrialsMCMC Starting Point Recovered Values ResidualsTrial D e (ppm) φ me D e (ppm) φ me F b χ RMS (ppm)1-1 -744 0.50311 4390 +24 − . +0 . − . . +0 . − . +31 − . +0 . − . . +0 . − . +47 − . +0 . − . . +0 . − . +65 − . +0 . − . . +0 . − . +58 − . +0 . − . . +0 . − . +24 − . +0 . − . . +0 . − . +53 − . +0 . − . . +0 . − . +35 − . +0 . − . . +0 . − . +30 − . +0 . − . . +0 . − . − +98 − . +0 . − . . +0 . − . − +154 − . +0 . − . . +0 . − . − +113 − . +0 . − . . +0 . − . +51 − . +0 . − . . +0 . − . +22 − . +0 . − . . +0 . − . +25 − . +0 . − . . +0 . − . +51 − . +0 . − . Table 6: Synthetic Series Critical Depths and DispersionsObs. Sampling σ r =0 σ r =200 ppm σ r =500 ppm σ r =1000 ppmWindow Rate Info. D e,crit Disp. D e,crit Disp. D e,crit Disp. D e,crit Disp.(minutes) (pts./hr.) Crit. (ppm) (%) (ppm) (%) (ppm) (%) (ppm) (%)160 25 χ > χ > χ > > > χ
800 3.84 2000 5.59 2300 5.56 3000 5.94240 75 χ χ
800 2.41 600 4.01 2000 4.14 2300 8.59240 25 BIC 800 5.98 3300 4.78 1300 5.90 > χ χ
500 3.41 1300 1.96 500 3.38 1300 3.65318 250 χ
500 2.91 400 1.59 250 2.86 800 2.86318 25 BIC 1600 3.20 600 4.37 2600 5.78 1600 4.75318 75 BIC 800 4.62 500 3.75 800 4.37 600 3.44318 250 BIC 250 3.59 800 2.99 800 2.65 1000 2.92506 25 χ
800 4.82 800 4.72 2300 6.29 > χ
300 3.96 1300 4.83 3600 8.98 > χ
300 2.80 1600 4.75 3000 7.66 > > > > Table 7: Real Noise Data Sets and ResultsLight Curve 0-a 1-a 4-a 0-b 1-b 4-bTarget CoRoT-1 Star 1 Star 4 CoRoT-1 Star 1 Star 4Comp. Stars 1,2,3,4 2,3,4,13 1,2,3,8,13 Group B Group B Group B σ w (all) 7318 4582 4550 8265 8982 9283 σ r (all) 1225 1535 1691 2257 1451 5437 σ w (out) 6608 4462 4378 7446 8451 9329 σ r (out) 171 1609 1903 1527 1878 3688 Best-fit Model Results ( χ ) D e (ppm) 3237 +327 − − +513 − +495 − +375 − +436 − +542 − φ me . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . Nonzero params 15 16 16 16 16 16 σ w (residuals) 6544 5660 5505 7177 7951 7773 σ r (residuals) 125 0 606 0 0 646 Best-fit Model Results (BIC) D e (ppm) 3150 +383 − − +829 − +354 − +434 − +311 − +457 − φ me . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . . +0 . − . Nonzero params 15 10 9 12 8 8 σ w (residuals) 6411 5617 5498 7280 8056 7944 σ r (residuals) 0 0 781 0 335 522 Table 8: Independent Reference Set (Group B) Tests and Corrections φ me fixed to 0.5000 φ me fixed to 0.5022 Target Best-Fit D e ( χ ) Best-Fit D e (BIC) Best-Fit D e ( χ ) Best-Fit D e (BIC)CoRoT-1 4992 +430 − +699 − +426 − +375 − Star 1 3861 +332 − +280 − +452 − +306 − Star 2 1618 +268 − +340 − +290 − +696 − Star 3 101 +462 − +36 − +427 − +24 − Star 4 2003 +427 − +322 − +460 − +527 − Avg. 1-4 1784 +362 − +285 − +391 − +514 − Corrected 3208 +562 − +755 − +528 − +622 −
42 –Table 9: Benchmark Data Set AttributesName Type σ w σ r Obs. Win Samp. Rate D e φ me S1 Synth 1000 0 240 75 4000 0.500S2 Synth 1000 0 240 75 3300 0.475S3 Synth 1000 0 160 75 1000 0.500S4 Synth 1000 0 506 75 1600 0.500S5 Synth 1000 200 240 75 2000 0.500S6 Synth 1000 1000 240 250 4000 0.500S7 Synth 1000 200 318 25 2600 0.500R1 Star 2 6188 2239 293 141 4000 0.500R2 Star 0 7318 1225 293 141 1500 0.500R3 Star 0 7318 1225 293 141 0 0.500Table 10: Data Set S1: Synthetic Noise (Full Table Online)Phase O.Flux Error E.Shape N.Base W.Noise R.Noise0.42000 1.00452 0.00099 1.0040 1.00052 0.00052 0.000000.42053 1.00393 0.00099 1.0040 0.99993 -0.00007 0.000000.42107 1.00404 0.00099 1.0040 1.00004 0.00004 0.000000.42161 1.00460 0.00099 1.0040 1.00060 0.00060 0.000000.42214 1.00330 0.00099 1.0040 0.99930 -0.00070 0.00000... ... ... ... ... ... ...
Table 11: Data Set R1: Real Noise (Full Table Online)Phase O.Flux Error E.Shape N.Base O.Pos Airm. FWHM x Disp y Disp Sky0.43242 1.00245 0.00418 1.0040 0.99846 0 1.4050 10.6442 4.9667 -1.8051 -38.1740.43279 1.00067 0.00405 1.0040 0.99668 1 1.4023 10.6731 5.8475 -2.4504 35.4850.43294 1.00390 0.00411 1.0040 0.99990 1 1.4010 10.6792 5.8099 -2.4756 14.2240.43315 0.99956 0.00415 1.0040 0.99557 0 1.3995 10.6743 5.3403 -2.8960 -7.6870.43331 0.99974 0.00415 1.0040 0.99576 0 1.3983 10.6720 4.6221 -2.7726 -2.658... ... ... ... ... ... ... ... ... ... ... 44 –
A. Benchmark Data Sets
Given the biases and complications in detecting exoplanet eclipses, and the diversity ofroutines developed to make these measurements, there is a strong need for common referencedatasets containing realistic noise levels and patterns against which different teams can testthe performance of their routines. Such datasets are not commonly available; thus, weprovide here a set of models to serve this purpose for the exoplanet communityWe selected a number of representative light curves from our studies that can be usedas benchmark testing for any routines designed to fit and measure eclipses: ten datasets thatspan the full breadth of our tests, both the synthetic noise models (white and red noise) andthe real data systematics (from the CoRoT-1 Ks-band photometry). Various baselines from160 to 506 minutes and sampling rates from 25 to 250 points per hour are featured. Thedepth and central phase of the eclipses that are input are provided, so that any team cancheck the results from their analysis routine against them.Table 9 shows the characteristics of each dataset: the type of noise (synthetic or fromone of the real stars’ differential light curves), white and red noise amplitude σ w and σ r (inppm), observing window (in minutes), sampling rate (in points per hour), and the inputeclipse signal’s depth (in ppm) and mid-eclipse phase. For the synthetic datasets, σ w and σ r are the input values that we started with; for the datasets with real noise, these values weremeasured by fitting to Equation 1. These datasets are made available in machine-readableASCII format on the Vizier Data Service. The first seven sets, labeled S1 through S7, are synthetic light curves constructed inthe manner described in Section 4.1. Sets S1 through S4 contain white noise only, while S5through S7 also include red noise components (S5 and S6 use Pattern 3, S7 uses Pattern 4;see Figure 1). Sets S1 and S6 are also the data shown in panels (b) and (c), respectively, ofFigure 3. The first several lines of set S1 are shown in Table 10 as an example to illustratethe format and data types. The first three columns show the orbital phase at which eachobservation occurs, the observed flux, and the flux uncertainty. Only these three columnsare necessary for testing purposes; the remaining four columns simply separate the differentcomponents of the construction: the noiseless eclipse shape, the combined noise base, andthe white and red noise components. A greater number of significant digits are given in theonline tables.The three remaining datasets R1, R2, and R3 use the noise and systematics from the For the time being, they can be found at ;this is to be replaced later with Vizier url and citation, when the paper is accepted.
45 –Fig. 1.— Five red noise models we used for the synthetic light curve tests. Patterns 1,2, and 3 were constructed by adding together four sinusoidal waves of different periods,amplitudes, and phase angles. Patterns 4 and 5 were constructed by integrating a randomnormal distribution to form Brownian noise. Each selected noise model was normalized tothe desired red noise amplitude, and then combined with white noise and a simulated eclipsecurve to form a synthetic light curve. 46 –Fig. 2.— A comparison of the results from each of the two independently developed MCMCanalysis processes we tested on eight selected synthetic light curves. Identical sets of lightcurves, covering a range of eclipse depth, baseline length, sampling rate, and red noiseamplitude and pattern, were run through each analysis, in order to compare the results toeach other and to the input values. The x-positions of each result are offset for purposes ofclarity. The results from both MCMCs were consistent with each other, and are also shownin Table 3 along with a description of the light curves. 47 –Fig. 3.— Examples of construction of eclipse light curves: a 4000-ppm deep eclipse with nonoise at all (a), and with each of the three types of noise models (shown in red): White noiseonly (b), white noise plus red noise (c), and real noise systematics from NICFPS Ks-bandphotometry (d). For each case, we combined the eclipse shape and noise into a light curveto model with the MCMC and recover the best-fit depth and mid-eclipse phase. 48 –Fig. 4.— The behavior of the best-fit recovered eclipse signals as a function of input depth,for the tests with white noise only. The gray vertical bar shows the white noise level of σ w =1000 ppm. The accuracy is expressed as a percent difference between the input andrecovered depth. Each row of plots represents a different baseline length, and the colorsshow the different sampling rates. The plots in the left column use the χ informationcriterion, while those in the right column use the BIC. The x-positions of the symbols areslightly offset in order to distinguish the data points and error bars. Following each series ofpoints from left to right, the critical depth is defined as the point at which the depth percenterrors leave the ±
10% range for the second time. 49 –Fig. 5.— The behavior of the best-fit recovered eclipse signals as a function of input depth,for the tests with white noise and red noise (Pattern 3) added. The amplitudes of these noisepatterns ( σ w =1000 ppm, σ r =500 ppm) are shown by the vertical gray and red bars. Theaccuracy is expressed as a percent difference between the input and recovered depth. Eachrow represents a different baseline length, and the colors red, green, and blue represent thedifferent sampling rates. The plots in the left column use the χ information criterion, whilethose in the right column use the BIC. The x-positions of the symbols are slightly offset inorder to distinguish the data points and error bars. 50 –Fig. 6.— Comparison of the two different types of red noise models we used in these analyses:a sum of sinusoids (Pattern 3), and two Brownian random walks (Patterns 4 and 5, seeFigure 1), with amplitudes of σ w =1000 ppm, σ r =500 ppm, shown by the vertical gray andred bars. These tests all used the 240-minute observing window with a cadence of 250 pointsper hour. The accuracy is expressed as a percent difference between the input and recovereddepth. 51 –Fig. 7.— The light curve and trends from the real dataset (CoRoT-1b field in Ks). Thetop set of points (red and blue; each represents one of the two offset positions) shows thedifferential flux between Comparison Star 1 and the others in the field, to be used as realnoise for the tests in Sections 5 and 6. The points are in 15-minute bins for clarity. Below arecurves (normalized, not to scale with one another) showing the other parameters measuredto look for correlation. 52 –Fig. 8.— The depth recovery results from the combination of real noise and synthetic eclipsesof varying input depths. The dark green squares represent the results using χ , while thepink diamond points represent the results using the BIC. The measured white and red noiseamplitudes are indicated by the vertical gray and red bars, respectively. Each of the top sixpanels uses a different star as the target, while the The bottom row shows results from twolight curves – one from a well-behaved series (Star 2), and one from a poorly-behaved series(Star 4) – clipped to only 50 minutes before and after the eclipse. 53 –Fig. 9.— Comparison of the results from using the linear and quadratic models for depen-dence on x- and y-displacement, for both χ (top panel) and BIC (bottom panel). Each ofthese are performed using the Star 2 light curve, with synthetic eclipses of varying depthsinput. The measured white and red noise amplitudes are indicated by the vertical gray andred bars, respectively. 54 –photometry of the CoRoT-1 field in Ks. Set R1 treats comparison star 2 as the target andadds a 4000-ppm eclipse, modeled using the characteristics of “Exoplanet X” from Table 1.Refer to that table and Section 4.1 for all the other parameters that are needed to reconstructthe eclipse signal in light curve. This is the light curve shown in panel (d) of Figure 3; itis also equivalent to the eclipseless light curve 2-a described in Section 5.1 with a 4000-ppmeclipse shape added. Sets R2 and R3 use the light curve from the actual target star, with setR2 having an extra 1500-ppm eclipse added on top of the true signal. If a routine is workingproperly it should find an eclipse 1500 ppm deeper for R2 than for R3. Table 11 shows thefirst several lines of set R1. The data to fit is again in the first three columns, but the realnoise sets include the other variables – offset position, z , w , x , y , and ss