A new approach to observational cosmology using the scattering transform
DD RAFT VERSION J UNE
16, 2020Typeset using L A TEX twocolumn style in AASTeX63
A new approach to observational cosmologyusing the scattering transform
Sihao Cheng ( 程 思 浩 ), Yuan-Sen Ting ( 丁 源 森 ), Brice M´enard ( 梅 卜 瑞 ), and Joan Bruna ( 步 岩岸 ) Department of Physics and Astronomy, The Johns Hopkins University, 3400 N Charles Street, Baltimore, MD 21218, USA Institute for Advanced Study, Princeton, NJ 08540, USA Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544, USA Observatories of the Carnegie Institution of Washington, 813 Santa Barbara Street, Pasadena, CA 91101, USA Research School of Astronomy & Astrophysics, Australian National University, Cotter Rd., Weston, ACT 2611, Australia Courant Institute of Mathematical Sciences, New York University, New York Center for Data Science, New York University
Abstract
Parameter estimation with non-Gaussian stochastic fields is a common challenge in astrophysics and cosmol-ogy. In this paper we advocate performing this task using the scattering transform, a statistical tool rooted in themathematical properties of convolutional neural nets. This estimator can characterize a complex field, withoutexplicitly computing higher-order statistics, thus avoiding the high variance and dimensionality problems. Itgenerates a compact set of coefficients which can be used as robust summary statistics for non-Gaussian infor-mation. It is especially suited for fields presenting localized structures and hierarchical clustering, such as thecosmological density field.To demonstrate its power, we apply this estimator to the cosmological parameter inference problem in thecontext of weak lensing. Using simulated convergence maps with realistic noise, the scattering transformoutperforms the power spectrum and peak counts, and is on par with the state-of-the-art CNN. It retains theadvantages of traditional statistical descriptors (it does not require any training nor tuning), has provablestability properties, allows to check for systematics, and importantly, the scattering coefficients are inter-pretable. It is a powerful and attractive estimator for observational cosmology and, in general, the study ofphysically-motivated fields.
1. Introduction
Non-Gaussian fields are ubiquitous in astrophysics. Ana-lyzing them is challenging as the dimensionality of their de-scription can be arbitrarily high. In addition, there is usuallylittle guidance on which statistical estimator will be most ap-propriate for parameter inference. In this paper, we advocateusing a novel approach, called the scattering transform (Mal-lat 2012) for the analysis of such fields and, in particular,the matter distribution in the Universe, a highly studied non-gaussian field.In many fields of astrophysics and, in particular, in cosmol-ogy, extracting non-Gaussian information has been attemptedthrough N -point correlation functions. (e.g., Bernardeauet al. 2002; Takada & Jain 2003; Semboloni et al. 2011; Fuet al. 2014) and polyspectra, their Fourier equivalents (e.g.,Sefusatti et al. 2006). However, being high powers of the [email protected] input field, these statistics suffer from an increasing varianceand are not robust to outliers in real data, making them gradu-ally less informative (Welling 2005). In addition, the numberof configurations to consider for N -point functions explodeswith the number of points used. As a result, information ishighly diluted among coefficients, and it becomes a real chal-lenge to efficiently extract information with N -point corre-lation functions. Other methods, including using topologi-cal properties such as Minkowski functionals (Mecke et al.1994; Hikage et al. 2003; Shirasaki & Yoshida 2014; Kra-tochvil et al. 2012), and using biasing properties such ascounts of clusters, peaks, and voids (Jain & Van Waerbeke2000; Marian et al. 2009; Kratochvil et al. 2010; Liu et al.2015a,b; Pisani et al. 2019), have also been considered. How-ever, in the cosmological context, these excursions into non-Gaussian signal analyses have had limited impact in improv-ing existing constraints on cosmological parameters so far.Recently, convolutional neural nets (CNNs, e.g., LeCunet al. 1998) have claimed supremacy in a wide variety ofapplications aimed at extracting information from complex a r X i v : . [ a s t r o - ph . C O ] J un C HENG , T
ING , M ´
ENARD & B
RUNA data. They have also shown promises to efficiently retrievecosmological information well beyond second-order statis-tics (e.g., Gupta et al. 2018; Ribli et al. 2019a,b). While thepotential of this method is enormous, it also comes with anumber of issues. To precisely and robustly estimate cos-mological parameters, CNNs require a large training set. Inaddition, when applied to real data, systematic errors not in-cluded in the training process of CNN can hardly get checkedand controlled, whereas for traditional statistics, a simple χ test can do so. As such, the use of CNNs in real data comeswith limitations regarding interpretability and validity.In this paper, we advocate using a different approach calledthe scattering transform to efficiently and robustly extractstatistical information from non-Gaussian fields . The setof operations performed by a scattering transform has closesimilarities with those built in CNNs, but the scattering trans-form does not require any training and, like traditional statis-tics, it generates coefficients with proved properties. It cantherefore hopefully overcome the aforementioned limitationsencountered with CNNs. In section 2, we introduce the scat-tering transform, present intuitive understanding of its coef-ficients, and visualize its key properties. In section 3 and 4,we demonstrate the power of the scattering transform to in-fer cosmological parameters ( Ω m and σ ) in the context ofweak lensing using simulated convergence maps. As we willshow, it outperforms the power spectrum and peak counts,and is on par with the state-of-the-art CNN. Finally, we com-ment on the attractive properties of the scattering transformin Section 5 and conclude in Section 6.
2. The scattering transform
In this section, we present the scattering transform and in-tuitive interpretations of its coefficients. The scattering trans-form generates a compact set of coefficients that capture sub-stantial non-Gaussian information beyond the power spec-trum. In contrast to N -point correlation functions, the scat-tering coefficients are all proportional to the input data, anddo not suffer from the increasing variance issue. Thus, thescattering coefficients, which form a representation of the in-put field, can be used to extract non-Gaussian informationefficiently and robustly. This is particularly attractive from adata analysis point of view. The scattering transform was originally proposed by Mal-lat (2012) as a tool for signal processing to extract informa-tion from high-dimensional data with provable stability prop-erties. The scattering coefficients can be calculated analyti- This work was done simultaneously and independently of that presentedin Allys et al. (2020), where the authors apply a different but related tech-nique, the wavelet phase harmonic, to slices of the matter density field. cally and possess many attractive properties including trans-lational invariance, non-expanding variance, and linearizeddependence on deformation (Mallat 2012). Interestingly, thescattering transform has also provided key insights into deci-phering the remarkable behavior and performance of CNNs.A counterintuive feature of CNNs is that the convolution,though restricting the flexibility of the neural net, dramati-cally boosts its performance on many types of data. In ad-dition, a given CNN architecture can often be re-purposedfor very different tasks. These facts suggest that a certainmathematical structure enables efficient information extrac-tion from a wide range of complex data. Understanding thisstructure may dramatically simplify the costly training pro-cess required when using neural networks.The scattering transform has been successfully used inmany areas, including audio signal processing (Anden &Mallat 2011, 2014), image classification (Bruna & Mallat2012), texture classification (Sifre & Mallat 2013), materialscience (Hirn et al. 2017; Eickenberg et al. 2018; Sinz et al.2020), multifractal analysis in turbulence and finance (Brunaet al. 2015) or graph-structured data (Gama et al. 2018). Sev-eral of these examples reached state-of-the-art performancecompared to the CNNs in use at the time. In astrophysics, apioneer application has been performed by Allys et al. (2019)to analyze the interstellar medium.The scattering transform can be used with two possiblegoals in mind: representing a specific realisation of a field(with a classification goal) or characterising the global sta-tistical properties of the field. The narratives in these tworegimes are slightly different (Mallat 2012; Bruna & Mallat2012). We will focus on the latter one, which is relevant tocosmological applications.
Here, we present the formulation of the scattering trans-form in the context of characterizing random fields (Mallat2012). We focus on the 2-dimensional case in this study,but it can be directly generalized to any other dimensionality.For clarity, we will attach the notation ( x, y ) for the spatialdependence of a field only when it is first introduced.To extract information from an input field, the scatteringtransform first generates a series of new fields by recursivelyapplying two operations: a wavelet convolution and a modu-lus. The expected values of all these fields are then defined asthe scattering coefficients and used to characterize statisticalproperties of the original field. This hierarchical structure,the use of convolution and modulus (a non-linear operator)are all elements found in the architecture of CNNs.Mathematically, given an input field I ( x, y ) , the scatter-ing transform generates a set of 1st-order fields I ( x, y ) byconvolving it with a family of wavelets ψ j,l ( x, y ) and then BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM Figure 1.
Illustration of the scattering transform on a weak lensing map. The azimuthal resolution is set to be L = 4. For clarity, we only showresults using wavelets with orientation indices l = 1 and l = 1, and several selected scale indices j and j . In the top left corner of each panel,we show the mean value of that field. They are the scattering coefficients ( S , S , S ) of the input field. For a convenient display, the bluenumbers are 10 times the coefficients derived from the lensing map. E.g., the S coefficient of this lensing map is actually 0.00267. The colorbar ranges for I , I , I fields are adjusted separately for better visualization. C HENG , T
ING , M ´
ENARD & B
RUNA
Figure 2.
Upper panel : profile of a Morlet wavelet ( j = 6, l = 0,image size 512 ×
512 pixels) in the real space and another one ( j =1, l = 1) in Fourier space. The center of the Fourier space representszero frequency. Lower panel : radial frequency profiles of a fam-ily of wavelets. Dilating/contracting (by factor of 2) and rotating(by π /L) one wavelet give the whole family of wavelets used in thescattering transform. taking the modulus: I ≡ | I (cid:63) ψ j ,l | , (1)where I represents a group of fields labelled by the waveletindex j , l . This process is illustrated in the first two rows ofFigure 1 with a weak lensing convergence field. Wavelets arelocalized oscillations and band-pass filters. Figure 2 showsthe profile of Morlet wavelets in real space and Fourier space.Morlet wavelets are used in our study and described in Ap-pendix A. In general, a family of wavelets covers the wholeFourier space. They all have the same shape but differentsizes and orientations, labelled by j and l respectively. Theycan all be generated through dilating and rotating a prototypewavelet. In the scattering transform, the convention is to usea dilation factor of 2, such that for a pixelized field, the sizeof a wavelet ψ j,l in the real space is around j pixels.Having created a set of 1st-order fields, one can then iteratethe same process to create 2nd-order fields I ( x, y ) : I ≡ | I (cid:63) ψ j ,l | = || I (cid:63) ψ j ,l | (cid:63) ψ j ,l | , (2)where I represents a group of fields labelled by the two setsof wavelet index j , l and j , l . An illustration of thesefirst two orders of scattering transform is shown in Figure 1.Higher-order fields can be created with further iterations, butwe will only use the scattering transform up to 2nd order in this study. We note that the iterations are not commutative,so maintaining the order of wavelets is important.If the input field I is homogeneous , then all the gener-ated fields I n remain homogeneous. Therefore, the expectedvalues of their intensity can be used as translation-invariantdescriptors of the input field: S ≡ (cid:104) I (cid:105) (3) S j ,l ≡ (cid:104) I j ,l (cid:105) = (cid:104) (cid:12)(cid:12) I (cid:63) ψ j ,l (cid:12)(cid:12) (cid:105) (4) S j ,l ,j ,l ≡ (cid:104) I j ,l ,j ,l (cid:105) = (cid:104) (cid:12)(cid:12)(cid:12)(cid:12) I (cid:63) ψ j ,l (cid:12)(cid:12) (cid:63) ψ j ,l (cid:12)(cid:12) (cid:105) . (5)These expected values S n are called the n th-order scatteringcoefficients. Due to homogeneity, these expected scatteringcoefficients can be estimated by taking the spatial average ofa single realization: ˆ S n = (cid:104) I n (cid:105) x,y , (6)where ˆ S n is an unbiased estimator of S n and (cid:104)(cid:105) x,y representsthe spatial average of a field. The number of scattering coefficients is determined by thenumber of wavelets used. Setting J different scales ( J can-not exceed the side length of the field) and L different ori-entations results in J × L different wavelets used in total. Ifall combinations of wavelets are used, then the total numberof coefficients at the n th-order will be J n L n . When consid-ering an isotropic field, which is the case of interest in cos-mology, the scattering coefficients can be further reduced. Toconstruct isotropic statistics, we simply average over all ori-entation indices, which reduces the number of coefficients byan order of L n and creates a more compact and robust set ofstatistical descriptors. We thus define our reduced scatteringcoefficients as s ≡ S (7) s j ≡ (cid:104) S j ,l (cid:105) l (8) s j ,j ≡ (cid:104) S j ,l ,j ,l (cid:105) l ,l , (9)where S n represents the standard scattering coefficients, s n represents our reduced coefficients, and (cid:104)(cid:105) l denotes an aver-age over orientation indices. The reduced coefficients s n canalso be understood as the expected value of some ‘reduced’fields (cid:104) I n (cid:105) l ,..,l n , which are the average of I n with differentwavelet orientation indices l , .., l n . We illustrate these ‘re-duced’ fields in Figure 3, where information is condensed, Following the convention in cosmology, we use the word “homogeneous”to mean “stationary”, the latter of which is more commonly used in math-ematics. To follow the convention in cosmology, we use slightly different notationsfrom Mallat (2012): we use S n to represent the expected values, whichcharacterize properties of a random field and which Mallat denotes as S n ;we use ˆ S n to represent S n ’s estimators calculated from spatial average,which Mallat directly denotes as S n . BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM J + J coefficients. As a result, probing the fullrange of scales for an image with 512 ×
512 pixels ( J = 8)yields in total 73 reduced scattering coefficients.In general, performing azimutal averages over all l and l leads to information loss. To preserve more isotropicinformation, one could include l − l as an index of thereduced 2nd-order scattering coefficients (Bruna & Mallat2012; Allys et al. 2019) or apply the scattering strategy againto rotation (Sifre & Mallat 2013). In the weak lensing studypresented below, however, this additional information is notimportant as the fields considered do not present substantialanisotropic structures. We do not take it into account as itdoes not improve the performance of our analysis.Having introduced the mathematical formulation, we willpresent in the next section some intuitive understanding ofthe key operations involved in the scattering transform. The core operation I → | I (cid:63) ψ j,l | employed by the scat-tering transform comprises two steps: a convolution by acomplex-valued wavelet and a modulus operation. In short,the wavelet convolution selects scales, and the modulus con-verts oscillations into their local strength.Let us discuss the wavelet convolution first. As a waveletis a band-pass filter, the wavelet convolution selects Fouriermodes around a central frequency and coarsely separates in-formation of different scales. Due to the locality of wavelets,which is related to their logarithmic spacing and widths inFourier space, the scattering coefficients are Lipschitz con-tinuous to deformation, meaning that similar fields differingby a small deformation are also similar in the representationformed by scattering coefficients (Mallat 2012), and there-fore the scattering characterization is a stable one. Fouriercoefficients, in contrast, are unstable to deformation in highfrequencies.Another key idea of the scattering transform is to generatefirst-order statistics, in contrast to higher-order moments andcorrelation functions, which multiply an increasing numberof field intensities and cause instability to outliers. Being alinear operator, the wavelet convolution certainly keeps the“first-order” property. However, for a homogeneous randomfield, convolution alone cannot extract information beyondthe mean of the original field (cid:104) I (cid:105) , because the expected valueoperator commutes with all linear transformations. Extract-ing more information requires non-linear operations. For ex-ample, N -point correlation functions use multiplication offield intensities as the non-linear operation. The scatteringtransform, on the other hand, employs the modulus operation, which is a natural choice to preserve the desired property ofworking with first-order statistics (Mallat 2012).As the modulus is taken in the real space and is non-linear,its behavior in Fourier space is not simple. Nevertheless, wecollected some intuitive understandings and present them inAppendix B for interested readers. The scale separation and phase removal in the scatteringtransform remind us of the power spectrum, which can be de-fined in a similar form to the 1st-order scattering coefficients S = (cid:104)| I (cid:63) ψ |(cid:105) , P ( (cid:126)k ) ∝ (cid:104)| I (cid:63) ψ (cid:48) | (cid:105) with ψ (cid:48) = e − i(cid:126)k · (cid:126)x . (10)The differences between the two estimators S and P ( k ) arejust the choice of the norm (L versus L ) and the convo-lution kernel ( ψ or ψ (cid:48) ). Therefore, the 1st-order scatteringcoefficients are qualitatively similar to the power spectrum.Both of them characterize the distribution of fluctuations, orthe strength of clustering, as a function of scale.However, in the case of the power spectrum, the convo-lution kernel (a sine function) is completely de-localized inreal space. Thus, the power spectrum’s version of I fields( | I (cid:63) ψ (cid:48) | ) lose all spatial information. In contrast, the useof localized wavelets in the scattering transform allows I to preserve spatial information, as shown in Figure 1 and 3.According to the analogy with the power spectrum, the meanof an I field characterizes the average amplitude of Fouriermodes selected by the wavelets, whereas the spatial distri-bution of fluctuations in I , missing in the power spectrumanalog, in turn encodes the phase interaction between thoseFourier modes. This information can be extracted by apply-ing the scattering operations once again, I → I = | I (cid:63) ψ | ,and then measuring the 2nd-order scattering coefficients S .Following the power spectrum analogy, S coefficients re-semble the power spectrum of I fields and measure cluster-ing properties on I . Because I fields highlight the regionswhere fluctuation around a scale is strong (or where cluster-ing happens around a scale), the 2nd-order coefficients canbe understood as measuring the clustering of structures high-lighted in I , i.e., the “clustering of (clustered) structures”.This leads to an interesting intuition: we need two pointsto describe the scale of one structure and an additional twopoints for another one. Therefore, the second-order scatter-ing coefficients S , which measure the clustering of sizedfeatures, include information up to the classical 4-point cor-relation function. In general, an n th-order scattering co-efficient S n will contain information up to n -point func-tion of the input field. This intuition is consistent with themathematical derivation by Mallat (2012). By this “hierar-chical clustering” design, the scattering transform expansionquickly includes information from higher-order statistics. C HENG , T
ING , M ´
ENARD & B
RUNA
Figure 3.
The scattering transform of three fields ( I ) with indistinguishable power spectra. We have averaged over angular indice l and l .Row 1 shows a realization of convergence maps in cosmology (Ω m , σ ) = (0.292, 0.835), row 2 shows cosmology (Ω m , σ ) = (0.566, 0.520),row 3 is for a Gaussian random field with the same (2D) power spectrum as row 1. It can be seen by eye that the intensity of the last column (cid:104) I (cid:105) l ,l , which corresponds to an s coefficient and measures the clustering strength of structures selected by I , is significantly different fromeach other, while their power spectra are indistinguishable.
3. Application to weak lensing cosmology
We now show that the scattering transform can be a power-ful tool in observational cosmology to extract non-Gaussianinformation from the matter density field. To illustrate thispoint, we consider an application with 2-dimensional fields:we show how well cosmological parameters can be con-strained using the scattering coefficients of weak lensing con-vergence maps κ ( (cid:126)θ ) or, equivalently, measurements of cos-mic shear. Being projections of the density field along theline-of-sight, these maps present an appreciable level of non-gaussianities on scales smaller than a few degrees, reflectingthe non-linear growth of matter fluctuations. For the nec-essary background on cosmology with gravitational lensing,we refer the reader to reviews (Kilbinger 2015; Mandelbaum2018).We explore the use of our reduced scattering coefficientson simulated weak lensing convergence maps to infer Ω m and σ and compare their performance with that of the powerspectrum. We also compare our results with that of a state-of-the-art CNNs by Ribli et al. (2019b) and peak count statistics. We use mock convergence maps generated by theColumbia Lensing team and described in Zorrilla Matillaet al. (2016) and Gupta et al. (2018). The maps are pro-duced through ray-tracing to redshift z = 1 in the output ofdark-matter-only N -body simulations for a set of Λ -CDMcosmologies. These cosmologies differ only in two param-eters: the present matter density relative to the critical den-sity Ω m , and a present normalization of the power spectrum σ . Other cosmological parameters are fixed: baryon density Ω b = 0.046, Hubble constant h = 0.72, scalar spectral index n s = 0.96, effective number of relativistic degrees of freedom( n eff = 3.04), and neutrino masses ( m ν = 0.0). The dark en-ergy density is set so that the universe is spatially flat, i.e., Ω Λ = − Ω m .For each cosmology, there are 512 convergence maps with3.5 × field of view, allowing us to sample cosmicvariance. These maps were also used by Ribli et al. (2019b).To compare our results to Ribli et al. (2019b), we use the http://columbialensing.org BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM pixelmaps to a 512 resolution with 0.41 arcmin per pixel. In practice, convergence or shear estimates are obtainedfrom measurements of galaxy shapes, with a level of noisethat depends on the galaxy ellipticity distribution and theirnumber density on the sky. To first order, background galax-ies used for shear measurements have a wide range of red-shifts and are not correlated. The noise can be well approx-imated as Gaussian white noise. Its contribution to the con-vergence maps can be modelled (van Waerbeke 2000) as σ noise = σ (cid:15) n g A pix , (11)where σ (cid:15) is the intrinsic variance of ellipticity of galaxies,which is taken to be 0.4 , n g is the number density of back-ground galaxies, A pix is the area per pixel, which is 0.1682arcmin . For some existing and on-going surveys such asCFHTLenS, KiDS450, and DES, n g is around 10 arcmin − (Kilbinger et al. 2013; Abbott et al. 2018); for some upcom-ing surveys we expect substantially higher densities: n g ∼
25 arcmin − for LSST , n g >
30 arcmin − for Euclid , and n g ∼ − for the planned WFIRST survey. forfuture optimal case, n g ∼
100 arcmin − .After adding noise, we also smooth the maps. As the powerof Gaussian white noise is distributed mostly at high frequen-cies, smoothing the convergence maps can help to increasethe signal-to-noise of specific estimators. By default, we donot smooth the noiseless maps, and we perform a σ = 1 ar-cmin (2.44 pixel) Gaussian smoothing on noisy maps. : For each 3.5 × conver-gence field in each cosmology, we apply the scattering trans-form up to 2nd order using the ‘kymatio’ python package (Andreux et al. 2020) and then calculate the reduced coef-ficients ( s , s , s ) as defined in Section 2.2. To probe theavailable range of scales, we set J = 8 and L = 4 in thescattering transform, i.e., we use 8 scales spaced logarith-mically with central wavelengths between 1.2 arcmin and 75arcmin and 4 azimuthal orientations, resulting in 32 differentwavelets used in total.By default, the ‘kymatio’ package only calculate the 2nd-order coefficients with j > j , because the coefficients with j ≤ j is mainly determined by the property of wavelets but https://sci.esa.int/euclid https://wfirst.gsfc.nasa.gov not the input field, as illustrated by the upper-right sketch ofFigure 4. The mathematical reasoning for this property canbe found in Appendix B. To demonstrate these coefficients’behavior, we modified the ‘kymatio’ code to calculate them,and show them together with the coefficients with j > j in Figure 4. Nevertheless, we checked that they do not con-tribute to constraining cosmological parameters, and there-fore in our inference analysis, we only use 2nd-order coef-ficients with j > j , which yields an even more compactset of 1 + 8 + 28 = 37 scattering coefficients used for ourcosmological inference. Power spectrum : For the same set of input fields, we alsocompute the power spectrum and peak count statistics us-ing the publicly available ‘LensTools’ python package (Petri2016). The power spectrum is calculated within 20 bins in therange 100 ≤ l ≤ Peak count : In our analysis, a peak is defined as a pixelwith higher convergence ( κ ) than its eight neighbors. Then,peaks are binned by their κ values and counted in each bin.We adopt a binning similar to that in Liu et al. (2015a). Weuse 20 bins in total, including 18 bins linearly spaced be-tween κ = –0.02 and 0.12, one bin for peaks below –0.02,and one bin above 0.12. For reference, κ = 0.12 correspondsto a significance of peak ν ≡ κ/σ noise around 7 when n g = 30.Although using more bins for very high peaks ( κ > s , s , and power spectra must be positive for a non-trivialfield, we consider their logarithm to better satisfy a multi-variate Gaussian likelihood. To perform the cosmological in-ference analysis with the three methods introduced above, weuse •
37 scattering coefficients •
20 power spectrum coefficients •
20 peak count coefficients. https://lenstools.readthedocs.io C HENG , T
ING , M ´
ENARD & B
RUNA
Figure 4.
Upper-left Panel : The fiducial cosmology (black) and two other cosmologies on the ( Ω m , σ ) plane. Upper-right Panel : Illustrationof reduced scattering coefficients s ( j ) and s ( j , j ) for a single j scale. Lower Panel : The power spectrum and scattering coefficients forthe three cosmologies in noiseless case. The first row presents coefficients in the fiducial cosmology, and the second row shows changes ofcoefficients ( ∆ coef.) relative to the fiducial one when we move to the two other cosmologies. Error bars and gray shaded regions show cosmicvariance, i.e., the intrinsic variability among realizations. The 1st-order scattering coefficients behave similarly to the power spectrum, whilethe 2nd-order scattering coefficients can break the Σ degeneracy, along which non-Gaussianity of weak lensing field changes.
4. Results
In this section, we examine the distribution and cosmo-logical sensitivity of scattering coefficients, and present theirconstraining power for two cosmological parameters, Ω m and σ . We show that the scattering coefficients provide sub-stantially more information than the power spectrum and ison a par with CNN. In Figure 4, we present the distributions of reduce scatter-ing transform in the noiseless case together with the powerspectrum. In the first row, we show the values for a fiducialcosmology that has the Planck cosmology of Ω m = 0.309and σ = 0.816 (Planck Collaboration et al. 2016). The ex-pected values of these descriptors are estimated by averaging over different realizations of a given cosmology. Error bars,which are the sample standard deviation of the realizations,represent the cosmic variance in this noiseless case. We cansee the similarity between the power spectrum and s coeffi-cients, as they have similar physical meanings (Section 2.4).We can also see the different behaviors of s coefficients for j < j and j > j , as discussed in Section 3.3.Then, we investigate the cosmological sensitivity of thepower spectrum and scattering coefficients. The power spec-trum is known to be mostly sensitive to one combination ofthe cosmological parameters, namely Σ ≡ σ (cid:18) Ω m . (cid:19) a , (12)with a around 0.6 (e.g., Kilbinger 2015), but can hardlydistinguish cosmologies with the same Σ , as illustrated in BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM Σ degeneracy. Gray ar-eas indicate cosmic variance of the fiducial cosmology. Asexpected, the 1st-order scattering coefficients show a cosmo-logical sensitivity similar to that of the power spectrum, be-cause both of them measure the strength of fluctuations as afunction of scale.The 2nd-order scattering coefficients, on the other hand,characterize the spatial distribution of sized fluctuations. Tomake the 2nd-order scattering coefficients less correlatedwith the 1st-order ones, here we present de-correlated 2nd-order coefficients s /s , as each s ( j , j ) is proportionalto the corresponding s ( j ) according to their definitions(Bruna et al. 2015). These s /s exhibit particularly highsensitivity to cosmological change along the Σ degeneracy.In addition, they are indifferent to the other direction of cos-mological change, which means they provide a piece of in-formation roughly orthogonal to that carried by the 1st-ordercoefficients s or the power spectrum. In noisy cases, thoughthe information from s /s is not orthogonal to s anymore,we have checked that they still provide substantial sensitivityalong the Σ degeneracy. Due to this additional sensitivity,the scattering transform can be used to better constrain cos-mological parameters than the power spectrum. We now present the cosmological constraints set by thescattering coefficients measured from a single 3.5 × field. For reference, we note that LSST will generate about2,000 times more data, leading to constraints about timestighter than the numbers presented below. In this study, weonly probe the constraints on Ω m and σ and leave the workof using scattering coefficients to constrain the dark energyequation of state parameter w or neutrino mass M ν to futurestudy. Cosmological inference is just another aspect of thecosmological sensitivity problem examined in the previoussubsection. The Fisher inference formalism we use in thisstudy is described in Appendix C.We first present results in the noiseless case. In Figure 5we demonstrate the 1 σ Fisher forecast of Ω m and σ usingall scattering coefficients (red ellipse) and power spectrum(gray ellipse). The scattering coefficients provide a dramat-ically tighter constraint than the power spectrum. We alsoshow a break-down of this constraining power into contri-butions from 1st-order (blue ellipse) and 2nd-order (orangeellipse) coefficients alone. As expected, the 1st-order coeffi-cients ( s ) and power spectrum set similar constraints. Theslight difference of ellipse orientation originates from the dif- Figure 5.
The 1 σ Fisher forecast of cosmological parameters froma 3.5 × noiseless convergence map with 0.41 arcmin /pixel resolution. The de-correlated 2nd-order scattering coefficients s /s provide critical information to break the Σ degeneracy alongwhich the power spectrum cannot distinguish, therefore drasticallyimprove the constraint. ference between the L and L norms used by the scatteringtransform and the power spectrum. The de-correlated 2nd-order scattering coefficients ( s /s ) provide a strong con-straint along the Σ degeneracy, consistent with our cosmo-logical sensitivity discussion in Section 4.1.We note that the 0th-order coefficient s (the mean of a3.5 × field) has strong correlation with other scatter-ing coefficients, which is a sign of being in the non-linearregime and having non-Gaussianity (e.g., Li et al. 2020).Therefore, although the expected value of s is identicallyzero in all cosmology, combining s with other coefficientshelps substantially to tighten the constraints on cosmologi-cal parameters. However, this piece of information may notscale as fast with the increasing field of view as small scaleinformation, because in real data each patch of 3.5 × fields on the sky are not independent.To be more quantitative, we compare different methods us-ing the reciprocal of the area of their 1 σ Fisher forecast el-lipses on the ( Ω m , σ ) plane as the figure of merit (FoM). Inthe noiseless case, combining all scattering coefficients ( s , s , s ) leads to a constraint that is 14 times tighter than thatof the power spectrum, 5 times tighter than peak count statis-tics, and 3.3 times tighter than the joint constraint from powerspectrum and peak count.0 C HENG , T
ING , M ´
ENARD & B
RUNA
Table 1.
Comparison of the constraining power for ( Ω m , σ ) between different methods, with a single 3.5 × convergence map.Methods Ω m – σ Figure of Merit a n g = 10 arcmin − n g = 30 arcmin − n g = 100 arcmin − noiseless noiseless (no smoothing)scattering transform: s + s + s
50 140 329 1053 3367scattering transform: s + s
21 55 133 492 565scattering transform: s + s
39 91 181 446 1720power spectrum P( l )
20 40 67 104 253peak count 30 89 162 170 667CNN (Ribli et al. 2019b) (44) (121) (292) (1201) ( - ) a The figure of merit is defined as the reciprocal of the 1 σ confident area based on Fisher matrix (or the 68% posterior contour, in parentheses)on the ( Ω m , σ ) plane. The convergence maps are smoothed with σ = 1’ Gaussian filter except for the case shown in the last column with nosmoothing. Figure 6.
The 1 σ Fisher forecast of cosmological parameters ( Ω m and σ ) from different descriptors of a 3.5 × convergence mapsmoothed with σ = 1’ Gaussian filter. The scattering coefficients have comparable performance as a state-of-the-art CNN (Ribli et al. 2019b) atall noise levels, and 3–5 times better than the power spectrum depending on the noise level. We then compare the performance of the scattering trans-form to a state-of-the-art CNN analysis by Ribli et al.(2019b). To perform a meaningful comparison, as done bythese authors, we use noiseless convergence maps convolvedwith a σ = 1 arcmin Gaussian smoothing. Interestingly, wefind that the scattering coefficients extract a similar amountof cosmological information to the CNN trained in Ribli et al.(2019b). The corresponding figures of merit are shown in Ta-ble 1.We now consider convergence fields in the presence ofgalaxy shape noise. As the noise level increases, small-scalestructures, which carry plenty of cosmological information,get erased. As a result, the constraining power of the scat-tering coefficients (as well as other methods) degrades. In Figure 6 we show the Fisher forecast of Ω m and σ from a3.5 × convergence map under three noise levels, us-ing the scattering coefficients and the power spectrum. Wealso show the posterior constraints from CNNs trained by(Ribli et al. 2019b) on the same simulations. The figuresof merit for these methods, together with the peak countmethod, are listed in Table 1. Again, we find that the scat-tering transform not only outperforms the power spectrumand peak count, but also provides cosmological constraintson a par with state-of-the-art CNNs.To summarize: we have demonstrated the power of thescattering transform for cosmological parameter inferencewith weak lensing data. For simplicity, we focused on theconvergence field but a similar analysis can also be per- BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM
5. Discussion
When performing inference, we usually need to estimatea set of physical parameters underlying a field from a finitenumber of realizations. Ideally, one would like to use thelikelihood function of the field itself but this is often out ofreach. It is typically not analytically tractable or too costlyto sample through numerical simulations. Therefore, for theinference problem to be feasible, it is reduced towards a setof statistical descriptors, rather than the field itself. In otherwords, it requires a ‘statistical’ compression of the data.In the case of Gaussian random fields, we know that thepower spectrum provides a sufficient statistic, i.e. a loss-less compression of information. However, when there isno known parametric statistical model of the field, estimat-ing valid probability distributions in the high-dimensionalregime is a challenging task. A powerful framework is givenby maximum-entropy regularised estimation, which looks forthe most ‘non-committal’ statistical model under the con-straints of a ‘feature vector’ of sufficient statistics (Jaynes1957). There is thus a tension in the design of such vectorof sufficient statistics (Bruna & Mallat 2019): On the onehand, one would like the features to be efficiently estimatedfrom the available samples, so that the corresponding statisti-cal model is robust under resampling. In other words, typicalsamples from the true distribution should remain typical un-der the estimated statistical model. On the other hand, thefeatures should be descriptive enough so that they introduceenough constraints, i.e., typical samples from the estimatedmodel should also be typical under the true distribution.Viewed in this direction, modern overparametrised CNNssearch for the most informative compression and representa-tion to the inference problem through a training optimization,at the expense of defining brittle statistical models (Bruna &Mallat 2019). Traditional statistical approaches, on the otherhand, design the compression/representation based on someproperties of the field expected from domain knowledge. Forexample, the peak count statistic is used in weak lensing asconvergence maps display distinct halos. N -point correlationfunctions, closely related to perturbation theory and conve-nient for analytical prediction, include/exclude dimensionsof variations by truncating a series expansion, which makesthem good descriptors for fields slightly deviating from a Figure 7.
Dependence of the ( Ω m , σ ) constraints with differentmethods on galaxy shape noise. The figure of merit (FoM) is definedas the 1 σ confident area on the ( Ω m , σ ) plane. Note that the CNNresult (Ribli et al. 2019b) is reported in terms of posterior, whileothers are Fisher forecast. We have checked that for the scatteringtransform, the posterior FoM is only slightly lower than the Fisherforecast reported here and almost identical to the CNN result. Gaussian one. However, a highly non-Gaussian field requiresusing larger N . As the number of coefficients and the com-plexity of configurations increase rapidly with N , N -pointcorrelation functions quickly become an inefficient or non-robust representation of the original field.In physics, we are actually not interested in performing in-ference with any non-Gaussian field, but only the subset offields that are generated by physical interactions. Such fieldstypically display localized, coherent structures. These struc-tures also act as bricks and then form larger structures. So, inorder to characterize them, it is important to take these prop-erties into account. One needs an appropriate or “tailored”representation rather than a fully generic one. As we willshow in the next section, the design and operations of thescattering transform leads to an efficient and robust represen-tation for such fields. Efficiency: two binning strategies make the scatteringtransform efficient.(i) Spatial/frequency binning: the use of wavelets balancesthe resolution in real and frequency space. As a result, thescattering transform can capture localized information froma large range of scales with a compact set of coefficients. Forexample, in our case, the weak lensing information is com-2 C
HENG , T
ING , M ´
ENARD & B
RUNA pressed into 37 coefficients, a number that is much smallerthan typical bi-spectrum descriptors, while achieving state-of-the-art CNN-like constraint on cosmological parameters.(ii) High-order information binning: after selecting struc-tures of given sizes through wavelet convolution and modulusoperations in the 1st order, the scattering transform then se-lects structures ‘assembled’ by these 1st-order structures inthe next order. This hierarchical design allows the n th or-der scattering coefficients to quickly include information of n -point functions.These two binnings do not lose much information because(i) interesting phenomena in physics often appear in multiplescales, not all squeezed at one scale, and (ii) in physical worldstructures do often form hierarchically. Robustness: (i) As opposed to the N -point correlation function ap-proach, which requires multiplying an increasing number offield fluctuations and causes high variance, the scatteringtransform is a first-order statistic which avoids amplifyingthe process variability. As a result, it yields low variance de-scriptors and is less sensitive to outliers.(ii) The locality of wavelets, which is related to their loga-rithmic spacing and widths in frequency space, introduce sta-bility to deformations, a desired property of robust descrip-tors which classical N -point functions do not have. Interpretability: as discussed in Section 2.4, the scatter-ing coefficients have an intuitive interpretation. They de-scribe clustering properties of the field in the following way:(i) The 1st-order scattering coefficients are similar to acoarsely binned power spectrum, which characterize the clus-tering strength at different scales j . As the scattering trans-form uses an L norm as opposed to an L norm, the ra-tio between s coefficients and the power spectrum providesa measure of sparsity of the field. This explains why inFigure 5 the constraints from 1st-order coefficients and thepower spectrum are slightly different, and just combiningthese two can also provide a stronger constraint on cosmol-ogy than using power spectrum alone.(ii) The 2nd-order scattering coefficients characterize theclustering strength of j -scale structures separated by j -scales. In other words, these coefficients characterize the“clustering of structures”, selected over a given frequencyrange. Their departure from their Gaussian counterparts isa robust measure of the strength of non-gaussianities. The scattering transform and CNNs share a number ofproperties. Both of them have hierarchical layers with con-volutions and non-linear operations. The flexibility of CNNsleads to a higher performance for fields with much highercomplexity, such as images from the world around us, but atthe expense of generating less robust statistical models. The cosmological density field is not as complex as random im-ages of rabbits. The simplicity of the structures encounteredin weak lensing maps and their regularity as a function ofscale, imply that a descriptor using convolutions by a sim-ple wavelet can extract a substantial amount of information.As shown by Ribli et al. (2019b), a CNN trained on conver-gence maps internally generates convolution filters similar tothe Morlet wavelet.While the generation of these CNN filters requires a largenumber of simulations as a training set, the scattering trans-form which has preset filters does not require any learningphase. In addition, the choice of CNN architecture can mod-ify the results substantially, as can be seen for example inthe comparison between results of Ribli et al. (2019b) andGupta et al. (2018). As such, CNNs usually require much,and often ad-hoc, fine-tuning. The scattering transform, onthe other hand, is not subject to these sources of variability.It requires the use of simulations only to probe the cosmicvariance of the descriptors.Another advantage of the scattering transform over CNNsis the possibility to investigate systematic effects in the data.Analyzing the scattering coefficients not only returns thebest-fitting cosmological parameters but also allows for san-ity checks and the evaluation of goodness of fit. In contrast,CNNs currently provide little opportunity to verify the ro-bustness of their outcomes. As it is most of the time virtuallyimpossible to precisely simulate real data, the fidelity of sim-ulated training sets is limited and the impact of systematiceffects can be difficult to assess.
The non-linear gravitational evolution of density fluctua-tions in the universe gives rise to halos, which are virializedsystems locally bound by gravity. As highlighted by Ribliet al. (2019b), a substantial amount of non-Gaussian cosmo-logical information can be extracted from these features. Thepeak count method makes use of this domain knowledge andis designed to capture the abundance of halos. However, itdoes not characterize the spatial distribution of these halos,which is also sensitive to cosmological parameters. On theother hand, the scattering transform extracts a comprehen-sive information of the abundance, profile, and distribution ofhalos by highlighting structures of particular scales and thencharacterizing their clustering at other scales, as describedin Section 2.4. In the limit of small j and large j , the2nd-order scattering coefficients can be simply understoodas a measure of the “two-halo term” in the halo model. Thehalo abundance and profile also leave their imprint on s co-efficients at different j and j scales. In this manner, thescattering transform provides a non-parametric description inone-halo, two-halo, and the transitional regime where halosoverlap and are forming larger halos. BSERVATIONAL COSMOLOGY WITH THE SCATTERING TRANSFORM
6. Conclusion
Characterizing arbitrary non-Gaussian fields is challeng-ing as the dimensionality of their description can be arbitrar-ily high. The subset of fields relevant in physics, however,tends to be more constrained as they typically display local-ized, coherent structures. In the cosmological context, thematter density field presents another characteristic property:hierarchical clustering. An efficient statistical descriptor ofthe cosmological density field would ideally be able to natu-rally capture these two properties.In this paper, we advocate the use of the scattering trans-form, a statistical estimator designed to extract informationfrom a complex field with provable stability properties.It involves operations similar to those found in convolu-tional neural nets (CNNs): it uses wavelets, which balancethe spatial and frequency resolution, as its convolutional ker-nel and the modulus operation as its non-linear operation.However, in contrast to CNNs, the scattering transform does not require any training. This statistical estimator can gener-ate a compact set of robust coefficients that forms a represen-tation of the input field and can be used as efficient summarystatistics for non-Gaussian information.We applied the scattering transform to the cosmologicalparameter inference problem in the context of weak lensingdata. For simplicity, we focused on the convergence fieldbut a similar analysis can also be performed on the shearfield. We used simulated convergence maps generated byray-tracing N -body simulation results (Zorrilla Matilla et al.2016; Gupta et al. 2018) and measure their scattering coeffi-cients to infer the cosmological parameters Ω and σ . Forsimulated data with galaxy shape noise, the scattering trans-form outperforms the power spectrum, peak counts and is onpar with state-of-the-art CNNs.As described in section 5.2, the scattering transform pos-sesses a series of attractive properties for parameter estima-tion, especially: it is efficient, robust, and interpretable. Aswe have shown, the 1st-order scattering coefficients charac-terize the clustering strength at different scales and are sim-ilar to the power spectrum; the 2nd-order scattering coeffi-cients efficiently extract non-Gaussian information by char-acterizing the clustering of structures of different sizes. Ingeneral, the n th-order scattering coefficients include infor-mation of up to n points functions, conveniently summa-rized in a compact set of coefficients, whose distribution istypically Gaussian (a desired property for likelihood esti-mation). Similarly to classical statistical estimators (not re-quiring training nor tuning), the scattering transform offersthe possibility to investigate for systematic errors potentiallypresent with real data.In this paper we focused on applications of the scatteringtransform for cosmological parameter inference with weaklensing data. Using it with existing and upcoming surveys (e.g. DES, LSST, Euclid , WFIRST ) can be great interest toimprove constraints and/or provide consistency checks. Nat-urally, the scattering transform is an attractive estimator formany other applications: in observational cosmology, astro-physics and beyond.
Acknowledgments
We thank the Columbia Lensing group for making theirsuite of simulated maps available, and NSF for supportingthe creation of those maps through grant AST-1210877 andXSEDE allocation AST-140041. We also thank Dezs˝o Riblifor useful discussions. YST is supported by the NASA Hub-ble Fellowship grant HST-HF2-51425.001 awarded by theSpace Telescope Science Institute. This work is partially sup-ported by the Alfred P. Sloan Foundation, NSF RI-1816753,NSF CAREER CIF 1845360, NSF CHS-1901091, SamsungElectronics, and the Institute for Advanced Study.
Appendix A Morlet Wavelets
Wavelets are localized oscillations. If we simply use aGaussian envelop to modulate a plane wave, then we obtaina Gabor function in arbitrary dimensions G ( (cid:126)x ) = 1 (cid:112) | Σ | e − (cid:126)x T Σ − (cid:126)x/ e i (cid:126)k · (cid:126)x , (A1)where Σ is the covariance matrix describing the size andshape of the Gaussian envelop, and (cid:126)k determines the fre-quency of the modulated oscillation. To keep maximum sym-metry, usually Σ is selected to have only 1 eigen-value differ-ent from the others, and (cid:126)k to be along that eigen-direction.Thus we denote the eigen-value along (cid:126)k by σ and the othereigen-value by σ /s . The parameter s is also the ratio oftransverse to radial width of the wavelet in Fourier space.The Fourier transform of a Gabor function is simply aGaussian filter centered at (cid:126)k , ˜ G ( (cid:126)k ) = e − ( (cid:126)k − (cid:126)k ) T Σ ( (cid:126)k − (cid:126)k ) / . (A2)Wider envelop in real space makes narrower filter in Fourierspace. Note that the product k σ determines the number ofoscillations within ± π ≈ standard deviation of the Gaus-sian envelop and allows for a trade off between spatial andfrequency resolution.Unfortunately, a Gaussian profile in Fourier space does notgo to zero at 0 frequency. This contradicts the admissibilityof wavelet which requires wavelets to strictly be band-passfilters, not low-pass filters. Therefore, a small correction isrequired. A simple solution is to introduce an offset, β , be-fore the modulation. In Fourier space this is equivalent to4 C HENG , T
ING , M ´
ENARD & B
RUNA subtracting another Gaussian profile centered at 0 to cancelout the 0-frequency contribution. Families of wavelets cre-ated in this way are called Morlet wavelets. Formally, ψ ( (cid:126)x ) = 1 (cid:112) | Σ | e − (cid:126)x T Σ − (cid:126)x/ (cid:16) e i (cid:126)k · (cid:126)x − β (cid:17) , (A3)where β = e − (cid:126)k T Σ (cid:126)k / is determined by the admissibilitycriterion. Its Fourier transform is ˜ ψ ( (cid:126)k ) = ˜ G ( (cid:126)k ) − βe − (cid:126)k T Σ (cid:126)k/ . (A4)In our study, which is a 2-dimension case, we follow thesettings used in the ‘kymatio’ package mentioned in Sec-tion 3.3, σ = 0 . × j k = 3 π × j s = 4 /L , (A5)where σ is in unit of pixels, j is an integer starting from 0,and k is always between 0 and 1. This choice allow a familyof Morlet wavelets best covers the whole Fourier space, witha dyadic sequence of scales ( j ). Within the wavelet envelop,there are about 2 cycles of oscillations, because k σ ≈ . Appendix B Scattering transform in Fourier space
The modulus and expectation operations of the scatteringtransform are defined in the real space and do not have simpleFourier counterparts, so in general it is easier to understandit in the real space. However, as the Morlet wavelets looksimpler in Fourier space (they are band-pass filters), it is alsoenlightening to collect some intuitions of the scattering trans-form from the Fourier side.In general, the modulus operation will mix Fourier modesand change the frequency distribution of a field. In par-ticular, if the field in question has been convolved with acomplex-valued wavelet that has a single peak in Fourierspace, then the energy will be redistributed mainly towardslower frequencies. I.e., the typical frequency of | I (cid:63) ψ | islower than I (cid:63) ψ . This can be revealed by first writing | I (cid:63) ψ | as (cid:112) ( I (cid:63) ψ )( I (cid:63) ψ ) ∗ , where ∗ stands for complex conjugate,and then Taylor expanding this square root with powers of ( I (cid:63) ψ )( I (cid:63) ψ ) ∗ (Mallat 2010). For the Morlet wavelets usedin scattering transform, the central frequency (in terms ofwavenumber) of the wavelet ψ is roughly k , and the half-width of the wavelet in Fourier space (in the radial direction)is around /σ . So, the leading term of the Taylor expansion,which is proportional to ( I (cid:63) ψ )( I (cid:63) ψ ) ∗ , will have an half-width around √ /σ in Fourier space and its centroid shiftedback to 0 from k , as a multiplication in the real space cor-responds to a convolution in Fourier space. This means that the typical frequency of | I (cid:63) ψ | is lower than I (cid:63) ψ , because √ /σ ≈ . k < k , according to Equation A5.In other words, the core operation I → | I (cid:63) ψ | expresseshigh frequency information of I n in terms of lower fre-quency components including the 0-frequency component inthe next-order fields I n +1 . As the 0-frequency componentis translation invariant, it can be directly used as a statisticaldescriptor of the original field.In addition to the qualitative understandings we have pre-sented above, it will be valuable to understand more quanti-tatively the Fourier behavior of the scattering coefficients. Appendix C Cosmological inference framework
In this appendix we describe the Fisher forecast formal-ism used to infer the cosmological parameters in this study.According to the Cram´er–Rao inequality, the variance ofany unbiased estimator ˆ θ for model parameters θ cannotbe smaller than the inverse of the Fisher information matrix I ( θ ) of the model: cov ( ˆ θ ) ≥ I ( θ ) − . (C6)Elements of the Fisher matrix is defined as I m,n ( θ ) ≡ − (cid:28) ∂ ln p ( x | θ ) ∂θ m ∂ ln p ( x | θ ) ∂θ n (cid:29) , (C7)where x is the observable, p is the likelihood function, (cid:104)·(cid:105) is the expectation over x . In our cosmological case, θ rep-resents cosmological parameters, θ = (Ω m , σ ) , and x rep-resents the statistical descriptors such as the scattering coef-ficients. The function p ( x | θ ) is called the likelihood of θ when x is fixed, and is called the probability density func-tion (PDF) of x when θ is fixed. In our study, we assumethat given any cosmology θ , the PDF of statistical descrip-tors x is Gaussian: p ( x | θ ) ∝ (cid:112) | C | exp [ −