Toward an Optimal Sampling of Peculiar Velocity Surveys For Wiener Filter Reconstructions
aa r X i v : . [ a s t r o - ph . C O ] M a r Mon. Not. R. Astron. Soc. , 1–12 (2017) Printed 13 November 2018 (MN L A TEX style file v2.2)
Toward an Optimal Sampling of Peculiar Velocity Surveys For WienerFilter Reconstructions
Jenny G. Sorce , ⋆ , Yehuda Ho ff man , Stefan Gottl ¨ober Leibniz-Institut f¨ur Astrophysik, 14482 Potsdam, Germany Universit´e de Strasbourg, CNRS, Observatoire astronomique de Strasbourg, UMR 7550, F-67000 Strasbourg, France Racah Institute of Physics, Hebrew University, 91904 Jerusalem, Israel
ABSTRACT
The Wiener Filter (WF) technique enables the reconstruction of density and velocity fields fromobserved radial peculiar velocities. This paper aims at identifying the optimal design of peculiarvelocity surveys within the WF framework. The prime goal is to test the dependence of the qual-ity of the reconstruction on the distribution and nature of data points. Mock datasets, extending to250 h − Mpc, are drawn from a constrained simulation that mimics the local Universe to produce re-alistic mock catalogs. Reconstructed fields obtained with these mocks are compared to the referencesimulation. Comparisons, including residual distributions, cell-to-cell and bulk velocities, imply thatthe presence of field data points is essential to properly measure the flows. The fields reconstructedfrom mocks that consist only of galaxy cluster data points exhibit poor quality bulk velocities. Inaddition, the quality of the reconstruction depends strongly on the grouping of individual data pointsinto single points to suppress virial motions in high density regions. Conversely, the presence of aZone of Avoidance hardly a ff ects the reconstruction. For a given number of data points, a uniformsample does not score any better than a sample with decreasing number of data points with thedistance. The best reconstructions are obtained with a grouped survey containing field galaxies: As-suming no error, they di ff er from the simulated field by less than 100 km s − up to the extreme edgeof the catalogs or up to a distance of three times the mean distance of data points for non-uniformcatalogs. The overall conclusions hold when errors are added. Key words:
Techniques: radial velocities, Cosmology: large-scale structure of universe, Methods:numerical
Reconstructing the local Large Scale Structure is essential in orderto analyze the distribution of matter and to understand the motionsruling the local Universe, namely to study the local dynamics. Sev-eral methods and algorithms have been devised for that purpose overthe last three decades (e.g. Dekel et al. 1990; Bertschinger et al. 1990;Nusser & Dekel 1992; Lavaux 2008; Kitaura 2013; Jasche & Wandelt2013; Wang et al. 2013; Lavaux 2016). Leading e ff orts in that direc-tion are the POTENT analysis (Dekel et al. 1999) and the Wiener Filter(WF) Bayesian methodology (Zaroubi et al. 1995, 1999). These tech-niques can usually be applied either to radial peculiar velocity catalogsor to redshift surveys. While the latter are easily acquired they providea biased account of the matter distribution. On the other hand, radialpeculiar velocity acquisition constitutes a real observational challenge,but unlike redshift surveys, they are a better tracer of the total (includ-ing the dark component) mass distribution. In addition, peculiar veloc-ities are linear on scales of a few megaparsecs and are correlated overlarge distances. It follows that the reconstruction of the Large ScaleStructure from radial velocities constitutes a challenging and appeal-ing task to observers and theorists alike. ⋆ E-mail: [email protected] / [email protected] Determination of galaxy peculiar velocities is uncertain primar-ily due to the distance measurement (i.e. the Hubble flow subtraction).Extragalactic distance measurement is exceptionally di ffi cult and tele-scope time consuming. Hence, strategically, the optimal survey mustbe defined to plan the observations accordingly. In this paper, we focuson the particular aspect of data sampling within the framework of theWF reconstruction. We address questions such as: Should field galax-ies be the subject of all the e ff orts or should galaxy clusters be thefocus? What are the merits, if any, of using clusters as individual datapoints? To what extent does the Zone of Avoidance (ZOA) degrade thequality of the reconstructions (Ho ff man 1994)? Should one opt for auniform spatial sampling?.The modus operandi of this paper is to build mock catalogs outof a constrained simulation of the local Universe (e.g. Gottl¨ober et al.2010; Sorce et al. 2014a). The advantage of using a constrained sim-ulation is that the simulation reproduces all the known major struc-tures and therefore realistic mock datasets can be constructed. Darkmatter halos and sub-halos are used as proxies for clusters and galax-ies. To dedicate our attention entirely to the sampling issues, nei-ther statistical nor systematic errors are added to distances and pe-culiar velocities of halos in a first-pass. In order to build the mockcatalogs, given the constrained nature of the simulation, the ob-server is assumed to stand in the center of the box where the ori- c (cid:13) Sorce et al. gin of the Supergalactic coordinate system is positioned. The sec-ond peculiar velocity catalog ( cosmicflows-2 , Tully et al. 2013) ofthe cosmicflows project is used here as our benchmark for pecu-liar velocity surveys. It has actually already been used in a se-ries of paper to study the local Large Scale Structure, to set upinitial conditions for constrained simulations and to estimate thebulk flow (e.g. Tully et al. 2014; Ho ff man et al. 2015; Sorce 2015;Pomar`ede et al. 2015; Watkins & Feldman 2015; Sorce et al. 2016b;Carlesi et al. 2016b,a; Sorce et al. 2016a; Ho ff man et al. 2016).In its modern form the WF is formulated as a Bayesian linear es-timator based on derivation of correlation functions, matrices and theirinverse assuming a prior cosmological model. It is optimal in the caseof Gaussian random fields (Zaroubi et al. 1995, and see Appendix Afor detailed equations). The linear regime of the Large Scale Structureis one of these explaining why the WF has revealed itself to be such avital tool for reconstructing the Large Scale Structure. It is then naturalto study the optimal sampling strategy within the WF framework. TheWF is a useful estimator in the case of very noisy, sparse, unhomoge-neously distributed with an incomplete coverage datasets, namely withcatalogs like peculiar velocity surveys. It can handle all the samplingissues previously described and this paper aims at quantifying to whichlevel it does so. However, there is one more issue that needs to be tack-led with: the grouping or collapse of all the data points that belong toone given large halo (cluster) into one point. This procedure is used tosuppress non-linear virial motions that are not accounted for in the WF.This paper tests the importance of removing non-linear motions.The paper is divided as follows. In the second section di ff erentmock catalogs that can be distinguished by 1) the number of data pointsas a function of distance, 2) the presence or not of a ZOA, 3) the se-lection of clusters only, 4) mimicking a grouping scheme while keep-ing isolated field galaxies, are drawn from the simulation. In the thirdsection, the WF technique is applied to these di ff erent mocks and thereconstructed velocity fields are compared to the velocity field of thereference simulation. In particular, reconstructed dipole and monopoleterms are compared to those of the simulation. Before concluding, abrief excursion in the regime of radial peculiar velocity catalogs withuncertainties is undertaken. To be able to meet the requirements of the benchmark catalog in thetests, a constrained simulation of the local Universe is made in thecontext of the CLUES project (Gottl¨ober et al. 2010) following theprocess described in Sorce et al. (2014a); Sorce (2015). This simula-tion is 500 h − Mpc wide and contains 1024 particles. The Planckcosmology framework ( Ω m = Ω Λ = = σ = . https: // / of coordinates, the XY plane is shown in the top left panel of Figure1. The velocity (black arrow) and density (black contour, green colorfor the mean) fields of the reference simulation are represented. In thatsimulation all the major attractors and voids of the local Universe arerepresented: the Shapley supercluster in the top left, the Coma super-cluster in the middle top, the Virgo supercluster in the middle of thebox and the Perseus Pisces supercluster on the right side of Virgo. The second generation catalog of galaxy distances built by the Cos-micflows collaboration is a large publicly released catalog of ra-dial peculiar velocities that will be considered as the benchmark tobuild the various mocks. Published in Tully et al. (2013), it containsmore than 8,000 accurate galaxy peculiar velocities. Distance mea-surements come mostly from the Tully-Fisher (Tully & Fisher 1977)and the Fundamental Plane (Colless et al. 2001) methods. Cepheids(Freedman et al. 2001), Tip of the Red Giant Branch (Lee et al. 1993),Surface Brightness Fluctuation (Tonry et al. 2001), supernovae of typeIa (Jha et al. 2007) and other miscellaneous methods also contribute tothis large dataset though to a minor extent ( ∼
12% of the data). Ourprimary interest lies in the total number of points and the repartitionof these points as we consider the ideal case of perfect mock cata-logs without any error, and as a result without any bias. The groupedversion of cosmicflows-2 is useful for the grouped mocks. It contains552 groups and 4303 single galaxies, i.e a total radial peculiar velocitycount of 4855.We use the Amiga halo finder (Knollmann & Knebe 2009) tobuild a list of halos from our reference simulation. Halos are then se-lected and prepared to match cosmicflows-2 on di ff erent aspects: • Mocks with a uniform distribution only reproduce the total num-ber of data points in cosmicflows-2 and extend to the same maximumdistance as cosmicflows-2 (approximately 250 h − Mpc). Note that theuniformity of the distribution gives rise to mocks that are de facto al-ready grouped in the sense that there is not much to group when datapoints are uniformly distributed and sparse. • For mocks with an inhomogeneous distribution we seek to havea similar repartition of data points (number and spatial coverage) as cosmicflows-2 . An histogram with a bin size of 20 h − Mpc is derivedfor the observational cosmicflows-2 catalog providing the number ofmeasurements in each bin or 20 h − Mpc shell. For each 20 h − Mpcshell, the same number of halos as found in cosmicflows-2 is selectedrandomly in the list of halos. • For grouped mocks, in the case of multiple halos in the same re-gion, the most massive halo is selected to mimic the grouping appliedto cosmicflows-2 . This ensures that 1) the velocity of the halo is free ofnon-linear motions that could be induced by more massive nearby ha-los and 2) the center of mass of the region corresponds approximatelyto the center of mass of the halo. Subsequently, its position and velocityare directly given by the halo finder. This procedure permits mimick-ing a group catalog without an extra layer of complexity based on theselection of a grouping algorithm and all its implications, namely defin-ing the galaxies belonging to the group and the velocity and positionto be attributed to the group. Note that for the grouped mock obtainedwith a uniform distribution, only the most massive halos (by exten-sion clusters) are selected. This is only a subtle di ff erence but with nonnegligible e ff ects on the results. • To reproduce the ZOA, domains are identified as similar to theZOA and every halo is removed from those. This zone is defined asa cone with the apex at the center of the box (where the observer is http: // / projet / cosmicflows / c (cid:13) , 1–12 F and Datasets -150-100-50050100150 Y ( h - M p c ) Ref.Simu
IG-1
IG-2 -150-100-50050100150 Y ( h - M p c ) IGZ-1
IGZ-2
I-1 -150 -100 -50 0 50 100 X (h -1 Mpc)-150-100-50050100150 Y ( h - M p c ) UG-1 -100 -50 0 50 100 X (h -1 Mpc)
U-1 -100 -50 0 50 100 150X (h -1 Mpc)
UZ-1
Figure 1.
First top, left panel: Density (black, green color for the mean density) and velocity (black arrows) fields of the reference simulation in the XY supergalacticplane. Other panels: Reconstructed overdensity (black and green contours) and velocity (black arrows) fields obtained with the Wiener Filter technique applied to onemock of each category. The properties of these mocks and the corresponding initials are explained in Table 1 in the same order. The Large Scale Structure of thereference simulation is overall reconstructed in every case.Category Uniform distribution Inhomogeneous distribution ZOA Ungrouped Grouped + field galaxies ClustersIG-X D D
IGZ-X
D D D
I-X
D D
UG-X
D D
U-X
D D
UZ-X
D D D
Table 1. Di ff erent mock types: ’U’ stands for a uniform distribution, these mocks match cosmicflows-2 (either grouped ’G’ or ungrouped although a uniform distributiongives rise to an almost grouped catalog) solely by the number of data points. Note that only clusters are selected in the case of the UG mocks while in the case of Umocks, the selection is completely random. ’I’ stands for inhomogeneous distribution, namely not only the number of data points is mimicked but also their distributionas a function of the distance. ’Z’ stands for the presence of a zone devoid of data mimicking the Zone of Avoidance. Finally ’X’ can take the value 1 or 2 dependingon whether there is once or twice the number of data points in cosmicflows-2 grouped and ungrouped respectively. For each category 5 mocks that di ff er on the’randomly’ chosen halos are built. Note however that, since there are approximately as many halos as number of measurements in the center of the box because ofthe simulation resolution, the number of variations possible is smaller when mimicking also the distribution of halos in cosmicflows-2 especially when doubling thenumber of measurements.c (cid:13) , 1–12 Sorce et al. -100-50050100150 S G Y ( h - M p c ) IG-1
IG-2 -100-50050100150 S G Y ( h - M p c ) IGZ-1
IGZ-2
I-1-150 -100 -50 0 50 100 SGX (h -1 Mpc)-150-100-50050100 S G Y ( h - M p c ) UG-1 -100 -50 0 50 100 SGX (h -1 Mpc)
U-1 -100 -50 0 50 100 150SGX (h -1 Mpc)
UZ-1
Figure 2.
Distributions of selected halos in the XY supergalactic plane in a 10 h − Mpc thick slice of one mock per each category. From left to right, top to bottom:distribution similar to the grouped version of cosmicflows-2 without a Zone of Avoidance (ZOA) and with the same number of data points (IG-1), with twice thenumber of data points and still without ZOA (IG-2), with the same number of data points and a ZOA (IGZ-1), with twice the number of data points and with a ZOA(IGZ-2); repartition matching that in ungrouped cosmicflows-2 without ZOA (I-1); uniform distribution matching only the number of points in the grouped versionof cosmicflows-2 and selecting only massive halos (UG-1), uniform distribution matching the number of points in ungrouped cosmicflows-2 without (U-1) and with aZOA (UZ-1). assumed to lie), with an aperture angle of 20 ◦ and assuming the sameorientation within the XYZ volume as the observational one in the su-pergalactic XYZ volume.The di ff erent types of mocks are summarized in Table 1.We produce five mocks of each type,i.e. in a category, mocks dif-fer from one another only by the ‘randomly’ chosen data points. In thecase of the inhomogeneous distribution the variance is limited closeto the center of the box since there are almost as many halos to se-lect as halos available despite the resolution of the simulation. We alsodouble the number of data points to see the influence of having moreobservations in the grouped mocks mimicking the grouped version of cosmicflows-2 , in terms of distribution of data points. This permitsus to determine whether there is an advantage in having additionnalpoints. Since the perfect case without error is considered, there willbe no quantitative measurements on the fact that having more points Note that the variance in the center of the box is reduced some more. has the unconditional advantage of decreasing the errors. On the otherhand, a group with no error measurements on velocities and positionscan already be considered as the result of the grouping of an infinitenumber of measurements of the group-members.On Figure 2, we compile the resulting list of halos for each mocktype (one mock per category is represented) is plotted as black dots ina 10 h − Mpc thick slice in the XY supergalactic plane.
We apply the WF technique to the di ff erent mocks using the Planckpower spectrum as a prior. The main calculations of the method aregiven in Appendix A. A boxsize of 500 h − Mpc and grids with 256 cells are used. The resulting overdensity and velocity fields are plottedin the XY supergalactic plane in Figure 1 with one mock of each cate-gory together with those of the reference simulation. Overdensity fields c (cid:13) , 1–12 F and Datasets -1 Mpc) σ ( k m s - ) σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
Figure 3.
Means (lines) and standard deviations (error bars) of 1- σ scatters obtained comparing the velocity field of the reference simulation and that obtainedwith the WF technique applied to the di ff erent mocks as a function of the ‘middle’ radius of the compared shells, i.e. grid cells at a distance d such that R mid -25 d < R mid +
25 h − Mpc are compared. Each colored linestyle corresponds to a mock type where the short names refer back to Table 1. are shown as black contours (the green color represents the null field-contour) while the black arrows represent the velocity fields. Somedi ff erences can be observed between the di ff erent WF reconstructionsalthough overall they all reproduce the Large Scale Structure of thereference simulation confirming that the WF is a powerful technique.To assess to what extent, a reconstruction is in agreement with thereference simulation, we proceed as follows: • production of a cell-to-cell comparison between the velocity fieldof the reference simulation and that of the reconstruction. Namely allthe cell values from field 1 are plotted against those of field 2. Using acloud-in-cell scheme with a 256 grid for the simulation to match thegrid size of the WF reconstructions, the cell sizes are 1.95 h − Mpc i.e.slightly smaller than the linear threshold. • derivation of the one sigma (1- σ ) scatter of this cell-to-cell com-parison, i.e. if the fields are identical, the points (cell field 1 ; cell field2) all lie on the 1:1 relation. • repeating of the process with the four reconstructions obtainedwith the other mocks in the same category as the first one, • computation of the mean and standard deviation of the 1- σ scat-ters.Not to bias results towards a better agreement with mocks having moredata points in the center, the cell-to-cell comparisons are made withinshells of 50 h − Mpc thickness (thus the center of the box is not alwaysaccounted for). The derived mean 1- σ scatters are plotted together withtheir standard deviation (bars) on Figure 3 as a function of the ‘mid-dle’ radius R mid such that compared shells are constituted of cells atdistances between R mid -25 and R mid +
25 h − Mpc from the center ofthe box. Each color represents one type of mocks as given by the leg-end and the meaning of the short names are described in Table 1.The first observation is that the standard deviations of the 1- σ scatters are quite small from one reconstruction to another whatevershell is considered: from 0.2 to 39 km s − . The majority is around1-2 km s − for the non-uniformly distributed mocks, with the excep-tion of the ungrouped mocks that have standard deviations between 16and 29 km s − . The non (but almost) grouped mocks with the uniformdistribution present standard deviations of 8-10 km s − . The catalogsof clusters result in the maximum standard deviations with values be-tween 9 and 39 km s − . Standard deviations are the smallest for mocks mimicking the grouped version of cosmicflows-2 not only in terms ofnumber of points but also in their repartition. That on average in allthe shells, the highest standard deviations are those observed for themocks mimicking an ungrouped configuration confirms the influenceof non-linear motions on a linear technique such as the WF. More pre-cisely, from one selection to another the non-linear motions are moreor less important implying that they do not a ff ect the reconstruction tothe same amount.As a second observation, one can note that the presence of aZOA does not impact the reconstruction severely, if it does at all,as already shown by Ho ff man (1994): the 1- σ scatters are about thesame when comparing mocks with and without ZOA. They are about85 ± − for the largest shells and decrease, with the size of theshells, down to 35-40 ± − when comparing reconstructionsfrom inhomogeneous mocks. For uniform mocks, the 1- σ scatters aremore stable across the di ff erent shells with values about 80 ± − ,in agreement with the fact that they are homogeneously distributed.When comparing the smallest shells, the e ff ect of the non-linear clus-tering captured only by the simulation is visible on small scales andexplains that 1- σ scatters increase when comparing small volumes.This is the limitation of the method used here to compare the simu-lation with non linear motions to the linear WF reconstructions. Still,this method is e ffi cient enough to reach our goal.A slight but negligible ( < − ) improvement is observedwhen using twice the number of points as that in cosmicflows-2 only inthe outer shells. This implies that increasing the density of data pointsimproves the reconstruction only weakly comforting the stability of theWF technique. For the ungrouped and inhomogeneous catalogs, non-linear motions scramble the signal so that reconstructions only haveaccuracy at about 100 ±
20 km s − minimum.With regards to the catalogs of clusters, they present the largest1- σ scatters of all the tested mock catalogs in agreement with expec-tations: they do not contain galaxies in the field to measure the gravi-tational field due to the attraction of large attractors like clusters. Suchcatalogs permit to recover the velocity field only at an accuracy ofabout 160 ±
10 km s − for the largest shells up to 200 ±
39 km s − forthe inner shells.As a final note, the uniformly distributed catalogs (except thoseof clusters) result on average in slightly better reconstructions than c (cid:13) , 1–12 Sorce et al. the inhomogeneous grouped catalogs only for the largest shells. Still,because the variance of their 1- σ scatters is larger, it includes themean 1- σ scatters and the variances obtained for the reconstructionsmade with the non-uniformly distributed grouped mocks. Namely,with less than 2% of the data in the largest compared shells (atR mid =
225 h − Mpc), the catalogs reproducing the distribution of thesecond catalog of the Cosmicflows project are able to do as well asthe uniformly distributed catalogs that have about 30% of data in thatregion.To give another quantitative aspect of the mean 1- σ scatters, theirratio to the variance of the simulated field is added as a right axison Figure 3. It permits the evaluation of the di ff erence between thesimulated and reconstructed field which is seen to be smaller than thevariance of the simulated field itself in all but the catalogs of clusterscases.Residuals between simulated and reconstructed fields can bestudied in more detail. Figure 4 shows the mean properties (and theirstandard deviation) of the residual distributions in the same shellsas described above. From top to bottom, the averages and standarddeviations of the mean, standard deviation, skewness and kurtosis.Note that skewness and kurtosis are employed here and in the rest ofthe paper for third and fourth standardized moments, more preciselykurtosis is used for ’excess kurtosis’. of the residual distributionsper mock type are plotted. Unconditionally, inhomogeneous groupedmocks result in quasi-Gaussian residual distributions (skewness andkurtosis are almost null: about 0.1 or less in absolute value and about1 or less respectively) with the smallest standard deviation (about 80to 170 km s − ) and a mean close to zero (less than 10 km s − inabsolute value), i.e. reconstructions are not biased towards an infallor outflow onto the local Volume. On the contrary, all the other mocktypes (homogeneous or ungrouped) present on average a distributionof residuals with a negative mean of about 50 km s − implyingthat Wiener Filter reconstructed velocities are biased towards largervalues with respect to the velocities of the reference simulation.Although catalogs of clusters result in residual distributions with thelargest standard deviation on average (more than 300 km s − ), thesedistribution do not appear to be asymmetric in a quantitative way(skewness about 0.3 in absolute value). However, they are quite flat(kurtosis about 5). Finally, homogeneous and inhomogeneous catalogsthat are ungrouped lead to residual distributions that, in additionto being quite flat, are asymetric (skewness about -1, -2). Theseobservations hold for all the shells considered here. To summarize,inhomogeneous grouped catalogs result in non-biased reconstructionswith on average an almost Gaussian distribution of errors presentingthe smallest scatter.Next, the reconstructions obtained in one category are comparedbetween themselves to estimate directly the variance due to the selec-tion of data points. We repeat the same cell-to-cell analysis but nowWF reconstructions are compared two by two instead of a compari-son between a reconstructed field and the field of the reference sim-ulation. The means (lines) and standard deviations (error bars) of the1- σ scatters are shown in Figure 5 as a function of R mid of the shellswithin which the WF reconstructions are compared. Each color repre-sents one type of mocks as given by the legend and the meaning of theshort names are explained in Table 1.Again, the smallest 1- σ scatters are obtained for the non-uniformly distributed mocks (about 35 ± − for the largest shelldown to 15 ± − for the inner shell) and the largest ones are ob-tained for the mocks of clusters (more than 200 ±
30 km s − ). Theseobservations are again in agreement with expectations as the clustersdo not feel their own gravitational field inducing the actual flows in the field. Thus from one selection of clusters to another, a large varia-tion in the reconstructed field is expected. While again the variance be-tween the reconstructions from mocks with and without a ZOA is notdrastically di ff erent, there is a clear importance of eliminating virialmotions when a large number of data is available in a region: largerscatters are measured for the catalogs mimicking the ungrouped ver-sion of cosmicflows-2 than for those mimicking the grouped version.There are benefits in having a higher density of data in the center ofthe box rather than a uniform distribution only when the catalogs aregrouped. Note that the non-uniformly distributed catalogs present re-constructions in better agreement with each other than those obtainedwith homogeneous datasets also when considering the largest shellswhich confirms the large scale correlation of the velocities. Namely ifthere is less variance in the center of the box, there is also less variancein the outer part of the box. We reiterate that this is true only if the cat-alog is released of non-linear motions (i.e. this observation is not truefor the reconstructions obtained with the ungrouped inhomogeneousmocks).That the mean 1- σ scatter of the inner shells for the inhomoge-neous catalogs with a ZOA and twice the number of points is smallerthan for the other inhomogeneous catalogs is only an artifact due tothe impossibility to significantly vary the points in the center of thebox. The ratio of the mean 1- σ scatters to the variance of the simulatedfield is again added as a right axis on Figure 3. The same conclusionas before can be reached: only the catalogs of clusters present ratioslarger than 1.To summarize, catalogs reproducing the grouped version of cosmicflows-2 in most aspects are those resulting in reconstructionspresenting the most neutral and Gaussian distribution of errors withrespect to the reference simulation and the smallest scatters not onlywith the reference simulation but also between themselves even at largeradii, i.e. where the randomness of the datapoint selection is higher andwhere there are less data points than in uniformly distributed catalogs.This is in agreement with the fact that peculiar velocities are correlatedon large scales. Then as long as the center is well constrained, a goodaccuracy at farther distances can be reached.To go further into the comparison, in the next subsection, we pro-pose to establish the accuracy of the moments of the velocity fieldsobtained with the WF technique. Monopole and dipole components of the velocity fields are definedwith the Taylor expansion of the velocity field v at first order. The lattercan be written as follows: v α ( r ) = v α ( r ) + ∂v α ∂ r β | r dr β (1)where α is x , y or z . The trace of the deformation tensor ∂v α ∂ r β in equa-tion 1 is the monopole term: T r ( Σ ) = − ∇ ~v H ( z ) (e.g. Libeskind et al. 2014).Considering the full three dimensional (3D) velocity field, evaluated ona Cartesian grid, the bulk velocity vector is the volume weighted meanvelocity field within a top-hat sphere of radius R to denote a sharp cut-o ff . The bulk velocity is defined for instance by Ho ff man et al. (2015)as: V WFbulk ( R ) = π R Z r < R v WF ( r ) d r (2)We derive the monopole and dipole components of the referencesimulation and of the WF reconstructed velocity fields at di ff erent radii.Means (lines) and standard deviations (error bars) of monopole (top)and dipole (middle) values of reconstructed fields divided by thoseof the reference simulation as a function of the radius (distance from c (cid:13) , 1–12 F and Datasets -100 0 100 m e a n ( k m s - ) Distribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: PropertiesDistribution of Errors: Properties
IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
200 400 σ ( k m s - ) IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1 -3 -1 1 s k e w n e ss IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1 -1 Mpc) 5 15 20 k u r t o s i s IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
Figure 4.
Properties of the distribution of residuals between simulated and reconstructed fields. Namely the average properties of the distribution of errors in thereconstructions are shown. From top to bottom, averages and standard deviations of the mean, standard deviation, skewness and kurtosis of the residual distributionsin shells of ‘middle’ radius R mid , i.e. the residuals in cells at a distance d such that R mid -25 d < R mid +
25 h − Mpc are considered. Each colored linestyle correspondsto one mock type, whose short names are given in Table 1. -1 Mpc) σ ( k m s - ) σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
Figure 5.
Means (lines) and standard deviations (error bars) of 1- σ scatters obtained comparing between themselves reconstructed velocity fields of a same givenmock category as a function of the ‘middle’ radius of the compared shells, i.e. cells at a distance d such that R mid -25 d < R mid +
25 h − Mpc are compared. Eachcolored linestyle corresponds to one mock type, whose short names are given in Table 1. the center of the box) are derived and reported on Figure 6 for eachmock category (one color and linestyle per category). Mocks with non-uniform distributions have velocity moments closer to the referencesimulation in agreement with the fact that their velocity fields are thebest match to the reference velocity field. For the dipole term, this isnot applicable to ungrouped mocks because of the non-linear motions.More precisely, the non-uniformly distributed mocks give recon-structions with monopole terms consistent with those of the simulation:Their proportionality factor is between 0.5 and 2.0 on a large range of radii. Regarding the uniformly distributed mocks, they have monopoleterms that can vary grandly around that of the reference simulations ;especially in the case of the catalogs of clusters, where not only themonopole terms between the reconstructions obtained with the di ff er-ent mocks di ff er completely (standard deviations as high as 10), butalso the ratio of the simulated to reconstructed monopole terms canreach values as high as 16. Note that on the Figure, the appropriaterange chosen for visibility does not extend to 16.Regarding the dipole terms, the ratio between values found with c (cid:13) , 1–12 Sorce et al. -1 Mpc)-4-2024 m o n o p o l e W F / m o n o p o l e S i m u IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1 -1 Mpc)110 d i p o l e W F / d i p o l e S i m u IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1 -1 Mpc)0.00.20.40.60.81.0 c o s ( θ ) IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
Figure 6.
Mean (lines) and standard deviation (error bars) of the monopole (top) and dipole (middle) terms of the reconstructed velocity fields for each mock categorydivided by the monopole and dipole values of the field of the reference simulation as a function of the distance from the center of the box (observer’s location). Means(lines) and standard deviations (error bars) of the dot product or alignment (bottom) between the simulated and reconstructed dipoles as a function of the distance fromthe center of the box. The identification of mock categories with the letters is given in Table 1. c (cid:13) , 1–12
F and Datasets the reconstructions and those of the simulation are between 1.0 and 1.3out to 200 h − Mpc for the grouped non-uniformly distributed mocksand present very small variances (less than 0.3). Thus the WF recon-struction in that case gives a correct bulk flow up to 200 h − Mpc inagreement with the claims of Ho ff man et al. (2015). On the other hand,the ungrouped inhomogeneous mocks give values too large already atsmall radii (factor 2 at 60 h − Mpc up to a factor 6 at 200 h − Mpc).As for the homogeneous mocks, they start by giving reconstructionswith reasonable dipole values (for radii up to 30 h − Mpc) but quicklyreach values 2-3 times too large with variances (about 0.4-0.5) largerthan for the reconstructions obtained with the inhomogeneous mocks.To investigate further the accuracy of the reconstructed dipole, welook at the alignment between the simulated and the reconstructed vec-tors in the bottom panel of Figure 6. They are extremely well alignedwhen considering all the inhomogeneous grouped catalogs up to dis-tances of 200 h − Mpc with an almost inexistent scatter. In contrast,the vectors are largely misaligned with large discrepancies and scattersfor the other mocks. Considering the catalogs of clusters, the angle be-tween the simulated and reconstructed vectors probes a large range ofvalues and the two vectors are clearly never aligned even in the centerof the box. This clear misalignment implies that clusters are not goodtracers of the large scale velocity dipole. It permits to see in a com-pletely new way the misalignment between the CMB and the vectorinferred by clusters highlighted by Lauer & Postman (1994). In viewof the results of this paper, field galaxies could reconcile the discrep-ancy between the dipoles’ direction.
A detailed study of the impact of uncertainties and their distributionin radial peculiar velocity catalogs used to reconstruct the local Uni-verse is beyond the scope of this paper. Nevertheless, it is interestingto check to which extent our findings hold for observational catalogswith errors. Although any observer aims at providing error-free cata-logs, the latter are always a ff ected by statistical and systematic errors.To simplify the problem, we disregard the systematic errors. In lightof ongoing and future all sky surveys (e.g ASKAP, SKA, WISE, Pan-STARRS) this assumption is relatively realistic: the risks of direction-dependent errors for a given catalog will not be problematic anymore.Although statistical errors will decrease dramatically, they will still bepresent and cannot be strictly disregarded. However, their distributionin a given catalog of data can be quite complex. They depend on multi-ple factors such as the distribution of object distances to the observers,the type of observations or more precisely of distance indicators usedas well as the number of distance measurements for a given object.To be realistic without adding an extra layer of complexity, wesettle for a 15% scatter Gaussian distribution of fractional errors on dis-tances regardless of the mock type considered. The 15% value choiceis based on the fact that errors in the benchmark catalog ranged mostlybetween ∼
10 and 20%. This distribution of errors, that are proportionalto the distances, allow us to take into account the fact that usually ob-servations of objects further away are more uncertain than those ofnearby objects. However, the 15% scatter imposed whatever the mocktype removes the benefit of having more data points or more observa-tions per object that would reduce the global uncertainties. In addition,it does not take into account the fact that inhomogeneous catalogs havea smaller mean distance than homogeneous catalogs which in princi-ple could lead to a smaller scatter of the fractional error distribution(more accurate distance indicators usable nearby). In light of these ob-servations, results obtained with mocks where errors have been addedgive a hint at whether or not the finding hold but any result should beconsidered carefully before drawing firm conclusions.Subsequently, we add errors to distances (thus to derived radialpeculiar velocities) in every mock following the prescription we de- scribed above. Namely, for each mock, an array of random values dis-tributed on a 0.15- σ Gaussian is built. Distances with errors are dis-tances without errors multiplied by the random values plus one. Pe-culiar velocities with errors are derived with these new uncertain dis-tances. Note that we choose to use a di ff erent random array for everymock rather than fixing the random array for a given mock type. Con-sequently, the variance is increased between the mocks of a same type.Regardless, it might seem unfair to fix the random array only for agiven mock type, we would need to fix it for the whole set of mocks.However, such a choice would aggravate the e ff ects described above:there is a priori no reason for catalogs with a larger mean distance tohave the same σ (and a fortiori the exact same) Gaussian distributionof fractional errors as catalogs with a smaller mean distances.WF reconstructions are obtained with the 40 mocks with errorsand compared to the reference simulation as before. The average 1- σ scatters and their standard deviations are plotted on Figure 7. Resultsare as follows: • The first completely expected observation is the increase of the1- σ scatters with respect to the case without errors: they are all above100 km s − . The second observation is the decoupling of the 1- σ scat-ters obtained with the mocks that have twice as many data points as cosmicflows-2 to those with only once the number of points. The trendis contrary to our expectation since the 1- σ scatters are slightly largerin the former case (about 150 km s − ) than in the latter case (about130 km s − ). This highlights typically one result that has to be con-sidered carefully. Indeed the fact that the number of measurements de-creases the global error has not been taken into consideration here. • Inhomogeneous catalogs with a ZOA reveal themselves to resultin reconstructions with slightly higher 1- σ scatters (about 170 km s − )than those without, reinforcing the weak impact of the ZOA on thereconstructions. This is further motivation to recent e ff orts made toobserve close to the ZOA (Sorce et al. 2012, 2014b; Neill et al. 2014)or even in the ZOA (Kraan-Korteweg et al. 1994; Donley et al. 2006;Williams et al. 2014; Staveley-Smith et al. 2016; Said et al. 2016a,b;Ramatsoku et al. 2016). • The 1- σ scatters obtained with the ungrouped without ZOA in-homogeneous catalogs are higher (about 150 km s − ) than those ob-tained with the inhomogeneous grouped mocks with once the numberof points and without ZOA (about 130 km s − ). This is in agreementwith previous conclusions: Removing non-linear motions is essential.Note that such a result is not trivial since we applied the same 1- σ Gaussian distribution of errors for both mock types whereas by defini-tion the global error should have been smaller after grouping. • This additionally explains why the 1- σ scatters are similar forboth ungrouped without ZOA inhomogeneous catalogs and groupedwith ZOA inhomogeneous catalogs. Should we have considered alarger σ for the distribution of errors in the ungrouped cases, the 1- σ scatters would have been higher in agreement with previous observa-tions without errors. • Finally, the 1- σ scatters are the largest (up to 300 km s − ) forreconstructions obtained with the uniformly distributed mocks exceptat large radii. The 1- σ scatters could have been smaller consideringthat clusters might benefit from multiple measurements, hence smallererrors. However, such a statement has to be counterbalanced by the factthat nearby measurements benefit in general from higher accuracy dueto more precise distance indicators. In addition, even if the 1- σ scattercould be slightly decreased, they are twice as large as those obtainedwith other mocks.To summarize, our previous conclusions based on mocks withouterror stand: the inhomogeneous grouped mock catalogs constitute anexcellent sampling for the Wiener Filter technique. c (cid:13) , 1–12 Sorce et al. -1 Mpc) σ ( k m s - ) σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v σ / σ v IG-1IG-2IGZ-1IGZ-2I-1UG-1U-1UZ-1
Figure 7.
Means (lines) and standard deviations (error bars) of 1- σ scatters obtained comparing the velocity field of the reference simulation and those obtainedwith the WF technique applied to the di ff erent mocks with errors as a function of the ‘middle’ radius of the compared shells, i.e. grid cells at a distance d such thatR mid -25 d < R mid +
25 h − Mpc are compared. Each colored linestyle corresponds to one type of mocks. Short names of Mock types are given in Table 1.
The Wiener Filter (WF) permits reconstructing velocity fields with cat-alogs of radial peculiar velocities. These fields are pathways to studythe local Large Scale Structure and its dynamics. Thus, identifying thesurvey sampling that leads to the best reconstruction in the WF frame-work is essential. Applying the WF technique to di ff erent mock cata-logs built out of a reference simulation, we are able to determine theoptimal sampling of the observational dataset to recover the best qual-ity reconstruction with the WF technique. By extension, it o ff ers thepossibility to design strategically the observational plan to measure pe-culiar velocities and thus build surveys.Leading comparisons, including residual distributions, cell-to-celland bulk velocities, between WF reconstructions obtained with cata-logs of clusters, grouped catalogs (these latter di ff ering from the formerby the presence of field galaxies while galaxies in high density regionsare collapsed into one point), non-uniformly and homogeneously dis-tributed datasets drawn from a constrained simulation and that verysame simulation, we find that the best quality reconstruction with afixed number of data points is obtained with catalogs that present thefollowing properties: • they are grouped, i.e. non-linear motions are suppressed. It is anexpected result as the WF is a linear method: degrading the informa-tion contained in the dataset is essential to preserve the quality of thereconstruction, • they also contain field galaxies to measure flows due to the gravi-tational field induced by large structures like clusters, • they are non-uniformly distributed, in particular they present adecreasing number of data points with the distance from the observer.Note that there is no particular advantage at a constant numberof points for a survey to be uniformly distributed especially when theobject of study is the bulk flow. As a matter of fact, non-uniformlydistributed datasets, as described above, result in as good reconstruc-tions as homogeneously distributed catalogs provided that galaxieshave been excellently grouped in clusters when appropriate. For in-stance, a catalog where 50% of the data are within 60 h − Mpc and98% within 150 h − Mpc allows the reconstruction of the velocity fieldoverall up to 250 h − Mpc as well as a uniformly distributed catalog does over the same range of distances. The velocity field reconstruc-tion is even better in the inner part of the box when applying the WF tothe former rather than the latter and the bulk flow can be studied withgreater accuracy.Quantitatively, the WF technique when applied to a non-uniformly distributed grouped catalog of radial peculiar velocitieswithout error, is able to provide a velocity field in agreement with thereference field (from which the data is drawn) at better than 100 km s − up to distances equal to three times the mean distance of the data. Thedipole (including its direction) and monopole terms of the velocity fieldare in excellent agreement with that of the reference field (less than 2%di ff erence) up to 200 h − Mpc. This is expected due to the large scalecorrelation of the velocities. The worst agreement is found for the cat-alogs of clusters, suggesting that such catalogs are not good probesof the large scale velocity field within the WF framework. In particu-lar, they result in a complete misalignment between the simulated andthe reconstructed dipole vectors. In addition, the random selection ofpoints is not important provided that the sample covers the maximumof regions. Namely, for a given type of catalog, from a random se-lection of points to the other the resulting reconstructions do not varymuch from one another.To dedicate our attention entirely to the sampling issues, neitherstatistical nor systematic errors were added to catalogs in a first-pass.However, observational catalogs are not exempt of biases and errors.With a simple but realistic model of fractional errors on distances anda careful consideration of the results in light of the chosen error model,we show that our conclusions are unchanged when adding errors to themocks. A more sophisticated error model applied to the optimal catalogas described above, typically with a density of measurements declin-ing from the center to the outer part and with properly grouped clustergalaxies, demonstrates that the WF, applied to the catalogs where thebiases have been minimized (see e.g. Sorce 2015, for a detailed study),approximates the underlying velocity field at 100-150 km s − in agree-ment with the findings obtained with the error model used here.To summarize, this analysis lays the basis to design strategicallyfuture observational surveys and to build catalogs. Such surveys willlead to exquisite quality WF reconstruction of the Large Scale Struc-ture that in turn will lead to optimal study of the local dynamics. c (cid:13) , 1–12 F and Datasets ACKNOWLEDGEMENTS / APPENDIX A: THE WIENER-FILTER
The Wiener Filter technique is the optimal minimal variance estimatorgiven a dataset and an assumed prior power spectrum (Zaroubi et al.1995, and references therein). The overdensity δ WF and velocity v WF fields of the Wiener Filter are expressed in terms of the following cor-relation matrixes. For a list of M constraints c i : δ WF ( r ) = M X i = h δ ( r ) c i i M X j = h c i c j + ǫ i δ ij i − ( c j + ǫ j ) (A1) v WF α = M X i = h v α ( r ) c i i M X j = h c i c j + ǫ i δ ij i − ( c j + ǫ j ) with α = x , y, z (A2)where c i + ǫ i are mock or observational constraints plus theiruncertainties and errors are assumed to be statistically independent.Schematically, the signal is smoothed by a factor inversely pro-portional to the errors. The constraints can be either densities orvelocities. h AB i notations stand for the correlation functions involvingthe assumed prior power spectrum.The associated correlation functions are given by: h δ ( r ′ ) v α ( r ′ + r ) i = ˙ a f (2 Π ) Z ∞ ik α k P ( k ) e − i k . r d k = − ˙ a f r α ζ ( r ) (A3) h v α ( r ′ ) v β ( r ′ + r ) i = (˙ a f ) (2 Π ) Z ∞ k α k β k P ( k ) e − i k . r d k = (˙ a f ) Ψ αβ (A4)where P is the assumed prior power spectrum.Since data sample a typical realization of the prior model, i.e. thepower spectrum, χ d . o . f should be close to 1 where χ = P Mi = P Mj = ( c i + ǫ i ) h c i c j + ǫ i δ ij i − ( c j + ǫ j ) and d.o.f is the degree of freedom. How-ever, data include non-linearities which are not taken into accountin the model. Consequently, a non linear sigma ( σ NL ) is required( h c i c j i + δ kij ǫ j + δ kij σ NL ) to compensate for the non-linearities to drive χ d . o . f closer to 1. Data dominate the reconstruction in regions where theyare dense and accurate. In contrast when they are noisy and sparse, thereconstruction is a prediction based on the assumed prior model. REFERENCES
Bertschinger E., Dekel A., Faber S. M., Dressler A., Burstein D.,1990, ApJ, 364, 370Carlesi E., Ho ff man Y., Sorce J. G., Gottl¨ober S., Yepes G., CourtoisH., Tully R. B., 2016a, MNRAS, 460, L5Carlesi E. et al., 2016b, MNRAS, 458, 900Colless M., Saglia R. P., Burstein D., Davies R. L., McMahan R. K.,Wegner G., 2001, MNRAS, 321, 277Dekel A., Bertschinger E., Faber S. M., 1990, ApJ, 364, 349Dekel A., Eldar A., Kolatt T., Yahil A., Willick J. A., Faber S. M.,Courteau S., Burstein D., 1999, ApJ, 522, 1Donley J. L., Koribalski B. S., Staveley-Smith L., Kraan-KortewegR. C., Schr¨oder A., Henning P. A., 2006, MNRAS, 369, 1741Freedman W. L. et al., 2001, ApJ, 553, 47Gottl¨ober S., Ho ff man Y., Yepes G., 2010, ArXiv e-prints: 1005.2687 c (cid:13) , 1–12 Sorce et al. Ho ff man Y., 1994, in Astronomical Society of the Pacific ConferenceSeries, Vol. 67, Unveiling Large-Scale Structures Behind the MilkyWay, Balkowski C., Kraan-Korteweg R. C., eds., p. 185Ho ff man Y., Courtois H. M., Tully R. B., 2015, MNRAS, 449, 4494Ho ff man Y., Nusser A., Courtois H. M., Tully R. B., 2016, MNRAS,461, 4176Jasche J., Wandelt B. D., 2013, MNRAS, 432, 894Jha S., Riess A. G., Kirshner R. P., 2007, ApJ, 659, 122Kitaura F.-S., 2013, MNRAS, 429, L84Knollmann S. R., Knebe A., 2009, ApJS, 182, 608Kraan-Korteweg R. C., Cayette V., Balkowski C., Fairall A. P., Hen-ning P. A., 1994, 67, 99Lauer T. R., Postman M., 1994, ApJ, 425, 418Lavaux G., 2008, Physica D Nonlinear Phenomena, 237, 2139Lavaux G., 2016, MNRAS, 457, 172Lee M. G., Freedman W. L., Madore B. F., 1993, ApJ, 417, 553Libeskind N. I., Ho ff man Y., Gottl¨ober S., 2014, MNRAS, 441, 1974Neill J. D., Seibert M., Tully R. B., Courtois H., Sorce J. G., JarrettT. H., Scowcroft V., Masci F. J., 2014, ApJ, 792, 129Nusser A., Dekel A., 1992, ApJ, 391, 443Planck Collaboration et al., 2014, A&A, 571, A16Pomar`ede D., Tully R. B., Ho ff man Y., Courtois H. M., 2015, ApJ,812, 17Ramatsoku M. et al., 2016, MNRAS, 460, 923Said K., Kraan-Korteweg R. C., Jarrett T. H., Staveley-Smith L.,Williams W. L., 2016a, MNRAS, 462, 3386Said K., Kraan-Korteweg R. C., Staveley-Smith L., Williams W. L.,Jarrett T. H., Springob C. M., 2016b, MNRAS, 457, 2366Sorce J. G., 2015, MNRAS, 450, 2644Sorce J. G., Courtois H. M., Gottl¨ober S., Ho ff man Y., Tully R. B.,2014a, MNRAS, 437, 3586Sorce J. G., Courtois H. M., Tully R. B., 2012, AJ, 144, 133Sorce J. G., Gottl¨ober S., Ho ff man Y., Yepes G., 2016a, MNRAS,460, 2015Sorce J. G. et al., 2016b, MNRAS, 455, 2078Sorce J. G., Tully R. B., Courtois H. M., Jarrett T. H., Neill J. D.,Shaya E. J., 2014b, MNRAS, 444, 527Staveley-Smith L., Kraan-Korteweg R. C., Schr¨oder A. C., HenningP. A., Koribalski B. S., Stewart I. M., Heald G., 2016, AJ, 151, 52Tonry J. L., Dressler A., Blakeslee J. P., Ajhar E. A., Fletcher A. B.,Luppino G. A., Metzger M. R., Moore C. B., 2001, ApJ, 546, 681Tully R. B., Courtois H., Ho ff man Y., Pomar`ede D., 2014, Nature,513, 71Tully R. B. et al., 2013, AJ, 146, 86Tully R. B., Fisher J. R., 1977, A&A, 54, 661Wang H., Mo H. J., Yang X., van den Bosch F. C., 2013, ApJ, 772, 63Watkins R., Feldman H. A., 2015, MNRAS, 447, 132Williams W. L., Kraan-Korteweg R. C., Woudt P. A., 2014, MNRAS,443, 41Zaroubi S., Ho ff man Y., Dekel A., 1999, ApJ, 520, 413Zaroubi S., Ho ff man Y., Fisher K. B., Lahav O., 1995, ApJ, 449, 446 c (cid:13)000