[PDF] Three-dimensional shapelets and an automated classification scheme for dark matter haloes

Abstract

We extend the two-dimensional Cartesian shapelet formalism to d-dimensions. Concentrating on the three-dimensional case, we derive shapelet-based equations for the mass, centroid, root-mean-square radius, and components of the quadrupole moment and moment of inertia tensors. Using cosmological N-body simulations as an application domain, we show that three-dimensional shapelets can be used to replicate the complex sub-structure of dark matter halos and demonstrate the basis of an automated classification scheme for halo shapes. We investigate the shapelet decomposition process from an algorithmic viewpoint, and consider opportunities for accelerating the computation of shapelet-based representations using graphics processing units (GPUs).

Full PDF

MMon. Not. R. Astron. Soc. , 1–19 (2011) Printed 16 November 2018 (MN L A TEX style ﬁle v2.2)

Three-dimensional shapelets and an automatedclassiﬁcation scheme for dark matter haloes (cid:63)

C.J. Fluke † , A.L. Malec , P.D. Lasky , , , B.R. Barsdell Centre for Astrophysics & Supercomputing, Swinburne University of Technology, PO Box 218, Hawthorn, Victoria, 3122, Australia Theoretical Astrophysics, Eberhard Karls University of T¨ubingen, T¨ubingen 72076, Germany School of Physics, University of Melbourne, Parkville, VIC 3010, Australia

Accepted 18 December 2011

ABSTRACT

We extend the two-dimensional Cartesian shapelet formalism to d -dimensions. Con-centrating on the three-dimensional case, we derive shapelet-based equations for themass, centroid, root-mean-square radius, and components of the quadrupole momentand moment of inertia tensors. Using cosmological N -body simulations as an applica-tion domain, we show that three-dimensional shapelets can be used to replicate thecomplex sub-structure of dark matter halos and demonstrate the basis of an auto-mated classiﬁcation scheme for halo shapes. We investigate the shapelet decomposi-tion process from an algorithmic viewpoint, and consider opportunities for accelerat-ing the computation of shapelet-based representations using graphics processing units(GPUs). Key words: methods: data analysis – methods: analytical – (cosmology:) dark matter– (cosmology:) large-scale structure of Universe

Complex, three-dimensional structures abound in astron-omy on all scales from “ﬂuﬀy” dust aggregrates in molec-ular clouds (Ossenkopf 1993; Stepnik et al. 2003), to cos-mological large-scale structure that has been described as“sponge-like” (Gott, Dickinson & Melott 1986), or a “skele-ton” (Sousbie et al. 2008) of clusters, ﬁlaments and voids(Barrow, Bhavsar & Sonoda 1985; White et al. 1987).While aspects of these structures can be expressed interms of simple, geometrically-motivated properties such astheir triaxiality or quadrupole moment, these quantities arenot able to capture the higher order complexity of the trueshape. The challenge, therefore, is to provide an accuratedescription of an arbitrary three-dimensional (3-d) shape,possibly over many physical length scales, in the hope thatthis can lead to improved theoretical or analytical insightinto the structure in question.The human visual system is more than capable of iden-tifying structures and sub-structures for an individual 3-dobject, but such qualitative interpretations only have lim-ited use – it is not practical to attempt a classiﬁcation ofshapes by eye when there are many thousands of objects (cid:63) † cﬂ[email protected] to inspect. The preferred alternative is an automated ap-proach including: • decomposition via an appropriate basis set (e.g. Fourieranalysis, wavelet transformations); • partitioning [e.g. Voronoi tesselation - see Icke & vande Weygaert (1987) for an early cosmological application]; • the use of minimal spanning trees to identify connectedstructures (Barrow et al. 1985; Pearson & Coles 1995); • Minkowski functionals [which return global geometricproperties such as volume, surface area and edge density –Mecke, Buchert & Wagner (1994); Sahni, Sathyaprakash &Shandarin (1998)]; and • segementation [e.g. “dendrograms” used by Goodmanet al. (2009) to identify self-gravitating structures in molec-ular clouds].The approach we present in this paper is the exten-sion of the two-dimensional (2-d) shapelet method (Refregier2003) to three dimensions. Shapelets are sets of orthonormalbasis functions based on the Hermite polynomial solutionsof the quantum harmonic oscillator (QHO). Simple analyticforms can be derived for the physical properties of 3-d struc-tures (e.g. centre of mass, root-mean-square radius and the Although, if there are enough individual eyes available to as-sist, then this approach is feasible, as the Galaxy Zoo project( ) has demonstrated.c (cid:13) a r X i v : . [ a s t r o - ph . C O ] D ec C.Fluke et al. components of the quadrupole moment and moment of inter-tia tensors), which can be eﬃciently calculated in shapeletspace.In astronomy, 2-d shapelets have been applied to imagesimulation (Massey et al. 2004; Ferry et al. 2008), the mor-phological classiﬁcation of galaxies (Kelly & McKay 2004;Andrae, Jahnke & Melchior 2011) and sunspots (Young etal. 2005), and weak gravitational lensing (Refregier & Ba-con 2003). The latter includes the measurement of shear(Kuijken 2006), ﬂexion (Goldberg & Bacon 2005), point-spread function modelling and deconvolution (Melchior etal. 2009; Paulin-Henriksson, Referegier & Amara 2009), andweak lensing by large-scale structure from the FIRST ra-dio survey (Chang, Refregier & Helfand 2004). Massey etal. (2007) investigated weak lensing with polar shapelets(Massey & Refregier 2005), a form more suitable for imageswith rotational symmetry. Further properties of shapelets,including integral relations and convolution sums are pre-sented in Coﬀey (2006).The importance of the shapelet approach lies not somuch in the basis functions, but in the simplifed compu-tation of quantities relating to shape and structure thatcan be determined once a shapelet decomposition has beenobtained. These analytic quantities are expressed as linearsums of weighted shapelet states, greatly reducing the cal-culation complexity compared to (numerically) solving therelated integral formulations.Shapelet decomposition is not without its problems [seeBerry, Hobson & Withington (2004) for an extensive discus-sion]. Melchior, Meneghetti & Bartelmann (2007) examinedthe limitations of shapelet image analysis in cases where theorthonormality condition [see equation (6) below] fails, andproposed a decomposition procedure that preserves physi-cal properties of images. Melchior et al. (2010) and Bosch(2010) considered problems with using circular Gaussian ba-sis functions to model galaxies with high ellipticity or a largeS´ersic index. Ngan et al. (2009) proposed an alternative or-thonormal basis based on the S´ersic proﬁle (hence S´ersiclets)for use in weak lensing analysis. While helping to avoid is-sues with poor shape recovery from overﬁtting low signal-to-noise galaxies, and ﬁtting with too many degrees of freedom,S´ersiclets do not possess the analytic properties of shapelets,and the basis functions must be generated numerically. In-deed, it is the existence of analytic functions that has mo-tivated our choice of 3-d Cartesian shapelets as an appro-priate tool for quantifying properties of three-dimensionalstructures.The remainder of this paper is set out as follows. In Sec-tion 2, we present the mathematics of 3- and d -dimensionalCartesian shapelets. New analytic expressions are presentedfor several important physical properties of 3-d structuresin Section 3. In Section 4, we describe issues relating toimplementing an eﬃcient 3-d shapelet decomposition code.We highlight the inherent high-degree of paralellism in theshapelet decomposition algorithm, which makes it a promis-ing target for graphics processing units. In Section 5, wepresent ﬁrst applications of 3-d shapelets to problems incosmological simulations, with an emphasis on studying sub-structure in dark matter halos, demonstrating how an auto-mated shape classiﬁer can work in shapelet space. We endwith a summary and outlook for 3-d shapelets in astronomyin Section 6. In this section we present the Cartesian shapelet formalism.For full details of the one- and two-dimensional cases, andapplications, see Refregier (2003).

The one-dimensional (1-d) shapelet functions are B n ( x ; β ) ≡ β − / φ n ( β − x ) , (1)where β is a scaling length, n is a non-negative integer and φ n ( x ) ≡ (cid:16) n π / n ! (cid:17) − / H n ( x ) e − x / (2)with H n ( x ) the n -th order Hermite polynomial. Higher ordershapelets can be obtained using the recursion relation (seeAppendix A for some useful expressions): B n ( x ; β ) = (cid:18) xβ (cid:19) (cid:114) n B n − ( x ; β ) − (cid:114) n − n B n − ( x ; β ) (3)where B ( x ; β ) = β − / π − / e − x / β , (4)and B ( x ; β ) = √ xβ B ( x ; β ) . (5)The 1-d shapelets form an orthonormal basis, satisfying: (cid:90) ∞−∞ B n ( x ; β ) B m ( x ; β )d x = δ nm (6)where δ nm is the Kronecker delta symbol. Shapelets aresmooth and continuously diﬀerentiable everywhere. Theshapelet coeﬀecients for a suﬃciently well-behaved 1-d func-tion, f ( x ), are found through the integral: f n = (cid:90) ∞−∞ f ( x ) B n ( x ; β )d x (7)allowing the function to be re-written as a sum of (weighted)shapelets: f ( x ) = ∞ (cid:88) n =0 f n B n ( x ; β ) . (8)As we show in Section 4, the calculation of f n poses themain computational challenge. In practice, n is limited to n (cid:54) n max and the integral of equation (7) is calculatedover a ﬁnite volume. However, the orthonormality condi-tion assumes inﬁnite support - so power from higher ordershapelets may be lost, and the orthonormality requirementmay no longer strictly hold if the integration region is toosmall (Melchior et al. 2007). Using the orthonormality of 1-d shapelet functions, the basisfunctions for 2-d shapelets are (Refregier 2003): B , n ( x ; β ) ≡ β − φ , n ( β − x ) . (9)where φ , n ( x ) ≡ φ n ( x ) φ n ( x ) (10) c (cid:13) , 1–19 hree-dimensional shapelets Figure 1.

Examples of three-dimensional Cartesian shapelets ( β = 1). Top row: (left) n = (0 , , n = (0 , , n = (2 , , n = (1 , , f max , and generate 10 equally spacediso-surfaces over the range ( − f max , f max ). Individual isosurfaces are coloured with a two-ended intensity colour map: blue → black → orange. with x = ( x , x ) and n = ( n , n ). Consequently, the ex-tension to 3-d Cartesian shapelets is now almost trivial: B , n ( x ; β ) ≡ β − / φ , n ( β − x ) . (11)where φ , n ( x ) ≡ φ n ( x ) φ n ( x ) φ n ( x ) (12)with x = ( x , x , x ) and n = ( n , n , n ). The 3-d Carte-sian shapelet coeﬀecients have the form: f , n = (cid:90) V f ( x ) B , n ( x ; β )d x (13)with the integration occuring over the inﬁnite volume of thedomain, V , and the 3-dimensional shapelet decompositionis: f ( x ) = ∞ (cid:88) n ,n ,n f , n B , n ( x ; β ) . (14) c (cid:13) , 1–19 C.Fluke et al.

We present examples of 3-d Cartesian shapelets in Fig. 1,using equally-spaced isosurfaces, and a two-ended intensitycolour-map ranging from blue (negative values) to black(zero) to orange (positive values).Two further useful quantities are the characteristic ra-dius of a 3-d shapelet: θ , max ≈ β ( n max + 3 / / , (15)and the size of small scale oscillatory features: θ , min ≈ β ( n max + 3 / − / . (16)These expressions are based on well known quantum me-chanics results for the QHO, and are the 3-d versions ofthe expressions presented in Refregier (2003). They providea starting point for determining appropriate decompositionparameters, as discussed in Section 4.2. d -dimensional Cartesian shapelets It is straightforward to infer that the d -dimensional gener-alisation of the shapelet basis functions is: B d, n ( x ; β ) ≡ β − d/ φ d, n ( β − x ) (17)with φ d, n ( x ) ≡ d (cid:89) i =1 φ n i ( x i ) . (18)We can then write a general orthonormality condition: (cid:90) V B d, n ( x ; β ) B d, m ( x ; β )d d x = d (cid:89) i =1 δ n i m i , (19)the shapelet coeﬀecients have the form: f d, n = (cid:90) V f d ( x ) B d, n ( x ; β )d d x (20)and the d -dimensional shapelet decomposition is: f d ( x ) = ∞ (cid:88) n ,n ,...,n d =0 f d, n B d, n ( x ; β ) . (21)In d -dimensions, the characteristic sizes are: θ d , max ≈ β ( n max + d/ / (22)and θ d , min ≈ β ( n max + d/ − / . (23)We note that Coﬀey (2006) refers to the d -dimensionalsolutions of the harmonic oscillator, but does not presentspeciﬁc d -dimensional results in the form we use. Refregier (2003) demonstrates how analytic expressions canbe obtained for common properties of 2-d images. We nowderive analytic expressions for physical properties of 3-dstructures using 3-d Cartesian shapelets, and their gener-alisation to d -dimensions. The zeroth moment, M , of an arbitary (well-behaved) func-tion, f ( x ), in three dimensions is M ≡ (cid:90) V f ( x )d x. (24)Writing this in terms of the shapelet coeﬃcients, using equa-tion (14), and the orthonomality condition, equation (6): M = ∞ (cid:88) n ,n ,n f , n (cid:90) ∞−∞ B n d x (cid:90) ∞−∞ B n d x (cid:90) ∞−∞ B n d x (25)= π / β / (cid:88) n ,n ,n f , n U , n W , n , (26)where U n ,n ,n ≡ (3 − n − n − n ) / , (27)and W n ,n ,n ≡ (cid:20)(cid:18) n n / (cid:19) (cid:18) n n / (cid:19) (cid:18) n n / (cid:19)(cid:21) / , (28)are factors that recur in the analytic expressions to follow.We have used the integral property [see equation (17) ofRefregier (2003)] for even n : J n ≡ (cid:90) ∞−∞ B n ( x ; β )d x = (cid:16) − n π / β (cid:17) / (cid:18) nn/ (cid:19) / , (29)while for odd n , the integrals in equation (25) vanish as B n ( x ; β ) is an odd function.For applications in image processing, Refregier (2003)identiﬁes total ﬂux, F , with the 2-d zeroth moment. In 3-d,a more natural association might be made with total mass, M , for an object with density ﬁeld, f ( x ) = ρ ( x ). The centroid position of a 3-d object is:ˆ x i ≡ M (cid:90) V x i f ( x )d x (30)for i = 1 , ,

3. The orthonormality condition enables us towrite the series expansion as (for clarity, we only show resultsfor ˆ x ):ˆ x = 1 M ∞ (cid:88) n ,n ,n f , n J n J n (cid:90) ∞−∞ x B n d x . (31)Using the recursion relation, equation (A3), and the factthat (cid:90) ∞−∞ d B n i d x i d x i = 0 , (32)gives the intermediate resultˆ x = √ βM ∞ (cid:88) n ,n ,n f , n J n J n (cid:90) ∞−∞ √ n + 1 B n +1 d x . (33)With the notation introduced above, we haveˆ x = π / β / M (cid:88) n even (cid:88) n ,n f , n √ n + 1 U n ,n ,n W n +1 ,n ,n (34)and similar results for ˆ x and ˆ x . c (cid:13) , 1–19 hree-dimensional shapelets The root-mean-square (RMS) radius of a 3-d object is: r ≡ M (cid:90) V x f ( x )d x, (35)where x = | x | = (cid:112) x + x + x , gives an estimate of thephysical extent of the object under investigation. Substitut-ing equation (14) into the above, and using equation (6): r = 1 M (cid:88) n ,n ,n f , n (cid:20) J n J n (cid:90) ∞−∞ x B n d x (36)+ J n J n (cid:90) ∞−∞ x B n d x + J n J n (cid:90) ∞−∞ x B n d x (cid:21) From equation (A4) and noting that (cid:90) ∞−∞ d B n i d x i d x i = 0 , (37)we have r = 2 π / β / M (cid:88) n ,n ,n f , n ( n + n + n + 3 / × U n ,n ,n W n ,n ,n . (38) The quadrupole moment tensor is: Q ij ≡ (cid:90) V f ( x ) (cid:0) x i x j − x δ ij (cid:1) d x, (39)which is symmetric and traceless, so that there are onlyﬁve independent elements. Performing the same calculationsas in the previous section, the diagonal components of thequadrupole moment tensor are: Q = 2 π / β / (cid:88) n ,n ,n f , n (2 n − n − n ) U n ,n ,n W n ,n ,n . (40) Q and Q have a similar form. The oﬀ-diagonal compo-nents are Q = 3 π / β / (cid:88) n ,n even (cid:88) n f , n (cid:112) ( n + 1)( n + 1) (41) × U n ,n ,n W n +1 ,n +1 ,n and similarly for the other Q ij with i (cid:54) = j . For the special case where f ( x ) = ρ ( x ) represents a mass-density ﬁeld, we can calculate the moments of interia. Themoment of interia tensor describes all moments of interiaof an object about diﬀerent axes of rotation, usually calcu-lated with respect to the centre of mass of the object. Incomponent form: I ij ≡ (cid:90) V f ( x )( x δ ij − x i x j )d x. (42)In coeﬀecient space, the diagonal elements of the interia ten-sor are: I = 2 π / β / (cid:88) n ,n ,n f , n ( n + n +1) U n ,n ,n W n ,n ,n (43) and similarly for I and I . The oﬀ-diagonal elements are I = − π / β / (cid:88) n ,n even (cid:88) n f , n (cid:112) ( n + 1)( n + 1) (44) × U n ,n ,n W n +1 ,n +1 ,n and similarly for the remaining elements. Refregier (2003) demonstrates how shapelet coeﬀecients aremodiﬁed under a general coordinate transformation in termsof a set of operators generating rotation, convergence, shearand translation. As we have not used the operator formula-tion explicitly elsewhere in the present work, we choose notto introduce this approach now. Instead, we treat simple co-ordinate transformations in terms of a modiﬁcation of theintegral in equation (13).Consider an arbitrary (small) coordinate transforma-tion: x → x (cid:48) = (1 + Ψ ) x + (cid:15) (45)where (cid:15) is a translation, and Ψ is a 3 × f T , n , we must solve the integral: f T , n (cid:39) (cid:90) V f ( x − Ψ x − (cid:15) ) B , n ( x ; β )d x (46)for each n , which is ﬁrst order in Ψ . We introduce trans-formed coordinates, and a new set of shapelet basis func-tions, B , n ( x ; β ) → B , n ( x (cid:48) − Ψ x (cid:48) − (cid:15) ) . (47)In general, the relevant integral expressions for transformedcoordinates must be calculated numerically. We can gaininsight into the eﬀect of simple transformations by consid-ering the eﬀect of translations and dilations on the shapeletground state, B , n =(0 , , ( x ; β ). The eﬀect of a (small) translation, (cid:15) = ( (cid:15) , (cid:15) , (cid:15) ), on theshapelet coeﬃcients is: f T , n (cid:39) (cid:90) V (cid:48) f ( x (cid:48) ) B , n ( x − (cid:15) )d x (cid:48) . (48)As an example, we solve this for the n -tuples: (0 , , , ,

0) and (2 , , f T , n =(0 , , = e − (cid:15) / β e − (cid:15) / β e − (cid:15) / β (49) f T , n =(1 , , = − (cid:15) β √ e − (cid:15) / β e − (cid:15) / β e − (cid:15) / β (50) f T , n =(2 , , = (cid:15) β √ e − (cid:15) / β e − (cid:15) / β e − (cid:15) / β . (51)(52)As expected, shapelet power is transformed from the groundstate to higher-order shapelet terms. In all cases, if any ofthe (cid:15) i = 0, then the orthornormality condition, equation (6),prevails. c (cid:13)000

Next, we consider a transformation that is a pure dilation: Ψ = κ =  κ κ

00 0 κ  (53)where all the | κ i | (cid:28)

1. Transformed shapelet coeﬀecientsare: f T , n (cid:39) (cid:82) V (cid:48) f ( x (cid:48) ) B , n ( x (cid:48) − Ψ x (cid:48) )d x (cid:48) (1 + κ )(1 + κ )(1 + κ ) (54)Using the ground state shapelet and the same n -tuples aspreviously, we ﬁnd: f T , n =(0 , , = 2 / [2 + κ (2 + κ )] − / (55) × [2 + κ (2 + κ )] − / [2 + κ (2 + κ )] − / f T , n =(1 , , = 0 (56) f T , n =(2 , , = 2 κ (2 + κ ) [2 + κ (2 + κ )] − / (57) × [2 + κ (2 + κ )] − / [2 + κ (2 + κ )] − / Since the ground state is a symmetric shape, under a dila-tion, the odd shapelet coeﬃcients vanish.

The same approach can be used for rotations about the co-ordinate axes, which are deﬁned in terms of the standard3 × R ( θ ) =  θ − sin θ θ cos θ  , (58)and similarly for rotations about the x -axis, R ( θ ), and x -axis, R ( θ ). A sequence of rotations can be combinedinto a single general rotation matrix, R x ( θ ). The coordi-nate transformations for rotations are tractable but morecomplex algebraically than for translations and dilations –equations (48) and (54). Rather than providing a generalanalytic form for the rotations, we instead demonstrate theresulting change in amplitude of shapelet coeﬀecients underan abitrary rotation in Section 5.2, in particular Figs. 6-8. d -dimensional expressions We can use the results from the previous sub-sections toobtain analytic expressions in d -dimensions. The zeroth mo-ment is: M = π d/ β d/ (cid:88) n ,n ,...,n d f d, n U d, n W d, n (59)where now U n ,n ,...,n d = 2 ( d − (cid:80) di =1 n i ) (60)and W n ,n ,...,n d = (cid:34) d (cid:89) i =1 (cid:18) n i n i / (cid:19)(cid:35) / . (61) The centroid is:ˆ x = π d/ β ( d +2) / M (cid:88) n even (cid:88) n ,...,n d f d, n (cid:112) ( n + 1) (62) × U n ,n ,...,n d W n +1 ,n ,...,n d , and similarly for ˆ x , . . . , ˆ x d . Finally, with x = | x | = (cid:112) x + x + . . . + x d , we have the d -dimensional RMS ra-dius: r = 2 π d/ β d/ M U n ,...,n d W n ,...,n d (63) × even (cid:88) n ,...,n d f d, n (cid:18) n + n + . . . + n d + d (cid:19) . We do not attempt to derive d -dimensional equivalentsof the quadrupole moment or moment of inertia tensors,as these are more natural quantities in three-dimensions.However, the generalised approach we have demonstratedcan be applied to other properties deﬁned as d -dimensionalintegrals of f d ( x ). Before we can use the analytic expressions of Section 3to study three-dimensional objects, we need to obtain theshapelet coeﬃcients. In this section, we discuss some of theissues in implementing an eﬀecient 3-d shapelet decomposi-tion code.

In applications to image simulation (Massey et al. 2004;Young et al. 2005) and gravitational lensing (Refregier & Ba-con 2003; Goldberg & Bacon 2005; Kuijken 2006), shapeletquantities are calculated for a pixel grid of image intensities,which is often obtained as a ‘postage stamp’ region selectedfrom a larger image. For the 3-d case, we use a regular cubicmesh of voxels (volume elements).The discrete sampling of the 3-d structure onto a meshmeans we need to integrate each shapelet term over the phys-ical size of a voxel, under the assumption that the data valuein the voxel is constant. This is valid for data that is alreadyon a grid (e.g. from a mesh-based simulation), and can beachieved for point-based data by smoothing to the grid withan appropriate smoothing scheme.For integration over a ﬁnite cubic volume, ˆ V , over spa-tial range x min to x max (and similarly for y and z ), equation(13) is replaced by a summation over N g voxels: f , n = N g ,N g ,N g (cid:88) i,j,k f ijk (cid:90) ˆ V ijk B n ( x )d x (64)where our grid-based 3-d shape has a constant value in eachvoxel, f ijk . The volume is assumed to be suﬃciently largethat the f ijk → I n ( i ) = (cid:90) ba B n ( x )d x. (65) c (cid:13) , 1–19 hree-dimensional shapelets where the index, 1 (cid:54) i (cid:54) N g , speciﬁes the one-dimensionalvoxel coordinate, and hence the integration limits on theboundaries of the i th voxel are: a = x min + ( i − x (66) b = a + ∆ x, (67)with cell width∆ x = x max − x min N g . (68)This allows us to write equation (64) as a sum over all voxels: f , n = N g ,N g ,N g (cid:88) i,j,k f ijk I n ( i ) I n ( j ) I n ( k ) (69)providing a set of shapelet coeﬀecients that are used to cal-culate the analytic quantities of Section 3.Equation (65) has recursion solutions I n ( i ) = − β (cid:114) n [ B n − ( x )] ba + (cid:114) n − n I n − ( i ) (70)with I ( i ) = (cid:114) βπ / (cid:20) erf (cid:18) xβ √ (cid:19)(cid:21) ba (71) I ( i ) = − β √ B ( x )] ba . (72) A key problem is the choice of parameters, ( β , n max , x c ),to perform an optimal shapelet decomposition. We use thenotation x c to refer to the best-ﬁtting object centroid, as op-posed to the shapelet reconstructed value, ˆ x . A good choiceof parameters will ensure compact representation of the orig-inal data in coeﬃcient space, while retaining high accuracy.Well chosen parameters will also exclude any noise that maybe present in the data. As Melchior et al. (2007) highlightedfor the 2-d case, shapelet decompositions may appear goodvisually, so it is important to deﬁne an appropriate goodnessof ﬁt, particularly as shapelet space can be highly degener-ate. The β parameter is the characteristic scale of the objectto be decomposed. Increasing β has the eﬀect of increasingthe amplitude of the shapelets and dilating them along allcoordinate axes. Changing the amplitude of the shapeletshas no eﬀect on the optimisation as the obtained coeﬃcientssimply scale in proportion to the change in amplitude, i.e. β is a one-dimensional spatial parameter.The maximum number of coeﬃcients needed relates tothe complexity of the data. A value of n max that is too lowwill likely result in loss of information regarding the small-est features; if n max is too high, noise and arbitrary high-frequency variations will be reproduced. Moreover, with in-creasing n max , the range of β and x c values that give vi-able solutions increases. This is because the additional co-eﬃcients can compensate for a poor choice of β and x c . Itis therefore important that a minimum optimal n max value There is an error in the factors of β in equation (32) of Massey& Refregier (2005), which is corrected in the arXiv version of theirpaper: arXiv:astro-ph/0408445. is used, while not resulting in signiﬁcant loss of structuralinformation, along with the optimal β and x c values.To determine appropriate n max and β values, we solvefor the two unknown quantities in equations (15) and (16): n max = θ , max θ , min −

32 (73)and β = (cid:112) θ , max θ , min . (74)Consider a voxel grid centred on the coordinate originwith major axis length, x max = − x min , which is taken tobe twice the maximum particle distance, θ , max , from thecoordinate origin. In this case, the cell width is:∆ x = 2 x max N g . (75)Choosing θ , min = ∆ x/

2, it follows that n max = ( N g − / β = x max / (cid:112) N g . (77)For speciﬁc applications, convergence studies may be a moreappropriate way to select initial estimates for n max and β ,and the size of the data ‘padding’ region.To minimise the number of evaluations, and avoid someof the issues of generating shapelet coeﬀecients with toomany orders, we impose the constraint [see Section 3.1 ofRefregier (2003)]:0 (cid:54) ( n + n + n ) (cid:54) n max . (78)This constraint means that the total number of shapeletterms to be evaluted for a given n max is: N eval = 16 ( n max + 1)( n max + 2)( n max + 3) . (79)This last equation is the d = 3 version of the more generalresult: N eval = (cid:18) n max + dd (cid:19) (80)to obtain the unique set of n values satisfying:0 (cid:54) d (cid:88) m n m (cid:54) n max . (81)Coeﬃcient-based measurements may produce inaccu-rate results in cases when the chosen n max , β and x c val-ues result in a reconstructed shape that is truncated by thebounding cube of the original data grid (in other words,when the reconstructed shape is bigger than the originaldata). Ensuring that the parameter bounds previously out-lined are not traversed, i.e. through the use of the paddingregion, will prevent this from occurring. Moreover, estimatesof the ˆ x and r may fail if M = 0, since they dependon the reciprocal of the zeroth moment. This may occur forvalues of β that are too large.Further discussion of strategies for optimal shapelet de-composition are beyond the scope of this paper – see Massey& Refregier (2005) for an approach based on the steepest de-scent method. We now investigate the decomposition processfrom an algorithmic viewpoint, and consider opportunities c (cid:13)000

The algorithm for obtaining a shapelet decomposition for avoxellated structure is:(i) Choose the target grid resolution, N g , and desired n max , which constrain the initial choice of β .(ii) Generate an array of shapelet amplitude estimates, f , n , with N eval entries (i.e. the minimum number that mustbe calculated), and initialise to zero-values.(iii) Calculate the 1-dimensional I n ( i ) terms for all ordersup to n max , resulting in n max N g stored values of I n ( i ).(iv) Loop over the elements of the n vector, subject tothe constraint of equation (78), then:(a) For each set of n values, loop over N g cells withindices ( i, j, k ) and calculate the quantity: f , n := f , n + I n ( i ) × I n ( j ) × I n ( k ) × f ijk . (82)(v) Output the shapelet amplitudes for further processingand analysis.The process for reconstructing a three-dimensionalshape from its shapelet coeﬀecients proceeds as follows:(i) Create an empty shape, ˆ f ( x ), with dimensions N g ,and zero all ˆ f ijk values.(ii) Loop over n vector, subject to constraint of equation(78), calculating:ˆ f ( x ) := ˆ f ( x ) + B , n ( x ; β ) . (83)An optional ﬁlter can be applied in the reconstructionby only adding the contributions from shapelet terms where f , n meets a prescribed criteria. Such an approach may beuseful for removing noise, or to investigate the dependenceof the analytic solutions on a particular shapelet order – seethe example application in Section 5.3.Obtaining a shapelet decomposition of a voxellatedstructure involves computing equation (69) for all N eval coef-ﬁcients. This computation is both very regular and abundantin inherent parallelism – two traits that suggest a strongsuitability for implementation on many-core computing ar-chitectures such as graphics processing units (GPUs).GPUs were originally developed to accelerate the ren-dering of three-dimensional graphics through the use of acustom processor with a highly parallel architecture. GPUsare now capable of supporting general (i.e. non-graphics)computations through the use of software platforms suchas the Compute Uniﬁed Device Architecture (CUDA) fromNVIDIA or implementations of the OpenCL standard.We can assess the suitability of the shapelet algorithmfor a GPU implementation by using an algorithm analysisapproach similar to that of Barsdell, Barnes & Fluke (2010),who noted that the most important considerations for analgorithm on a GPU are: massive parallelism, branching,arithmetic intensity and memory access patterns. To begin with, we assess the amount of parallelism in the shapelet de-composition problem. For simplicity, we assume that data isplaced inside a bounding box such that all of the dimensionsare the same – if there are fewer grid points along one axis,these must be zero-padded to the maximum grid scale.The computation of equation (69) involves a summa-tion over the three coordinate dimensions, i, j, k , for eachshapelet coeﬃcient deﬁned by n , n , n . The computationover n , n , n is therefore entirely (or embarassingly ) paral-lel, as each coeﬃcient can be computed independently. Thesummation over voxels also exhibits inherent parallelism,but requires some coordination between elements. For thisreason we will ﬁrst consider parallelising the shapelet algo-rithm only over the shapelet coeﬃcients, and will assumethe summations are performed sequentially.Parallelising the problem over the shapelet coeﬃcientsdeﬁned by n , n , n allows a maximum of N eval paral-lel threads to work on the problem simultaneously. For n max = 20, this is 1771 threads. While this is likely to exceedthe number of physical processor cores in any current hard-ware architecture, modern GPUs often require an order ofmagnitude more threads in ﬂight before their full potentialis reached. It is therefore likely that some of the summationwill need to be parallelised in addition to the evaluation ofthe shapelet coeﬃcients. One such approach would be tocompute sums over slices of the data volume in parallel, be-fore combining them in a second stage of computation. Thiswould increase the number of threads by a factor of N g ,which would almost certainly saturate the available hard-ware performance.The next concern is branching, which occurs when par-allel threads execute diﬀering instructions as a result of aconditional statement. Besides the application of the con-straint n + n + n < n max , the shapelet decompositionalgorithm does not require any branching operations. Wetherefore conclude that this factor will not signiﬁcantly in-ﬂuence performance on a GPU.Arithmetic intensity is the ratio of arithmetic opera-tions to memory-access operations. A high arithmetic in-tensity means that the GPU’s instruction hardware will befully utilised; a low intensity means that getting data frommemory to the processors will be a bottle-neck and perfor-mance will be limited. The total input data to the shapeletalgorithm scales as O ( N g ), while the computation scalesas O ( N eval N g ). This implies a very high theoretical peakarithmetic intensity of O ( N eval ) ≈ O (1000). This would beachieved by re-using input data f ijk for the computationof many n , n , n values. Assuming such behaviour couldbe eﬀected, the performance would be limited by the arith-metic throughput of the hardware, and we would expect tosee very good performance on a GPU.In practice, the re-use of data is achieved through theexploitation of a cache , which is an area of very fast memoryin which small amounts of data can be stored. On NVIDIAGPUs, the speciﬁc cache we refer to is known as shared mem-ory . By loading a block of f ijk data into shared memory,threads can re-use the data multiple times before having toload another block in. As all of the operations on the in-put data f ijk scale as O ( N g ), there is no diﬀerence betweencacheing a block of any particular shape; for simplicity wetherefore consider cacheing a simple one-dimensional blockof data in the i dimension. In this setup, the j and k indices c (cid:13) , 1–19 hree-dimensional shapelets can remain constant during the computation of the block,which allows the value I n ( j ) I n ( k ) to be pre-computed andstored locally before computation of the block begins. If, inaddition, the value of n is made to remain constant over thelocal group of threads, then the values f ijk I n ( i ) can be pre-computed and stored in shared memory. The computationby each thread of the block of i values then only involves themultiplication and accumulation of two pre-computed val-ues. Multiplication followed by addition also happens to bethe fastest operation available on current GPU hardware.The last concern is the memory access pattern exhibitedby the algorithm. Fortunately, the regularity of the compu-tation means that data are typically accessed in an alignedand contiguous fashion, and there should therefore be noissues in achieving a high memory throughput.To reduce the computational overhead in the evalua-tion of equation (69), the integrals I n can be pre-computedonce for each input shape and stored in look-up tables. Therecursion relation, equation (70), makes it practical to eval-uate and store all shapelet orders up to n max for N g gridcells along one dimension. This involves only O ( n max N g )terms, and could be computed on the CPU without impact-ing on the overall performance of the algorithm. A furtheradvantage of using the recursion relations is that suﬃcientnumerical precision can be maintained, even for high n val-ues. If we calculated each shapelet term independently fromequation (2), then the pre-factors, (2 n n !) − / tend to zerovery rapidly, and for n >

30, cannot be stored suﬃcientlyaccurately in single precision. This requirement is of rele-vance to GPU implementation, as the greatest processingspeed-ups oﬀered by the current generation of GPUs is forsingle precision, and reduces the overall memory required bystoring as 32-bit rather than 64-bit values.Given the strong degree of parallelism exhibited by thealgorithm, the ability to eﬃciently cache the input data andtake advantage of a very high arithmetic intensity, the abilityto pre-compute the shapelet integral terms, and the fact thatthe core of the algorithm can be reduced to simple multiply-add operations, we conclude that an implementation of theshapelet decomposition algorithm on a GPU would likelyachieve a level of performance very near the peak capabil-ity of the hardware. Shapelet decomposition thus stands tobeneﬁt signiﬁcantly from current trends in commodity com-puting hardware, and may have an additional advantageover related methods that are unable to take advantage ofmassively-parallel architectures.The extension of the above algorithm analysis to d -dimensional shapelet decompositions should be straightfor-ward, and we expect the conclusions to remain unchanged;however, implementation complexity is likely to increase,particularly in the general case.For the application domain we now explore, viz. 3-dCartesian shapelet representations of simulated dark mat-ter haloes, we have used a CPU-only implementation of thedecomposition algorithm. If the only use of the shapelet approach was to calculate theanalytic expressions of Section 3, then it would be a some-what ineﬀecient one, compared to direct numerical integra-

Figure 2.

To select an appropriate β scale for classiﬁcation of haloshapes, we calculate the average peak signal-to-noise ratio, (cid:104) P S (cid:105) ,over 200 input haloes (markers; black solid line). Dashed linesrepresent the one-standard deviation error range. On the basis ofthis analysis, we choose β scale = 0 .

8, which presents a reasonablecomprimise to a full optimisation process. tion of equations (24), (30), (35), (39), and (42). The beneﬁtof the shapelet decomposition is that we now have additionalinformation concerning the shape. Opportunities for clas-sifying three-dimensional structures based on the shapeletterms may be made through identiﬁcation of the dominantshapelet terms, or by investigating relative weights of par-ticular shapelet orders. In this section, we demonstrate howthree-dimensional shapelet analysis of dark matter halossuggests a new method for automatically classifying halotypes.

For some time, it has been known that Cold Dark Matter(CDM) cosmologies predict the formation of triaxial haloes(on average), with a slight preference for prolate haloes overoblate ones (Davis et al. 1985; Barnes & Efstathiou 1987;Frenk et al. 1988; Dubinski & Carlberg 1991; Dubinski 1994;Cole & Lacey 1996; Jing & Suto 2002; Kasun & Evrard 2005;Bailin & Steinmatz 2005; Oguri et al. 2005; Allgood et al.2006; Knebe & Wießner 2006; Kuhlen, Diemand & Madau2007). These studies include measuring the distribution ofhalo triaxalities, studying the eﬀects of baryons (which tendto reduce the triaxiality compared to dark matter only mod-els), and investigating the relationships between halo shapesand angular momentum.The purely triaxial treatment of dark matter haloesoverlooks another well-established result from CDM simula-tions: individual haloes do not have a smooth density proﬁle– they contain sub-structure (Lacey & Cole 1993; Moore etal. 1999; Ghigna et al. 2000). While the triaxial nature ofdark matter haloes can be expressed empirically (e.g. Jing& Suto 2002), quantifying the sub-structure remains a chal-lenge. A shapelet-space representation of dark matter haloesprovides a potential solution. c (cid:13)000

Figure 3.

The 12 most massive haloes from most massive (A; top left) to least massive (L; bottom right). Each panel comprises (left)input dark matter halo and (right) shapelet reconstructed halo, displayed as volume renderings of the logarithmic density. The coordinateranges in each panel are not equal, but have been selected for clarity based on ∆ x for each halo. Shapelet parameters were N g = 51, n max = 24, and β values are in Table 1. The strong similarity between the input and shapelet reconstructed versions is apparent. To demonstrate our approach, we use a sample of 200candidate dark matter haloes selected from a cosmological N -body simulation performed with GADGET-2 (Springel2005). The cosmological parameters were Ω = 0 .

27, Λ =0 . h = 0 .

71 and σ = 0 .

9, and candidate haloes were iden-tiﬁed using the SubFind groupﬁnder (Springel et al. 2001).Using particle number, N p , as a proxy for mass, wepay particular attention to the twelve most massive haloes,haloes A–L, and the twelve least massive, haloes M-X, fromthe sample. We consider these two-subsets as being represen-tative of typical halo shapes and presence of sub-structure,along with limiting any mass-dependent biases that may oc-cur. For each halo, the triaxality, T , is calculated using theapproach described in Appendix B, and tabulated in Table1. Further quantities presented in this table are describedbelow.Of the twelve ‘heavy’ haloes, two are oblate ( T (cid:54) / T (cid:62) /

3) and two are triaxial (1 / < T < / N g = 51, n max = 24 and β = β scale x max / (cid:112) N g . To select an appropriate β scale for classiﬁcation of halo shapes, we deﬁnea ﬁtness estimator in terms of the peak signal-to-noise ratio: P s = 20 log (cid:20) Max( f ijk ) √ M s (cid:21) , (84)where Max( f ijk ) is the maximum value in the volume, andthe mean-square error is: M s = 1 N N g (cid:88) i,j,k =1 | f ijk − ˆ f ijk | . (85)We calculate the average peak signal-to-noise ratio, (cid:104) P S (cid:105) ,over 200 input haloes (Figure 2 – markers; black solidline); dashed lines represent the one-standard deviation errorrange. On the basis of this analysis, we choose β scale = 0 . ρ ijk . To deal with the large dynamicrange in ρ ijk , the input shape is actually: f ijk = log (1 + ρ ijk ) . (86) c (cid:13) , 1–19 hree-dimensional shapelets Figure 4.

The 12 least massive haloes from most massive (M; top left) to least massive (X; bottom right). Each panel comprises (left)input dark matter halo and (right) shapelet reconstructed halo, displayed as volume renderings of the logarithmic density. The coordinateranges in each panel are not equal, but have been selected for clarity based on ∆ x for each halo. Shapelet parameters were N g = 51, n max = 24, and β values are in Table 1. The strong similarity between the input and shapelet reconstructed versions is apparent. Since each halo has a diﬀerent mass and hence physi-cal extent, the β value for each halo is diﬀerent – see Ta-ble 1. The other columns in this table are: the cell-width,∆ x , as deﬁned in equation (75); the maximum voxel valuefrom the input shape, I max , and the minimum and max-imum shapelet-recovered values, S min and S max , respec-tively. To enable quantitative comparisons between the in-put and reconstructed shapes we compute the quantititesΣ I = (cid:80) i,j,k f ijk and Σ S = (cid:80) i,j,k ˆ f ijk , and P s . Numericaltesting, where reconstructions were optimised by hand, sug-gested that Σ I ∼ Σ S and P s (cid:62)

45 (Figure 2) represented agood shapelet ﬁt for the grid resolution used.The m -th most dominant shapelet component of the re-construction has n = D m , with amplitude f , D m = f m, max .Except where indicated, D = (0 , , D . The ﬁnal two columns of Table 1 represent theresult of ‘by-eye’ classiﬁcations of the spatial characteristicsof each halo, C ( I ), and the shapelet proﬁles, C ( S ), into thethree halo classes – see Section 5.2 below.Figs. 3 and 4 show the results of the shapelet decom-position. For each halo, the left-hand panel shows the inputshape, and the right-hand panel is reconstructed in shapeletspace. Each image pair presents two-dimensional projectionsof fully three-dimensional, volume rendered structures. Vi-sual comparsion of pairs of images suggests that, qualita-tively, Cartesian shapelets represent an appropriate basis set for decomposition of dark matter haloes. Quantitatively,we ﬁnd that: | ( S max − S min ) /I max − | (cid:54) , (87) | Σ S / Σ I − | (cid:54) , (88)and P S (cid:62)

45, so that even without a halo-speciﬁc optimisi-ation, there is excellent agreement between the input haloand its Cartesian shapelet reconstruction.

The 3-d shapelet approach provides a means to check theoutcome of halo ﬁnding algorithms by identifying classesin shapelet space without needing to visually inspect en-tire halo-candidate catalogues. For 21 of the 24 haloes inTable 1, the dominant component is D = (0 , ,

0) – inmost cases, the zeroth-order shape has a high amplitude,which is not unexpected for haloes centred on the coor-dinate origin. Three haloes (I, S and V), however, receivetheir maximal contribution from a higher-order shapelet, D = ( n , , , n (cid:62)

1. We can use information on the rela-tive contributions of shapelet orders higher than the zerothorder term to enable a shapelet-based classiﬁcation of darkmatter halo shapes.Fig. 5 shows three characteristic patterns in shapelet c (cid:13)000

Table 1.

Summary of halo properties and shapelet decomposition parameters for the sample of 24 dark matter haloes, classiﬁed intoheavy (A-L) and light (M-X) samples. Table columns are: halo identiﬁer; number of particles in halo, N p , a proxy for halo mass; halotriaxiality, T , determined from particle positions; scale parameter, β , used for decomposition; the cell-width, ∆ x , as deﬁned in equation(75); the maximum voxel values from the input shape, I max , and the shapelet-recovered minimum and maximum voxel values, S min and S max ; and Σ I = (cid:80) i,j,k f ijk and Σ S = (cid:80) i,j,k ˆ f ijk are used to characterise the recovered shapes, along with the peak signal-to-noise, P S . The m -th most dominant shapelet component has n = D m , with amplitude f , D m = f m, max , and m = 1 , , . . . ; except whereindicated, D = (0 , , D . Halo classes are assigned ‘by eye’ through inspection of the real-space, C I ,and shapelet-space, C S , representations – see Section 5.2 for details.Halo N p T β ∆ x I max Σ I S min S max Σ S P S f , max D D C I C S A 1425030 0.239 0.85 0.42 4.64 2734.9 -0.22 4.51 2739.8 47.27 15.307 (2,0,0) 2 2B 62492 0.753 0.22 0.11 3.18 2089.5 -0.07 3.11 2093.6 51.29 1.279 (0,2,0) 2 2C 47535 0.707 0.18 0.09 3.17 2455.0 -0.10 3.06 2457.2 50.27 0.900 (2,0,0) 2 2D 45760 0.406 0.17 0.09 3.02 2245.5 -0.07 2.94 2245.1 51.38 0.826 (2,0,0) 2 2E 43091 0.973 0.24 0.12 2.88 1465.3 -0.08 2.81 1465.8 49.73 0.848 (2,0,0) 3 3F 39700 0.871 0.20 0.10 3.01 1715.0 -0.07 2.91 1715.3 51.06 0.905 (2,0,0) 2 2-3G 37735 0.816 0.17 0.08 2.96 2132.9 -0.08 2.89 2133.5 51.01 0.753 (2,0,0) 2 2-3H 35417 0.694 0.21 0.11 2.94 1406.5 -0.08 2.86 1406.5 51.00 0.891 (2,0,0) 2 2I 28290 0.963 0.15 0.07 2.74 1934.3 -0.06 2.62 1933.4 49.38 0.372 (2,0,0) (0,0,0) 3 3J 27189 0.659 0.18 0.09 2.64 1567.1 -0.06 2.55 1568.0 51.98 0.744 (2,0,0) 2 2-3K 21336 0.766 0.12 0.06 2.53 1863.9 -0.07 2.45 1858.4 47.46 0.313 (2,0,0) 3 3-2L 20476 0.144 0.16 0.08 2.63 1184.9 -0.06 2.53 1185.3 54.14 0.569 (2,0,0) 2 2M 864 0.868 0.04 0.02 1.39 230.83 -0.02 1.30 229.85 52.35 0.018 (1,0,0) 2 2-1N 855 0.934 0.05 0.02 1.24 252.54 -0.02 1.15 252.41 51.01 0.016 (2,0,0) 2 2-1O 845 0.716 0.04 0.02 1.15 261.30 -0.02 1.07 261.67 50.01 0.011 (2,0,0) 3 3-2P 834 0.760 0.04 0.02 0.77 271.24 -0.02 0.75 270.65 45.70 0.014 (2,0,0) 2 1Q 829 0.818 0.05 0.02 1.21 237.36 -0.03 1.14 236.67 50.82 0.017 (0,1,0) 2 2R 824 0.859 0.05 0.02 1.09 241.40 -0.02 1.02 240.89 50.11 0.020 (2,0,0) 3 2S 821 0.984 0.06 0.03 1.14 212.54 -0.02 1.09 212.77 52.70 0.024 (2,0,0) (4,0,0) 3 3T 816 0.713 0.04 0.02 1.37 227.73 -0.02 1.27 228.45 53.16 0.018 (2,0,2) 1 1U 797 0.422 0.04 0.02 1.14 229.69 -0.02 1.07 229.43 51.36 0.018 (2,0,0) 2 2-1V 794 0.847 0.05 0.02 1.09 226.23 -0.03 1.02 227.14 49.75 0.013 (4,0,0) (2,0,0) 3 3W 788 0.876 0.04 0.02 1.32 228.40 -0.02 1.25 227.70 52.35 0.017 (1,0,0) 1 1X 778 0.512 0.04 0.02 1.26 214.21 -0.02 1.18 213.40 51.27 0.018 (1,0,0) 1 1 space, consistent with the general appearance of the haloesin Figs. 3 and 4. For each halo, all amplitudes are plottedin index order, with n value varying most rapidly, then n ,and ﬁnally n . The light grey vertical lines indicate valuesof ( n , ,

0) where n = 0 , , . . . , n max ; for n max = 24, thereare N eval = 2925 shapelet coeﬃcients. Shapelet amplitudes,represented by vertical black line segments, are plotted as w n = f , n /f , max , with the six most-dominant shapelet or-ders numbered and coloured red. We propose the followingthree classes: • Class 1 : Halo T (top panel) has a central core, but nosigniﬁcant sub-structure. In shapelet space, it is dominatedby the zeroth-order shapelet, with low amplitudes for higherorders. • Class 2 : Halo D (middle panel) has a central core, andobvious sub-structure. Here, the zeroth-order shapelet againdominates, but there are several higher order shapelets withamplitudes (cid:46) f , max . • Class 3 : Halo I (bottom panel) has signiﬁcant sub-structure and no central core. The zeroth shapelet is nolonger always the dominant term, and there are severalshapelet orders with amplitudes ∼ f , max .The initial alignment of each halo with the x -axis is ap-parent, with obvious contributions from shapelet orders n = ( n , , f , max occuring for the zeroth-order shapelet, retainthis behaviour, while the power in higher shapelet orders isdistributed away from the ( n , ,

0) values. This is not unex-pected from the behaviour of 2-d shapelets under rotations(see Refregier 2003). Rotation of Halo 102 (Fig. 8), withtwo clear components, results in variation in the highest-amplitude shapelet coeﬀecients, suggesting the following fea-tures for identiﬁcation of haloes of this type: either f , max oc-curs for a shaplet order other than the zeroth-order, or thereare one or more shapelet orders with amplitudes (cid:38) . f , max .We use this heuristic to now attempt a purely (by-eye)shapelet-based selection of haloes with clear multiple sub-structures (Class 3). We apply the shapelet decompositionwith the same input parameters as used throughout this ini-tial implementation, to a total of 176 haloes. Particle countsfor this new set of haloes are in the range 865 (cid:54) N p (cid:54) c (cid:13) , 1–19 hree-dimensional shapelets Figure 5.

Three characteristic shapelet-space representations of dark matter haloes. Shapelet coeﬃcient amplitudes are plotted inindex order, with n value varying most rapidly, then n , and ﬁnally n – the light grey vertical lines indicate values of ( n , , n = 0 , , . . . , n max . Shapelet amplitudes, represented by vertical black line segments, are w , n = f , n /f , max , with the six mostdominant shapelet orders numbered and coloured red. (Top) Halo T, Class 1 – central core, no signiﬁcant sub-structure – dominated byzeroth-order shapelet, low-amplitude for higher orders. (Middle) Halo D, Class 2 – central core, signiﬁcant sub-structure – f max occursat n = (0 , ,

0) and several higher order shapelets have amplitudes ∼ f max . Bottom) Halo I, Class 3 – signiﬁcant sub-structure, nocentral core – zeroth-order shapelet is not the dominant term (although in other Class 3 haloes, it can still be dominant), several orderswith amplitudes ∼ f max . expected spatial characteristics. Visual inspection of the re-maining 132 haloes suggests a futher 10 haloes that shouldhave been identiﬁed from their shapelet representations. Inall cases, reinvestigation of the shapelet distribution revealedthat they were very close to meeting the criteria for a Class3-halo. While a more robust approach to classiﬁcation is re-quired for a full implementation (e.g. using an appropriately-sized training set and the construction of a decision tree orneural network classiﬁer), our results do suggest that thereis beneﬁt to performing classiﬁcation of dark matter haloshapes in shapelet space.The existence of multiple cores in the dark matterhaloes has implications for computation of halo triaxiality –the classiﬁcation of ‘heavy’ haloes E, F, I, K as prolate based purely on the principle moments of inertia is somewhat mis-leading; in each case, an argument could be made that anisolated group has not been identiﬁed using, in this case,the Subﬁnd algorithm. Our application to 176 haloes identi-ﬁes 44(+10) prolate haloes where the inferred triaxality wasbased on counting potentially distinguishable sub-haloes asa single halo. A shapelet-based automated classiﬁer providesa method of identifying such haloes without needing to vi-sually inspect each halo. As an example of quantitative analysis in shapelet space,we calculate the moment of inertia tensors from equation c (cid:13)000

Figure 6.

Halo 100, N p = 2068, T = 0 .

80 (prolate), Class 2. The left-hand column shows four real-space conﬁgurations of the halo,with arbitrary rotations about the centre-of-mass. The right-hand column shows the corresponding shapelet-space conﬁguration; w n = f , n /f , max , and the horizonal axis represents the sequential coeﬃcients, n – see Section 5.2. General properties of the distribution ofshapelet amplitudes are preserved regardless of orientation. (43)–(45) and hence triaxiality, T . In Fig. 10, we plot theshapelet-based triaxiality, T S , against the value determinedfrom the original particle positions, T , for the sample of 176intermediate mass haloes. A least-squares ﬁt to the data(solid line) gives T S = 0 . T + 0 .

02 with the Pearson co-eﬀecient, r = 0 .

98. In Fig. 11, we plot the ratio of T S /T against the particle number, N p , which suggests that thereis a slightly larger scatter for the lower mass haloes. We ﬁndthat (cid:104) T s /T (cid:105) = 0 . ± .

12, where the error is the samplestandard deviation. Even though we have not performed aper-halo optimisation for ( β , n max , x c ), the shapelet-basedanalytic result does indeed provide a very good estimatorfor the halo triaxiality. We have extended the two-dimensional Cartesian shapeletformalism of Refregier (2003) to three dimensions, deriv-ing analytic expressions for the zeroth moment, object cen-troid, root-mean-square radius, and the components of thequadrupole moment and moment of inertia tensors. We alsopresented generalisations to d -dimensions.Further work is necessary to develop a robust and sys-tematic optimisation strategy for the decomposition pa-rameters, and the development of specﬁc applications forthe three-dimensional shapelet technique requires such astrategy. There are also opportunities to develop the for-malism further, speciﬁcally extending it to include spher-ical shapelet functions [c.f. the alternative presentation of c (cid:13) , 1–19 hree-dimensional shapelets Figure 7.

Halo 101, N p = 2047, T = 0 .

92 (prolate), Class 1. The left-hand column shows four real-space conﬁgurations of the halo,with arbitrary rotations about the centre-of-mass. The right-hand column shows the corresponding shapelet-space conﬁguration; w n = f , n /f , max , and the horizonal axis represents the sequential coeﬃcients, n – see Section 5.2. General properties of the distribution ofshapelet amplitudes are preserved regardless of orientation. two-dimensional Cartesian shapelets as polar shapelets byMassey & Refregier (2005)].The shapelet decomposition algorithm exhibits at-tributes that make it an ideal target for implementation onmodern, massively-parallel GPUs. Our algorithm analysisdemonstrates that the computation is entirely (or embarass-ingly) parallel; has minimal or no branching; maintains ahigh ratio of arithmetic operations to memory-access oper-ations; and has a memory access pattern that will result inaligned or contiguous access to memory, required for achiev-ing a high memory throughput. With our proposed schemeof precomputing shapelet voxel-integral terms, the computa-tion reduces to a parallel series of multiply-add operations,which are almost ideal for GPUs – we anticipate achievingclose to peak processing performance. Signiﬁcantly reduc-ing the computation time for the shapelet decomposition, compared to CPU, means that more processing time is thenavailable for optimisation.As an example application, we have demonstrated howthree-dimensional shapelets can be used to study the com-plex sub-structures of dark matter haloes from cosmological N -body simulations, including providing an alternative ap-proach to classifying the properties of haloes. Our prelimi-nary investigation suggests that halo triaxiality measuredpurely from the moment of inertia tensor may be incor-rect due to limitations of group ﬁnders that are not ableto separate out what may be truly distinct sub-clumps. Im-provements to our current ‘by eye’ approach to classiﬁcationcould include development of a decision tree or neural net-work classiﬁer, or the use of principle component analysis tosigniﬁcantly reduce the number of shapelet terms requiredfor classiﬁcation (Kelly & McKay 2004). c (cid:13)000

Figure 8.

Halo 102, N p = 2005, T = 0 .

91 (prolate), Class 3. The left-hand column shows four real-space conﬁgurations of the halo,with arbitrary rotations about the centre-of-mass; the right-hand column shows the corresponding shapelet-space conﬁguration; w n = f , n /f , max , and the horizonal axis represents the sequential coeﬃcients, n – see Section 5.2. General properties of the distribution ofshapelet amplitudes are preserved regardless of orientation. The shapelet formalism is virtually unexplored in thethree-dimensional domain, oﬀering opportunities for the fur-ther development of a methodology that can be used toquantify and analyse complex three-dimensional structures.Future applications of the three-dimensional shapelet te-chinique may include classiﬁcation and parameterisation ofsources identiﬁed in H i spectral line data cubes; studying theshapes of voids in cosmological simulations (by consideringan inverted density ﬁeld); and the possibility to generatemock dark matter haloes through an extensive study of thedistribution of shapelet amplitudes as a function of massand triaxiality. ACKNOWLEDGMENTS

This research was supported under the Australian ResearchCouncil’s Discovery Projects funding scheme (project num-ber DP0665574). PL is supported by the Alexander vonHumboldt Foundation. CJF is grateful to Michael Vannerand Toﬀa Beer for their contributions to this work. Wethank Chris Power for providing the dark matter halo sam-ple, David Bacon for early discussions on shapelets, and ourreferee for his insightful comments. Three-dimensional vi-sualisation was conducted with the S2PLOT progamminglibrary (Barnes et al. 2006). c (cid:13) , 1–19 hree-dimensional shapelets Figure 9.

From a visual investigation of 176 haloes in shapelet coeﬀecient space, 44 were selected as having obvious sub-structure andno central core. 36 of these haloes are shown here. Visual inspection of the remaining 130 haloes in real space suggests that an additional10 should have been identiﬁed as Class 3. Further inspection in shapelet space conﬁrmed the limitation of a ‘by-eye’ classiﬁer.

REFERENCES

Allgood B., Flores R.A., Primack J.R., Kravtsov A.V.,Wechsler R.H., Faltenbacher A., Bullock J.S., 2006, MN-RAS, 367, 1781Andrae, R., Jahkne, K., Melchior, P., 2011, MNRAS, 411,385Bailin J., Steinmatz M., 2005, ApJ, 627, 647Barrow J.D., Bhavsar S.P., Sonoda D.H. 1985, MNRAS,216, 17Barnes D.G., Fluke C.J., Bourke P.D., Parry O.T., 2006,PASA, 23, 82 Barnes, J., Efstathiou, G., 1987, ApJ, 319, 575Barsdell, B.R., Barnes D.G., Fluke C.J., 2010, MNRAS,408, 1936Berry R.H, Hobson M.P., Withington S., 2004, MNRAS,354, 199Bosch, J., 2010, AJ, 140, 870Chang T.-C., Refregier A., Helfand D.J., 2004, ApJ, 617,794Coﬀey M.W., 2006, JPhys A: Math Gen, 39, 877Cole S., Lacey C., 1996, MNRAS, 281, 716Davis, M., Efstathiou, G., Frenk, C.S., White, S.D.M., c (cid:13)000

Figure 10.

The shapelet-based triaxiality, T S , plotted againstthe value determined from the original particle positions, T , forthe sample of 176 intermediate mass haloes. A least-squares ﬁtto the data (solid line) gives T S = 0 . T + 0 .

02 with the Pearsoncoeﬀecient, r = 0 . Figure 11.

The ratio of the shapelet-based triaxiality to theparticle-based value, T S /T , plotted against the number of parti-cles, N p , in each of 176 intermediate mass haloes. We ﬁnd that (cid:104) T s /T (cid:105) = 0 . ± .

12, where the error is the sample standarddeviation.

APPENDIX A: HERMITE POLYNOMIALS

We collect here a number of key expressions relating to Her-mite polynomials, which prove useful in deriving the analyticproperties of Section 3.Expressing the Hermite polynomials via the Rodriguesformula H n ( x ) = ( − n e x d n dx n (cid:16) e x (cid:17) , (A1)one can show the important recursion relation H n +1 ( x ) = 2 xH n ( x ) − dH n ( x ) dx , (A2) c (cid:13) , 1–19 hree-dimensional shapelets which further implies the shaplet basis functions satisfy: (cid:112) n + 1) βB n +1 ( x ; β ) = (cid:18) x − β dd x (cid:19) B n ( x ; β ) (A3)and (cid:18) x − β d d x (cid:19) B n ( x ; β ) = (2 n + 1) β B n ( x ; β ) , (A4)which is the eigenvalue equation. Calculating the derivativeterms, we have the further recurrence relations: H n +1 ( x ) = 2 xH n ( x ) − nH n − ( x ) . (A5) APPENDIX B: TRIAXIALITY AND HALOROTATIONS

Triaxiality of a dark matter halo is most easily expressedin terms of the principle moments of inertia. The princi-ple moments, ( I , I , I ), and the associated principle axes,( e , e , e ), are the eigenvalues and eigenvectors of the mo-ment of inertia tensor, ˆ I , respectively. Approximating anarbitrary halo as a triaxial ellipsoid of the form x a + y b + z c = 1 (B1)with c (cid:54) b (cid:54) a , then I = M b + c ) (B2) I = M a + c ) (B3) I = M a + b ) (B4)and I (cid:54) I (cid:54) I . Moreover, we can calculate the triaxialityparameter (Franx, Illingworth & de Zeeuw 1991): T = a − b a − c = I − I I − I , (B5)enabling us to classify haloes as oblate ( T (cid:54) / / < T < / T (cid:62) / a = b = c ) to have T ≡ x, y, z ), from the cosmological simulation, we make use ofthis to simplify the computation of ˆ I . Speciﬁcally, we com-pute elements of ˆ I from particle positions, using standardeigensystem routines from the GNU Scientiﬁc Library tosolve for the principle eigenvectors and eigenvalues. To en-able comparisons between halos, we rotate each halo so thatits principle axes are aligned with the Cartesian axes. First,we deﬁne an orthogonal coordinate system with unit vectordirections e , e and n = e ⊗ e . (B6)Although e is orthogonal to e and e , explictly calculat-ing the cross-product ensures that we have a right-handedcoordinate system. The Euler angles are: θ = sin − ( − e ,z ) (B7) θ = tan − ( e ,z / n z ) (B8) θ = tan − ( e ,x / y ) . (B9) We use the standard C-function atan2 , which returns theprinciple value tan − ( y/x ), for calculating θ and θ , and isrecommended for converting between rectangular and polarcoordinates.Next, we build a general 3 × R =  C y C z + S x S y S z C x S z − S y C z + S x C y S z − C y S z + S x S y C z C x C z S y S z + S x C y C z C x S y − S x C x C y  , (B10)where C x = cos( θ x ), S x = sin( θ y ) and similarly for y and z ,from which we can determine the inverse matrix, R − , mosteasily via the transpose, R T , and the adjunct matrix of R T .Each particle position, p , in the halo is now rotated aroundthe origin to new coordinates: p (cid:48) = R − p . (B11) c (cid:13)000