Shape Analysis of HII Regions -- II. Synthetic Observations
Justyn Campbell-White, Ahmad A. Ali, Dirk Froebrich, Alfred Kume
MMNRAS , 1–23 (2020) Preprint 12 June 2020 Compiled using MNRAS L A TEX style file v3.0
Shape Analysis of H II Regions – II. Synthetic Observations
Justyn Campbell-White, , (cid:63) Ahmad A. Ali, Dirk Froebrich, Alfred Kume SUPA, School of Science and Engineering, University of Dundee, Nethergate, Dundee DD1 4HN, U.K. Centre for Astrophysics and Planetary Science, The University of Kent, Canterbury, CT2 7NH, U.K. Department of Physics and Astronomy, University of Exeter, Stocker Road, Exeter EX4 4QL, U.K. School of Mathematics, Statistics and Actuarial Sciences, The University of Kent, Canterbury, CT2 7FS, U.K.
Accepted XXX. Received YYY; in original form ZZZ
ABSTRACT
The statistical shape analysis method developed for probing the link between physical param-eters and morphologies of Galactic H II regions is applied here to a set of synthetic observa-tions (SOs) of a numerically modelled H II region. The systematic extraction of H II regionshape, presented in the first paper of this series, allows for a quantifiable confirmation of theaccuracy of the numerical simulation, with respect to the real observational counterparts ofthe resulting SOs. A further aim of this investigation is to determine whether such SOs canbe used for direct interpretation of the observational data, in a future supervised classificationscheme based upon H II region shape. The numerical H II region data was the result of pho-toionisation and radiation pressure feedback of a 34 M (cid:12) star, in a 1000 M (cid:12) cloud. The SOsanalysed herein comprised four evolutionary snapshots (0.1, 0.2, 0.4 and 0.6 Myr), and multi-ple viewing projection angles. The shape analysis results provided conclusive evidence of theefficacy of the numerical simulations. When comparing the shapes of the synthetic regions totheir observational counterparts, the SOs were grouped in amongst the Galactic H II regionsby the hierarchical clustering procedure. There was also an association between the evolution-ary distribution of regions and the respective groups. This suggested that the shape analysismethod could be further developed for morphological classification of H II regions by using asynthetic data training set, with differing initial conditions of well-defined parameters. Key words: H II regions – methods: statistical, data analysis – radio continuum: ISM – hy-drodynamics – radiative transfer Since the advance in high performance computing in the latterpart of the 20th century, astrophysicists have utilised these toolsto perform numerical simulations of all aspects of the Universe.From modelling cloud collapse, star formation (SF) and feed-back (e.g. Robitaille 2011; Bethell et al. 2007; Steinacker et al.2005), to galaxy formation and evolution (e.g. Williams et al.2019; Baes et al. 2011), to entire cosmological models that re-flect the largest scale structure astronomers have ever observed (e.g.Springel 2005). Radiation plays an important role in astrophysics.The transport of radiation through the interstellar medium (ISM) istherefore one of the most fundamental processes to be consideredwhen modelling stellar objects and galactic structures. Analysingthe radiation from an object not only tells us about the nature of theradiation source, but also the medium through which it has trav-elled to reach us. Interstellar dust therefore also plays an importantrole in the study of radiation, since it scatters and re-radiates UVthrough to IR photons (Weingartner & Draine 2001). (cid:63)
E-mail: [email protected]
Quantifiable results can be obtained directly from numericalsimulations and compared with observational results, such as thestellar initial mass function (Padoan et al. 1997). However, thereexists many important reasons why the production of synthetic ob-servations (SOs) from the numerical models are necessary (see theextensive review by Haworth et al. 2018). In the last decade, a num-ber of radiative transfer (RT) models have been used to generatesynthetic observations (SOs) of the numerical simulation they re-late to (Steinacker et al. 2013). The RT codes work by sampling thesimulation at every grid point, for the given dimensionality of thesimulations. Given some density and temperature, the emissivitycan be computed, which is then integrated to obtain the flux. Fluximages can be generated from any viewing angle the user specifies.Such SOs are referred to as ‘ideal synthetic observations’ (Koepferl& Robitaille 2017), which must then be further processed in orderto account for observational effects when detecting and process-ing astronomical radiation. Such resultant ‘realistic’ SOs are thendirectly comparable to their real observational counterparts. Thisallows us to test observational diagnostics that are well defined inthe simulations, and hence produce bespoke models for direct inter-pretation of the observational data. In this work, we use the statisti- c (cid:13) a r X i v : . [ a s t r o - ph . GA ] J un J. Campbell-White et al. cal shape analysis method of H II regions, developed in Campbell-White et al. (2018) to directly compare realistic SOs of an H II re-gion produced by the numerical simulations in Ali et al. (2018) toradio continuum observational data.H II regions are the result of photoionisation from massivestars (> 8 M (cid:12) ). Due to their significant role in providing feed-back to the giant molecular clouds (GMCs), in which these starsare born, they have been extensively modelled in order to probe thevarying physical processes and mechanisms associated with suchfeedback; such as their role in altering gas dynamics and star forma-tion. H II regions have been shown to be fundamental in calculationsof the observed Galactic star formation efficiency (SFE) (Krumholz2015). Photoionisation feedback from H II regions can have bothnegative and positive effects in terms of the local SFE of GMCs.Dale et al. (2007a,b) showed via numerical models of GMCs irradi-ated by ionising stars, both internally and externally, that some starsformed earlier, compared to the control runs without photoionisa-tion feedback. Furthermore, evidence for triggered star formationwas noted, such that the overall SFE of the cloud was increased.Conversely, simulations by Walch et al. (2013) found that althoughtriggering was effective on small timescales, larger timescales re-sulted in a reduced SFE due to the dispersal of the gas, a furtherfeedback mechanism of H II region evolution. More recently, theionising radiation models of Geen et al. (2017) displayed a lowSFE that was consistent with the Galactic observations (of the or-der a few percent, Lada & Lada 2003).In Campbell-White et al. (2018, hereafter, Paper I) we suc-cessfully applied our shape analysis methodology to a selection of1.4 Ghz radio continuum images of H II regions from the MAG-PIS survey (Helfand et al. 2006). The mathematical description ofthe shape of each H II region was systematically extracted from thecontoured radio continuum images. By determining the local curva-ture values along each H II region boundary, curvature distributionswere obtained and compared pairwise using the Anderson-Darlingnon-parametric test statistic. The resulting test statistic distance ma-trix was then the subject of hierarchical clustering, allowing for theidentification of groups of H II regions that share a common mor-phology. From investigation of potential associations between as-signed group and physical parameters, the results showed evidencefor H II regions of a given shape to have similar dynamical ages.We also found indication of ionising cluster mass to be associatedto the shape groupings. We suggested that the application of thisshape analysis method to SOs of H II regions would not only givea direct quantifiable test of the efficacy of the SOs, but also allowus to further refine the methodology and reduce errors with a welldefined sample set. SOs from differing initial conditions could thenbe used as a training set in a machine learning supervised classifi-cation scheme of H II regions, via this shape analysis method.This work is organised as follows: The numerical simulationof the H II region, and corresponding SOs that are analysed in thisinvestigation, from Ali et al. (2018) are detailed in section 2. Insection 3, we cover the shape extraction from the SOs and comparethe shapes of these synthetic H II regions to those of the MAGPISobservational sample from Paper I. In section 4 we investigate howdifferent parameters such as noise and projection angle influencethe identified shape of the H II regions and their resultant groupings.We also discern whether such SOs can be used as a training set ina supervised morphological classification scheme of H II regionsand discuss further potential applications of the methodology. Oursummaries and conclusions are given in section 5. Table 1.
Table 2 from AHD18: Initial parameters of the massive star in thenumerical simulation.Parameter ValueMass 33.7 M (cid:12) Luminosity . × L (cid:12) Radius .
59 R (cid:12)
Effective temperature
41 189 K
Ionizing flux ( hν ≥ . ) . × s − Figure 1.
Figure 1 from AHD18: Positions of stars at the onset of feedback,with stellar mass in colour scale, overlaid on column density in greyscale(both are logarithmic). The most massive star is . (cid:12) in red. Thesecond highest is . (cid:12) . The third is . (cid:12) . The least massive is .
82 M (cid:12) . The synthetic observations of H II regions used in this work arefrom the numerical simulations of Ali et al. (2018, hereafter,AHD18), specifically the model including both photoionisation andradiation pressure. This simulation was performed using the MonteCarlo radiative transfer (MCRT) and hydrodynamics (HD) code TORUS (Harries et al. 2019). The comprehensive level of detail inthe radiative transfer means the resulting H II region has accuratetemperatures, ion fractions, and size.For each hydrodynamical time step, an MCRT calculation wascarried out in order to compute photoionisation balance (giving ionfractions and electron densities), thermal balance (giving gas anddust temperatures), and radiation pressure. Photon wavelengths liein the range 100 to Å and include the stellar radiation field aswell as the diffuse field from gas/dust. The gas heating rates arefrom photoionisation heating of H and He, while the gas coolingrates involve recombination lines of H and He, collisionally excitedmetal forbidden lines, and free–free emission. Dust is heated byphoton absorption and cooled by blackbody emission. Gas and dusttemperatures are linked by a term accounting for collisional heatingbetween the two. The included atomic species are H I – II , He I – III ,C I – IV , N I – III , O I – III , Ne I – III , and S I – IV (the total abundance ofeach element remains constant throughout). The model also con-tains silicate dust grains with a median size of 0.12 µ m – dust can MNRAS , 1–23 (2020) hape Analysis of H II Regions – II Figure 2.
Overview of the 20 cm, 1.4 GHz synthetic observations from the numerical simulation in Ali et al. (2018). Three snapshots of φ = p = 0 , θ = t = 30 , and deg projection angle are shown for the four evolutionary time-steps of 0.1, 0.2, 0.4 and 0.6 Myr. Coordinates are shown in pc scale. attenuate ionising photons, reducing the size of H II regions com-pared to models without dust (Haworth et al. 2015).The initial condition is a spherical cloud with a uniform den-sity inner core extending to half the sphere radius, with the densityin the outer half going as r − . . The density outside the sphereis 1% of the density at the edge of the sphere. The sphere has atotal mass M = 1000 M (cid:12) , radius R = 2 . pc, and mean sur-face density Σ = 0 .
01 g cm − . The 3D grid is Cartesian, uniform,and fixed with cells. The physical size of the grid is 15.5 pcin each axis, yielding a resolution of 0.06 pc per cell. The cloudevolves without stars under a seeded turbulent velocity field andself-gravity for 75% of the mean free fall time of the initial sphere.This is the time at which Krumholz et al. (2011) found the SFEto be 10%. At this time, stars are added from a random samplingof the Salpeter (1955) IMF, such that the cumulative stellar massis 10% of the cloud mass (100 M (cid:12) ), and at least one massive staris present. This results in a massive star of mass 33.7 M (cid:12) , whichis placed at the cloud’s most massive clump, with 28 other starsplaced according to a probability density function proportional to ρ . . The distribution of stars at this stage is shown in Fig. 1, over-laid on column density. The radiation field is then switched on andthe simulation evolves until all of the mass leaves the grid. The stars evolve using Schaller et al. (1992) tracks, with stellar spectrainterpolated from atmospheric models by Lanz & Hubeny (2003)and Kurucz (1991). The initial mass, luminosity, radius, effectivetemperature and ionising photon rate of the massive star are listedin Tab. 1.The ionising photon rate of the massive star is slightly largerthan the mean of the normally distributed values of MAGPISH II regions considered in Paper I (log N ly of 48.87 and 48.40,respectively), but is well within the maximum observed log N ly value of 49.77. We also stated in Paper I that our estimates of theionising photon rate are lower limits, due to only considering themass within the H II region shape boundary considered for analy-sis. In terms of ionising flux and mass, the numerically modelledH II region from AHD18 is therefore representative of an exampleH II region from the MAGPIS sample considered for comparisonin this work. Furthermore, the turbulence in the simulation resultsin a inhomogeneous density distribution and an off-centre massivestar, meaning there could be differences in the shape of the ionisedregion when viewed from different projections, which will be in-vestigated in detail in this work.We note from Fig. 3 in AHD18 that from the onset of feed-back ( t = 0 ) to . Myr, there is a steady mass flow, with mass
MNRAS000
MNRAS000 , 1–23 (2020)
J. Campbell-White et al. beginning to leave the simulation grid at ∼ . Myr,the overall mass flux begins to decrease. Spikes in the distributioncorrespond to removal of the clumps. The size of the spikes growswith time, as the densest clumps are the last to leave the grid. By ∼ . (cid:104) t ff (cid:105) , all the mass has left the 15.5 pc grid. Thepeak value of ionised mass is 440 M (cid:12) at 0.5 Myr. The peak ionisedmass fraction, which is just under 40% of the total mass, is reachedat 0.6 Myr. At this time, the fraction of volume ionised is ∼ TORUS to produce free–free continuum images using the previously cal-culated temperature and density as inputs. The free–free emissioncoefficient in the radio regime is j ν = 5 . × − T − / n e g ν (1)(Rybicki & Lightman 1979) for gas temperature T and electrondensity n e , where the Gaunt factor g ν is approximately g ν = √ π (cid:20) ln (cid:18) T / ν (cid:19) + 17 . (cid:21) (2)(Osterbrock & Ferland 2006). We use ν = λ =
20 cm).Snapshots can be taken at given simulation ages and from any φ and θ spherical viewing angles. We chose to consider simula-tion ages of 0.1, 0.2, 0.4 and 0.6 Myr. For each of these respectiveages, 18, 22, 19 and 18 projections were included, resulting in 77synthetic observations of the numerically modelled H II region. Thelower limit of the simulation ages was selected in order to producea SO of a diffuse observed H II region, rather than a compact orultra-compact H II region, which would be observed at earlier ages.We chose to include the 0.4 and 0.6 Myr ages since they take placeat key stages in the mass flow of the simulation grid, hence repre-senting more evolved, late stage H II regions.Figure 2 shows 12 example SOs of the 20 cm radio continuumemission produced by the simulations. Three example projectionsare shown for each of the evolutionary stages (ages given in Myr).In this example, the φ angle is kept fixed (labelled p in the figureheadings) and the θ angle is 30, 60 and 90 deg (labelled t in thefigure headings). The axes for each image are in parsecs. In the fol-lowing section we detail how the shapes were extracted in a mannercomplementary to that carried out for the MAGPIS observationaldata in Paper I. The purpose of investigating the SOs in this work is to test the effi-cacy of the simulations by comparing them to the MAGPIS obser-vations (from Paper I) and then see how different parameters mayinfluence the shape. The extraction of the shape of the H II regionshence needed to be performed in the same manner as it was for theMAGPIS observations. We therefore needed to process the ‘ideal’SOs into realistic SOs, with properties matching the observationalMAGPIS sample. To achieve this, we converted the intensity unitsof the SOs from MJy/sr to those used in the MAGPIS observations,Jy/beam, whilst accounting for the different pixel scales. This wasso that artificial noise could be added to the SOs in order to performthe contouring procedure to identify the ‘edge’ of the H II regions,exactly as was carried out in Paper I. For the MAGPIS data, the shape of each H II region was extractedusing an image contouring procedure. After applying sigma clip-ping to all of the pixel values, which removed the signal from theionised emission, the mean and standard deviation were taken fromthe remaining clipped values. This provided the contour value toapply to the original images to systematically define the H II regionboundaries. In order to carry out this procedure on the SOs, arti-ficial noise was introduced to the SOs by taking a random valuefrom a Gaussian distribution with mean, µ , and standard deviation, σ , from one of the MAGPIS tiles, after sigma clipping. The stan-dard probability density function for the Gaussian distribution wasused: P ( x ) = 1 σ √ π e − ( x − µ ) (cid:46) σ (3)Figure 3 shows an example of one of the 0.2 Myr SOs. Theleft panel shows the original SO, the middle and right panels showthe same SO with the random Gaussian noise added to each pixel.The middle panel has its Gaussian profile taken from the MAGPIStile centred at l = 12.43 ◦ , b = -0.04 ◦ , with µ = 0.959 mJy/beamand σ = 0.389 mJy/beam. The right panel’s Gaussian noise profilefollows the MAGPIS tile centred at l = 30.25 ◦ , b = 0.24 ◦ , with µ = 0.830 mJy/beam and σ = 0.345 mJy/beam. The contours for thetwo SOs with Gaussian noise are determined by taking the clippedmean plus 1 σ , the same way in which each contour was calculatedfor each of the H II regions in Paper I. The level for the contour ofthe original SO in the left panel is the same as the middle panel,since the pixel vales of the SO had already been converted to matchthat of the MAGPIS data. There is not much visual difference be-tween the three contours. The original SO has a much smoothercontour as expected. Small perturbations along the boundaries ofeach of the Gaussian profile SOs are noted, they are approximatelythe same size. Whilst some extrusions along the boundary appearin the SOs with the Gaussian profiles, the intrusion on the originalSO on the upper right side is smoothed out by the Gaussian noise.The conversion of MJy/sr to Jy/beam was suitable for the 0.1and 0.2 Myr SOs. However, for the 0.4 and 0.6 Myr SOs, using theactual conversion factor resulted in the emission from the SOs be-ing drowned out by the introduction of the Gaussian noise fromthe MAGPIS data distributions. This is explained by referring backto the bulk properties and the volume-average electron density ofthe simulations (Figures. 3 & 9, respectively AHD18), which showsthat after 0.4 Myr, mass starts to leave the grid of the simulation andthe electron density decreases. Therefore, the integrated radio con-tinuum intensity at 20 cm obtained from the radiative transfer code,which is used to produce the SO, is less than should be expected.The intensity of these older SOs in Jy/beam was hence artificiallyboosted by a factor of ∼ for the 0.4 Myr SOs and ∼ for the0.6 Myr SOs. These values were determined such that a single con-toured central region, representative of the original SO before thenoise was introduced, was obtained by the automated contouringprocedure. These were minimum values for the single central re-gion to be obtained and for the shape to not substantially changewith further increase. Since the contour level is determined fromthe image tile, as the intensity boosting increases, the distributionof pixel values changes, therefore the final contour achieved is stillconsistent with further intensity boosting. We tested this for the0.6 Myr regions using a factor of ∼ and ∼ , with the result- MNRAS , 1–23 (2020) hape Analysis of H II Regions – II Figure 3.
Comparison images of example SO projection at 0.2 Myr. Left: Original SO. Middle: Same SO with random Gaussian noise added to each pixelvalue. Right: As middle but with a different Gaussian distribution used. t and p are the θ and φ projection angles of the SO, respectively. The blue contours(for the Gaussian noise tiles) are the ‘edge’ identified from the clipped mean pixel values plus 1 σ . The contour in the original SO tile uses the threshold levelfrom the middle tile. Figure 4.
Overview of the addition of random Gaussian noise to the same SOs shown in Fig. 2. Contours are shown at a constant level of 1 σ plus the clippedmean of the values from the Gaussian distribution introduced. Coordinates are shown in pc scale.MNRAS000
Overview of the addition of random Gaussian noise to the same SOs shown in Fig. 2. Contours are shown at a constant level of 1 σ plus the clippedmean of the values from the Gaussian distribution introduced. Coordinates are shown in pc scale.MNRAS000 , 1–23 (2020) J. Campbell-White et al. (a) 0.1 Myr SO Shape (b) 0.2 Myr SO Shape (c) 0.4 Myr SO Shape (d) 0.6 Myr SO Shape
Figure 5.
Boundaries of an example synthetic observation H II region for each of the four ages in the sample (those in the second row of Fig. 4). Points signifythe approximately equally spaced interpolation spline knots, where the curvature was calculated. ing amount of variation along the boundaries being well within thespatial sampling resolution of the shape analysis (c.f. Sec. 3.2).Figure 4 shows a summary of the addition of a Gaussian noiseprofile to the same 12 example SOs shown in Fig. 2. The imagesfor all 77 SOs with this Gaussian noise profile are shown in Ap-pendix C . The noise profile used was that of the middle panel inFig. 3, with the two younger SOs having the actual intensity con-version factor used and the older two have their emission boostedas described above. It is clear from this summary that there is ahigher amount of perturbation along the H II region boundaries asthe age of the SOs increases. It appears that the contoured shapechanges more with each projection for the 0.4 and 0.6 Myr regionsthan for the 0.1 and 0.2 Myr regions (however, larger perturbationsare noted for some of the different projections for the earlier agesin the full overview in Appendix C). The axes of the plots are givenin spatial parsec scale, hence the effective radius of each H II regionboundary also increases with the age of the SO. For the remainderof this study, only one Gaussian noise profile was used for all of the77 SOs. This was because the H II region shape did not change bya substantial amount with the introduction of Gaussian noise pro-files with different distributions from the MAGPIS data. This wasnumerically assessed with the shape comparison method utilised inthis work and Paper I, resulting in lower pairwise test scores thanfor any other comparison thus far. Having the same noise profilealso meant that the contour level applied to each SO was consis-tent, since this is calculated from the noise itself - distinguishingthe signal of the H II region from the background noise. The first test of the SOs was to directly compare them to the MAG-PIS observational sample from Paper I. The further steps to extractand compare the shapes of the regions were carried out in the samemanner as before. To summarise: interpolation splines were fittedto the region boundaries identified from the contouring procedure,with interpolation knot intervals of ∼ . pc (see Fig. 5); then thelocal curvature values were calculated at each knot. The empiricaldistribution functions (EDFs) of curvature values were then statis-tically compared pairwise, using the two-sided Anderson-Darling(A-D) test statistic (Anderson & Darling 1952; Pettitt 1976). TheA-D test returns a dissimilarity measure between the pair of sam-ples, whereby the null hypothesis that the samples are drawn from Available in the online version of this paper. the same parent population is rejected for large test result scores.After applying a Euclidean distance transformation to the A-Dtest scores, hierarchical clustering was performed on the distancematrix of H II region shape distances using Ward’s agglomerativemethod (Ward & Joe 1963; Murtagh & Legendre 2014). The re-sulting hierarchical structure was then investigated using the den-drogram graphical representation.In this primary investigation of how the shapes of the SOH II regions compare to those of the MAGPIS H II regions, the 12example SOs from Fig. 4 were considered, along with the 76 MAG-PIS H II regions from the Paper I. Figure 6 shows the resulting den-drogram from the hierarchical clustering. The branches of the SOsare highlighted by colours that correspond to the age of the SO:0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myr ingreen. The first clear result is that the shapes of the SO H II regionsare grouped in amongst the MAGPIS H II regions, with none ap-pearing as outliers. This result confirms that these numerical simu-lations are producing H II regions that are representative of what weobserve in our Galaxy; since we ensured that the shape of the SOH II regions was extracted and quantified in the exact same way asthe MAGPIS sample.The next point to note from Fig. 6 is that the three differentprojections considered for each SO age are not all grouped togetherin the dendrogram. The 0.1 Myr projections are close to each otheron the dendrogram, however, one of them belongs to a differentparent group than the other two. Similarly, for the 0.2 and 0.4 Myrprojections, two each belong to the same parent group, with thethird positioned a few groups away, respectively. The most spreadin group allocation is seen for the 0.6 Myr projections. These re-sults show that for each of the SO projections, there is a MAGPISH II region that shares the most similar shape, such that the projec-tions are having a significant influence on the identified shape of theregion. This will be further investigated and discussed in section 4.Since the dendrogram represents the results of the hierarchi-cal procedure, which is a bottom-up approach to forming groups,excluding part of the sample does not change the resulting group-ings. This was tested by excluding the 0.4 and 0.6 Myr regions inFig. 6 and rerunning the clustering procedure, the resulting struc-ture and groups match those presented here. The group structurethat would be obtained from cutting the dendrogram at a heightthat intersects seven branches (i.e., seven groups) is 95% concur-rent with the groupings identified in Paper I, with a notable excep-tion being the outlier H II region labelled ‘12.432’ being joined to adifferent group here. MNRAS , 1–23 (2020) hape Analysis of H II Regions – II
200 150 100 50 0
Figure 6.
Dendrogram of the MAGPIS sample of H II regions from PaperI with the 12 example SOs with added Gaussian noise (those shown inFig. 4). The dendrogram represents the results from applying hierarchicalclustering of the shape data of each H II region. The branches of the H II Re-gions are labelled by their Galactic longitude (for the MAGPIS sample) oran ID number (for the SO sample). by The branches of the SOs are colouredby their age: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myrin green. The horizontal axis represents the height computed from the ag-glomerative clustering method.
Figure 7.
Example MAGPIS image tile and H II region G030.252 + Whilst the results of the previous subsection show that the numer-ical simulations are producing well representative SOs, the intro-duction of noise to the SOs in order to extract the shape can bedeveloped further. The distribution of noise from radio interferom-etry images does follow a Gaussian distribution, however, it is notcompletely homogeneous across the image tiles. Artefacts from thereduction process and emission from fore- and background sourceseach contribute to the non-homogeneity of the noise distributions.We stated earlier in this section that small changes to the Gaussianmean and standard deviation of the noise profile had minimal influ-ence on the shape of the SO H II region, however, we did propose inPaper I that observational noise may be a significant contributionto the observed and extracted shape we obtain using our methods.In order to see how much of an effect the observational noisedistributions from observations have on the shape of the SOs,we inserted the SO data directly into the MAGPIS tiles used inPaper I. The intensity converted SO pixel values were added tothose in an area of the image tile deemed to contain only imagenoise and away from the signal of the H II region. An exampleof this is shown in Fig. 7, where one of the 0.2 Myr projections(upper-right) is inserted to the MAGPIS tile containing H II regionG030.252 + + II re-gion in the SO is at a distance of 6.2 kpc, then this corresponds to anangular pixel scale of 2" per pixel, the same as the MAGPIS tiles.This meant that the smoothing factor applied to the contours wasuniform for both the MAGPIS and the SO H II regions. As before,the contours shown are calculated from the clipped mean plus onestandard deviation of the entire tile, which identifies both H II re-gions well.Figure 8 shows an overview of an example SO projection foreach of the four ages that has been inserted into six different MAG- MNRAS000
Example MAGPIS image tile and H II region G030.252 + Whilst the results of the previous subsection show that the numer-ical simulations are producing well representative SOs, the intro-duction of noise to the SOs in order to extract the shape can bedeveloped further. The distribution of noise from radio interferom-etry images does follow a Gaussian distribution, however, it is notcompletely homogeneous across the image tiles. Artefacts from thereduction process and emission from fore- and background sourceseach contribute to the non-homogeneity of the noise distributions.We stated earlier in this section that small changes to the Gaussianmean and standard deviation of the noise profile had minimal influ-ence on the shape of the SO H II region, however, we did propose inPaper I that observational noise may be a significant contributionto the observed and extracted shape we obtain using our methods.In order to see how much of an effect the observational noisedistributions from observations have on the shape of the SOs,we inserted the SO data directly into the MAGPIS tiles used inPaper I. The intensity converted SO pixel values were added tothose in an area of the image tile deemed to contain only imagenoise and away from the signal of the H II region. An exampleof this is shown in Fig. 7, where one of the 0.2 Myr projections(upper-right) is inserted to the MAGPIS tile containing H II regionG030.252 + + II re-gion in the SO is at a distance of 6.2 kpc, then this corresponds to anangular pixel scale of 2" per pixel, the same as the MAGPIS tiles.This meant that the smoothing factor applied to the contours wasuniform for both the MAGPIS and the SO H II regions. As before,the contours shown are calculated from the clipped mean plus onestandard deviation of the entire tile, which identifies both H II re-gions well.Figure 8 shows an overview of an example SO projection foreach of the four ages that has been inserted into six different MAG- MNRAS000 , 1–23 (2020)
J. Campbell-White et al. (a) 0.1 Myr SOs (b) 0.2 Myr SOs(c) 0.4 Myr SOs (d) 0.6 Myr SOs
Figure 8.
SOs inserted into six different MAGPIS tiles to show how different noise profiles influences extracted shape. For each age, the same projection isshown, along with the 1 σ plus clipped mean contours. Axes are in pc, with a pixel scale of 0.06 pc/pixel. The corresponding angular size of the cutouts forincreasing SO are are 2.8 (cid:48) , 3.9 (cid:48) , 6.7 (cid:48) & 7.8 (cid:48) , respectively. Linear intensity scaling with 95% maximum is used for each tile, to highlight the differences in noisestructure. Table 2.
Summary of the different MAGPIS Noise Profile (NP) propertiesthat the SOs were inserted into.NP l [ ◦ ] b [ ◦ ] µ [mJy/beam] σ [mJy/beam]1 20.99 0.09 0.120 0.2422 12.43 -0.04 0.945 0.3673 43.76 0.06 0.105 0.2634 20.22 0.11 0.329 0.2225 30.25 0.24 0.829 0.3436 41.93 0.04 0.144 0.265 PIS tiles. The example SOs used for each age are those in the sec-ond row of Fig. 2 (the same ones used in Fig 5 to exemplify theinterpolation spline fitting to the boundary contours). The MAG-PIS tiles that the SO were inserted into will be referred to as thenoise profile (NP). Details of which MAGPIS tile each NP repre-sents is given in Tab. 2. The means and standard deviations werecalculated from the sigma clipped MAGPIS tiles, with the contourlevel applied in each of the images then taken as the mean plusone standard deviation, as before. The example NPs are each fromMAGPIS H II region tiles that were sorted into different groups bythe shape analysis clustering in Paper I. We surmised there that theobservational noise may be influencing the shape obtained, hencethe reasoning for selecting tiles from different resulting groups. From the 76 H II region tiles in our preceding study, the range ofmean intensity values were 0.1 – 0.945 mJy. The range of stan-dard deviations in the tiles were 0.2 – 0.4 mJy, which concurs withthe variation in the RMS noise of the overall MAGPIS survey of ∼ II region can be seen inthe bottom right corner. Also, as with the Gaussian noise examplesshown previously, the 0.4 and 0.6 Myr SOs have had their emissionboosted by the same amount as before (factor of ∼ ∼
70, re-spectively). As previously explained, this enabled the central partof the H II region to be contoured appropriately, accounting for thelower mass and electron density remaining in the simulation grid.The change in the noise structure and intensity is apparentfrom Fig. 8, where the image tiles are each linearly scaled with amaximum intensity limited to 95%, to visually highlight the differ-ences between the NPs. NPs 3 and 6 appear to have the most ‘saltand pepper’ like noise, similar to that seen in the random Gaussiandistributions used previously. The examples shown for the 0.6 MyrSOs appear to have the least dissimilarities by NP, although thismay be because the regions themselves are larger and it is harder MNRAS , 1–23 (2020) hape Analysis of H II Regions – II to visually discern the differences along the boundaries. It is worthremembering here that whilst these contours represent the system-atically defined region boundaries, the shapes that are compared inthe subsequent analyses are quantified from the shape landmarks,which are given by the interpolation spline fitting (see Fig. 5). Sincewe are still interested in the direct comparison of these SOs to theMAGPIS H II regions, the spline sampling remained at ∼ . pc.Therefore, each of the boundaries shown here were under-sampled,with small perturbations along the contour smoothed out by thesplines, with the level of smoothing proportional to the spatial sizeof the regions. Nevertheless, the differences in the contoured H II re-gion shapes seen here, resulting from the different NPs, will enableus to investigate how such changes in NP effect the final compari-son of the shape that we perform in the analysis. This will be dis-cussed in detail in Sec. 4.For the first three ages, NP 5 results in a spurious extrusion onthe left side of the H II region. This is an example of where theremay be underlying signal in what was thought to be only back-ground noise, and thus it is having a clear effect on the identifiedshape. We are only able to identify this since we have prior knowl-edge of the original SO data, along with the other NPs for com-parison. If this were a real observation it would be systematicallyincluded as is, with a larger image tile used to ensure the contouris closed. For the rest of this analysis and the purposes of testing,however, we chose to exclude NP 5 and consider the remaining fiveprofiles as our representative observational noise.Figure 9 shows the resulting dendrogram from applying theclustering algorithm to the 76 MAGPIS H II regions, along withthe 20 example SO H II regions (from Fig. 8), whose shapes wereextracted from the five NPs detailed here. As with the previous den-drogram of the Gaussian noise profiles, the branches of the SOH II regions are coloured by their age: 0.1 Myr in red, 0.2 Myr inpink, 0.4 Myr in blue and 0.6 Myr in green. Figure 9 shows that allof the 0.4 Myr SO H II regions belong to the same parent branch,along with three of the five 0.6 Myr SO regions. The other two0.6 Myr regions are paired together in a separate group. There areonly a few examples of where SOs of the same age are paired to-gether in this manner. As with the Gaussian noise profiles, mostof the SO H II regions are being paired with one of the MAG-PIS H II regions. There is also a 97% concurrence between thegroups identified from this dendrogram and those identified fromthe MAGPIS sample in Paper I. This further confirms the efficacyof the simulations for producing representative SOs. Having shown that the insertion of the SOs into different NPs fromthe MAGPIS data leads to H II region shapes that are mathemati-cally similar to what we observe in the MAGPIS sample, we cannow further investigate the properties of the SOs and how they mayinfluence the obtained shapes. Whilst we are only considering SOsfrom one set of initial conditions, i.e., star cluster mass, ambientdensity, etc., introduction of the MAGPIS NPs essentially expandsour number of observations at each simulation time-step. For the77 projections, across the four ages, with five NPs, we arrive at asample of 385 individual H II regions. These variables are hence theparameters that we will discuss further in this section. In additionto showing that shape analysis can be used as a tool to confirm thereliability of synthetic data, a further aim of this investigation is tosee whether the SOs could be used as a training set for supervisedclassification of H II regions via their shapes. We therefore need to
200 150 100 50 0
Figure 9.
Dendrogram of the MAGPIS sample of H II regions from PaperI with the 20 example SOs that were inserted into the MAGPIS NPs (shownin Fig. 8, excluding NP 5). The dendrogram represents the results from ap-plying hierarchical clustering of the shape data of each H II region. Thebranches of the 20 SOs are coloured by their age: 0.1 Myr in red, 0.2 Myrin pink, 0.4 Myr in blue and 0.6 Myr in green.MNRAS000
Dendrogram of the MAGPIS sample of H II regions from PaperI with the 20 example SOs that were inserted into the MAGPIS NPs (shownin Fig. 8, excluding NP 5). The dendrogram represents the results from ap-plying hierarchical clustering of the shape data of each H II region. Thebranches of the 20 SOs are coloured by their age: 0.1 Myr in red, 0.2 Myrin pink, 0.4 Myr in blue and 0.6 Myr in green.MNRAS000 , 1–23 (2020) J. Campbell-White et al.
500 400 300 200 100 0 Figure 10.
Dendrogram resulting from applying hierarchical clustering to the shape data of the sample of 385 SO H II regions inserted into MAGPIS NPs. Aswith the previous dendrograms, the branches are coloured by their age: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myr in green. The three groupsdisplayed in Fig. 11 are indicated by the respective group numbers at the first split into three. Six groups are delineated by the dashed red boxes and labelled 1through 6. 20 further groups as seen in Fig. 12 are delineated by the green boxes with the first and last labelled 1 and 20, respectively. understand how each of these parameters is affecting the obtainedshapes and defining the groups.A potential variable that we are deferring to future work is dis-tance. In this investigation, all SO H II regions are assumed to be atthe same distance from the observer, ensuring that the shape couldbe extracted in exactly the same manner as the MAGPIS H II re-gions from Paper I; with the corresponding pixel scales matchingthose used for the contouring parameters. We found in Paper I thatusing an angular sampling scale for the shape analysis resulted inbiased results, with nearby regions that possess higher angular res- olution data being grouped together by the procedure. Furthermore,we discussed there how variations in the determined distance (dueto the many errors associated with kinematic distance estimations)is directly related to variations in the spatial sampling scale used forthe shape extraction. We are planning a detailed investigation in tohow distance influences the shape analysis methodology and feasi-bility for future applications. Brief details of this and other plannedfuture work is discussed at the end of this section. MNRAS , 1–23 (2020) hape Analysis of H II Regions – II Age [Myr] N u m be r o f R eg i on s Figure 11.
Bar chart showing the ages of the SOs that have been assignedto the first three groups in the dendrogram in Fig. 10, at a cut height of e.g.300.
Figure 10 shows the dendrogram resulting from applying the hi-erarchical clustering method to the shape distances of the 385 SOH II regions ‘observations’ across the different projections and NPs.As with the previous dendrograms, the branches of the SOs arecoloured by their age: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr inblue and 0.6 Myr in green. It is clear from this dendrogram thatthere is a divide between groups containing the 0.1 and 0.2 Myr re-gions (early-type regions) and those containing the 0.4 and 0.6 Myrregions (late-type regions). This is displayed concisely in Fig. 11by a bar chart of number of regions of a given age for the first threedistinct groups. These groups are obtained by cutting the dendro-gram at a height of e.g. 400. In this situation, group 1 containsmostly early-type regions with some late-type regions. Group 3hosts exclusively early-type regions and group 2 hosts mainly late-type regions with some 0.2 Myr regions being included. In terms ofgrouping the synthetic H II regions purely by age, the dendrogramin Fig. 10 shows that only a few of the smaller groups are all thesame colour, that is, regions of the same age. A cut on the dendro-gram, resulting in six groups is shown by the dashed red boxes. Thisis the maximum height (i.e., fewest number of groups) required toproduce groups that each contain either entirely late- or entirelyearly-type regions, with only a few exceptions of mixing. If the cutwere slightly higher, groups 1 and 2 would be the first to merge.We can investigate the division of ages between groups further bytaking a lower cut with more resulting groups, this is representedon Fig 10 by the green boxes.Figure 12 shows the ages and effective radii of the SO H II re-gions across 20 groups from the dendrogram. Here, the mean num-ber of region per group is ∼
20. It is clear that even with this manygroups, it is still most common for there to be a mix of both early-type regions and both late-type regions belonging to each group.The few exceptions are group 8, with mostly 0.2 Myr regions andone 0.6 Myr region; group 14 has only 0.4 Myr regions; and groups6 and 11 contain only 0.6 Myr regions. Increasing the number ofgroups further (to e.g. 40 groups), results in the same pattern of thegrouping of early- and late-type regions, with a few more instancesof exclusively differentiating the respective individual ages. Thisresult shows that there is a lot of similarity between the shapes ofthe early-type and late-type SO H II regions, an observation we canalso make from considering the interpolation splines which definethe shape in Fig. 5.A noteworthy point here is that the late-type regions are thosewhich had their emission artificially boosted before being insertedto the MAGPIS tiles, to account for the loss of mass and lowerdensity within the simulation grid. Whilst this could be the defin-ing factor for the differentiation between these regions shapes, wedo still see associations between some of the early-type regions Radius [pc] SO Age [Myr]
Figure 12.
Distribution of SO effective radii and age for 20 groups from thehierarchical clustering of the shape data. and these late-type regions. Referring to the overview of H II regionshapes from the purely Gaussian distributions in Appendix C, wesee that even with the more uniform noise, certain projections fromthe 0.2 Myr SOs feature boundaries with more perturbations. Suchexamples are those likely to be grouped with the late-type regions,based upon what we already know from how the grouping proce-dure works (Paper I). Furthermore, we do still see from Fig. 12,both the cross over and distinction in the obtained groups betweenthe 0.4 and 0.6 Myr shapes. For the purpose of further investiga-tions, we will maintain that the late-type regions are thus represen-tative of their Galactic counterparts. We will, however, return to thisin future work when considering simulations from a larger grid.Figure 12 also shows that there is not a clear distinction inregion radii by group, apart from that seen in the main split inearly and late type regions (possessing small and large radii, re-spectively). In fact, some of the groups that host both early- andlate-type regions display a large spread in region radii. The resultsfrom the MAGPIS data in Paper I showed that one of the identi-fied groups was exclusively small regions. The fact that this resultis not seen here could reaffirm the notion that those regions fromthe previous work had similar shapes because of their young ages,and not purely their small sizes.The remaining parameters to investigate if and how they in-fluence the H II region shapes are the noise profile and projection.Figure 13 shows the distribution of NPs across the six groups iden-tified by the red boxes on the dendrogram. It appears that there is noclear preference for regions belonging to a given NP to be placedin a particular group. Group 1 has slightly more regions from NP 1than the other NPs. Group 4 has fewer regions from NP 4 than anyof the other NPs, whilst group 5 shows the opposite result. Con- MNRAS000
Distribution of SO effective radii and age for 20 groups from thehierarchical clustering of the shape data. and these late-type regions. Referring to the overview of H II regionshapes from the purely Gaussian distributions in Appendix C, wesee that even with the more uniform noise, certain projections fromthe 0.2 Myr SOs feature boundaries with more perturbations. Suchexamples are those likely to be grouped with the late-type regions,based upon what we already know from how the grouping proce-dure works (Paper I). Furthermore, we do still see from Fig. 12,both the cross over and distinction in the obtained groups betweenthe 0.4 and 0.6 Myr shapes. For the purpose of further investiga-tions, we will maintain that the late-type regions are thus represen-tative of their Galactic counterparts. We will, however, return to thisin future work when considering simulations from a larger grid.Figure 12 also shows that there is not a clear distinction inregion radii by group, apart from that seen in the main split inearly and late type regions (possessing small and large radii, re-spectively). In fact, some of the groups that host both early- andlate-type regions display a large spread in region radii. The resultsfrom the MAGPIS data in Paper I showed that one of the identi-fied groups was exclusively small regions. The fact that this resultis not seen here could reaffirm the notion that those regions fromthe previous work had similar shapes because of their young ages,and not purely their small sizes.The remaining parameters to investigate if and how they in-fluence the H II region shapes are the noise profile and projection.Figure 13 shows the distribution of NPs across the six groups iden-tified by the red boxes on the dendrogram. It appears that there is noclear preference for regions belonging to a given NP to be placedin a particular group. Group 1 has slightly more regions from NP 1than the other NPs. Group 4 has fewer regions from NP 4 than anyof the other NPs, whilst group 5 shows the opposite result. Con- MNRAS000 , 1–23 (2020) J. Campbell-White et al.
Noise Profile N u m be r o f R eg i on s SO Age [Myr]
Figure 13.
Bar chart showing the distribution of MAGPIS noise profileshosting the 385 SOs, for the six groups delineated in Fig. 10. sidering the ages along with the NPs, the regions from NP 6 thatappear in group 1 are only 0.2 Myr old, and all but one of the re-gions from NP 1 in group 2 are 0.6 Myr old. The majority of NPsin each group, however, are associated with at least two of the ages,again, split by early- and late-type ages. Similar results to theseare obtained by considering each of the θ and φ projection angles.There is no clear preference for a given observation angle to resultin regions being assigned to the same group.Another way to discern whether the NP or the projection anglehas more of an influence on the shapes and obtained groups, is toconsider for a given NP, how many projections of each snapshot ageare grouped together. Conversely, for a given projection angle, howmany of the five NPs for each age are grouped together. These re-sults are shown in Fig. 14. For the six groups described previously,the top panel shows for a given projection angle and age, what frac-tion of the five NPs are placed in a given group. The bottom panelshows for a given NP and age, what fraction of the 18-22 differentprojection angle SOs are grouped together. The respective x-axeshave not been labelled since it is only the relative distributions weare interested in. Mean values for each group are indicated by thedashed black line.The top panel of Fig. 14 shows that it is most likely for twoof the NPs for a given age and projection to be grouped together,suggesting that the NP is having a large influence on the extractedshape of the H II region. For the examples where three or four of theNPs are grouped together, there is no preference for this to occurin a given group. There is only one situation where all five of theNPs are grouped together, that is for one of the 0.6 Myr projectionsin group 4. The bottom panel shows some slight differences forthe fraction of projections for a given NP and age that are groupedtogether. Group 1 has on average, 33.5% of the total number of pro-jections for a given NP grouped together, with a slight preferencefor up to half of the 0.2 Myr projections to be grouped together. Ingroup 6, the average is again 33.5% projections, however, NP 6 forthe 0.1 Myr regions includes 70% of the different projections. Forthe late-type regions, group 4 shows the highest average number ofprojections per NP, followed by the 0.4 Myr regions in group 1 andthe 0.6 Myr regions in group 2. The average number of given pro-jections per group is only slightly higher for the early-type regions.This is an interesting result since the early-type regions are muchmore spherically symmetric and less perturbed than the late-typeregions. This again shows that the curvature method for represent-ing the region shapes is robustly quantifying the regions based ontheir boundaries.The result that, in some groups, ∼ of the different pro-jections of a given age and NP are grouped together shows thatchanges in the viewing angle may not influence the shape of the Unique Projection Angle per Age F r a c t i on o f N o i s e P r o f il e s Noise Profile per Age F r a c t i on o f P r o j e c t i on A ng l e s Figure 14.
Bar charts showing the respective influence of changing the NPor the projection angle of the SO on resulting group. Ages are as with pre-vious plots: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myrin green. Top: Distribution of different NPs for a fixed age and projection.Each bar represents a given SO projection angle and age, with the fractionof NPs belonging to the respective group shown. Bottom: Distribution ofdifferent projection angles for a fixed age and NP. Each bar represents agiven SO age and NP, with the fraction of projection angles belonging tothe respective group shown. Mean values in each group are shown by thedashed black lines. H II region substantially. Whilst this may be due to the initial condi-tion of spherical symmetry throughout the numerical simulations,this is still a result that can only be achieved via study of the SOs.These are also likely the regions that remain the most sphericallysymmetric across the different projection angles, i.e., those that donot possess and intruding or protruding features. Previous inves-tigations of inferring the projection angle of H II regions relativeto the observer has focused on such ‘cometary’ and ‘champagneflow’ features, to see whether environmental properties that influ-ence these inhomogeneities can be determined from the H II regionsthemselves (e.g. Yorke et al. 1983; Steggles et al. 2017). Since wedo only have the one vantage point for our observational data, therequirement for classification is still to be able to carry this out inconjunction with the information we can collect. Therefore, in thepresent work, having the different projection angles of the SOs es-sentially gives us different observations of H II regions that share thesame initial conditions but can be thought of as evolving differentlydue to differences in ambient density or ISM structure along ourline of sight. However, investigating further how the noise structureof our observations influences the shape we observe is an importantaspect towards refining an observational morphological classifier.The manner by which the H II region shapes were extractedfrom both the MAGPIS tiles and the SOs was by analysing the MNRAS , 1–23 (2020) hape Analysis of H II Regions – II background noise from the radio continuum images. By removingthe radio signal and setting a threshold level that was above the re-maining noise profile, this enabled the boundary of each region tobe defined by the contouring procedure. This led to a systematicallydefined data sample, whereby the H II regions were each extracted,such that their signal levels should be consistent across the Galac-tic Plane. Nevertheless, one could argue that if you took one of theH II regions and placed it in a different area of the Galaxy, definingthe boundary from the background radio noise in the vicinity couldlead to the shape being different. This was thus what the SOs al-lowed us to test by doing exactly that. We have seen here that thesedifferent NPs, defined from the MAGPIS tiles, are influencing theshape. The results also show, however, that this is not a system-atic influence, and SO H II region shapes from given noise profilesare not concurrently being paired together by the hierarchical clus-tering. We carried out some further investigation into a systematicinfluence by the different NPs using the ordinance visualisationof multi-dimensional scaling (MDS, detailed in Appendix A), butreach the same conclusions as we do in this subsection; that the in-fluence is not systematic. This is discussed further in Appendix Aand Sec. 4.3, where we suggest how we may overcome issues withnoise in our future work. Although we may not be able to fully quantify a systematic effectthat the different noise profiles or projections are having on the SOH II region shapes, the SOs still have much better constraints thanthe Galactic observations in terms of initial conditions and prop-erties. We have also shown here, via the shape analysis method,that they are explicitly representative of the Galactic H II regions.Considering the sample of 385 SOs, we can say with certainty,which evolutionary stage each SO is at. The different projectionsand noise profiles essentially give us many more observations ofthe given age. This is similar to what we observe along the Galacticplane, with no preference for H II regions of a given age to be at agiven place along the Plane (Anderson et al. 2014). Ultimately, fora training set of SOs, we would also require SOs of differing initialconditions. Using the data we have analysed in this work, however,we can test whether the SO data sample can be used to infer theages of the MAGPIS observational sample.The initial conditions of the numerical simulations, which pro-duce the SOs analysed here, involve ionisation and radiation pres-sure feedback from a star of mass 33.7 M (cid:12) , with an ionising pho-ton rate of N ly = × s − , or log N ly = 48.86 (AHD18).Since the observational sample we considered in the Paper I covera range of masses ( ∼
17 - 45 M (cid:12) ), we have selected a mass lim-ited sample from the MAGPIS H II regions to use in the followingtest. The limit used was 48.3 < log N ly < M (cid:12) (Weidner & Vink 2010),and was around the mean of the normally distributed values for theMAGPIS sample (Fig. 8, Paper I). The lower limit was taken to ac-count for photon leakage out of the system, along with the massleaving the simulation grid and decreasing density. The resultingSOs may hence be representative of regions of lower masses. Thisresulted in a subsample of 26 MAGPIS regions, with ages rangingbetween 0.1 Myr and 1.9 Myr. Whilst this age range of the MAG-PIS subsample covers a considerably larger range than the SOs con-sidered in this work, there is not a one to one correspondence be-tween the two ages used. The ages from the SO snapshots beginwith t = 0 when feedback starts in the simulation. Whereas, theages calculated for the MAGPIS sample are only an estimation of
800 600 400 200 0
Figure 15.
Dendrogram resulting from applying hierarchical clustering tothe shape data of the sample of 385 SO H II regions along with 26 of theMAGPIS H II regions. The branches are coloured by their age for the SOdata: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myr in green.;and the MAGPIS H II regions are in cyan.MNRAS000
Dendrogram resulting from applying hierarchical clustering tothe shape data of the sample of 385 SO H II regions along with 26 of theMAGPIS H II regions. The branches are coloured by their age for the SOdata: 0.1 Myr in red, 0.2 Myr in pink, 0.4 Myr in blue and 0.6 Myr in green.;and the MAGPIS H II regions are in cyan.MNRAS000 , 1–23 (2020) J. Campbell-White et al.
Age [Myr] F r a c t i on o f M A G P I S R eg i on s N u m be r o f S O R eg i on s Figure 16. H II Region ages for six groups identified from Fig. 15. Groupsare numbered 1 through 6 corresponding to top down on the dendrogram,as with previous figures. Number of SO H II regions in each group is shownby the orange bars. For comparison, the fraction of MAGPIS H II regionsper group is shown by the cyan bars. the dynamical ages. These estimates involve assumptions regard-ing the surrounding ISM and require accurate distances to the re-gions. The dynamical age then considers the observed expansionwith respect to the theoretical Strömgren radius. Therefore, for thepurpose of this test, we can consider the respective age ranges anddistributions of both the SOs and MAGPIS sample, to see how theycompare.Using the data set of 385 SOs described in Sec. 4.1, we in-serted the 26 mass limited MAGPIS H II region shapes to the SOsample; and looked to see where the MAGPIS H II regions wouldbe placed in relation to the SOs in the resulting group structure. Fig-ure 15 shows the dendrogram resulting from the hierarchical clus-tering procedure for the SO training data and the MAGPIS targetdata. Branches are coloured as before - 0.1 Myr in red, 0.2 Myr inpink, 0.4 Myr in blue and 0.6 Myr in green - with the addition ofthe MAGPIS H II regions in cyan. The introduction of the MAGPISregions has changed the ordering of the six delineated groups, withmostly late-type regions shown in the top two groups and mostlyearly-type regions in the rest. The ordering of the final groups isarbitrary. It is the resulting group associations and hierarchy thatmatter. The MAGPIS H II regions appear slotted in to the SO H II re-gions. The same result of good correspondence that was seen withthe data the other way around in Sec. 3.2. There are two MAG-PIS regions, paired together, joined to the bottom group, These twomay represent slight outliers since they join to the adjacent groupat a substantial height. We will discuss this is more detail later inthe section.Figure 16 shows the respective ages of the SO and MAGPISH II regions that are grouped into the six groups in Fig. 15. In orderto compare the respective age distributions of the data, the numberof SO H II regions are shown in orange, with the fraction of MAG-PIS H II region per group shown in cyan. As seen previously withthe SO data, there is a clear division between early- and late-type regions, with group 3 showing the most cross over between ages.Following from the respective age discrepancies, mentioned pre-viously, we can split the MAGPIS H II regions into early- and late-types by considering those with age less than 1 Myr to be early-typeand those greater than 1 Myr to be late type.Group 4 hosts exclusively early-type SOs with a mean ageof 0.14 Myr. 75% of the MAGPIS regions assigned to this groupare also early-type. Group 3 has the largest mix of early- and -late type SOs, but with majority 0.2 Myr SOs and a mean age of0.25 Myr. Two thirds of the MAGPIS regions in group 3 are alsoearly-type, with the remaining late type, showing a good agreementin spread with the SO data. Group 5 has exclusively early-type re-gions for both the SO and MAGPIS data. The same result is seen forgroup 6. Group 2 hosts majority late-type SOs, with a mean age of0.5 Myr. The two MAGPIS regions assigned to this group are alsolate-type. 75% of the MAGPIS regions in group 1 are late-type, ingood agreement with the late-type assignment of the SOs. In termsof this distribution of relative ages for each sample, these resultsare promising for the prospect of using the SOs as a training setfor supervised classification. We see here that even with only oneparameter of investigation, the evolutionary stage of the regions,we have good agreement between the SOs and MAGPIS observedsample. We do not, however, suggest that one set of initial condi-tions substantially represents the entirety of the age distribution ofthe observational sample. The identified groups from Paper I wereshown to have a spread of mass ranges, which is why we used themass limited sample here.In addition to these results, Appendix B shows an overviewof the mass-limited MAGPIS H II region sample images and shapesthat were assigned to each SO group. We can see from the figuresin Appendix B that the MAGPIS regions sorted into each of thetest groups appear to share similar morphological features. Thisreaffirms the notion that the shape analysis and statistical meth-ods employed here are performing as expected. Group 3 is thelargest group, showing the most visual differences between MAG-PIS regions. This would be the first group to split if the cut wasmade lower on the dendrogram in Fig. 15, which would separatethe more uniform regions from the more perturbed. The first twoMAGPIS regions shown in the images for group 6 are those lo-cated at the edge of the dendrogram. They appear to each host atleast one tight point of inflection, which would result in a largeoutlier in the curvature distributions. This is the likely cause forwhy they are slightly apart from the rest of the data. The compar-atively smooth sections along the rest of these region’s boundariesare likely why they were grouped to the other MAGPIS region, andthe corresponding SO regions in group 6.The notion of using the SOs as a training set in supervisedmorphological classification of H II regions would require an inputsample with many varied initial conditions and known parameters.We have shown throughout this investigation that the SOs producedin AHD18 are well representative of their Galactic counterparts.Furthermore, we can see here that the SOs of different ages canbe used to suggest whether a Galactic H II region is early- or late-stage. With different initial masses and ambient densities, we couldfurther refine the parameters of each training set group, and repeatthis investigation with a correspondingly larger sample of GalacticH II regions. The foremost proposed application of this work is to increase thenumber of parameters investigated by using SOs from numerical
MNRAS , 1–23 (2020) hape Analysis of H II Regions – II simulations of differing initial masses, ambient densities, environ-mental conditions and temperatures. If similar groupings that areshown in this investigation are seen for the larger sample set, wewill be one step closer to a thorough morphological classificationscheme. For example, we might find that the groups from the hi-erarchical procedure still differentiate between early- and late-typeH II regions, but also by intermediate mass and high mass ionisingsources, a result that was indicated for the Galactic regions in PaperI. Furthermore, with a different initial condition set-up, and a largersimulation grid, we would not be required to boost the emission ofthe regions in the SOs at later times, as we did in this investigation.We plan to extend this work to the simulations of Ali & Harries(2019), which features a 10 M (cid:12) cloud. This could lead to regionsthat are even more representative of the observed sample. We alsohave further simulations currently in preparation that feature cloudsof different metallicity environments, together with differing ambi-ent densities. Another factor to consider for future SOs used forcomparisons is the discretisation of parameters. A more continu-ous sample of ages is well within reason and is only limited by thesimulation time-steps. Varying parameters such as initial mass orelectron density would be more computationally expensive, but asthe data becomes available, it will be useful to have a shape analysistool ready for the analysis and comparison.In terms of the observational sample of H II regions, a furtherapplication would be to compare samples from different radio con-tinuum surveys. The work carried out here regarding a systematiceffect of the noise profiles on the H II regions was non-conclusive,i.e., if there is a systematic effect, it as not revealed by our investi-gation. However, comparing different observational data with dif-ferent noise profiles may lead to progress in this area. Similar tohow we considered the same SO with a different noise profile, wecould consider the same Galactic H II region from different surveysin the same manner. A survey with complementary coverage to theMAGPIS survey that could be used is The HI/OH/Recombinationline survey of the inner Milky Way (THOR) (Beuther et al. 2016).In addition to this, whilst we have only considered radio contin-uum images of H II regions in this work, a full classification schemebased on their morphologies should also consider data from furtherwavelength ranges, such as MIR. This would be the first step to-wards evolving this method into a multi-variate classifier.The unsupervised clustering analysis utilised herein from Pa-per I has potential to be adapted to a machine learning (ML) algo-rithm. One requirement of ML is a large training set, so that thealgorithm can learn the classes. This work has shown good po-tential for SOs of corresponding astrophysical data to be used asthe training set for future applications. If the hierarchical cluster-ing procedure was able to decide whether to continue with certaingroups or reject them based upon predefined criteria, the result-ing training set could be more accurate for investigating parame-ters of the observed samples. For example, late-type regions beingassigned to known early-type groups could be excluded from theprocedure and reassessed. The ML process could also consider theMDS investigation (Appendix A) of the different NPs on the fly,rejecting any data with large ordination differences. The cluster-ing procedure and future ML adaptations would also be useful forcomparing different methods of shape extraction and description.With future clustering and classification methods, we could use theobserved properties of the H II regions (such as ionised mass) to-gether with quantised shape to build a multi-variate descriptor toreplace the shape-distance scores, which can then be compared andclustered (we applied such methods as a multi-variate descriptor oflight-curve variations in Froebrich et al. 2018). This could be the natural evolution of this work, after the next steps of increasing thesample size of both the SO and observational samples.Only one shape descriptor is used in this work so far, the cur-vature distributions obtained from the contoured boundaries of theH II regions. Whilst we have shown the potential associations be-tween morphology of H II regions and physical properties, there isno clear one to one correspondence between the extracted shapesand the properties considered thus far. Our original reasoning forselecting the contouring/curvature descriptor was for automationand bias reduction. However, given that we have seen here and inour previous work, potential for shape to be used as an indicativemeasure, future work should also further investigate the differencesbetween different shape descriptors. We should also consider howto properly utilise angular resolution scales, when distances are un-known. This avenue could have potential applications for reduc-ing kinematic distance errors in observational data. Different im-age analysis/quantisation techniques considered for future work in-clude convolutional/artificial neural networks and self organisingmaps. The comparison of such methods with the shape analysismethods used so far would be useful for confirming the validity ofwhich method to adopt for future classification requirements, po-tentially based on synthetic data as the training set.There is also further potential for this shape analysis and clus-tering method to be applied to other astronomical phenomena. Wehave quantitatively shown here that modern SOs are producing wellrepresentative H II regions; we could therefore begin to investigatehow different observations and simulations perform. For example,H II regions could be compared to supernovae remnants, which arevisually similar in the radio continuum images (e.g. Green 2009). Itwould be interesting to discover whether our shape analysis methodcould tell the two nebulae apart, and if so, how it compares to estab-lished methods such as spectral energy distribution fitting. Lookingfurther ahead, as telescope and imaging techniques continue to im-prove, alongside computational power, there will be even more highresolution data to analyse and characterise. Statistical shape analy-sis could prove to be a useful tool in the era of big data. The synthetic observations of an H II region produced in the nu-merical simulations of Ali et al. (2018) were analysed using theshape analysis and statistical clustering methodology developed inCampbell-White et al. (2018). The numerical H II region was the re-sult of photoionisation and radiation pressure feedback of a 34 M (cid:12) star, in a 1000 M (cid:12) cloud. 77 SOs were considered, comprisingfour evolutionary snapshots (0.1, 0.2, 0.4 and 0.6 Myr), and multi-ple viewing projection angles. After the addition of artificial Gaus-sian noise, following the distribution of observational noise fromone of the MAGPIS tiles, the shapes of the SO H II regions wereextracted in the same manner as they were for the MAGPIS ob-servational sample in Paper I. The shape analysis results providedconfirmation of the efficacy of the numerical simulations, such thatthey are quantifiably consistent, in terms of their shape, with thereal observational counterparts. When considering the 76 MAGPISH II regions from Paper I and 12 representative SO H II regions,across the four ages, the SO H II regions were placed in amongstthe MAGPIS H II regions, in the resulting dendrogram from the hi-erarchical clustering procedure.This result was also found when directly inserting the SO re-gions to different MAGPIS tiles, to represent realistic noise for thesimulation images. By using five MAGPIS noise profiles for the 77 MNRAS000
MNRAS , 1–23 (2020) hape Analysis of H II Regions – II simulations of differing initial masses, ambient densities, environ-mental conditions and temperatures. If similar groupings that areshown in this investigation are seen for the larger sample set, wewill be one step closer to a thorough morphological classificationscheme. For example, we might find that the groups from the hi-erarchical procedure still differentiate between early- and late-typeH II regions, but also by intermediate mass and high mass ionisingsources, a result that was indicated for the Galactic regions in PaperI. Furthermore, with a different initial condition set-up, and a largersimulation grid, we would not be required to boost the emission ofthe regions in the SOs at later times, as we did in this investigation.We plan to extend this work to the simulations of Ali & Harries(2019), which features a 10 M (cid:12) cloud. This could lead to regionsthat are even more representative of the observed sample. We alsohave further simulations currently in preparation that feature cloudsof different metallicity environments, together with differing ambi-ent densities. Another factor to consider for future SOs used forcomparisons is the discretisation of parameters. A more continu-ous sample of ages is well within reason and is only limited by thesimulation time-steps. Varying parameters such as initial mass orelectron density would be more computationally expensive, but asthe data becomes available, it will be useful to have a shape analysistool ready for the analysis and comparison.In terms of the observational sample of H II regions, a furtherapplication would be to compare samples from different radio con-tinuum surveys. The work carried out here regarding a systematiceffect of the noise profiles on the H II regions was non-conclusive,i.e., if there is a systematic effect, it as not revealed by our investi-gation. However, comparing different observational data with dif-ferent noise profiles may lead to progress in this area. Similar tohow we considered the same SO with a different noise profile, wecould consider the same Galactic H II region from different surveysin the same manner. A survey with complementary coverage to theMAGPIS survey that could be used is The HI/OH/Recombinationline survey of the inner Milky Way (THOR) (Beuther et al. 2016).In addition to this, whilst we have only considered radio contin-uum images of H II regions in this work, a full classification schemebased on their morphologies should also consider data from furtherwavelength ranges, such as MIR. This would be the first step to-wards evolving this method into a multi-variate classifier.The unsupervised clustering analysis utilised herein from Pa-per I has potential to be adapted to a machine learning (ML) algo-rithm. One requirement of ML is a large training set, so that thealgorithm can learn the classes. This work has shown good po-tential for SOs of corresponding astrophysical data to be used asthe training set for future applications. If the hierarchical cluster-ing procedure was able to decide whether to continue with certaingroups or reject them based upon predefined criteria, the result-ing training set could be more accurate for investigating parame-ters of the observed samples. For example, late-type regions beingassigned to known early-type groups could be excluded from theprocedure and reassessed. The ML process could also consider theMDS investigation (Appendix A) of the different NPs on the fly,rejecting any data with large ordination differences. The cluster-ing procedure and future ML adaptations would also be useful forcomparing different methods of shape extraction and description.With future clustering and classification methods, we could use theobserved properties of the H II regions (such as ionised mass) to-gether with quantised shape to build a multi-variate descriptor toreplace the shape-distance scores, which can then be compared andclustered (we applied such methods as a multi-variate descriptor oflight-curve variations in Froebrich et al. 2018). This could be the natural evolution of this work, after the next steps of increasing thesample size of both the SO and observational samples.Only one shape descriptor is used in this work so far, the cur-vature distributions obtained from the contoured boundaries of theH II regions. Whilst we have shown the potential associations be-tween morphology of H II regions and physical properties, there isno clear one to one correspondence between the extracted shapesand the properties considered thus far. Our original reasoning forselecting the contouring/curvature descriptor was for automationand bias reduction. However, given that we have seen here and inour previous work, potential for shape to be used as an indicativemeasure, future work should also further investigate the differencesbetween different shape descriptors. We should also consider howto properly utilise angular resolution scales, when distances are un-known. This avenue could have potential applications for reduc-ing kinematic distance errors in observational data. Different im-age analysis/quantisation techniques considered for future work in-clude convolutional/artificial neural networks and self organisingmaps. The comparison of such methods with the shape analysismethods used so far would be useful for confirming the validity ofwhich method to adopt for future classification requirements, po-tentially based on synthetic data as the training set.There is also further potential for this shape analysis and clus-tering method to be applied to other astronomical phenomena. Wehave quantitatively shown here that modern SOs are producing wellrepresentative H II regions; we could therefore begin to investigatehow different observations and simulations perform. For example,H II regions could be compared to supernovae remnants, which arevisually similar in the radio continuum images (e.g. Green 2009). Itwould be interesting to discover whether our shape analysis methodcould tell the two nebulae apart, and if so, how it compares to estab-lished methods such as spectral energy distribution fitting. Lookingfurther ahead, as telescope and imaging techniques continue to im-prove, alongside computational power, there will be even more highresolution data to analyse and characterise. Statistical shape analy-sis could prove to be a useful tool in the era of big data. The synthetic observations of an H II region produced in the nu-merical simulations of Ali et al. (2018) were analysed using theshape analysis and statistical clustering methodology developed inCampbell-White et al. (2018). The numerical H II region was the re-sult of photoionisation and radiation pressure feedback of a 34 M (cid:12) star, in a 1000 M (cid:12) cloud. 77 SOs were considered, comprisingfour evolutionary snapshots (0.1, 0.2, 0.4 and 0.6 Myr), and multi-ple viewing projection angles. After the addition of artificial Gaus-sian noise, following the distribution of observational noise fromone of the MAGPIS tiles, the shapes of the SO H II regions wereextracted in the same manner as they were for the MAGPIS ob-servational sample in Paper I. The shape analysis results providedconfirmation of the efficacy of the numerical simulations, such thatthey are quantifiably consistent, in terms of their shape, with thereal observational counterparts. When considering the 76 MAGPISH II regions from Paper I and 12 representative SO H II regions,across the four ages, the SO H II regions were placed in amongstthe MAGPIS H II regions, in the resulting dendrogram from the hi-erarchical clustering procedure.This result was also found when directly inserting the SO re-gions to different MAGPIS tiles, to represent realistic noise for thesimulation images. By using five MAGPIS noise profiles for the 77 MNRAS000 , 1–23 (2020) J. Campbell-White et al.
SOs, we essentially had 385 distinct observations of the numericalH II region, at the given age snapshots and projections. As with theshapes of the H II regions using the artificial Gaussian noise distri-bution, those from the MAGPIS NPs were grouped in amongst theMAGPIS H II regions, with the majority of synthetic regions pairedwith one of the MAGPIS regions. This suggested that the differ-ent projection angles and noise profiles were having a significantimpact on the regions shapes. Such that the SO H II regions of thesame age were not exclusively grouped with each other in the hier-archical clustering.When considering the hierarchical clustering of the 385 SOsthat had been inserted into the MAGPIS tiles, the following resultswere obtained: • The determined hierarchy showed a clear divide betweenearly- (0.1 and 0.2 Myr) and late-type (0.4 and 0.6 Myr) regions.This divide was not exclusive by age, with a low cut on the dendro-gram (resulting in many groups) still producing groups that pos-sessed a mix of both early- and late-type regions. Whilst this maybe due to how the late-type region’s emission had to be artificiallyboosted to account for mass leaving the simulation grid, the re-sults for the late-type regions still show the same cross over as theearly-type regions. Furthermore, these late-type regions were stillshown to be representative of the MAGPIS observational sample,even with this boosting. However, this is a point to return to in fu-ture work with simulations from a larger grid. • There was no further association between the identified group-ings and SO region radii, apart from the main split between theearly- (small radius) and late-type (large radius) regions. This sug-gests that the result obtained in Paper I, pertaining to one grouphosting exclusively small regions could in fact be due to those re-gions all being young H II regions. • There was no strong preference for SO regions from a givennoise profile, nor given projection angle, to be assigned to specificgroups. In terms of which of these parameters has more of an effecton the shape – for a given SO age and projection angle, on aver-age two of the five NPs were grouped together in the hierarchicalclustering. For a given age and NP, on average ∼ of the dif-ferent projections were grouped together. This result was consistentfor both the early- and late-type SO H II regions, even though theearly-type regions appeared to be more spherically symmetric. • No systematic effect due to the different NPs was found bythe analysis. The hierarchical grouping of the pairwise shape dis-tance measures revealed no preference for SOs from a given NP tobe grouped together. Further investigation using multidimensionalscaling ordinance also revealed no such systematic influence. • The MDS did show systematic effects for how the shape of theH II regions is extracted. Different initial contour levels (for iden-tifying the boundary of the H II regions) changed the ordination inthe MDS axes, but not the relative scores along the axes. Whilst dif-ferent spatial shape resolutions changed the scores along the axesbut not the relative ordination positions. Higher resolutions corre-sponded to a larger spread in MDS scores, showing that as moredetail is considered, the variances in shape as a result of the differ-ent NPs is more profound.The results in this work have shown that the realistic SOs consid-ered here are conclusively morphologically representative of theGalactic H II regions we observe in the MAGPIS radio continuumsurvey. To determine whether the SOs could potentially be used toconstruct a training set for supervised classification of H II regions,via their shapes, a mass limited sample of the MAGPIS H II regionswere considered along with the 385 SOs. These results showed that there was good correspondence between respective early- and late-type H II regions from each sample. This suggests that there is a lotof potential for the utilisation of the SOs to construct such a train-ing set. For this SO sample, we only investigated whether there wascorrespondence between the ages, since we only considered SOsfrom one set of initial conditions. For a larger training set, of vary-ing masses and ambient densities, across the different evolutionarystages, the results shown here suggest that we should be able tomake predictions of the physical nature of the Galactic H II regions,based upon how their shapes compare to those of the model simu-lations. ACKNOWLEDGEMENTS
REFERENCES
Ali A. A., Harries T. J., 2019, MNRAS, 487, 4890Ali A., Harries T. J., Douglas T. A., 2018, MNRAS, 477, 5422Anderson T. W., Darling D. A., 1952, Ann. Math. Statist., 23, 193Anderson L. D., Bania T. M., Balser D. S., Cunningham V., Wenger T. V.,Johnstone B. M., Armentrout W. P., 2014, ApJS, 212, 1Baes M., Verstappen J., De Looze I., Fritz J., Saftly W., Vidal Pérez E.,Stalevski M., Valcke S., 2011, ApJS, 196, 22Bethell T. J., Zweibel E. G., Li P. S., 2007, ApJ, 667, 275Beuther H., et al., 2016, A&A, 595, A32Campbell-White J., Froebrich D., Kume A., 2018, MNRAS, 477, 5486Dale J. E., Bonnell I. A., Whitworth A. P., 2007a, MNRAS, 375, 1291Dale J. E., Clark P. C., Bonnell I. A., 2007b, MNRAS, 377, 535Froebrich D., et al., 2018, MNRAS, 478, 5091Geen S., Soler J. D., Hennebelle P., 2017, MNRAS, 471, 4844Green D. A., 2009, MNRAS, 399, 177Harries T. J., Haworth T. J., Acreman D., Ali A., Douglas T., 2019, Astron-omy and Computing, 27, 63Haworth T. J., Harries T. J., Acreman D. M., Bisbas T. G., 2015, MNRAS,453, 2277Haworth T. J., Glover S. C. O., Koepferl C. M., Bisbas T. G., Dale J. E.,2018, New Astron. Rev., 82, 1Helfand D. J., Becker R. H., White R. L., Fallon A., Tuttle S., 2006, AJ,131, 2525Koepferl C. M., Robitaille T. P., 2017, ApJ, 849, 3MNRAS , 1–23 (2020) hape Analysis of H II Regions – II Krumholz M. R., 2015, in Vink J. S., ed., Astrophysics and Space Sci-ence Library Vol. 412, Very Massive Stars in the Local Universe. p. 43( arXiv:1403.3417 ), doi:10.1007/978-3-319-09596-7_3Krumholz M. R., Klein R. I., McKee C. F., 2011, The Astrophysical Journal,740, 74Kurucz R. L., 1991, in Crivellari L., Hubeny I., Hummer D. G., eds, NATOAdvanced Science Institutes (ASI) Series C Vol. 341, NATO AdvancedScience Institutes (ASI) Series C. p. 441Lada C. J., Lada E. A., 2003, Annual Review of Astronomy and Astro-physics, 41, 57Lanz T., Hubeny I., 2003, ApJS, 146, 417Murtagh F., Legendre P., 2014, Journal of Classification, 31, 274Osterbrock D. E., Ferland G. J., 2006, Astrophysics of gaseous nebulae andactive galactic nucleiPadoan P., Nordlund A., Jones B. J. T., 1997, MNRAS, 288, 145Pettitt A. N., 1976, Biometrika, pp 161–168Robitaille T. P., 2011, A&A, 536, A79Rybicki G. B., Lightman A. P., 1979, Radiative processes in astrophysicsSalpeter E. E., 1955, ApJ, 121, 161Schaller G., Schaerer D., Meynet G., Maeder A., 1992, A&AS, 96, 269Springel V., 2005, MNRAS, 364, 1105Steggles H. G., Hoare M. G., Pittard J. M., 2017, MNRAS, 466, 4573Steinacker J., Bacmann A., Henning T., Klessen R., Stickel M., 2005, A&A,434, 167Steinacker J., Baes M., Gordon K. D., 2013, Annual Review of Astronomyand Astrophysics, 51, 63Walch S., Whitworth A. P., Bisbas T. G., Wünsch R., Hubber D. A., 2013,MNRAS, 435, 917Ward J., Joe H., 1963, Journal of the American statistical association, 58,236Weidner C., Vink J. S., 2010, A&A, 524, A98Weingartner J. C., Draine B. T., 2001, ApJ, 548, 296Williams T. G., Baes M., De Looze I., Relaño M., Smith M. W. L., Ver-stocken S., Viaene S., 2019, MNRAS, 487, 2753Yorke H. W., Tenorio-Tagle G., Bodenheimer P., 1983, A&A, 127, 313
APPENDIX A: FURTHER DETAILS ON NOISE PROFILESAND SELECTION CHOICES
In an attempt to determine whether we can quantify the influenceof the MAGPIS noise profiles on the shape of the SO H II reigons,we return to the ordinance technique of multi-dimensional scaling(MDS) that was used in Paper I to check that the hierarchical clus-tering was properly defining groups based upon the regions shapes.To recap: MDS reduces the dimensionality of an input distance ma-trix to a number of orthogonal principal coordinates. The eigen-vectors of which, give the ordination and the eigenvalues give therelative importance of that axis for representing the data variation.In Paper I, we saw that there was a correspondence between theamount of high curvature points along the region boundaries andthe scores along axis 1 of the MDS ordination, and surmised thatthe variation along axis 2 was also directly resulting from featuresof the curvature distributions. We also showed that the ordinationson the MDS plots corresponded well with the groups from the hier-archical clustering. Here, we can apply MDS to the distance matrixof H II region shapes for a given SO across the different NPs, toboth visually and quantitatively see how the resulting mathemati-cal shapes compare.Figure A1 shows the MDS ordinations for four of the SOs,one at each age. The numbering of the H II region shape’s pointson the graphs corresponds to the NP of the SO (from Sec. 3.3),with NP = 0 corresponding to the artificial Gaussian distribution(from Sec. 3.1). In each of the MDS plots, only the six H II regionshapes shown were compared pairwise using the A-D test. These −2 −1 0 1 2 3 4 − − Axis 1 A x i s −2 −1 0 1 2 3 4 − − Axis 1 A x i s
012 4 63 −2 −1 0 1 2 3 4 − − Axis 1 A x i s
01 24 63 −2 −1 0 1 2 3 4 − − Axis 1 A x i s Figure A1.
Multidimensional Scaling ordination plots for the shape dis-tances of four example SO ages and projections. Labels signify NP, with0 representing the shape obtained from the random Gaussian distribution.Point colours correspond to which group the SO H II region shapes wereallocated to in the hierarchical clustering of the full dataset. ordination results are hence only showing the differences the NPshave on the shapes. In each of the four instances, for increasingSO age, axis 1 of the MDS accounts for 64, 76, 73 and 80% ofthe shape variability, respectively. Axis 2 accounts for 23, 18, 16and 15%, respectively. Therefore, these two axes are sufficient forinvestigating the shape differences accordingly.In each of the four plots, the Gaussian noise profile 0 shapeis ordinated away from the other five NPs. For 0.1 Myr, all of theshapes are spread over the ordination plot. For 0.2 Myr, NP 0 is or-dinated away from the origin, at approximately an equal distancefrom each of the other NPs. A similar looking distribution is seenfor the 0.4 Myr data, with a tighter association. For the 0.6 Myrdata, NPs 1, 4 and 6 are ordinated very close to one another, withNPs 0, 2 and 3 ordinated away. In each case, the points are colouredby which of the six groups the shape was sorted into from thedendrogram in Sec. 4. As expected, those ordinated close togetherare assigned to the same group in the larger data set. NP 3 in the0.6 Myr data is the one example of that age assigned to a groupcomprising otherwise only early-type regions. As we can see here,it has the furthest distance from the other points. Another interest-ing note is that, for the 0.1 Myr shapes, the shape from NP 2 isordinated close to the origin of the coordinate system. This meansthat this shape represents the ‘average’ of the sample and the othershapes are all differing with respect to this shape.These plots were produced for a number of given projectionsto determine whether the respective positions in the MDS spacefor each NPs was systematic and reproducible. Unfortunately, theeffect the different NPs have on the underlying shape of the SOH II region is not a systematic across the ages. We do not see eachof the respective NPs by age behaving in the same manner in eachof the MDS plots; neither with respect to the other MAGPIS NPs,nor the Gaussian noise only shapes. The example shown for the0.1 Myr SO shows a large spread in the MDS ordination, yet this isan example where four of the five NPs are put in the same of the MNRAS000
Multidimensional Scaling ordination plots for the shape dis-tances of four example SO ages and projections. Labels signify NP, with0 representing the shape obtained from the random Gaussian distribution.Point colours correspond to which group the SO H II region shapes wereallocated to in the hierarchical clustering of the full dataset. ordination results are hence only showing the differences the NPshave on the shapes. In each of the four instances, for increasingSO age, axis 1 of the MDS accounts for 64, 76, 73 and 80% ofthe shape variability, respectively. Axis 2 accounts for 23, 18, 16and 15%, respectively. Therefore, these two axes are sufficient forinvestigating the shape differences accordingly.In each of the four plots, the Gaussian noise profile 0 shapeis ordinated away from the other five NPs. For 0.1 Myr, all of theshapes are spread over the ordination plot. For 0.2 Myr, NP 0 is or-dinated away from the origin, at approximately an equal distancefrom each of the other NPs. A similar looking distribution is seenfor the 0.4 Myr data, with a tighter association. For the 0.6 Myrdata, NPs 1, 4 and 6 are ordinated very close to one another, withNPs 0, 2 and 3 ordinated away. In each case, the points are colouredby which of the six groups the shape was sorted into from thedendrogram in Sec. 4. As expected, those ordinated close togetherare assigned to the same group in the larger data set. NP 3 in the0.6 Myr data is the one example of that age assigned to a groupcomprising otherwise only early-type regions. As we can see here,it has the furthest distance from the other points. Another interest-ing note is that, for the 0.1 Myr shapes, the shape from NP 2 isordinated close to the origin of the coordinate system. This meansthat this shape represents the ‘average’ of the sample and the othershapes are all differing with respect to this shape.These plots were produced for a number of given projectionsto determine whether the respective positions in the MDS spacefor each NPs was systematic and reproducible. Unfortunately, theeffect the different NPs have on the underlying shape of the SOH II region is not a systematic across the ages. We do not see eachof the respective NPs by age behaving in the same manner in eachof the MDS plots; neither with respect to the other MAGPIS NPs,nor the Gaussian noise only shapes. The example shown for the0.1 Myr SO shows a large spread in the MDS ordination, yet this isan example where four of the five NPs are put in the same of the MNRAS000 , 1–23 (2020) J. Campbell-White et al. six groups from the hierarchical clustering of the entire data-set. Onthe other hand, NP 1 of the 0.4 Myr data is ordinated close to NPs4 and 3 but is placed into the same group as NPs 2 and 6. We mustremember here, however, that the hierarchical groupings are for amuch larger sample, such that many of the other projections will beinfluencing the groupings due to the agglomerative procedure.An explanation for this non-systematic effect of the NPs is thatthe noise from the radio continuum images is not exactly Gaussian,i.e., it has inherent structures within, which was a motivation forusing them in the manner we have here. This means that the differ-ent sizes of the SOs at each age will be influenced by the inherentnoise in the continuum observations by varying amounts. An exam-ple can be seen in the results where NPs 3 and 4 are ordinated closeto one another in the 0.2 and 0.4 Myr plots, but not in the other two.Whilst the MDS investigation is useful for visualising the spreadin the shape data, we do not gather any summary results from thisinvestigation for the data in its entirety. The MDS approach does,however, show the specific influence of each NP when comparedto a reference or standardised version of the shape, which may beuseful in future machine learning approaches.Another application of the MDS ordination we performed wasto further investigate how the different selection choices for defin-ing the H II region shapes affect the resulting spread in the shapedata. That is, the initial sigma level used when extracting the bound-aries and the spline knot spacings for controlling the spatial reso-lution of the shape landmarks. So far, for all of the SO shape data,we have only considered the 1 sigma contour level, along with the0.54 pc spline knot spacing. This was for the purpose of directlycomparing the SOs to the previous results from the MAGPIS data.However, we suggested in Paper I that the SOs could provide abetter test set for determining how these two selection variables in-fluence the resulting shape. When rerunning the MDS of the shapedata using contours with 0.8 and 1.2 sigma above the mean value,the positions of the ordinations changes, but the overall spread inthe data remains. This suggests that, as with the NPs, varying theinitial sigma level is not having a systematic effect on the shapedata. When rerunning the MDS with different spline resolutions,decreasing the spline interval (hence increasing the spatial reso-lution) results in a larger spread along the MDS axes. This wasexpected as more features and points for comparison are capturedwith a higher resolution. Furthermore, we have already shown thatthe amount of high curvature points corresponds to the score alongaxis 1 of the MDS. Increasing the interval (decreasing the resolu-tion) results in a smaller spread in the MDS ordination. Whilst thismay seem like a favourable result, the level of smoothing alongthe boundaries is significantly increased, resulting in fewer fea-tures along the curves. Increasing this smoothing amount by toomuch thus becomes redundant for the smaller diameter regions. Asfound with the previous tests of this nature in Paper I, the 1 sigma,0.54 pc interval seems to be a good intermediary between extremes.The most important aspect here (and for future work of this nature)is that the shape extraction and quantification remains consistentfor the sample. APPENDIX B: MASS-LIMITED MAGPIS SAMPLEIMAGES
The following figures show the images of MAGPIS H II Regionassigned to each of the training groups in Fig. 15. Groups are num-bered top down from the dendrogoram. Image tiles are in angulardegrees scale with Galactic longitude and latitudes shown. The con-
Figure B1.
Group 3, note that region G045.204+00.744 is not shown toured outlines are those obtained from the shape extraction pro-cess, with the corresponding spline interpolation points used to ob-tain the curvature distributions indicated by the open squares.
MNRAS , 1–23 (2020) hape Analysis of H II Regions – II Figure B2.
Group 1
Figure B3.
Group 2
Figure B4.
Group 6
Figure B5.
Group 4
Figure B6.
Group 5MNRAS000
Group 5MNRAS000 , 1–23 (2020) J. Campbell-White et al.
APPENDIX C: EXTRACTED SHAPES FROM SYNTHETIC OBSERVATIONS WITH ARTIFICIAL GUASSIAN NOISE
Synthetic Observations (SOs) of the numerical simulation in (Ali et al. 2018). The 77 images are shown with coordinates in parsecs. Headingsidentify the snapshot age and projection viewing angle (t = θ , p = φ ). In each image, random Gaussian noise has been added to each pixelvalue, following the distribution from an example MAGPIS 1.4 GHz image tile. The boundary shown is that of the 1 σ above the mean noiselevel contour. MNRAS , 1–23 (2020) hape Analysis of H II Regions – II MNRAS000
Synthetic Observations (SOs) of the numerical simulation in (Ali et al. 2018). The 77 images are shown with coordinates in parsecs. Headingsidentify the snapshot age and projection viewing angle (t = θ , p = φ ). In each image, random Gaussian noise has been added to each pixelvalue, following the distribution from an example MAGPIS 1.4 GHz image tile. The boundary shown is that of the 1 σ above the mean noiselevel contour. MNRAS , 1–23 (2020) hape Analysis of H II Regions – II MNRAS000 , 1–23 (2020) J. Campbell-White et al.
MNRAS , 1–23 (2020) hape Analysis of H II Regions – II This paper has been typeset from a TEX/L A TEX file prepared by the author.MNRAS000