[PDF] An Investigation of Spectral Line Stacking Techniques and Application to the Detection of HC 11 N

Abstract

As the inventory of interstellar molecules continues to grow, the gulf between small species, whose individual rotational lines can be observed with radio telescopes, and large ones, such as polycyclic aromatic hydrocarbons (PAHs) best studied in bulk via infrared and optical observations, is slowly being bridged. Understanding the connection between these two molecular reservoirs is critical to understanding the interstellar carbon cycle, but will require pushing the boundaries of how far we can probe molecular complexity while still retaining observational specificity. Toward this end, we present a method for detecting and characterizing new molecular species in single-dish observations toward sources with sparse line spectra. We have applied this method to data from the ongoing GOTHAM (GBT Observations of TMC-1: Hunting Aromatic Molecules) Green Bank Telescope (GBT) large program, discovering six new interstellar species. In this paper we highlight the detection of HC 11 N, the largest cyanopolyyne in the interstellar medium.

Full PDF

DDraft version September 28, 2020

Typeset using L A TEX twocolumn style in AASTeX63

An Investigation of Spectral Line Stacking Techniques and Application to the Detection of HC N Ryan A. Loomis, Andrew M. Burkhardt, Christopher N. Shingledecker,

3, 4, 5

Steven B. Charnley, Martin A. Cordiner,

6, 7

Eric Herbst,

8, 9

Sergei Kalenskii, Kin Long Kelvin Lee,

11, 2

Eric R. Willis, Ci Xue, Anthony J. Remijan, Michael C. McCarthy, and Brett A. McGuire

11, 1, 2 National Radio Astronomy Observatory, Charlottesville, VA 22903, USA Center for Astrophysics | Harvard & Smithsonian, Cambridge, MA 02138, USA Department of Physics and Astronomy, Benedictine College, Atchison, KS 66002, USA Center for Astrochemical Studies, Max Planck Institute for Extraterrestrial Physics, Garching, Germany Institute for Theoretical Chemistry, University of Stuttgart, Stuttgart, Germany Astrochemistry Laboratory and the Goddard Center for Astrobiology, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA Institute for Astrophysics and Computational Sciences, The Catholic University of America, Washington, DC 20064, USA Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA Department of Astronomy, University of Virginia, Charlottesville, VA 22904, USA Astro Space Center, Lebedev Physical Institute, Russian Academy of Sciences, Moscow, Russia Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

ABSTRACTAs the inventory of interstellar molecules continues to grow, the gulf between small species, whoseindividual rotational lines can be observed with radio telescopes, and large ones, such as polycyclicaromatic hydrocarbons (PAHs) best studied in bulk via infrared and optical observations, is slowlybeing bridged. Understanding the connection between these two molecular reservoirs is critical tounderstanding the interstellar carbon cycle, but will require pushing the boundaries of how far we canprobe molecular complexity while still retaining observational speciﬁcity. Toward this end, we presenta method for detecting and characterizing new molecular species in single-dish observations towardsources with sparse line spectra. We have applied this method to data from the ongoing GOTHAM(GBT Observations of TMC-1: Hunting Aromatic Molecules) Green Bank Telescope (GBT) largeprogram, discovering six new interstellar species. In this paper we highlight the detection of HC N,the largest cyanopolyyne in the interstellar medium. INTRODUCTIONAs molecules increase in size, detection by rotationalspectroscopy generally becomes more challenging. Inlarge molecules, there are a substantially larger num-ber of rotational energy levels over which populationis distributed, reducing the emission between any twothat give rise to an observable transition. Even at lowtemperatures the rotational partition function for suchspecies can be high, with a large number of thermallypopulated rotational levels, diluting the intensity of anygiven transition. Additionally, larger species are gener-ally less abundant than smaller species (McGuire 2018).Taken together, it is often far more diﬃcult to detectindividual rotational lines of a heavy species relativeto those of a light species even if both have identical

Corresponding author: Ryan A. Loomis, Brett A. [email protected], [email protected] dipole moments, rotational temperatures, and columndensities. Even for small PAHs, for example, the totalline intensity is diluted over potentially hundreds if notthousands of transitions, making it exceedingly diﬃcultto detect any individual line in a reasonable amount ofintegration time.Here, we describe a new method that combines thetechniques of Markov Chain Monte Carlo (MCMC) in-ference with spectral line stacking and matched ﬁlteringto counteract the eﬀects of rotational dilution, improv-ing detection eﬃciency and the characterization of weakemission from large molecules.1.1.

Molecular detection technique

MCMC inference has grown in popularity in recentyears in the astrochemical community as a tool for an-alyzing the properties of spectroscopic lines (Czekala2016; Gratier et al. 2016; Loomis et al. 2016), allowingfor straightforward characterization of parameter uncer-tainties and covariances. Similarly, line stacking and a r X i v : . [ a s t r o - ph . GA ] S e p [ ] i ω i

20 0 20Velocity (km/s)0100200300 I m pu l s e R e s p o n s e ( ) Peak Impulse Response: 258.1 S N R ( ) HC N §2.1 - The GOTHAM Dataset§2.2 - Spectral Simulator §2.3 - Data Preparation§2.4 - MCMC Fitting§2.5 - Line Stacking§2.6 - Matched Filtering ω ω ω i ω i ∗ Spectroscopic CatalogSource PropertiesTelescope Properties

Figure 1.

Schematic diagram of our method for molecular detection and characterization. In short, the GOTHAM dataset( § § § § § § matched ﬁltering techniques have regularly been appliedto improve the signal-to-noise ratio (SNR) and detectioneﬃciency of weak lines (Walsh et al. 2016; Loomis et al.2018, 2020). Here, we present a hybrid combination ofthese techniques to robustly infer the presence of largeastronomical molecules of interest in single dish spec-tra, as well as their emission parameters and associateduncertainties. In particular, this technique is ideal foridentifying and characterizing species when no individ-ual line is intense enough to be observed in a spectralline survey, but where many lines are present in the dataitself, hidden under the noise. A ﬂowchart providing anoverview of our analysis method is shown in Fig. 1, andwe explain each step of the process in the following sub-sections. 1.1.1. The GOTHAM dataset

Our method is best suited to a line-sparse single-dimensional spectral dataset, and here we investigateits application to data from the ongoing GBT large pro-gram GOTHAM. The details of these observations are presented in McGuire et al. (2020a). In short, at thetime of this analysis, the observations were ∼

30% com-plete, covering 13.1 GHz of total bandwidth between 7.8and 29.9 GHz. With a frequency resolution of 1.4 kHz(0.014-0.054 km s − ), the dataset encompasses 9.3 mil-lion channels.Despite the wide spectral range, the observations arerelatively line-sparse. A total of 632 lines are de-tected above 5 σ , yielding an eﬀective average line den-sity of 0.05 lines/MHz (1 line every 20 MHz). In addi-tion, the lines are quite narrow: ∼ − in aggre-gate, although we ﬁt the contributions of several (2-4) ∼ − components to these features. The resultis a spectrum which is sparse in ‘bright’ channels: only1 channel in every ∼ > σ above the local noiselevel, which is equivalent to a ﬁlling factor of < Spectral simulator

To infer the desired astrophysical properties (forexample, excitation temperature and column den-sity) of a given molecular species, we employ a for-ward modeling framework where spectra are itera-tively simulated in a fashion similar to the obser-vations themselves and then compared to the data.Our spectral simulator is based on the basic equa-tions of molecular excitation and radiative transfer(Liu et al. 2001; Remijan et al. 2005; Mangum &Shirley 2015); the open source code can be foundat https://github.com/ryanaloomis/spectral simulator.The simulator has three main inputs: a spectroscopiccatalog in SPCAT format from the

CALPGM suite ofprograms, (Pickett 1991; Drouin 2017) a collection oftelescope properties, and a collection of source proper-ties.The most critical telescope property is the 100 m dishsize of the GBT, required for calculating an eﬀectivebeam ﬁlling factor to account for beam dilution eﬀects.Source properties for each source component are left asfree parameters, and include eﬀective source size (usedfor calculating ﬁlling factors and assumed to be a sym-metric Gaussian), column density ( N col ), excitation tem-perature ( T ex ), source velocity ( v LSRK ), and linewidth( dv ).In our modeling of TMC-1, we have found four dis-tinct velocity components at similar velocities to thosepreviously identiﬁed (Dobashi et al. 2018, 2019) fromwhich the majority of species emit from. Source size, col-umn density, and source velocity were allowed to freelyvary for each component, and the excitation tempera-ture and linewidth are ﬁt jointly across components. Itis likely that excitation temperature and linewidth dovary slightly across the diﬀerent cloud components, butour data is not suﬃcient to constrain these diﬀerences,which we discuss in more detail later. Several speciesin our analysis were best-ﬁt by utilizing only a subsetof three of these four cloud components, and their re-sults are presented with a corresponding number of freeparameters.The spatial orientation of the four cloud componentson the sky (Fig. 2) has a pronounced impact on howtheir emission is measured by the telescope. First, sinceour dataset is only from a single pointing position andwe do not have spatial information about these cloudcomponents, we make the simplifying assumption thateach component is centered in the beam. The GaussianFWHM of the beam for the Green Bank Telescope iscalculated for a given wavelength λ and dish size D as: θ bm [ (cid:48)(cid:48) ] = 206265 × . λD (1) as documented in the GBT Proposer’s Guide (Staﬀ2020) and the corresponding beam dilution factor fora Gaussian source centered in the beam with FWHM θ source is: θ source θ bm + θ source . (2)This beam dilution factor is applied at each frequencyin the spectrum to all cloud components based on theirgiven source sizes. In reality, the sources may be un-equally distributed throughout the beam, leading tovarying beam dilution eﬀects at diﬀerent frequencies,as the source begins to exit the beam. We discuss thispoint in more detail later.In the optically thin limit, the spatial distribution ofcomponents does not strongly impact how their emissionco-adds. Thus for species ﬁrmly within the opticallythin limit, a beam diluted spectrum can be generatedfor each component and then summed. For species thatmay have lines which are more optically thick, however,the spatial distribution may have a more pronouncedeﬀect on how the emission co-adds.If two optically thick lines lie at diﬀerent velocities,and the linewidths are smaller than the separation be-tween the central velocity of the components, then thecomponents are radiative decoupled and can be added asin the optically thin case. This is the main assumptionof the Large Velocity Gradient approximation. Addi-tionally, if two optically thick components are spatiallydistinct, they will add linearly in measured intensity. Ifthere are tow co-spatial optically thick lines that overlapin velocity, however, they need to be added in τ spacebefore converting to intensity. We refer to these twolimiting cases as ‘separate components’ and ‘co-spatial’.As we lack the spatial information to disentangle themore complicated (and more likely) scenario of a situa-tion between these two limiting cases, we instead presentresults from the two limits and discuss both when rele-vant. For the ‘co-spatial’ case, it makes more sense to ﬁta common source-size across components (Fig. 2, bot-tom left), so the total number of model parameters isshrunk by three. As shown in the Supplementary Ma-terials and discussed in more detail later, a co-spatialmodel does a much better job of describing the smallerand more optically thick cyanopolyynes.1.1.3. Initial data preparation

To begin the ﬁtting process, it is ﬁrst necessary toreduce the size of the dataset that will be simulated,as generating 9.3 million channels in every step of theMCMC process would not be computationally tractable.A dataset of much more manageable size would consist

Reality: mostly separate?Reality: mostly overlapping? Four separate source sizesCo-spatial: single source size I T = B(I( τ ), SS ) + B(I( τ ), SS ) + B(I( τ ), SS ) + B(I( τ ), SS )I T = B(I( τ + τ + τ + τ ), SS c ) Figure 2.

A schematic showing two spatial distribution regimes in which the emission from TMC-1 may fall into (upperpanels) and the approximations we use in our analysis (lower panels). The FWHM primary beam of the GBT is denoted by theblack circle.

Upper left:

Emission may be mainly co-spatial, with signiﬁcant overlap between velocity components.

Upper right:

Emission may originate from spatially distinct velocity components, which are all still mostly within the primary beam.

Lowerleft:

The ‘co-spatial’ approximation, in which optical depths are added linearly before converting to intensity, and a commonsource size is ﬁt.

Lower right:

The ‘separate components’ approximation, in which each component is separately beam dilutedand then these intensities are added linearly. of only the small number of channels which are near linesof interest for a given species.Using our spectral simulator and a nominal set of tele-scope and source properties, we generate an initial simu-lation for the target species across the full bandwidth ofthe GOTHAM observations. A dish size of 100 m, sourcesize of 100 (cid:48)(cid:48) , excitation temperature of 8 K, column den-sity of 10 cm − , and linewidth of 0.37 km s − are as-sumed. As this initial simulation is only used for select-ing regions of the spectrum to perform the ﬁt on, relativeline strengths need only be approximate, and knowingthe exact source size, excitation temperature, columndensity, or linewidth is not necessary. The linewidth andexcitation temperature are estimated based on previousobservations of TMC-1 (Remijan et al. 2006; Dobashiet al. 2018, 2019), with the linewidth being large enoughto encompass all of the known multiple velocity compo-nents.Nominally the method will work when including alllines in a catalog ﬁle that fall within the range of theobservations. Indeed, for this work all lines were usedwith simple linear species, but for the analysis of speciessuch as 2-cyanonaphthalene where there are thousandsof extremely weak lines, applying a threshold signiﬁ- cantly improves the computational eﬃciency. In thesecases a threshold of 5% of the peak intensity in the ini-tial simulation was used, discarding all lines below thisthreshold as they will not contribute signiﬁcantly to theﬁnal ﬁt or stacked detection. For each remaining line,a window was generated at 5.8 ± − and appliedto the GOTHAM spectrum, yielding a ﬁnal sparse spec-trum with a much smaller datasize, as shown in Fig. 1.Within each window, a local estimate of the noise wastaken by calculating the standard deviation of all pointsless than 3.5 σ (where σ is an initial standard deviationtaken considering all points). This method reduces theimpact of any strong lines on the estimate of the localnoise. For the analysis of weaker species, a 6 σ thresh-old was then applied to block any interloping lines fromother species, preventing them from contaminating themodel ﬁt or ﬁnal stack. Interloping lines were removedfrom the windowed dataset.The ﬁnal output of this procedure is a small, sparsespectrum for each species being considered, as well as anoise spectrum of identical dimensionality.1.1.4. MCMC ﬁtting

With a reasonably sized dataset now available for agiven species, we then utilize an MCMC ﬁtting methodto derive posterior probability distributions and covari-ances for each free parameter in the spectral simulatormodel. This process is very similar to that described inLoomis et al. (2016).The degrees of freedom for each model are set by theconsiderations described earlier, with a maximum of 14free parameters. The aﬃne-invariant MCMC implemen-tation emcee (Foreman-Mackey et al. 2013) was usedwith 100 walkers run for up to 10,000 steps. Conver-gence was assessed using a Gelman-Rubin convergencediagnostic (Gelman & Rubin 1992).Parameter initialization and priors were determinedusing two well-characterized ‘template’ species. Whileinitially investigating the properties of species withinthe GOTHAM data, we found that as one might ex-pect from chemical intuition, linear species appeared toshare source properties with HC N, while cyclic speciesappeared to share source properties more similar to ben-zonitrile. These two species, which both have easilyidentiﬁed bright individual lines, were therefore ﬁt ﬁrstwith very simple priors - source velocities were forced tobe in a sequential order and all other values had physi-cal bounds set on them (e.g. positivity constraints). Anexample corner plot of the HC N ﬁt is shown in Fig. 9.The quality of these ﬁts was then assessed visually,assuring the suitability of the model for the data. Asseen in Fig. 3, these nominal ﬁt parameters reproducedall observed lines within uncertainties. The HC N andbenzonitrile posteriors were then used as priors for theirrespective template families for all values other than col-umn density, and 50th percentile values were used toinitialize walkers in a tight ball. Column densities wereinitialized via quick maximum likelihood ﬁts, holdingthe other initialized values ﬁxed.From these ﬁts for each species, we report parametersand their uncertainties using 16th, 50th, and 84th per-centile intervals (e.g. Table 2 for HC N). These intervalsare also denoted in the corner plots (e.g. Fig. 9). 50thpercentile values are used for all stacking analyses.1.1.5.

Line stacking

The posterior probability distributions from theMCMC ﬁtting describe the range of parameter valuesconsistent with the data, but are predicated on the as-sumption that our model does a good job of describingthe underlying data. This is easily justiﬁed when indi-vidual lines can be detected and compared to the modelpredictions (Fig 3), but is less easy to visualize whenindividual lines are not seen above the noise level. Cal-culating a detection signiﬁcance is therefore crucial to interpreting the MCMC constraints. To provide a visu-ally intuitive interpretation of detection signiﬁcance, webreak this process down into two steps. First, we stackall of the windowed lines which have no interlopers, andsecond, we apply the stacked best ﬁt simulation as amatched ﬁlter to the data stack.The application of line stacking techniques to increaseSNR in spectroscopic data is a well-known technique,particularly in an astrochemical context for the detec-tion of new species (Langston & Turner 2007; Walshet al. 2016; Loomis et al. 2016). Here we follow the nor-mal prescription of SNR weighted stacking of each line(see Fig. 1), but with a minor modiﬁcation for somespecies. When a species has a more complex spectrumwhere transitions are not always well separated (e.g.closely spaced hyperﬁne components), a naive stack ofevery transition will over-count the contributions fromother nearby transitions, and may also contaminate thesignal-free noise regions of the stack with signal fromthese nearby lines. To avoid these issues, we treat groupsof transitions which are blended or closely spaced (typi-cally < N linesin Fig. 3 is shown in Fig. 4. Even though each of theindividual lines were strongly detected, the overall sig-niﬁcance of the detection is greatly enhanced, now witha peak value of ∼ σ . A similar stack of our best-ﬁtmodel is overlaid in red, illustrating the quality of theﬁt. Demonstrations of the robustness of this line stack-ing method for our dataset are shown in the Supplemen-tary Materials. We additionally discuss its limitationslater, particularly with respect to source line density.1.1.6. Matched ﬁltering

As described in Loomis et al. (2018), the technique ofmatched ﬁltering ﬁrst presented by Woodward (1953)and North (1963) can be used on astronomical spectro-scopic data to optimally extract a detection signiﬁcancewhen the shape of the signal is known. In our case,the stacked line signal still retains velocity structure, asseen in Fig. 4, and is thus not yet the maximum SNRattainable.As shown in Fig. 1, we select a narrow region aroundthe stacked predicted spectrum to use as the templateﬁlter, and then cross-correlate this ﬁlter with the stackeddata spectrum, yielding an impulse response spectrum.The spectrum is then normalized by calculating the −

13 8134.5 MHz 15 −

14 8715.5 MHz 16 −

15 9296.6 MHz 17 −

16 9877.6 MHz 18 −

17 10458.6 MHz 19 −

18 11039.7 MHz33 −

32 19174.1 MHz 35 −

34 20336.1 MHz 38 −

37 22079.2 MHz 39 −

38 22660.2 MHz 40 −

39 23241.2 MHz 41 −

40 23822.3 MHz42 −

41 24403.3 MHz 43 −

42 24984.3 MHz 44 −

43 25565.3 MHz 45 −

44 26146.3 MHz 46 −

45 26727.4 MHz 47 −

46 27308.4 MHz − − )0200 T A * ( m K ) −

47 27889.4 MHz 49 −

48 28470.4 MHz 50 −

49 29051.4 MHz 51 −

50 29632.4 MHz

Figure 3.

Individual line detections of HC N in the GOTHAM data. The spectra (black) are displayed in velocity spacerelative to 5.8 km s − , and using the rest frequency given in the top right of each panel. Quantum numbers are given in the topleft of each panel, neglecting hyperﬁne splitting. The best-ﬁt model to the data, including all velocity components, is overlaidin green. Simulated spectra of the individual velocity components are shown in: blue (5.63 km s − ), gold (5.79 km s − ), red(5.91 km s − ), and violet (6.03 km s − ). See Table 2. standard deviation of the spectrum (excluding the cen-tral region where we expect to see signal) and dividingby this standard deviation (Loomis et al. 2018). Theunits of the impulse response are now σ , rather than aﬂux unit, and describe the SNR of the response. Thepeak response can therefore be thought of as a mini-mum detection signiﬁcance for the species. An exampleof this impulse response spectrum for HC N is shownin Fig. 4, where the peak detection signiﬁcance is nowalmost doubled, at 258.1 σ .With a better model and hence a better matching ﬁl-ter, the signiﬁcance of the detection could be improved,but it cannot be lower than the current peak response.We discuss this point in more detail later, along with anexploration of the eﬀects of spectroscopic catalog accu-racy on the recovered detection signiﬁcance.1.1.7. Upper limits

In cases where our matched ﬁltering analysis yields animpulse response with a signiﬁcance not large enough toclaim a detection (e.g. < σ ), we reﬁt the data using a modiﬁed MCMC process to yield more useful poste-riors on the column densities. Instead of letting all ofthe parameters described above run free, we instead ﬁxthe source sizes, velocities, and excitation temperatureto the values reported for a similar molecule, as wasdone for the priors described earlier (for example, HC Nfor linear species and benzonitrile for cyclic species).From the resultant posterior distributions, 95th per-centile conﬁdence interval values are reported as 2 σ up-per limit column densities. An example upper limit pos-terior is shown in Fig 10 for HC N, which we do not cur-rently detect above a 4 σ signiﬁcance in the GOTHAMdata.1.1.8. Broader applicability and limitations of method

Both the MCMC ﬁtting and stacking analysis pre-sented here are predicated on the assumption that signal(i.e. coherent information content) within the windoweddata being ﬁt or stacked is dominated by species of in-terest, rather than some red noise or contribution fromcompeting species. In the context of well-calibrated sin- − S N R ( σ ) HC N −

20 0 20Velocity (km/s)0100200300 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 277.3 σ Figure 4.

Left:

Velocity-stacked spectra of HC N in black, with the corresponding stack of the simulation using the best-ﬁtparameters to the individual lines in red. The data have been uniformly sampled to a resolution of 0.02 km s − . The intensityscale is the signal-to-noise ratio of the spectrum at any given velocity. Right:

Impulse response function of the stacked spectrumusing the simulated line proﬁle as a matched ﬁlter. The intensity scale is the signal-to-noise ratio of the response function whencentered at a given velocity. The peak of the impulse response function provides a minimum signiﬁcance for the detection of277.3 σ . gle dish spectra, this can be more simply stated as arequirement of line sparsity. Analysis of interferomet-ric data with this technique is possible, but beyond thescope of this paper. The degree of line sparsity nec-essary for a given analysis will be diﬀerent for eachspecies of interest. As discussed earlier, thresholdingdata is able to prevent the most egregious interlopinglines from contaminating an analysis, but low level lineconfusion would prevent successful stacking of the thou-sands of lines necessary to detect a species such as 2-cyanonaphthalene. In contrast, the several dozen linesof HC N would be more tolerant to a low level of lineconfusion (as each individual line of interest would bebrighter in comparison to the confusing lines).Of the handful of astronomical sources that haveyielded the vast majority of new interstellar moleculardetections, TMC-1 has by far the most sparse spectra.Application of our technique to other sources, such asSgr B2(N), IRAS 16293-2422, or Orion KL, is likely notas straightforward due to their higher line density. Amore fruitful approach may be to take inspiration fromother solutions to the analogous problems of detrendingand source separation, where advancements in Bayesianmethods such as probabilistic cataloging (Portillo et al.2017) hold promise for the bulk analysis of large datasets(Siemiginowska et al. 2019).Finally, thus far we have made the assumption thatthe spectroscopic catalogs used in our spectral simula-tor are a ﬁxed input, with no error. In reality, few largespecies of astronomical interest have precise laboratoryconstraints on their spectra, and several of our newlydetected species in GOTHAM required signiﬁcant re-ﬁnement via new laboratory spectroscopic investigation (McCarthy et al. 2020; McGuire et al. 2020a,b). To bet-ter understand the sensitivity of our stacking method tospectroscopic errors, we systematically introduced in-creasing amounts of Gaussian random noise to the rota-tional constants used to generate the catalogs for ben-zonitrile, propargyl cyanide (McGuire et al. 2020a), and2-cyanonaphthalene. A plot of the fractional level ofmodiﬁcation to the rotational constants versus the frac-tional peak ﬁlter response (normalized to the peak ﬁlterresponse for the nominal catalog) is shown in Figure 5.We ﬁnd that for all three species, a modiﬁcation of tenparts per million (ppm) is suﬃcient to eﬀectively nullifythe molecular detection. A relative precision of ∼ Detection of HC N HC N has a long and colorful history in radio as-tronomy. Three radio lines were ﬁrst reported towardIRC+10216 on the basis of a rotational constant de-rived by extrapolation from those measured experimen-

Figure 5.

Fractional modiﬁcation to rotational constantsplotted versus normalized matched ﬁlter response for threespecies – benzonitrile (blue), propargyl cyanide (red), and2-cyanonaphthalene (orange). tally for shorter members in this homologous series (Bellet al. 1982; Oka 1978). Any lingering doubt of the astro-nomical identiﬁcation appeared to be put to rest withthe observation of a fourth transition toward TMC-1 in1985 (Bell & Matthews 1985). The subsequent labora-tory detection of HC N (Travers et al. 1996), however,established that its rotational lines actually lie 0.13%lower in frequency (a shift equivalent to 13 linewidths inIRC+10216 and nearly 800 linewidths in TMC-1) rela-tive to those originally reported (Bell et al. 1982; Bell& Matthews 1985). The observed lines thus could notarise from HC N. Subsequently, two new astronomicallines were detected in TMC-1 with the NRAO 43 m radiotelescope (Bell et al. 1997), both in apparent agreementwith the laboratory rest frequencies. Albeit based onslender astronomical data, the detection of HC N inspace now appeared secure, or so it seemed. In 2016,an attempt was made to verify the detection of HC Nby analyzing archival observations towards TMC-1 withthe 100 m Green Bank Telescope (GBT; Loomis et al.2016). Even with substantially deeper integrations, noevidence was found for six consecutive transitions be-tween 12.9 and 14.6 GHz. The non-detection of HC Ntoward TMC-1 was further supported by observationsthat were unable to detect two higher frequency transi-tions in a sensitive observation in K-band with the GBT(Cordiner et al. 2017).The apparent absence of HC N in TMC-1 and cor-responding column density upper limit combined witha non-linear relationship of column density with chainlength for shorter cyanopolyynes (HC x N; Bujarrabalet al. 1981; Bell et al. 1997; Ohishi & Kaifu 1998) led Loomis et al. (2016) to hypothesize that cyclizationreactions may become important once a carbon chainreaches a critical size. If correct, formation of ring iso-mers could then directly compete with linear isomers via‘bottom-up’ pathways. The detections of benzonitrile( c -C H CN), the simplest aromatic nitrile (McGuireet al. 2018), and now individual PAHs (McGuire et al.2020b), in TMC-1 suggest that cyclic chemistry is farmore widespread at these earliest stages of star forma-tion than previously thought.With conﬁdence from the aforementioned tests thatour method is able to rigorously detect not only speciesthat show individual lines, but also those which sit belowthe visible noise, we turn back to the prior mysteriousnon-detection of HC N (Loomis et al. 2016). A similarstacking and MCMC analysis was undertaken,(Loomiset al. 2016) but with signiﬁcantly less data than waspresently available in the GOTHAM observations.Unsurprisingly, we ﬁnd that none of the brightestHC N lines are individually detected in our observa-tions (Fig. 6). Fitting for HC N using priors from ourHC N ﬁt, however, we ﬁnd column density posteriorsthat are consistent with a detection of HC N (Fig. 11and Table 1). We visualize the signiﬁcance of these pos-teriors through the same line stacking and matched ﬁlteranalysis. The line stack shown in Fig. 7 displays a ten-tative but encouraging 3.8 σ signal, and with a matchedﬁlter applied, the signal increases to a 5.0 σ detection(Fig. 7).The column density constraints from this analysisof HC N yield a total column density of 7 . +21 . − . × cm − . Three of the velocity components showwell constrained column densities, while the fourth com-ponent column density can be best viewed as an up-per limit. The total column density value is not di-rectly comparable, however, with the 2 σ upper limit of9 . × cm − from Loomis et al. (2016), as that anal-ysis did not constrain the HC N source size, insteadassuming a much larger ﬁxed source size of 6.0’ × N). As seen in Fig. 11, col-umn density is highly covariant with our derived sourcesize, and the largest contribution to the total HC N col-umn density comes from the fourth velocity componentwith a source size of ∼ (cid:48)(cid:48) . With the brightest HC Nlines originating in X-band, where the GBT beam sizeis ∼ ∼ N column density would be roughly1 . +3 . − . × cm − , entirely consistent with the upperlimit presented there of 9 . × cm − . − − )050 T A * ( m K ) Figure 6.

Individual line observations of HC N in the GOTHAM data. The spectra (black) are displayed in velocity spacerelative to 5.8 km s − , and using the rest frequency given in the top right of each panel. Quantum numbers are given in the topleft of each panel, neglecting hyperﬁne splitting. The best-ﬁt model to the data, including all velocity components, is overlaidin green. Simulated spectra of the individual velocity components are shown in: blue (5.63 km s − ), gold (5.79 km s − ), red(5.91 km s − ), and violet (6.03 km s − ). Discussion

Cyanopolyyne column densities

The prior analysis of relative cyanopolyyne columndensities synthesized both GBT observations reported inthat paper as well as previous literature values (Loomiset al. 2016). In all cases, an assumption was made thatemission ﬁlled the beam, and the individual velocitycomponents were not considered.These assumptions are reasonable for the smallercyanopolyynes – we ﬁnd that ‘co-spatial’ ﬁts to HC Nand HC N signiﬁcantly better replicate the observedline proﬁles. Although the more optically thin largercyanopolyyne species such as HC N and HC N are wellﬁt by a ‘separate components’ ﬁt, the varying sourcesizes in these ﬁts make it very diﬃcult to compare col-umn densities across the two diﬀerent ﬁtting methods.For the purposes of this comparison, we have there-fore additionally ﬁt all cyanopolyyne species with a ‘co-spatial’ method, with results presented in detail in theSupplementary Materials. The ‘separate components’and ‘co-spatial’ results for the larger species are verysimilar, as would be expected for optically thin species.Further discussion of the relative source sizes and distri- butions of the cyanopolyynes and the eﬀect on their ﬁtsis presented below.Using the column densities derived from the ﬁts pre-sented in the Supplementary Materials, an updated ver-sion of Figure 5 from Loomis et al. (2016) is shown inFig. 8, along with a comparison to predictions from achemical model, discussed in more detail in the Supple-mentary Materials. The general qualitative trend notedin that work is maintained, with a log-linear trend atsmaller sizes, and a sharp decline at HC N.1.3.2.

Spatial variations in cyanopolyyne chemistry

Previous spatially resolved observations of HC N,HC N, and HC N toward TMC-1 have shown themto be spatially extended on scales large enough to ﬁllthe GBT beam at the frequencies probed by GOTHAM(Toelle et al. 1981; Churchwell et al. 1978; Olano et al.1988). These observations were all taken at relativelycoarse spatial resolution, however, and the detailed dis-tribution of these species is unknown, as is the distribu-tion of larger cyanopolyynes such as HC N. In particu-lar, observations of cyanopolyynes at both high spectraland spatial resolution do not exist to date, making itdiﬃcult to spatially disentangle the four known velocitycomponents in TMC-1.0

Table 1. HC N best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +9 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . +7 − . +3 . − . C3 5 . +0 . − . +18 − . +0 . − . C4 6 . +0 . − . +10 − . +16 . − . N T (Total) †† . +21 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertaintiesof the individual components in quadrature.

Several pieces of evidence suggest that our two limit-ing sets of assumptions in the currently presented anal-ysis of cyanopolyynes are insuﬃcient, but also providesome hints at the true cyanopolyyne distribution. First,we note that ‘separate component’ ﬁts for HC N andHC N yield line proﬁles which poorly represent the data,while the ‘co-spatial’ ﬁts shown in the SupplementaryMaterials provide reasonable ﬁts to the observationalline proﬁles. This suggests that the velocity componentsare suﬃciently co-spatial that when source sizes arelarge, they overlap signiﬁcantly along the line of sight.Second, we ﬁnd that for both the ‘co-spatial’ and ‘sep-arate component’ ﬁts, the source size(s) decrease withcyanopolyyne size as previously noted (Bell et al. 1998),possibly suggesting spatially segregated chemical evolu-tion within the source. Finally, for more optically thinspecies such as HC N, ‘separate component’ ﬁts yieldwidely varying source sizes for the components. Thissuggests that the source components are not purely co-spatial, and likely have some scatter within the beam.Our beam dilution and source-size ﬁtting analysis islimited by both the sensitivity of our observations andthe assumption that each source is centrally locatedwithin the beam. It is possible that the larger specieshave a broader distribution that is not well probedby our observations, due to sensitivity limitations. Ifthe spatio-kinematic structure of the cyanopolyynes isshared by other species, it may be possible to use a sin-gle set of interferometric observations as a template tounlock the GOTHAM observations, enabling more com-plicated ﬁtting and thus better characterization of thetrue column density spatial distribution.In conclusion, we have presented a new method for ro-bustly characterizing and visualizing detections of new interstellar species in line sparse sources, even when in-dividual lines of the species are not detected. These re-sults of applying this method to the GOTHAM datasethave resulted in a total of six new interstellar specieshave been detected in TMC-1 (McGuire et al. 2020a,b;McCarthy et al. 2020; Burkhardt et al. 2020; Xue et al.2020). In particular, we have detected HC N in TMC-1and derived a column density consistent with the previ-ous upper limit presented in Loomis et al. (2016). CONTRIBUTIONSR.A.L. wrote the manuscript and developed theMCMC and spectral stacking analysis code describedhere. M.C.M. and K.L.K.L. performed the laboratoryexperiments and theoretical calculations for several ofthe catalogs used in this analysis, and helped revisethe manuscript. A.M.B and B.A.M. performed the as-tronomical observations and subsequent data reduction.E. H. determined and/or estimated rate coeﬃcients andis the originator for many of the chemical simulations.A.M.B. and C.N.S. contributed or undertook the astro-nomical modeling and simulations. E.R.W., M.A.C. andB.A.M. contributed to the design of the GOTHAM sur-vey, and helped revise the manuscript. C.X. modiﬁedand contributed the chemical networks of the relatedspecies, and helped revise the manuscript. C.X. andA.J.R. performed the astronomical observations. CODE STATEMENTAll the codes used in the MCMC ﬁttingand stacking analysis presented in this pa-per are open source and publicly available athttps://github.com/ryanaloomis/TMC1 mcmc ﬁtting. COMPETING INTERESTS STATEMENT1 − S N R ( σ ) HC N −

20 0 20Velocity (km/s)0510 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 5.0 σ Figure 7.

Left:

Velocity-stacked spectra of HC N in black,with the corresponding stack of the simulation using thebest-ﬁt parameters to the individual lines in red. The datahave been uniformly sampled to a resolution of 0.02 km s − .The intensity scale is the signal-to-noise ratio of the spectrumat any given velocity. Right:

Impulse response function ofthe stacked HC N spectrum using the simulated line proﬁleas a matched ﬁlter. The intensity scale is the signal-to-noiseratio of the response function when centered at a given ve-locity. The peak of the impulse response function provides aminimum signiﬁcance for the detection of 5.0 σ . The authors declare no competing interests. DATA STATEMENTThe datasets analyzed during the current studyare available in the Green Bank Telescope archive (https://archive.nrao.edu/archive/advquery.jsp),PI: B. McGuire. A user manual for theirreduction and analysis is available as well(https://greenbankobservatory.org/science/gbt-observers/visitor-facilities-policies/data-reduction-gbt-using-idl/). The complete, reduced survey data atX-band is available as Supplementary Information inMcGuire et al. (2020a). The individual portions ofreduced spectra used in the analysis of the individ-ual species presented here is available in the HarvardDataverse Archive GOTHAM Collaboration (2020). ACKNOWLEDGEMENTSThe authors thank the anonymous reviewer for theirhelpful comments. A.M.B. acknowledges support fromthe Smithsonian Institution as a Submillimeter Array(SMA) Fellow. M.C.M. and K.L.K. Lee acknowledgesupport from NSF grant AST-1615847 and NASA grant80NSSC18K0396. Support for B.A.M. during the initialportions of this work was provided by NASA throughHubble Fellowship grant

NAUTILUS v1.1 code. C.X.is a Grote Reber Fellow, and support for this workwas provided by the NSF through the Grote Reber Fel-lowship Program administered by Associated Universi-ties, Inc./National Radio Astronomy Observatory andthe Virginia Space Grant Consortium. E.H. thanks theNational Science Foundation for support through grantAST 1906489. S.B.C. and M.A.C. were supported bythe NASA Astrobiology Institute through the GoddardCenter for Astrobiology. The National Radio Astron-omy Observatory is a facility of the National ScienceFoundation operated under cooperative agreement byAssociated Universities, Inc. The Green Bank Observa-tory is a facility of the National Science Foundation op-erated under cooperative agreement by Associated Uni-versities, Inc.REFERENCES

Agndez, M., & Wakelam, V. 2013, Chemical Reviews, 113,8710. http://pubs.acs.org/doi/abs/10.1021/cr4001176Bell, M. B., Feldman, P. A., Kwok, S., & Matthews, H. E.1982, Nature, 295, 389Bell, M. B., Feldman, P. A., Travers, M. J., et al. 1997,ApJL, 483, L61 Bell, M. B., & Matthews, H. E. 1985, AstrophysicalJournals Letters, 291, L63Bell, M. B., Watson, J. K. G., Feldman, P. A., & Travers,M. J. 1998, ApJ, 508, 286Bujarrabal, V., Guelin, M., Morris, M., & Thaddeus, P.1981, A&A, 99, 239 Time (yr)10 -16 -14 -12 -10 -8 n ( x ) / n ( H ) Time (yr)10 -16 -14 -12 -10 -8 n ( x ) / n ( H ) N ( x ) c m - HC NHC NHC NHC NHC N Figure 8.

Left:

Cyanopolyyne total column densities from ﬁts using the ‘co-spatial’ approximation (see SupplementaryMaterials) are plotted against carbon-chain size, as in (Loomis et al. 2016). Points in blue are taken directly from our ﬁts,where the source size of each component was taken into account. Points in red have been adjusted back to the value which onewould calculate under the assumption that the source ﬁlls the beam. These red points are directly comparable to those shownin Figure 5 in Loomis et al. (2016).

Right:

Calculated abundances (solid lines), abundances from the co-spatial MCMC analysis(dotted lines), and best-ﬁt times (dots) for the cyanopolyynes HC n N, n ∈ [3 , , , , N and HC N are shown by the green and blue bars, respectively. Equivalent columndensities assuming N ( H ) = 10 cm − (Gratier et al. 2016; Fuente et al. 2019) are shown on the right axis. Note: error barsfor HC N - HC N are not visible at the scale used, but can be found in the tables in the Supplementary Materials.Burkhardt, A. M., Loomis, R. A., Shingledecker, C. N.,et al. 2020, Nature Astronomy, in revisionChurchwell, E., Winnewisser, G., & Walmsley, C. M. 1978,A&A, 67, 139Cordiner, M. A., Charnley, S. B., Kisiel, Z., McGuire, B. A.,& Kuan, Y.-J. 2017, The Astrophysical Journal, 850, 187Crabtree, K. N., Martin-Drumel, M.-A., Brown, G. G.,et al. 2016, The Journal of Chemical Physics, 144, 124201Czekala, I. 2016, DiskJockey: Protoplanetary disk modelingfor dynamical mass derivation, , , ascl:1603.011Dobashi, K., Shimoikura, T., Nakamura, F., et al. 2018,ApJ, 864, 82Dobashi, K., Shimoikura, T., Ochiai, T., et al. 2019, ApJ,879, 88Drouin, B. J. 2017, Journal of Molecular Spectroscopy, 340,1Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman,J. 2013, PASP, 125, 306Fuente, A., Navarro, D. G., Caselli, P., et al. 2019, A&A,624, A105Gelman, A., & Rubin, D. B. 1992, Statistical Science, 7, 457GOTHAM Collaboration. 2020, Spectral Stacking Data forPhase 1 Science Release of GOTHAM, v4.0, HarvardDataverse, doi:10.7910/DVN/PG7BHO.https://doi.org/10.7910/DVN/PG7BHOGratier, P., Majumdar, L., Ohishi, M., et al. 2016, ApJS,225, 25Herbst, E. 1981, Nature, 289, 656 Hincelin, U., Wakelam, V., Hersant, F., et al. 2011, A&A,530, A61Jerosimi, S. V., Wester, R., & Gianturco, F. A. 2019,Physical Chemistry Chemical Physics, 21, 11405.https://pubs.rsc.org/en/content/articlelanding/2019/cp/c9cp00877bLangston, G., & Turner, B. 2007, ApJ, 658, 455Liu, S.-Y., Mehringer, D. M., & Snyder, L. E. 2001, ApJ,552, 654Loomis, R. A., ¨Oberg, K. I., Andrews, S. M., et al. 2018,AJ, 155, 182Loomis, R. A., Shingledecker, C. N., Langston, G., et al.2016, MNRAS, 463, 4175Loomis, R. A., ¨Oberg, K. I., Andrews, S. M., et al. 2020,ApJ, 893, 101Mangum, J. G., & Shirley, Y. L. 2015, PASP, 127, 266McCarthy, M. C., Lee, K. L. K., Loomis, R. A., et al. 2020,Nature Astronomy, accepted.McGuire, B. A. 2018, ApJS, 239, 17McGuire, B. A., Burkhardt, A. M., Kalenskii, S., et al.2018, Science, 359, 202McGuire, B. A., Burkhardt, A. M., Loomis, R. A., et al.2020a, Astrophysical Journal Letters, 900, L10.McGuire, B. A., Loomis, R. A., Burkhardt, A. M., et al.2020b, Science, in revision.North, D. O. 1963, Proceedings of the IEEE, 51, 1016Ohishi, M., & Kaifu, N. 1998, Faraday Discussions, 109, 205Oka, T. 1978, Journal of Molecular Spectroscopy, 72, 172 EXTENDED DATA

Figure 9.

Parameter covariances and marginalized posterior distributions for the HC N MCMC ﬁt. 16 th , 50 th , and 84 th conﬁdence intervals (corresponding to ± Table 2. HC N best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +3 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . +2 − . +1 . − . C3 5 . +0 . − . +7 − . +0 . − . C4 6 . +0 . − . +2 − . +1 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertaintiesof the individual components in quadrature.

Figure 10.

Parameter covariances and marginalized posterior distributions for the HC N MCMC ﬁt. The 97.8 th conﬁdenceinterval (corresponding to 2 sigmas for a Gaussian posterior distribution) is shown as a vertical line. Figure 11.

Parameter covariances and marginalized posterior distributions for the HC N MCMC ﬁt. 16 th , 50 th , and 84 th conﬁdence intervals (corresponding to ± SUPPLEMENTARY INFORMATION8.1.

Demonstration of robustness

When stacking many lines across a wide bandwidth, there is a clear risk that signal from an interloping line ofa diﬀerent species may bleed into the analysis. Here, we demonstrate that in applying our technique to GOTHAMobservations of TMC-1, this risk can be mitigated, and our detections are robust.8.1.1.

Application to benzonitrile

The presence of benzonitrile in TMC-1 has been further veriﬁed by our GOTHAM observations since its initialdetection (McGuire et al. 2018). Multiple individual lines are detected in our data, and the parameters from ourMCMC ﬁt (Table 3 and Fig. 12) can be seen to well-replicate the observations (Fig. 13), motivating our use ofbenzonitrile as a template for analysis of all aromatic species in the GOTHAM data.

Table 3.

Benzonitrile best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +164 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . +20 − . +0 . − . C3 5 . +0 . − . +98 − . +0 . − . C4 6 . +0 . − . +101 − . +0 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertaintiesof the individual components in quadrature.

These lines were then stacked and the model stack was applied to the data as a matched ﬁlter. The resultant stacksand matched ﬁlter response are shown in Fig. 14 and 15. The line detection signiﬁcance is dramatically improved,from ∼ σ in the individual lines, to 21.8 σ in the line stack, and then 39.0 σ in the matched ﬁlter response. Thevelocity structure shown in the individual lines in Fig. 13 is also more clearly seen in the stack in Fig. 14.8.1.2. Application to 2-cyanonaphthalene

Beyond the example shown for benzonitrile, where individual lines are detected above the noise, it is also possibleto apply this method to species where individual lines are not detected. The partition function of 2-cyanonaphthaleneis such that, even at the low temperatures of TMC-1 ( ∼

10 K), thousands of lines within the GOTHAM frequencycoverage have similar emission intensities, and are all quite weak. As shown in Fig. 16, none of the 36 strongest lines(based on the initial simulation parameters described in the main text) are detected in the GOTHAM data. Despitethis, a global MCMC analysis using priors from our benzonitrile ﬁt (Table 3) is able to provide useful constraints, andposterior distributions are shown in Fig. 17. The weighted stack of all observed lines is shown in Fig. 18, with a cleardetection now present at 6.9 σ , including visible velocity structure. Applying the model stack as a matched ﬁlter yieldsa 15.4 σ detection signiﬁcance (Fig. 19). Further discussion and analysis of this molecular detection is presented inMcGuire et al. (2020b). 8.1.3. Robustness to false positives

A molecular detection method must not only detect signal that is present in data, but it must also not yield falsepositives. Stacking techniques in particular are susceptible to ‘bleed-in’ from interloping lines when line density is toohigh. In our technique, we ﬁrst take a thresholding step when searching for a weak species (where no individual lines8

Figure 12.

Parameter covariances and marginalized posterior distributions for the benzonitrile MCMC ﬁt. 16 th , 50 th , and84 th conﬁdence intervals (corresponding to ± are expected to be present above the noise), discarding any windows that contain signal above 6 σ , as these windowslikely contain an interloping line.Although this method is likely to remove the most egregious interloping lines, there very well may be many linessitting below the noise that could cause false positives. Indeed, we know that there is signal below the visible noiselevel, as there are thousands of weak lines that we intend to stack for species such as 2-cyanonaphthalene. Here, it isthe extremely narrow linewidth of TMC-1 and the corresponding line sparsity discussed earlier that reduce the risk9 Figure 13.

Individual line detections of benzonitrile in the GOTHAM data. The spectra (black) are displayed in velocityspace relative to 5.8 km s − , and using the rest frequency given in the top right of each panel. Quantum numbers are given inthe top left of each panel, neglecting hyperﬁne splitting. The best-ﬁt model to the data, including all velocity components, isoverlaid in green. Simulated spectra of the individual velocity components are shown in: blue (5.63 km s − ), gold (5.79 km s − ),red (5.91 km s − ), and violet (6.03 km s − ). See Table 3. − S N R ( σ ) C H CN Figure 14.

Velocity-stacked spectra of benzonitrile in black, with the corresponding stack of the simulation using the best-ﬁtparameters to the individual lines in red. The data have been uniformly sampled to a resolution of 0.02 km s − . The intensityscale is the signal-to-noise ratio of the spectrum at any given velocity. of collisions. This is demonstrated by how sensitive our ﬁlter responses are to even minuscule ( ∼ ppm) changes torotational constants.A common method to further illustrate that we are uncovering emission from the species of interested, and not froman unexpected contaminating line or set of lines, is to jack-knife the data. For a species such as 2-cyanonaphthalene,there are thousands of lines of similar strength, and by splitting the set of analyzed lines in half we can examinewhether an outsized portion of the signal is coming from one half of the data. If the signal is real and our noise is0 −

20 0 20Velocity (km/s)02040 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 39.0 σ Figure 15.

Impulse response function of the stacked benzonitrile spectrum using the simulated line proﬁle as a matched ﬁlter.The intensity scale is the signal-to-noise ratio of the response function when centered at a given velocity. The peak of theimpulse response function provides a minimum signiﬁcance for the detection of 39.0 σ . − − )050 T A * ( m K ) Figure 16.

Individual line observations of 2-cyanonaphthalene in the GOTHAM data. The spectra (black) are displayed invelocity space relative to 5.8 km s − , and using the rest frequency given in the top right of each panel. Quantum numbersare given in the top left of each panel, neglecting hyperﬁne splitting. The best-ﬁt model to the data, including all velocitycomponents, is overlaid in green. Simulated spectra of the individual velocity components are shown in: blue (5.63 km s − ),gold (5.79 km s − ), red (5.91 km s − ), and violet (6.03 km s − ). mainly white, we would expect each half of the data to have roughly √ × the ﬁlter response of the full dataset. In1 Figure 17.

Parameter covariances and marginalized posterior distributions for the 2-cyanonaphthalene MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± contrast, if an interloping line was dominating the response, we would expect the two halves to have very unbalancedresponses, with one consistent with pure noise and the other close to the original ﬁlter response.A jack-kniﬁng test on 2-cyanonaphthalene is shown in Figure 20. The set of windowed 2-cyanonaphthalene lineswere split by taking every other line (treating sets of hyperﬁne components as a single line for this purpose). We ﬁndthat the signal is quite evenly split over the two datasets, illustrating that the ﬁlter response is not a false positive,and does not arise from interloping lines.8.2. Co-spatial cyanopolyyne ﬁtting results − S N R ( σ ) Figure 18.

Velocity-stacked spectra of 2-cyanonaphthalene in black, with the corresponding stack of the simulation using thebest-ﬁt parameters to the individual lines in red. The data have been uniformly sampled to a resolution of 0.02 km s − . Theintensity scale is the signal-to-noise ratio of the spectrum at any given velocity. −

20 0 20Velocity (km/s)01020 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 15.4 σ Figure 19.

Impulse response function of the stacked 2-cyanonaphthalene spectrum using the simulated line proﬁle as a matchedﬁlter. The intensity scale is the signal-to-noise ratio of the response function when centered at a given velocity. The peak of theimpulse response function provides a minimum signiﬁcance for the detection of 15.4 σ . HC N The best-ﬁt parameters for the MCMC analysis of HC N, under the ‘co-spatial’ approximation, are given in Table 4.The individual detected lines are shown in Figure 21, while the stacked spectrum and matched ﬁlter results are shownin Figure 22. A corner plot of the parameter covariances for the HC N MCMC ﬁt is shown in Figure 23.3 − S N R ( σ ) − S N R ( σ ) −

20 0 20Velocity (km/s)0510 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 10.1 σ −

20 0 20Velocity (km/s)0510 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 9.7 σ Figure 20.

Upper left:

Velocity-stacked spectra of 2-cyanonaphthalene in black, with the corresponding stack of the simulationusing the best-ﬁt parameters to the individual lines in red. Every other 2-cyanonaphthalene was used for this stack (split 1),and the data have been uniformly sampled to a resolution of 0.02 km s − . The intensity scale is the signal-to-noise ratio ofthe spectrum at any given velocity. Upper right:

Same as left, but for the other half of lines (split 2).

Lower left:

Impulseresponse function of the stacked spectrum (split 1) using the simulated line proﬁle as a matched ﬁlter. The intensity scale isthe signal-to-noise ratio of the response function when centered at a given velocity. The peak of the impulse response functionprovides a minimum signiﬁcance for the detection of 10.3 σ . Lower right: same as lower left, but for the other half of the lines(split 2). The peak of the impulse response function provides a minimum signiﬁcance for the detection of 9.8 σ . Table 4. HC N ‘co-spatial’ best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +13 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . . +0 . − . C3 5 . +0 . − . . +0 . − . C4 6 . +0 . − . . +0 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertainties of the individual components in quadrature. − . . . − )02 T A * ( K ) − − − − . . . − )0 . . . T A * ( K ) − ∗ − ∗ − Figure 21.

20 0 20Velocity (km/s)02000 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 2940.2 σ Figure 22.

Left:

Parameter covariances and marginalized posterior distributions for the ‘co-spatial’ HC N MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± HC N The best-ﬁt parameters for the MCMC analysis of HC N, under the ‘co-spatial’ approximation, are given in Table 5.The individual detected lines are shown in Figure 24, while the stacked spectrum and matched ﬁlter results are shownin Figure 25. A corner plot of the parameter covariances for the HC N MCMC ﬁt is shown in Figure 26.

Table 5. HC N ‘co-spatial’ best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +8 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . . +0 . − . C3 5 . +0 . − . . +0 . − . C4 6 . +0 . − . . +0 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertainties of the individual components in quadrature. − − )0100 T A * ( m K ) − − − )0200400 T A * ( m K ) − − − )05001000 T A * ( m K ) − − − − )050100 T A * ( m K ) − − − )050100 T A * ( m K ) − − − )0 . . . T A * ( K ) − − − )012 T A * ( K ) − − − − )050100 T A * ( m K ) − − − )0 . . . T A * ( K ) − − − )0 . . . T A * ( K ) − − − )0 . . . T A * ( K ) − − − )0 . . . T A * ( K ) −

10 29289.2 MHz

Figure 24.

100 0 100Velocity (km/s)0100020003000 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 2733.1 σ Figure 25.

Left:

Parameter covariances and marginalized posterior distributions for the ‘co-spatial’ HC N MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± HC N The best-ﬁt parameters for the MCMC analysis of HC N, under the ‘co-spatial’ approximation, are given in Table 6.The individual detected lines are shown in Figure 27, while the stacked spectrum and matched ﬁlter results are shownin Figure 28. A corner plot of the parameter covariances for the HC N MCMC ﬁt is shown in Figure 29.

Table 6. HC N ‘co-spatial’ best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +2 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . . +0 . − . C3 5 . +0 . − . . +0 . − . C4 6 . +0 . − . . +0 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertainties of the individual components in quadrature. − − − −

16 19176.0 MHz18 −

17 20304.0 MHz 20 −

19 22559.9 MHz 21 −

20 23687.9 MHz 22 −

21 24815.9 MHz − − )012 T A * ( K ) −

22 25943.9 MHz 24 −

23 27071.8 MHz 25 −

24 28199.8 MHz 26 −

25 29327.8 MHz

Figure 27.

20 0 20Velocity (km/s)010002000 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 2059.4 σ Figure 28.

Left:

Parameter covariances and marginalized posterior distributions for the ‘co-spatial’ HC N MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± HC N The best-ﬁt parameters for the MCMC analysis of HC N, under the ‘co-spatial’ approximation, are given in Table 7.The individual detected lines are shown in Figure 30, while the stacked spectrum and matched ﬁlter results are shownin Figure 31. A corner plot of the parameter covariances for the HC N MCMC ﬁt is shown in Figure 32.

Table 7. HC N ‘co-spatial’ best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +3 − . +0 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . . +1 . − . C3 5 . +0 . − . . +0 . − . C4 6 . +0 . − . . +0 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertainties of the individual components in quadrature. −

13 8134.5 MHz 15 −

14 8715.5 MHz 16 −

15 9296.6 MHz 17 −

16 9877.6 MHz 18 −

17 10458.6 MHz 19 −

18 11039.7 MHz33 −

32 19174.1 MHz 35 −

34 20336.1 MHz 38 −

37 22079.2 MHz 39 −

38 22660.2 MHz 40 −

39 23241.2 MHz 41 −

40 23822.3 MHz42 −

41 24403.3 MHz 43 −

42 24984.3 MHz 44 −

43 25565.3 MHz 45 −

44 26146.3 MHz 46 −

45 26727.4 MHz 47 −

46 27308.4 MHz − − )0200 T A * ( m K ) −

47 27889.4 MHz 49 −

48 28470.4 MHz 50 −

49 29051.4 MHz 51 −

50 29632.4 MHz

Figure 30.

20 0 20Velocity (km/s)0100200300 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 269.1 σ Figure 31.

Left:

Parameter covariances and marginalized posterior distributions for the ‘co-spatial’ HC N MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± HC N The best-ﬁt parameters for the MCMC analysis of HC N, under the ‘co-spatial’ approximation, are given in Table 8.The individual observed lines are shown in Figure 33, while the stacked spectrum and matched ﬁlter results are shownin Figure 34. A corner plot of the parameter covariances for the HC N MCMC ﬁt is shown in Figure 35.

Table 8. HC N ‘co-spatial’ best-ﬁt parameters from MCMC analysisComponent v lsr Size N † T T ex ∆ V (km s − ) ( (cid:48)(cid:48) ) (10 cm − ) (K) (km s − )C1 5 . +0 . − . +4 − . +2 . − . . +0 . − . . +0 . − . C2 5 . +0 . − . . +2 . − . C3 5 . +0 . − . . +2 . − . C4 6 . +0 . − . . +2 . − . N T (Total) †† . +0 . − . × cm − Note – The quoted uncertainties represent the 16 th and 84 th percentile (1 σ for a Gaussian distribution)uncertainties. † Column density values are highly covariant with the derived source sizes. The marginalized uncertaintieson the column densities are therefore dominated by the largely unconstrained nature of the source sizes,and not by the signal-to-noise of the observations. †† Uncertainties derived by adding the uncertainties of the individual components in quadrature.

Chemical modeling of cyanopolyynes in TMC-1

Astrochemical models were run in order to better understand how similar the chemistry of HC N is to the smallercyanopolyynes. The

NAUTILUS -v1.1 code (Ruaud et al. 2016) was used along with a version of the KIDA 2014 network(Wakelam et al. 2015), modiﬁed as in Shingledecker et al. (2018); Xue et al. (2020) and including the HC N reactionspreviously described in Loomis et al. (2016), and available as supporting material for that work. The HC N reactionsadded in that work were based on those of HC N included in the KIDA 2012 network (Wakelam et al. 2012). Forexample, consider the following reaction: C H + CN −−→ H + HC N (3)in the network of Loomis et al. (2016), the corresponding formation route is included for HC N:C H + CN −−→ H + HC N (4)where the above reaction utilizes the same rate coeﬃcient parameters and branching fractions as in reaction (3).As is currently the case with, e.g. HC N, the role of grain-chemistry for HC N is extremely limited. Adsorptiononto grains serves mainly as a destruction pathway for the gas-phase molecule, after which it can be photodissociatedvia secondary photons into C H and C N with the same rate coeﬃcient as the analogous HC N process. There is oneformation route for HC N involving the barrierless association of H and C N that is of negligible importance on theoverall gas-phase abundances.For physical conditions, typical TMC-1 values were used, including T gas = T dust = 10 K, a gas density of 10 cm − ,and a standard cosmic ray ionization rate of 1 . × − s − . Initial elemental abundances were, with the exception ofoxygen, taken from Hincelin et al. (2011). To obtain our initial oxygen abundances, we ran simulations using a range ofvalues using x (O) ∈ [1 . × − , . × − ]. From this analysis, we found that the best agreement between calculatedcyanopolyyne abundances and our observational results was obtained using x (O) t =0 ≈ . × − , which implies aslightly carbon rich C/O ∼ .

1. In Fig. 36, model results are shown in which an initial C/O=0.7 was used, a moretypical value in simulations of TMC-1 (Hincelin et al. 2011). From a comparison of that ﬁgure with Fig. 8, one can seethat the resulting agreement is signiﬁcantly poorer. A detailed comparison between observations and our calculated6abundances, as in Agndez & Wakelam (2013), using our carbon-rich C/O=1.1 is beyond the scope of this work andwill be explored in more detail in a subsequent study. Nevertheless, a comparison between our observationally derived − − )050 T A * ( m K ) Figure 33.

20 0 20Velocity (km/s)0510 I m pu l s e R e s p o n s e ( σ ) Peak Impulse Response: 5.0 σ Figure 34.

Left:

Parameter covariances and marginalized posterior distributions for the ‘co-spatial’ HC N MCMC ﬁt. 16 th , 50 th ,and 84 th conﬁdence intervals (corresponding to ± abundance for propargyl cyanide (HCCCH CN), another species whose ﬁrst detection we report in a companion work(McGuire et al. 2020a), and calculated abundances using the same model were found to be in excellent agreement.The results of our simulations are shown in Fig. 8. There, one can see that the calculated abundances of all species,with the exception of HC N, match the derived co-spatial MCMC results to within a factor of a few at a modeltime of ∼ × yr. In our network, the chemistry of HC N largely follows that of the smaller cyanopolyynes,8 Time (yr)10 -16 -14 -12 -10 -8 n ( x ) / n ( H ) Time (yr)10 -16 -14 -12 -10 -8 n ( x ) / n ( H ) N ( x ) c m - HC NHC NHC NHC NHC N Figure 36.

Calculated abundances (solid lines), abundances from the ‘co-spatial’ MCMC analysis (dotted lines), and best-ﬁt times (dots) for the cyanopolyynes HC n N, n ∈ [3 , , , ,

11] using C/O=1.1 (solid curves) and C/O=0.7 (dashed curves).Abundance ranges from the ‘separate components’ MCMC analysis for HC N and HC N are shown by the green and blue bars,respectively. Equivalent column densities assuming N (H ) = 10 cm − are shown on the right axis. where the dominant formation routes for species of the general formula HC n N ( n ∈ [3 , , , , C n N + + e − → H + HC n N (5)H C n N + + e − → H + HC n N (6)and the neutral-neutral reaction H C n − + CN → H + HC n N . (7)Here, we ﬁnd the dissociative recombination routes to dominate at times before ∼ yr, with the neutral-neutralroute becoming the major production pathway thereafter. At all simulation times, destruction occurs mainly viareaction with carbon atoms or ions such as C + , H +3 , and HCO + . We note that the sharp decrease in all cyanopolyyneabundances observable in Fig. 8 around ∼ N abundances are within the errors of our ‘separate components’ MCMC ﬁt results, amore self-consistent comparison with the co-spatial MCMC abundances reveals that HC N is still overproduced rela-tive to the smaller cyanopolyynes. This ﬁnding suggests that the assumptions we made previously in constructing ourHC N network (Loomis et al. 2016), i.e. that its chemistry largely follows that of HC N and the other cyanopolyynes,is somehow ﬂawed. One possibility is that the proposed reactions (5), (6), and (7), in addition to producing HC N fromH C N + , H C N + , and H C , respectively, may have other eﬃcient product channels not included in our network,including perhaps the production of cyclic molecules. Additionally, recent studies have highlighted the importanceof destruction pathways in understanding the abundances of interstellar molecules (Shingledecker et al. 2019, 2020).For example, it was found by Shingledecker et al. (2019) that propadienone exhibited a unique reactivity with atomichydrogen. If HC N can similarly react eﬃciently with H, this could help explain the sharp drop in its abundancerelative to the other cyanopolyynes. Another possibility suggested initially by Herbst (1981) and later conﬁrmed byJerosimi et al. (2019) is that the longer the cyanopolyyne, the more stable the anion formed via electron association.The products of these associations, such as HC N – , lose their linear structures and are much more reactive thantheir neutral counterparts. Either of these hypothetical destruction pathways might serve as a chemical link betweenlinear carbon chain species and the cyclic aromatic molecules also observed in TMC-1 (McGuire et al. 2018; McGuireet al. 2020b,b), providing an explanation for our chemical model’s overproduction of HC N and underproduction ofthe newly detected aromatic species (Burkhardt et al. 2020; McCarthy et al. 2020; McGuire et al. 2020b).98.4.

Lines Used in MCMC Fits

Table 9 shows the total number of transitions (including hyperﬁne components) of the molecules analyzed or discussedin this paper that were covered by GOTHAM observations at the time of analysis and were above our predicted ﬂuxthreshold of 5%, as discussed earlier. Also included are the number of transitions, if any, that were coincident withinterfering transitions of other species, and the total number of lines used after excluding interlopers. Observationaldata windowed around these transitions, spectroscopic properties of each transition, and the partition function usedin the MCMC analysis are provided in the Harvard Dataverse repository (GOTHAM Collaboration 2020).

Table 9.

Total number of transitions of a given species within the range of the GOTHAM data, number of interfering lines,and total number included in MCMC ﬁt. Transitions Covered Interfering Lines Total TransitionsMolecule By GOTHAM In Data Used in MCMCHC N 6 0 6HC N 16 0 16HC N 36 0 36HC N 66 0 66HC N 19 0 19HC13