[PDF] SWARM: A 32 GHz Correlator and VLBI Beamformer for the Submillimeter Array

Abstract

A 32 GHz bandwidth VLBI capable correlator and phased array has been designed and deployed at the Smithsonian Astrophysical Observatory's Submillimeter Array (SMA). The SMA Wideband Astronomical ROACH2 Machine (SWARM) integrates two instruments: a correlator with 140 kHz spectral resolution across its full 32 GHz band, used for connected interferometric observations, and a phased array summer used when the SMA participates as a station in the Event Horizon Telescope (EHT) Very Long Baseline Interferometry (VLBI) array. For each SWARM quadrant, Reconfigurable Open Architecture Computing Hardware (ROACH2) units shared under open source from the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER) are equipped with a pair of ultra-fast Analog-to- Digital Converters (ADCs), a Field Programmable Gate Array (FPGA) processor, and eight 10 Gigabit Ethernet ports. A VLBI data recorder interface designated the SWARM Digital Back End, or SDBE, is implemented with a ninth ROACH2 per quadrant, feeding four Mark6 VLBI recorders with an aggregate recording rate of 64 Gbps. This paper describes the design and implementation of SWARM, as well as its deployment at SMA with reference to verification and science data.

Full PDF

NNovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the Submillimeter Array

Rurik A. Primiani † , Kenneth H. Young † , Andr´e Young † , Nimesh Patel † , Robert W. Wilson † , Laura Vertatschitsch § , Billie B.Chitwood (cid:91) , Ranjani Srinivasan ¶ , David MacMahon ‡ and Jonathan Weintroub †† Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138, USA, [email protected] ‡ University of California Berkeley, Berkeley, CA 94720, USA § Systems & Technology Research, Woburn, MA 01801, USA (cid:91)

Smithsonian Astrophysical Observatory, Submillimeter Array, Hilo, HI 96720, USA ¶ Academia Sinica Institute of Astronomy and Astrophysics, Submillimeter Array, Hilo, HI 96720, USA

Received (to be inserted by publisher); Revised (to be inserted by publisher); Accepted (to be inserted by publisher);A 32 GHz bandwidth VLBI capable correlator and phased array has been designed and deployed at the Smith-sonian Astrophysical Observatory’s Submillimeter Array (SMA). The SMA Wideband Astronomical ROACH2Machine (SWARM) integrates two instruments: a correlator with 140 kHz spectral resolution across its full32 GHz band, used for connected interferometric observations, and a phased array summer used when the SMAparticipates as a station in the Event Horizon Telescope (EHT) Very Long Baseline Interferometry (VLBI) array.For each SWARM quadrant, Reconﬁgurable Open Architecture Computing Hardware (ROACH2) units sharedunder open source from the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER)are equipped with a pair of ultra-fast Analog-to-Digital Converters (ADCs), a Field Programmable Gate Array(FPGA) processor, and eight 10 Gigabit Ethernet ports. A VLBI data recorder interface designated the SWARMDigital Back End, or SDBE, is implemented with a ninth ROACH2 per quadrant, feeding four Mark6 VLBIrecorders with an aggregate recording rate of 64 Gbps. This paper describes the design and implementation ofSWARM, as well as its deployment at SMA with reference to veriﬁcation and science data. Keywords : radioastronomy, correlator, phased-array, submillimeter

1. Introduction

The Submillimeter Array (SMA) is an eight-element radio interferometer located atop Mauna Kea inHawai’i (Ho et al. , 2004). Eight six-meter dishes may be arranged into conﬁgurations with baselines aslong as 509 m, producing a synthesized beam of sub-arcsecond width at 345 GHz.The SMA is expanding the bandwidth of its receiver sets to 8 GHz in each sideband. Two receivers canbe operated simultaneously. Based on nominal center frequencies, the receivers are designated as 200, 240,345, and 400. The 200 and 240 are in opposite polarizations and can be tuned to overlap or to diﬀerentbands, the same applies to the 345 and 400s.To support the upgraded receivers a new wideband high spectral resolution correlator was needed.Scientiﬁc requirements called for 8 GHz bandwidth per sideband per polarization, 32 GHz bandwidthtotal, commensurate with the SMA’s new receiver sets. High uniform spectral resolution of ∼

140 kHz orﬁner over the entire band was speciﬁed to support fast spectral line surveys. Additionally, a phased arrayand VLBI data recorder were required to support Event Horizon Telescope (EHT) observations (Johnson et al. , 2015). The full set of scientiﬁc requirements is shown in Table 1.To meet the required speciﬁcations the SMA Wideband Astronomical ROACH2 Machine (SWARM)was envisioned, designed, and deployed. One quadrant of SWARM has 16 inputs, two receivers per SMAantenna, and can be conﬁgured to produce full Stokes polarization data over a 2 GHz usable band or asingle Stokes polarization over a 4 GHz usable band. Thus, as a dual-sideband system four quadrants ofSWARM will provide a total of 32 GHz of bandwidth on the sky. The system takes advantage of open At the time of writing three of four identical SWARM quadrants have been deployed supporting 24 GHz bandwidth. Thefull four-quadrant 32 GHz bandwidth system is expected to be completed by December 20161 a r X i v : . [ a s t r o - ph . I M ] N ov ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Table 1. SWARM’s top level scientiﬁc requirements.Feature Speciﬁcation RemarksNumber of antennas 8 dual frequency or dual polarizationIF bandwidth per quadrant 4 GHz 2 GHz per pol. per sidebandTotal sky bandwidth 32 GHz 8 GHz per side band per polSimultaneous receivers 2 dual freq. or dual pol. 230 & 345 GHzCorrelations 128 Full Stokes, 28 × × source technology shared by the Collaboration for Astronomy Signal Processing and Electronics Research(CASPER) as well as a ﬁve Giga-Sample-per-second (GSps) Analog-to-Digital Converter (ADC) boarddesigned by the Academia Sinica Institute of Astronomy and Astrophysics (ASIAA) (Jiang et al. , 2014).The Digital Signal Processing (DSP) platform chosen for SWARM is the second generation Reconﬁg-urable Open Architecture Computing Hardware (ROACH2). Each of two channels per ROACH2 samplea baseband IF from a custom Block Down-Converter (BDC). The ADCs are clocked at 4.576 GSps thussampling a 2.288 GHz Nyquist band corresponding to 2.000 GHz usable IF bandwidth per input (with ex-cised guard-band). Each ROACH2 is host to one Xilinx Virtex-6 Field Programmable Gate Array (FPGA)chip which, when conﬁgured with the SWARM gateware, is host to two 32768 point channelizers and avariety of other functions including fringe tracking, de-Walshing, full Stokes correlator, beamformer andpacketized communication logic (for a detailed description of the gateware see Section 3).Although this paper is primarily about the new SMA correlator, SWARM, it is important to considerthe context of the upgrade and the improvements that SWARM provides. SWARM will replace the recently-coined ASIC (Application Speciﬁc Integrated Circuit) correlator, previously called simply the “correlator”,which unsurprisingly was built out of ASICs. An abridged list of SWARM beneﬁts over the ASIC follows: • Higher uniform spectral resolution • No trade-oﬀ between bandwidth and high spectral resolution • Large 2 GHz usable blocks are easier to passband calibrate, and result in superior spectra • Built in VLBI phased array processor and data storage, with 16 × the present VLBI bandwidth • Better SNR due to more processed bits ( ∼

12% in principal) • Smaller size and lower power consumption • Use of commodity componentsAs the third and fourth quadrants of SWARM are installed and commissioned more of the ASIC correlatoris removed. Eventually, by the end of 2016, only SWARM will remain. This phased upgrade from theold to the new correlator is intentional and permits a smooth transition for the SMA which must remainconstantly in operation as an active facility instrument.

2. System design

The enormous computational requirements of SWARM demand a highly parallel signal processing engine.We selected the FPGA as the most appropriate technology. In particular CASPER hardware and libraries,along with the FX correlator architecture, have been qualiﬁed as the most viable design model.CASPER’s focus is on processing baselines for the very large numbers of stations common in modernlow frequency radio arrays. The eight-antenna SMA is a modest size, but the extremely wide bandwidthpresented an unexplored space within CASPER, and presents particular challenges. Early in the SWARMproject we analyzed the resource requirements for the channelizers, which dominate the computational For more information on CASPER, visit http://casper.berkeley.edu ovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA expense in the wideband FX architecture, and determined that the ROACH2 and in particular the XilinxVirtex-6 SX475T FPGA would accommodate the SWARM gateware (see Section 4.1). Fig. 1. Block diagram showing at the top level a quadrant of SWARM, on the right of the dotted line, in the context oflegacy SMA systems on the left. There are eight ROACH2s on the left of the 10 GbE crossbar switch, which contain F-and X-engines, as well as coarse and ﬁne delay tracking, phase control and deWalshing, a phased array summer, visibilityaccumulator, network logic, and assorted transposes and other memory. On the right hand side of the switch is shown the“SDBE” and Mark6 data recorder, both required for EHT VLBI.

The CASPER packetized correlator

CASPER pioneered the use of a commercial Ethernet switch as the interconnection fabric (Parsons et al. ,2008). Data is packetized prior to transmission via Ethernet switch “crossbar” from F-engine to X-engineand to VLBI recorders, etc.See Figure 1 which shows the architecture of a single SWARM quadrant at the top level, with theright hand side of the drawing showing the basic CASPER concept of processing engines organized arounda 10 Gigabit Ethernet (GbE) switch. The usual CASPER architecture shows F-engines and X-engines(described in Sections 3.2 and 3.8, respectively) on opposite sides of the switch, but in SWARM the F-and X-engines are folded back on one another, reducing the required number of ROACH2s by roughly twowith almost the same reduction of required number of switch ports.

Digital sampling

As previously stated, we selected a CASPER compatible 5 GSps 8-bit ADC developed by our SMA partner,ASIAA, (Jiang et al. , 2014) to process data in 2 GHz usable bandwidth blocks. The ASIAA ADC uses anintegrated circuit ADC, the EV8AQ160, from e2v . This is a so-called quad core device, using four 1.25 GSpsADC cores interleaved to achieve the 5 GSps design rate. The device provides register controls to alignthe cores to reduce the impact of spurs which arise due to mis-alignment in oﬀset, gain, phase (OGP), orthreshold Integral Non-Linearity (INL). Top level speciﬁcations of the e2v are listed here (from EV8AQ160ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group data sheet ): • Quad ADC with 8-bit resolution. • • Digital interface (SPI) to set OGP and INL for individual cores. • Full power input bandwidth up to 2 GHz. •

500 mV peak-to-peak analog input. • SNR=44dB, ENOB=7.1 bit at 620 MHz input frequencyIn selecting the ADC, we required 2 GHz of usable bandwidth to support SWARM. A Nyquist zone upto 2.3 GHz is needed, with the upper edge of the usable 2 GHz band at 2.15 GHz. While the bandwidthof the e2v

ADC in the data sheet is 2.0 GHz, our frequency response measurements show that the deviceresponds beyond that limit, with the attenuation at 2.15 GHz about 6 dB (including any loss on the PCboard). A sample rate of 4.6 GSps is thus within the maximum speciﬁed of 5 GSps.2.2.1.

Quad core calibration

Patel et al. (2014) presented a series of measurements characterizing the performance of the ASIAAADC. Signal-to-Noise and Distortion (SINAD), Spurious Free Dynamic Range (SFDR), Noise Power Ratio(NPR) and two-tone inter-modulation distortion tests showed that this ADC meets the requirements forSWARM. Patel et al. (2014) also documents the quad core calibration methods used in characterizingthe ADC using a sine wave source. One conclusion of our characterization of the ADC, however, was thatthe only core alignments that are critical for SWARM are oﬀset and gain. When SWARM is installedat the SMA, the only input available without manual intervention is receiver noise. We have found thatadjusting the oﬀset and gain of the four cores to be equal using receiver noise provides adequate correctionfor our needs. Fig.2 shows the autocorrelation spectra obtained with one of the ADCs with the ambienttemperature calibration load inserted at the 230 GHz receiver. With the oﬀset and gain values set to zeroesfor all four cores of the ADC, a strong spur is seen near the center of the spectrum. Drifts of the cores’soﬀsets and gains have been small and slow.2.2.2.

ADC power level

The optimal drive level into the ADC is determined by the peak of the Noise Power Ratio (NPR) curvevs. the input power. The NPR for an ideal 8 bit ADC (with only quantization noise and clipping noise)is 40.6 dB. The empirically determined NPR curve for the 5 GSps ADC boards used in SWARM can beseen in Patel et al. (2014). Patel measures the NPR curve using a tunable notch ﬁlter set to frequenciesof 800 MHz, 1000 MHz, and 1750 MHz, all of which show good agreement with the theoretical curve butwith peak NPR degrading slowly with frequency, ∼ ovember 9, 2016 1:16 SWARM˙revised˙ws-jai SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA

ROACH2

The latest open-source DSP platform to come out of CASPER is the so-called ROACH2. It is built into a1U ATX computer format and hosts a Xilinx Virtex 6 SX475T FPGA and a PowerPC. ROACH2 has twoexpansion connectors that are typically used to connect ADC cards. It uses the FPGA for its processingelement and has additional memory for storage and a PowerPC unit for monitor and control. It also has80 Gbps of bidirectional digital interface bandwidth.

High speed network crossbar

The CASPER correlator architecture uses processing nodes that communicate via packetized data routedthrough commercially available switches. These nodes could be FPGA, ASIC, GPU or CPU/multicorebased depending on the speciﬁc requirements for the node and the maturity of the instrument.

Cooling the electronics

Operation of the ROACH2 chassis at the SMA facility near the summit of Mauna Kea, at an elevation ofapproximately 4000 meters or 13,000+ feet, presented diﬃculties not present at sea level laboratory testing.In particular, the FPGA die temperature quickly exceeded the 85 degrees C threshold for guaranteed timing,even at reduced clock speeds. Modiﬁcations were made to the ROACH2 chassis to divert the power supplyexhaust and to increase air ﬂow through chassis from front to back. The cooling fan for the FPGA heat sinkwas increased in power and changed in orientation, and a more eﬀective heatsink compound was utilized.Modiﬁcations were also made to the 19 inch equipment racks that house SWARM hardware. Thepreviously open racks were enclosed with front and rear doors, and refrigerated air passing from thebottom of the rack was deﬂected into an added front plenum, whence the cooled air was then drawn, For a detailed block diagram of the ROACH2 platform, visit http://casper.berkeley.edu/wiki/ROACH2 ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Fig. 3. Plan view photo of the ROACH2 platform conﬁgured for SWARM. Two 5 GSps Quad Core ADCs are plugged in tothe ZDOK connectors towards the bottom, providing samples at a data rate approaching 80 Gbps. Eight 10 GbE ports on themezzanine board towards the top provide matched data rate throughput to the network switch. Photo credit: Derek Kubo. through the front of SWARM components, including the ROACH2 chassis, and exhausted out the rear, tobe carried up and out of a damper controlled exhaust at the top. The combination of these modiﬁcationsallowed the FPGA die temperature to remain below 85 degrees C at the full clock rate.

Real-time software

Although often overlooked and underestimated, real-time software is critical to smooth operation of anarray. For SWARM much of the pre-existing SMA software environment was adapted for the monitor andcontrol of the new correlator. This included reuse and modiﬁcation of: • Direct Digital Synthesizer (DDS) control code that manages Walshing and fringe-rotation • Shared-memory library for sharing values between SWARM and the SMA software environment • Correlation plotter for displaying SWARM data alongside the ASIC correlator data. • Data archive software for storing SWARM data using the existing SMA data format.Additionally, some new software was developed in Python for receiving and reordering of the SWARMvisibility dumps as well as for VLBI phased-array calibration (see section 5.1). As previously discussed inSection 2.2.2 the software servo for the BDC was also written to run in real-time. ovember 9, 2016 1:16 SWARM˙revised˙ws-jai SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA

3. FPGA gateware

Each ROACH2 board in SWARM contains a single FPGA connected to multiple peripherals (for moreinformation on ROACH2 see Section 2.3). The FPGA logic is implemented using what is called a bitcode which is essentially a binary ﬁle that encodes the conﬁguration of logic elements on the chip and theconnections between them. Typically the bitcode is generated by starting with a high-level description ofthe intended behavior; this is then synthesized and mapped to the logic elements provided by the FPGA.For the purposes of this document we will refer to this high-level description as gateware .Although it is common (i.e. in the engineering industry) for gateware to be implemented using languagessuch as Verilog or VHDL, for SWARM we decided to take advantage of the large and open-source CASPERgateware library and toolﬂow based around the MATLAB Simulink design environment. This decisionhad huge advantages including allowing us to signiﬁcantly reduce development by designing at a veryhigh-level. For example, the CASPER libraries provide parameterized blocks for a Fast Fourier Transform(FFT). What follows is a detailed description of the SWARM gateware which is graphically described inthe block diagram in Figure 4. Selectable test signals

The SWARM gateware design features a software-selectable data source which defaults to the 8-bit datafrom the samplers but can be selected to be either a Gaussian noise generator, a tunable sine-wave testtone, or a summation of both; the two inputs paths are independently selectable. In practice the Gaussiannoise is used to verify basic functionality of the system from the inputs to the outputs by ﬁrst selectingthe noise for all inputs, then synchronizing them across all DSP boards, and verifying perfect correlationon all baselines of the visibility output data. This ﬁrst-pass test has proven to be very helpful in quicklydiagnosing issues throughout the design.

F-engine and coarse-delay

Fundamental to the FX correlator architecture is the conversion of a sequence of discrete time domainsamples, for every input, to the Fourier domain before they are cross-correlated; this is referred to asthe “F-engine.” In the SWARM FPGA gateware design there are two such F-engines instantiated whichseparately process either two contiguous frequency bands or two orthogonal polarizations per antennadepending on the mode. Each quadrant of SWARM thus has 16 F-engines for a total of 64 across allquadrants.The SWARM F-engine is, in practice, a Polyphase Filter Bank (PFB) implemented with a 32768-pointreal-valued FFT preceded by a 4-tap Hamming-window Finite Impulse Response (FIR) ﬁlter for eachpolyphase component. Although the PFB provides the best isolation for narrow spectral components theSWARM F-engine features the ability to disable the FIR at runtime for observations that would prefera straight FFT (such as VLBI where easy conversion back to the time domain is necessary). Both theFIR and the FFT are implemented using standard blocks from the CASPER library with a parallelizationfactor, i.e. “demux,” of 16. Ultimately, the output of the F-engines are complex spectra of 16384 channelsfor every 32768 input time-domain samples; at a sample rate of 4576 MHz that amounts to a transformationroughly every 7 microseconds.In order to align the F-engine windows the SWARM gateware includes a coarse-delay correction whichis applied before the PFB using a buﬀer in the time domain. The primary purpose of the coarse-delay is tocorrect for the large geometric delays between antennas when tracking a celestial source. To accommodatethe largest baselines of the SMA in the Very Extended (VEX) conﬁguration the buﬀer is 32768 samples,coincidentally equal to one FFT window. ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Fig. 4. Block diagram representing the SWARM gateware design. The dashed, shaded region represents the FPGA on theROACH2 platform, a Virtex-6 SX475T. Blocks fully inside this shaded region represent high-level logic within the gatewarewhile blocks bordering it represent external interfaces (e.g. memory controllers, network ports, and busses). Dotted regionsidentify sub-systems referred to throughout this document with their hierarchical name. The data from each antenna ﬂowsfrom the ADCs on the left through the two F-engines, gets time-frequency transposed by two Quad Data Rate (QDR) chips, issent out over the network to return as 1 / Fine-delay, phase, and amplitude control

Directly following the F-engines are the so-called “complex gain” blocks which multiply each channel ofevery spectra by a dynamic complex value. There is a single ﬁne-delay control, a phase control, and aper-channel amplitude control. The ﬁne delay control amounts to simply a phase-per-channel value whilethe phase control is a constant phase across the band (i.e. every channel gets the same phase). There isalso a per-channel amplitude control implemented using a software-accessible memory bank. In practicethe amplitude control is rarely used (since most amplitude bandpass variations can be calibrated usingbandpass calibration sources) but could be used to optimize the secondary quantization (see Section 3.5) aswell as to knock out sources of interference (such as leakage from the oscillators in the antenna electronics).ovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA Synchronization and de-Walshing

The SMA uses Walsh modulation and demodulation to reduce cross-talk within the IF/LO system as wellas for sideband separation in the correlator. The modulation is applied at the LO while within SWARMWalsh demodulation is done using the complex gain adjustments discussed in the previous Section 3.3. Themodulation and demodulation are synchronized using an external signal generated by the DDS-computer(the machine that handles the modulation of the LO). Within the SWARM gateware the external signaldrives an arm-able internal Walsh counter which is then used to demodulate both the 0-180 degree (forcross-talk rejection) and the 90-270 degree (sideband separation) components of the Walsh pattern. Theinput signals are fully demodulated for one sideband, typically the USB, via phase shifts applied with thecomplex gain sub-system (see Section 3.3); subsequently the other sideband is “separated” in the ﬁnalaccumulator (see Section 3.9) by accumulating a parallel integration with a secondary modulation oppositeto that of the USB (on a per-baseline basis).

Quantization to 4-bits

To reduce the memory and network-traﬃc requirements of the transpose and corner-turn operations (seeSection 3.6) while maintaining signal-to-noise and dynamic range it was decided to re-quantize to lowerresolution after the complex gains are applied. Although the samplers themselves provide 8-bits the datagrows to 18-bits through the F-engine and the complex gain subsystems but is subsequently rounded downto 4-bits. This bitwidth is common among packetized CASPER-based FX correlators.

Time-Frequency Transposes

The F-engines’ output is a continuous series of spectra, however the X-engines, i.e. the correlators, expect asequence of time samples for each channel (see Section 3.8). To reorder the F-engine outputs appropriatelySWARM uses the high-speed QDR memory provided by the ROACH2 boards. The X-engines expect 128times samples per channel, thus 128 spectra from each F-engine must be buﬀered row-wise while the per-channel data is read out column-wise. This process is eﬀectively a matrix transpose operation where theaxes being transposed are frequency and time. In practice the spectra are actually “double-buﬀered” inthe QDR memory (for simpliﬁcation of read/write addressing) thus requiring a total of2 ×

128 spectra × × × ≈ . Packetized Corner-turn

Once the frequency domain data has been time-frequency transposed it must be transposed in anotherway, frequency-antenna. On one side, each F-engine path produces a full spectra for a single antenna whileon the opposite end a single X-engine will consume some subset of the channels (i.e. bandwidth) for all antennas. This process is commonly referred to as the “corner-turn” and is a requirement for any correlator.A corner-turn can be implemented in numerous ways. The ALMA correlator, for example, uses 16384cables to route the data appropriately which turned out to represent the “greatest design challenge in thesystem” (Escoﬃer et al. , 2007). SWARM, on the other hand, uses what could be called the “CASPERapproach” (see Section 2.1 and Figure 1) which is to use a commercial high-speed Ethernet switch androuted packets to serve the same function. Although some overhead is needed in the gateware to accom-modate packet buﬀers this approach has the beneﬁt of being ﬂexible, highly-scalable, easier to implement,and, in many cases, can be cheaper than other methods.

X-engine and accumulators

Once the data has been corner-turned each processor now has access to data from all antennas for a subsetof the bandwidth which for SWARM is 1 / Smithsonian Astrophysical Observatory DSP Group and baseline-based processing can begin. In particular all inputs can now be correlated by a subsystemcalled the X-engine. For the SWARM gateware we use the standard CASPER library X-engine block, eightper board due to the 16-fold demux (the factor of two comes from having complex-valued data after theF-engine).Unlike in many other CASPER correlators the SWARM X-engines are co-located with the F-engines,that is to say they use the same processing boards as the F-engines. While this has presented challengesin terms of clocking the FPGA design at high clock rates the approach was intended to reduce the totalnumber of ROACH2 boards (thus reducing cost) as well as using fewer Ethernet switch ports for thecorner-turn. Additionally the corner-turn switch ports are all used full-duplex at very nearly 10 Gbps inboth directions.The SWARM X-engines compute all cross-correlation products regardless of whether the two inputsper SWARM board represent two polarizations, i.e. dual-polarization mode, or two contiguous chunksof bandwidth, i.e. single-polarization mode. So, although we consider SWARM to be an 8-element full-Stokes correlator it could also be thought of as a 16-element single-Stokes correlator. Additionally theauto-correlations are produced which have proven useful for calibrating the data. In total, the X-enginesproduce 120 complex-valued cross-correlations and 16 real-valued auto-correlations; each pair of real-valuedauto-correlations can be crammed into a single complex number thus reducing the total output componentsto 128.For eﬃcient use of resources the X-engine blocks are conﬁgured to integrate by 128 time samples andthe outputs from all eight X-engine blocks (which simultaneously compute eight diﬀerent channels) areinterleaved into a single stream. This data is then long-term accumulated using one QDR chip per sidebandas discussed in Section 3.4.Note that the input window for each X-engine block is 1024 clocks (128 samples for each antenna-receiver pair) while the valid output window is 128 clocks (one clock per component). However becausewe’re interleaving eight blocks going into the accumulator there is no idle time available for double-buﬀering(though the capacity is available) and therefore the accumulations must be read out immediately uponcompletion. This presents a particular challenge for reading the data across an entire SWARM quadrant,the solution for which is discussed in Section 3.9. Visibility output and interleave delay

All ROACH2s are synchronized using an external signal, this applies to the X-engines as well. Thus, theX-engines all dump their visibility data simultaneously. While the average data rate at this point in thesystem is small (typical integration times are ∼

30 seconds), the simultaneous transmission of this datato a single port connected to a control computer would overwhelm the limited internal memory buﬀer inSWARM’s 10 GbE switch. The solution was to add a software-deﬁned delay to the FPGA gateware inorder to stagger the X-engine outputs. Due to the large size of the visibility data, this required using theon-board DDR3 memory, which oﬀers 4 GB of memory.

B-engine

Another baseline-based system is the built-in beamformer that enables the SMA to operate in a phasedarray mode, called the B-engine. The beamformer provides an adjustable gain per antenna which can beused eﬀectively as a mask and sums all antennas (a reduction of the data rate by eight). The summeddata is then sent out onto the network to the VLBI processor and recorder. To eﬀectively be used as aphased-array for VLBI the phases need to be adjusted for each antenna in real-time using the constantphase component of the complex gain subsystem (see Section 5.1).

4. Resource utilization

Before committing to the ROACH2 we needed to understand that the bitcode would ﬁt the target FPGA.It was clear that the utilization would be dominated by the PFB. A hard reality is that the FPGA cannotrun nearly as fast as the ADC so it is necessary to process a number of parallel streams, the demux factor.ovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA An early SWARM Memo explored the resource requirements of the PFB which was vital in the decisionto proceed with using the ROACH2 for the SWARM project. This section will reproduce (but not derive)the results from that Memo. Estimation of resources

As discussed in Section 3.2, a PFB can be constructed using an FIR ﬁlter followed by a DFT whichextracts the appropriate sub-bands. The DFT can be implemented using a FFT algorithm in order to takeadvantage of the O ( N log N ) optimization those algorithms aﬀord. However as bandwidth, and thereforedemux (represented here as D ), grows, more samples are presented at once which means more multipliersmust be instantiated in hardware.The FIR ﬁlter preceding the FFT uses a single real multiplier per tap, so given T taps (typical numbersare 4 to 8) and N channels, the full PFB multiplier utilization is shown below with the various componentsidentiﬁed, M PFB = D log N D (cid:124) (cid:123)(cid:122) (cid:125)

FFT + T D (cid:124)(cid:123)(cid:122)(cid:125)

FIR − D (cid:124)(cid:123)(cid:122)(cid:125) optimization (1)The most important thing to note from this equation is that since the pipelined stages divide their com-putation over N/D clock cycles the total number of multipliers only grows with N as log N . The maincontributer to multiplier utilization is instead the demux factor. Figure 5 shows a graphical representationof Equation 1. To ﬁnd the total adders we go through a similar calculation but noting that the butterﬂieshave as many adders as multipliers and the FIR only needs to sum all taps and thus performs D ( T − A PFB = 32 D log N D (cid:124) (cid:123)(cid:122) (cid:125)

FFT + D ( T − (cid:124) (cid:123)(cid:122) (cid:125) F IR + D (cid:124)(cid:123)(cid:122)(cid:125) reorder (2)The added D results from extra adders in the block needing to do the ﬁnal reordering of the FFT output.Within a polyphase ﬁlter-bank the multiplier and adder utilization appear to grow signiﬁcantly withthe demux factor, namely as D log N D , whereas the amount of required memory depends critically, andlinearly, on the total channels, N . Generally this implies that designs with modest bandwidth but requir-ing signiﬁcant spectral resolution will be constrained by memory. SWARM however, has both very largebandwidth and substantial PFB size to achieve ﬁne spectral resolution. This formalism helped us ﬁnd theappropriate combination of parameters which meet the requirements of bandwidth and spectral resolutionwhile ﬁtting the logic and memory available in the ROACH2’s FPGA. Demultiplexing

Experience shows that clocking an FPGA with a complex bitcode at rates approaching or exceeding300 MHz stretches its capabilities, and those of the design tool-ﬂow, to meet timing. Were 312 MHzachievable, however, our resource calculation shows that a very substantial savings in multiplier and adderresources results (see Figure 5). Without constraint it would perhaps be preferred to clock the FPGA atabout 250 MHz, however because the demux factors are quantized to radix-2 numbers, it is important toappreciate that stretching to the next demux boundary can yield signiﬁcant returns in utilization.

Implementation resources used

Ultimately the SWARM gateware described in Section 3 ﬁt into the target FPGA with a demux factor of16 which meant clocking the FPGA at 286 MHz. For a full list of resources used by the implementation ofour gateware, divided by sub-system, see Table 2 below. SWARM Memo ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Number of PFB channels I n s t a n t i a t e d M u l t i p li e r s D = 16D = 32D = 64D = 128

Fig. 5. Number of multipliers versus number of PFB channels for various values of the demultiplex factor. The dashed linerepresents the upper limit of available DSP slices on the FPGA present on the ROACH2. This plots shows how expensive itcan be to jump to the next demux (e.g. to decrease the FPGA clock speed) while maintaining the same number of channels.Note: here we are assuming 8 taps in the FIR (where as SWARM uses only 4).Table 2. Resources used by various sub-systems of the SWARM gateware.DSP Slices Slice LUTs Slice Reg. Block RAM SlicesAvailable a a Shown for the Xilinx Virtex-6 SX475T.

5. VLBI features

SWARM supports VLBI through a built-in beamformer, a VLBI-speciﬁc packetizer called the SWARMDigital Backend (SDBE), and an oﬀ-line data preprocessing system called the Adaptive Phased-array andovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA Heterogeneous Interpolating Downsampler for SWARM (APHIDS). This enables the SMA to participatein VLBI observations as part of the EHT.

Beamformer

The beamformer coherently adds the signals received from the target source in each antenna such thatthe array performs as the equivalent of a single station with a larger collecting area within the widerVLBI array. Phasing the array requires tracking all sources of delay, including ﬂuctuations in water vaporconcentration in the atmosphere. The SWARM phasing system is equipped with a real-time phasing solverthat continually updates the beamforming weights to compensate for these variable delays, which manifestas variable phase errors in each antenna, over the course of the observation. Since the phased array capabilityis used to observe sources that are unresolved on baselines within the array, the corrective beamformerweights can be computed by extracting from the correlator output that contribution associated with apoint-like source. Furthermore, as the weights are applied to the signal from each antenna before computingcross-correlations between antenna pairs (see Figure 4), the solution obtained from the correlator outputfor a particular integration period can also be used to calculate the average phasing eﬃciency over thatsame period. Speciﬁcally, the phasing eﬃciency is calculated as, η φ = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) i w i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:30) (cid:32)(cid:88) i | w i | (cid:33) , (3)where w i is the complex-valued weight applied to antenna i . See Young et al. (2016) for a more detaileddiscussion of the phased array and performance assessment thereof.Figure 6 shows the phasing eﬃciency achieved over the course of several scans during one night ofthe 2016 EHT campaign. For most of the scans the eﬃciency is well above 0.9. Lower values obtainedduring the scans on Cen A (just after 8:00 UT) and the ﬁrst few scans on SgrA* and NRAO 530 (from11:00 UT) are attributed to observing at low elevation which degrades the atmospheric phase stability.The antenna that was used as the phase reference during the observation suﬀered a loss of coherence fromaround 10:00–11:00 UT which resulted in poorer performance for scans in that period. SDBE

Single dish VLBI stations, speciﬁcally those used in the EHT in recent years, have a serial data pipelinefor 2 GHz bands: digitization, real-time data format to the VLBI standard, encapsulation in the VLBIData Interchange Format (VDIF) (Whitney et al. , 2009), and saving data to disk via Mark 6 data recorder(Whitney et al. , 2013). SWARM distributes the beamformer processing of 2 GHz bands across eightROACH2 devices. These parallel data streams must be collected and formatted in real-time in order tointerface with the Mark 6, in a manner similar to that implemented in the ROACH2 Digital Backend(R2DBE) which is used at other EHT sites (Vertatschitsch et al. , 2015).Utilizing the rapid development platform provided by the ROACH2, we built and tested a real-timesystem to collect and format “B-engine”, i.e. beamformer, packets output by SWARM. The data arereceived on four of the eight 10 GbE ports on the SDBE. The packets are time-stamped, the frequencydomain samples are quantized from 4 bits complex to 2 bits complex, the packets are formatted with VDIFheaders, and transported over UDP to the Mark 6. Since the B-engine packets are relatively small, severalof these packets are bundled into each UDP packet to reduce the interrupt rate on the Mark 6 so as toavoid packet loss. This design uses all eight 10 GbE ports oﬀered on the ROACH2, and all four 10 GbEinputs to the Mark 6. At full speed, the Mark 6 ingests 18.99 Gbps from a single quadrant of SWARM. Ablock diagram of the SDBE system is shown in Figure 7. ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group P h a s i n g e ff i c i e n c y P h a s i n g e ff i c i e n c y Fig. 6. Phasing eﬃciency measured on various sources during EHT VLBI on April 4, 2016. The horizontal axis shows time inUT and the vertical axis shows phasing eﬃciency. The inset histogram shows the distribution of phasing eﬃciency measuredover this period.

ROACH2ROACH2ROACH2ROACH2ROACH2ROACH2ROACH2ROACH2

SWITCH

10 GbE

SWARM

Single Quadrant

SDBE

ROACH2

UDPVDIFTime q2 UDPVDIFTime q2 UDPVDIFTime q2 UDPVDIFTime q24.69 Gbps4.69 Gbps 9.38 Gbps 4.75 Gbps

Aggregate Data Rates 37.50 GbpsInto Switch/SDBE 18.99 GbpsInto Mark 6

Mark 6

Data Recorder

GPU Server q2VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT q2VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT q2VDIF unpack 16K-pt FFT N-pt FFT M-pt FFT q2VDIF unpack 16K-pt FFT N-pt FFT M-pt FFTTCP VDIF TCP

GeForce GTX 980GeForce GTX 980GeForce GTX 980GeForce GTX 980

Mark 6

Data Recorder

Mark 6

Data RecorderDisk transportpost-observation

Fig. 7. Block diagram demonstrating the VLBI data pipeline, from SWARM to correlatable data in Mark 6 format. TheSDBE is integrated with SWARM and does the real-time processing necessary to interface with the on-site Mark 6 during anobservation. After observing the data is preprocessed oﬄine in APHIDS prior to correlation with data from other sites.

APHIDS

The underlying data within the packets streamed from the SDBE diﬀers from that typically employed forVLBI and expected at the EHT correlator. Speciﬁcally, other EHT sites sample a power-of-two megahertzbandwidth at the Nyquist rate and produce a digital stream of time-domain data. For SWARM the datawithin the SDBE packets are in the frequency-domain and correspond to a sample rate diﬀerent fromother EHT sites. A certain amount of preprocessing is therefore necessary prior to VLBI correlation withSWARM data, and is performed within APHIDS.This system reads SDBE data recorded to disk from a Mark 6, converts the data to time-domain at theovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA required sample rate, requantizes to 2 bit, encapsulates in VDIF, and writes to disk on a second Mark 6.The data reformatting implements interpolation and digital ﬁltering using a power-of-two DFT followed bya non-power-of-two inverse DFT, and is GPU accelerated using the CUDA toolset. The ﬁltering discardsexcess bandwidth resulting from the higher sample rate used in SWARM relative to other sites.Figure 8 shows a long baseline fringe detection using SWARM to the Large Millimeter Telescope (LMT)in Mexico, equipped with the ROACH2 DBE. It is typical in VLBI to search for a detection in both delayand delay rate space; the plot shows the correlation coeﬃcient as a function of these variables. Data wastaken on 8 April 2016. -0.2 -0.1 0.0 0.1 0.2delay rate [ps/s]-2.0-1.0 0.0 1.0 2.0 m u l t i - b a n d d e l a y [ n s ] Fig. 8. VLBI fringe detection on the quasar J1512-0905 on a transcontinental baseline between SMA SWARM and the LargeMillimeter Telescope (LMT).

6. Deployment and veriﬁcation

Early prototypes of SWARM were tested in the laboratory starting in 2013 with an antenna simulator, aphase agile four-channel noise generator, with a controllable ratio of correlated to uncorrelated noise in ineach channel. The antenna simulator can be set to Walsh the signals in the characteristic pattern used bythe SMA with both 0-180 degree and 90-270 degree cycles. The simulator could not, however, simulate thegeometric delays of a real sky observation. Also four-antenna versions of SWARM could not correlate thefull bandwidth because the cross-multiplies for a single antenna are distributed across all eight ROACH2sin a full system. Nonetheless the simulator proved to be invaluable to test basic functionality of SWARMin the laboratory, instead of on the telescope, which requires long distance travel, is less comfortableand eﬃcient due to altitude, and is either constrained in time allocation or risks interfering with SMAobservations.The ﬁrst eight-ROACH2 quadrant was ﬁelded at the SMA in 2014, running at 54% of full bandwidth.Over the next approximately two years, the bandwidth was increased twice, to just over 70% and then to90% in 2015. Also in late 2015, a second quadrant of SWARM was built and commissioned. SWARM wasﬁrst used for EHT VLBI science in March and April 2015, in 70% bandwidth mode. It was used again inJuly 2015 as well as April and June 2016. VLBI fringe detections were obtained for all these campaignsexcept June 2016, when a combination of technical problems at the partner EHT site, and bad weather onMauna Kea, rather than issues with SWARM, obstructed success.ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Fig. 9. Pictures of the SWARM equipment installed on Mauna Kea. From left to right, the four photos show the BDC whichfeeds analog baseband signals to SWARM; the front of a single quadrant of SWARM showing the 8 ROACH2 units cabledwith IF, ADC clock, and other control signals entering the front of the ROACH2 chassis; the rear of a SWARM quadrantshowing the 10 GbE cables which route the corner turn, visibility and B-engine data; and a rolling rack with a pair of SDBEsand Mark6 data recorders, which record B-engine data from a pair of quadrants. The second installed SWARM quadrant isnot shown. When the SMA ASIC correlator is decommissioned later in 2016, the SWARM equipment in rolling racks (BDCand SDBEs and Mark6 recorders) will be moved to permanent equipment racks.

On 11 July 2016, with two quadrants operational, the ﬁrst full bandwidth bitcode was successfullytested in connected interferometer mode. On 21 July 2016 two quadrants of SWARM running at full speedwere released for science at the SMA. As of 18 October 2016 there are now three quadrants of full speedSWARM in use for science with the ASIC correlator soon-to-be decommissioned. See ﬁgure 9 for photosof the SWARM equipment installed in the SMA equipment room on Mauna Kea.SWARM is always running at full spectral resolution, resulting in data ﬁles for a night of observation(assuming the four-quadrant system) of the order of 100 GB in size. A “rechunker” program can quicklyreduce the resolution of the SWARM data ﬁle, for those who do not need the resolution for their sciencegoals. The smaller ﬁles are more manageable in general, and they load more quickly into the data reductionprograms. Even so, full resolution SWARM data is archived for every track, which make the archive a morevaluable resource when the proprietary period expires and the archive becomes widely available, sometimesfor science goals other than that for which the data was originally taken.

Line survey demonstration

The instantaneous wide bandwidth and high spectral resolution of two quadrant SWARM allows for quickand eﬃcient line surveys. To demonstrate this, on 14 August 2016 a veriﬁcation and demonstration obser-vation, with about an hour on source, was made of the rich forest of strong lines in Orion BN/KL. Theopacity was mediocre, τ ∼ .

2, however the atmospheric phase was fairly stable, and the SMA was inthe subcompact conﬁguration. Given the strength of the lines, the conditions were entirely suitable for anobservation. Bandpass and ﬂux calibration data was also taken. The calibrated spectrum is show in Figureovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA

10. The three panels zoom in on smaller frequency ranges from top to bottom. The red and blue sectionin the top panel is a single SWARM 2.0 GHz usable chunk in one sideband only, and the blue sectionis a particularly busy and interesting segment of the spectrum, spanning about 260 MHz, and shown indetail in the bottom panel. All of the spectral detail visible in the bottom panel is available across the fullspectrum in the top panel, though not well visualized there due to the compressed frequency scale.When the planned four SWARM quadrants are completed later this year, the 8 GHz gap betweenlower and upper sidebands apparent in the top panel can be ﬁlled (assuming that the two 230 GHz receiversets are tuned with exactly 8 GHz diﬀerence in sky center frequency), and a further 8 GHz contiguousadded either below the LSB or above the USB, thereby providing a contiguous 32 GHz instantaneousbandwidth on the sky. It should also be noted that because SWARM samples a 2.288 GHz Nyquist bandin each ADC channel, and given carefully chosen ﬁlters and local oscillators in the block down-converterswhich condition the IF for SWARM, there are no edge eﬀects every 2 GHz due to bandpass skirts after theguard bands are excised. In other words the 32 GHz contiguous instantaneous sky band of four-quadrantSWARM when set up in this way has near-optimal SNR anywhere in the band.

Quantitative validation

To obtain a quantitative measure of SWARM performance, we analyzed observations of the red giant starR Cas taken on 21 July 2016, to see if we get better signal-to-noise with SWARM or ASIC. The SiO(5-4)maser line appeared in ASIC chunk s43, and SWARM chunk s50 (LSB)—for this test the SWARM andASIC correlators were conﬁgured so that both correlators processed the portion of the IF which containedthe spectral line. There was no detectable continuum, and no other lines, so this observation lends itselfwell to a comparison of SNR in the two systems—SNR in this context is deﬁned as the ratio of line areato the Root-Mean-Square (RMS) of the system noise in the line free region of the spectrum. See Figure 11for the SiO maser line as seen by SWARM and the SMA ASIC correlator.A short form description of the analysis is given in the following steps. Standard SMA data calibration(steps 1 to 3) used the SMA data reduction package, MIR. Python code was used to complete the analysisand estimate the SNR (steps 4 to 9).(1) T sys calibration was applied to the data, using SMA’s logged Y-factor measurements of system tem-perature. This converted the raw crosscorrelation coeﬃcient to an approximate Jansky unit scale.(2) Bandpass calibration was completed in both data sets using data taken on the quasar 3c454.3(3) The s43 (ASIC) and s50 (SWARM) R Cas amplitude data was gain calibrated using MWC349 as acalibration source(4) The SWARM data was vector averaged in sets of 6 channels to approximately match the ASIC reso-lution(5) RMS values of the amplitude of these spectra were calculated in the frequency range corresponding tothe usable 82 MHz bandwidth of s43 (excluding the region of line emission) with s50 trimmed to thesame 82 MHz to get the ASIC RMS.(6) The average value of the amplitude in the line was calculated for each spectrum(7) The SNR was calculated by dividing the average line-area amplitude by the RMS.(8) The ratio of the s50 SNR to the s43 SNR was calculated for each baseline.(9) The average of all 28 SNR ratios from step 8 then showed a ratio of 1.11 with an error of ± . ∼

12% improvement in SNRover the ASIC correlator. This analysis shows the measured SNR for SWARM is 11% ±

3% higher thanfor the ASIC. This non-trivial improvement allows SWARM to achieve ASIC’s SNR with correspondinglyless telescope time.ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Sample science data

SWARM is routinely used for science. Figure 12 shows a narrow (about 1 MHz) spectral line and lineimage of a HCN transition in a comet, observed by Smithsonian Scientist Chunhua Qi. The line takes upabout seven 130 kHz SWARM bins, it was taken when SWARM was one step away from running at fullbandwidth.

7. Conclusions

We have built and commissioned three quadrants of the SWARM system processing 24 GHz of the even-tual 32 GHz bandwidth goal. The three quadrants fully validate the SWARM design since the quadrantsare essentially replicas of one another. The two SWARM instruments, the connected correlator and thephased array, have been successfully deployed for routine science, and represent the future of digital signalprocessing at the SMA. The older ASIC correlator will be retired in 2016, saving an order of magnitude inpower used for DSP at SMA, and freeing up space in the SMA correlator room for future instrument buildouts. Engineering decisions made early in the design process of SWARM that have been validated include: • The use of quad-core ADCs in a broadband application which was viewed as a technical risk in theCASPER community. The foundational work of Patel et al. (2014) on mitigation of distortion throughquad core alignment has allowed us to show that such devices can yield science quality wideband astro-nomical data. • The choice to build an FX correlator with high resolution spectral decomposition computed in oneDSP stage, since cascading coarse and ﬁne PFBs would cause edge eﬀects requiring overlapping coarsechannels to mitigate, creating a need for still more computation, more FGPA hardware, and complexinterconnect. Early utilization estimates showed that two 32 kilopoint PFBs would ﬁt on a single XilinxFPGA, with X-engines co-located, along with delay and phase alignment, networking, packetizing, andtranspose and buﬀer resources, as long as the demultiplex factor was limited to 16. • The choice of a demultiplex factor of 16 along with the chunk bandwidth of 2.3 GHz which necessitatedan FPGA clock rate of 286 MHz and very high utilization of the various FPGA resources. Meeting timingwas indeed a greater challenge than anticipated but was ultimately achieved in July 2016. • The election to use open-source CASPER technology, including the ROACH2 and 5 GSps ADCs. TheSMA internal design eﬀorts were limited to system design, infrastructure, and the very complex, highlyutilized and high performance, FPGA bit code. We did not, however, have to develop and debug DSPhardware, which would have resulted in longer “time to science” for SWARM.All the originally targeted goals set at project inception were achieved. SWARM is impressively full featured,compact, and economical in its power consumption, and while these desirable characteristics are in part aconsequence of Moore’s Law, some were met through persistent pursuit of an elegant, highly utilized, andchallenging high speed FPGA design.Though this is not the ﬁrst CASPER packetized correlator, it is to our knowledge the widest band-width CASPER correlator deployed as an open facility instrument, therefore further validating CASPERapproaches such as the use of of packet-switched Ethernet switch based corner turners, and the beneﬁts ofopen source sharing of technology within the Astronomical community.

Acknowledgments

The Submillimeter Array is a joint project between the Smithsonian Astrophysical Observatory and theAcademia Sinica Institute of Astronomy and Astrophysics. We are grateful for the hard work and supportof numerous SMA staﬀ, who, collectively, made SWARM possible. Development of the VLBI features ofSWARM were funded with SAO Internal Research & Development funding, the NSF, and the Gordon andBetty Moore Foundation under GBMF3561. We received generous donations of FPGA chips from Xilinx,Inc, under the Xilinx University Program, also supporting EHT VLBI SWARM features. We acknowledgethe EHT for providing the SWARM EHT fringe veriﬁcation data, and Chunhua Qi for the HCN line andimage plot in comet C/2013 X1 (PanSTARRS). SWARM has beneﬁted from technology shared underovember 9, 2016 1:16 SWARM˙revised˙ws-jai

SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA open source license by CASPER. This research has made use of NASA’s Astrophysics Data System. Weacknowledge the signiﬁcance that Mauna Kea has for the indigenous Hawaiian people, and are privilegedto be able locate SWARM at its summit. References

Escoﬃer, R. P., Comoretto, G., Webber, J. C., Baudry, A., Broadwell, C. M., Greenberg, J. H., Treacy, R. R., Cais,P., Quertier, B., Camino, P., Bos, A., and Gunst, A. W.

A&A

ApJ , 616, L1 (2004).Jiang, H., Liu, H., Guzzino, K., Kubo, D., Li, C.-T., Chang, R., and Chen, M.-T.,

PASP , 126:761-768 (2014).Johnson, M. D., Fish, V. L., Doeleman, S. S., et al. , Science , 350, 1242 (2015).Parsons, A., Backer, D., Chen, H., Droz, P., Filiba, T., MacMahon, D., Manley, J., McMahon, P., Parsa, A., Siemion,A., Werthimer, D., Wright, M.,

PASP , 120, 873, 1207-1221 (2008).Parsons, A., et al. , “A New Approach to Radio Astronomy Signal Processing”,

URSI GA , (2005).Patel, N. A., Wilson., R. W., Primiani, R. A., Weintroub, J., Test, J. and Young, K. H.,

Journal of AstronomicalInstrumentation , Vol. 3, No. 1 (2014).Sutton, E. C., Blake, G. A., Masson, C. R., and Phillips, T. G.,

ApJS , Vol. 58, p341-378 (1985).Vertatschitsch, L., Primiani, R. A., Young, A., et al. , PASP , 127, 1226 (2015).Whitney, A. R., Kettenis, M., Phillips, C. and Sekido, M. [2009] “VLBI Data Interchange Format (VDIF),”

Pro-ceedings of the 8th International e-VLBI Workshop , Vol. 42 (2009).Whitney, A. R., Beaudoin, C. J., Cappallo, R. J., Corey, B. E., Crew, G. B., Doeleman S. S., Lapsley, D. E., et al. ,“Demonstration of a 16 Gbps Station-1 Broadband-RF VLBI System,”

PASP , Vol. 125, No. 924, (2013).Young, A., Primiani, R. A., Weintroub, J., Moran, J. M., Young, K. H., Blackburn, L., Johnson, M. D., and Wilson, R.W., “Performance Assessment of an Adaptive Beamformer for the Submillimeter Array.”

IEEE InternationalSymposium on Phased Array Systems & Technology , in press, (2016). ovember 9, 2016 1:16 SWARM˙revised˙ws-jai Smithsonian Astrophysical Observatory DSP Group

Fig. 10. The SMA with two quadrants of SWARM operational observed the forest of lines in Orion BN/KL on 14 August 2016between 15:20 and 16:50 UT, early morning in Hawai’i. The three panels in this presentation zoom in on smaller regions of thespectrum progressively. The top panel shows 16 GHz or 8 GHz in each sideband with the entire band measured instantaneously.The red section is then blown up in the middle panel, this is a single SWARM 2 GHz chunk in a single sideband. The bluesection is then shown in the bottom panel showing about 260 MHz or about 1.6% of the observed bandwidth in a two-quadrantSWARM. The lines in the lowest panel marked “A” are all transitions of CH OH . A single SMA baseline is shown, withone hour of on-source time. The line identiﬁcations are from Sutton et al. (1985). ovember 9, 2016 1:16 SWARM˙revised˙ws-jai SWARM: A 32 GHz Correlator and VLBI Beamformer for the SMA (cid:55)→(cid:55)→