[PDF] Optimization of multi-gigabit transceivers for high speed data communication links in HEP Experiments

Abstract

The scheme of the data acquisition (DAQ) architecture in High Energy Physics (HEP) experiments consist of data transport from the front-end electronics (FEE) of the online detectors to the readout units (RU), which perform online processing of the data, and then to the data storage for offline analysis. With major upgrades of the Large Hadron Collider (LHC) experiments at CERN, the data transmission rates in the DAQ systems are expected to reach a few TB/sec within the next few years. These high rates are normally associated with the increase in the high-frequency losses, which lead to distortion in the detected signal and degradation of signal integrity. To address this, we have developed an optimization technique of the multi-gigabit transceiver (MGT) and implemented it on the state-of-the-art 20nm Arria-10 FPGA manufactured by Intel Inc. The setup has been validated for three available high-speed data transmission protocols, namely, GBT, TTC-PON and 10 Gbps Ethernet. The improvement in the signal integrity is gauged by two metrics, the Bit Error Rate (BER) and the Eye Diagram. It is observed that the technique improves the signal integrity and reduces BER. The test results and the improvements in the metrics of signal integrity for different link speeds are presented and discussed.

Full PDF

OOptimization of multi-gigabit transceivers for high speed data communicationlinks in HEP Experiments

Shuaib Ahmad Khan a, ∗ , Jubin Mitra a , Tushar Kanti Das a , Tapan K. Nayak a,b a Variable Energy Cyclotron Centre, Homi Bhabha National Institute, Kolkata, India b CERN,CH-1211 Geneva 23, Switzerland

Abstract

The scheme of the data acquisition (DAQ) architecture in High Energy Physics (HEP) experiments consist of datatransport from the front-end electronics (FEE) of the online detectors to the readout units (RU), which perform onlineprocessing of the data, and then to the data storage for o ﬄ ine analysis. With major upgrades of the Large HadronCollider (LHC) experiments at CERN, the data transmission rates in the DAQ systems are expected to reach a fewTB / sec within the next few years. These high rates are normally associated with the increase in the high-frequencylosses, which lead to distortion in the detected signal and degradation of signal integrity. To address this, we havedeveloped an optimization technique of the multi-gigabit transceiver (MGT) and implemented it on the state-of-the-art20nm Arria-10 FPGA manufactured by Intel Inc. The setup has been validated for three available high-speed datatransmission protocols, namely, GBT, TTC-PON and 10 Gbps Ethernet. The improvement in the signal integrity isgauged by two metrics, the Bit Error Rate (BER) and the Eye Diagram. It is observed that the technique improves thesignal integrity and reduces BER. The test results and the improvements in the metrics of signal integrity for di ﬀ erentlink speeds are presented and discussed. Keywords:

HEP, DAQ, Transceiver, FPGA, Signal Integrity

1. Introduction

The major goals of HEP experiments are to probethe fundamental constituents of the matter and under-stand the nature of fundamental forces. Advanced re-search in HEP demands a progressive increase in colli-sion energies and beam luminosities of the particle ac-celerators, which are essential for accessing rare probeswith extremely low cross sections [1]. The experimentsare continuously upgraded with sophisticated detectors,electronics and DAQ systems [2, 3]. The DAQ architec-tures have been evolving continuously to cope up withthe demands of the experiments [4, 5]. The LHC atCERN will go through a major upgrade during the longshutdown (LS2) period, following which the beam lu-minosities will increase by about an order of magnitudefrom their present values. At the same time, the exper-iments at the LHC are upgrading the detector and DAQsystems to allow for faster readout of the online data. ∗ Corresponding author

Email address: [email protected] (ShuaibAhmad Khan)

The DAQ architecture in HEP experiments consistsof the three general steps: (i) the data from the onlinedetectors are transferred to the FEE through the detectorbackplane, (ii) the data from the FEE are transferred tothe RU [6, 7, 8], and (iii) the processed data are furthertransferred to data storage. These steps require high-speed data communication links from one step to theother. Most of the DAQ systems are designed usingthe present available technology in such a way that itcould be easily upgraded to match the requirements ofthe system. Since one of the major concerns is to e ﬃ -ciently acquire data for all the collisions, error resilientand e ﬃ cient data transmission with minimal signal at-tenuation is required. Signal integrity is essential for theproper Clock and Data Recovery (CDR) [6, 9]. Thus itis a challenge to minimize the bit error ratio (BER) andimprove signal integrity for increased data rates [10].In this manuscript we address the challenges of high-frequency losses arising due to the high data rates forthe DAQ systems in HEP experiments. Using FPGAwe present a heuristic optimization technique to tunethe parameters of multi-gigabit transceivers for achiev-ing the best performance at high-speeds for the trans- Preprint submitted to Elsevier January 10, 2019 a r X i v : . [ phy s i c s . i n s - d e t ] J a n ission of data, trigger, timing and slow control infor-mation. The proposed technique helps to improve thesystem performance in terms of signal integrity and isimplemented on a state-of-the-art 20nm Intel Arria-10FPGA [11]. It uses the Intel-Altera on-die Instrumen-tation tools [12] and does not require the probing ofFPGA pins or transceiver attributes. The full setup istested for the link rate of the high-speed communica-tion protocols frequently used for data transmission inthese experiments. The technique is useful for on-ﬁeldsystem-level debugging, and the parameters can be re-conﬁgured dynamically, allowing the user to conﬁgurethe transceivers for optimum performance. The robust-ness of the optimization technique has been tested withPseudo Random Binary Sequence31 (PRBS31) pattern,which represents the stressed and transitional data con-ditions. For the statistical reliability of the performedtests, a large number of data vectors are acquired. Dif-ferent performance indicators, such as, BER and eye di-agrams have been used to verify the improvement of thequality of data signal posterior to the execution of pro-posed optimization technique.The manuscript is organized as follows. In section 2,we present the data aggregation and processing in HEPexperiments. The important constituents of the high-speed DAQ system are discussed in section 3. Details ofthe transceiver optimization technique with its intricatefeatures are presented in section 4. Section 5 describesthe FPGA based test setup, and section 6 discusses themethodology to implement the proposed technique andits advantages. The test results are presented and dis-cussed in section 7. The manuscript is summarised insection 8.

2. Data aggregation and processing

A generalised architecture for the DAQ scheme of theHEP experiments is presented in Figure 1. The FEEboards are connected to the detectors and are located inthe radiation zone with proximity to the detector, requir-ing custom-built radiation hard electronics. The FEEboards process the analog detector signals and convertthose to digital signals. Design and speciﬁcations ofthese boards are unique to the individual detector sys-tem [13]. The particle detectors operate in the harsh ra-diation zones and in some cases, in high magnetic ﬁelds.The main data storage units, on the other hand, are keptin low radiation zones. The RUs, which are interme-diary between FEE and storage, can be placed eitherin the radiation zone of the experiment’s cavern or ina low radiation zone near the data storage units. In anideal case, the placing of the RUs near the detectors in the cavern minimizes the transmission latencies. But itrequires custom-built radiation hard electronics, whichare di ﬃ cult to obtain. In order to minimize the e ﬀ ect ofradiation, the RUs as well as the trigger system and theback-end computing nodes, are kept out of the radiationzone. This helps to get the advantage of the high pro-cessing power available electronics with a large ecosys-tem, ease of accessibility and maintenance. Computing Node (Server/PCs)RUFEE

Trigger System Data Links DAQ LinksTrigger Links T FEE : Front End Electronics T: MultiGigabit Transceivers RU: Readout UnitRadiation Zone Outside the Radiation Zone Detector

TT TT T

Figure 1: Basic blocks of a typical data acquisition architecture forHEP experiments.

The RU acts as an interface between detector datalinks, the trigger system, and links to storage as wellas computing nodes as shown in Figure 1. The tasksperformed by the FPGA based RUs depend on the de-tector speciﬁcations and requirements. Main tasks aredata sorting, optical link handling, multiplexing and for-warding of data from di ﬀ erent interfacing links, embed-ding control and trigger information, etc. [14]. Theseversatile functionalities require RU to be designed oncustom electronics boards with re-programmable func-tionality [15]. It is based on up-to-date FPGA technol-ogy with embedded on-chip transceivers. For our testswe have used the Intel Arria-10 GX FPGA based devel-opment board [11, 16]. The interfacing links of RU andthe high-speed communication protocols used for theLHC experiments in the context of the present frame-work are discussed in the following sections.

3. High-speed protocols

The DAQ architecture in Fig. 1 features three di ﬀ er-ent interfacing links: (i) the Data link, which connectsthe detector FEE to RU, (ii) the Trigger link, which con-nects the RU to the trigger system of the experiment,and (iii) the DAQ link, which takes the data from theRU to the storage and computing nodes. For the datalink, the Gigabit Transceiver (GBT) protocol architec-ture [17], developed at CERN, has been found to bemost ideal. The GBT protocol supports 4.8 Gb / sec datatransmission rate. It ensures the transmission of data2rom the FEE near the detectors in high radiation zoneto the RU, which is located near the counting room in alow or no radiation zone. The Trigger link uses the Tim-ing, Trigger and Control system based on Passive Opti-cal Networks (TTC-PON) technology [18]; operates atthe rate of 9.6 Gigabit per second. It ensures ﬁxed, de-terministic latency and satisﬁes the timing speciﬁcationof the LHC.The data packets get time-stamped in the RU. Thusthe links from the RU to the computing nodes is not la-tency critical. It has been found that the latest promisingtechnology option of 10-Gigabit Ethernet [5] with am-ple ecosystem are most suitable for the DAQ links inthe experiments. In Table 1, we give the detailed spec-iﬁcations of the three interface links used in the HEPexperiments for the acquisition of data.

4. Transceiver optimization

High-speed data communication su ﬀ ers from thetransmission losses and signal integrity issues; not seenat normal digital signalling levels [10]. The high-frequency content of the signal gets degraded due todielectric losses, skin e ﬀ ect, discontinuities in connec-tors, reﬂections caused by the vias, inadequately placedtraces, etc. We have developed a technique to optimizethe transceiver parameters accurately and o ﬀ er the bestcombination for a given high-speed link. This optimiza-tion of the transceiver parameters could take care of thetransmission losses [19].For the high-speed transmission channels with multi-gigabit rates, the unit interval (UI) for the data bit de-creases. At high transmission rates, the PCB materialssu ﬀ er from frequency dependent losses, hence becomedispersive. This prevents the signal from reaching itsfull strength at the shrunk UI window, leading to jitterand intersymbol interference (ISI). It also disturbs thedeciphering of the signal and the extraction of the em-bedded clock becomes di ﬃ cult at the receiver end.An increase of the signal strength is an obvious so-lution to overcome the attenuation. However, the issueof high-frequency roll-o ﬀ remains, and the pattern de-pendent jitter gets aggravated. Consequently, the signaldoes not reach its optimal strength within the intervaland may di ﬀ use further into the next UI leading to ISI.Also for the increase of signal strength overall powerconsumption of the transceiver increases. Noise levelsin the system also increase proportionally. All theselead to deteriorated metrics of signal integrity and re-duced drive length. The e ﬀ ects are even more evidentwith the use of high-speed interfaces with the systems which were originally designed for low bandwidth ap-plications.To overcome these losses, we have developed thetransceiver optimization technique and a proﬁcientmethodology for 20nm Arria-10 FPGA. This newFPGA with considerably large on-chip resources [11]are ideal for the processing requiremnts in the experi-ments. For the optimization, the high-frequency componentsin the data stream are boosted up on every switch-ing, using the digital pre-emphasis taps of the on-chiptransceiver. In addition, the low frequency componentsare reduced. This technique helps to achieve the sameamount of emphasis with less power dissipation. Theexaggerations are overridden by the attenuation duringtransmission and allow for the signal to be recoveredaccurately. -2Z-1Z+1Z+2Z +/-+/-+/-+/-

VOD1st Pre-tap2nd Pre-tap1st Post-tap2nd Post-tap

Z: Operator for Z-transform

Figure 2: Voltage output di ﬀ erential (VOD) and tunable pre-emphasistaps with ﬂexible polarity in the embedded transceiver of FPGA. The optimization technique has been implemented onIntel Arria-10 FPGA development board with integratedreconﬁgurable transceiver architecture [11]. It incorpo-rates additional circuitry in bu ﬀ ers for equalisation andpre-emphasis techniques. The transmitter of the embed-ded transceiver has ﬁve programmable drivers as shownin Figure 2. Voltage output di ﬀ erential (V OD ) controlsthe base amplitude. The four pre-emphasis taps are1st pre-tap, 2nd pre-tap, 1st post-tap and 2nd post-tap.These taps also include polarity settings. The post tapsare the causal taps and the pre-taps are the anti-causaltaps. These multiple taps and choice of polarity couldhandle channel attenuating characteristics. Equalisationwith DC gain and Variable Gain Ampliﬁer (VGA) is onthe receiver side of the transceiver. There are multipletransceiver parameters with a large span of operatingrange and so to scan the system performance for ev-ery combination of the parameters is a time-consumingprocess. Our goal had been to develop an e ﬃ cient tech-nique for optimization of transceiver parameters such3 able 1: Speciﬁcations of three high speed interface links, GBT [17], TTC-PON [18] and 10-Gb Ethernet. Parameters GBT TTC-PON 10Gb Ethernet

Technology Speciﬁcation

Custom XGPON1 withmodiﬁcations 802.3ae SpeciﬁcationStandard

Designer Group

CERN ITU-T withCERN modiﬁcations IEEE

Line Rate

Payload Rate

Payload Size

120 bits@40 MHz Downstream:192 bits@40 MHzUpstream:16 bits@40 MHz 64 [email protected] MHz

Wavelength (nm)

850 nm(Multi-mode)1310 nm(Single-mode) Downstream: 1577 nmUpstream: 1270 nm 850 nm(10 Gb BASE-SR)

Network Topology

Point-to-Point Point-to-Multipoint Point-to-Point

Encoding

RS ECC with BlockInterleaver 8b /

10b 64b / Synchronous TriggerSupport

Yes Yes No

Trigger Latency

150 ns(Optical loop-back) 100 ns Downstream1.6 us Upstream NA that the signals impacted by the high-frequency lossesare recovered.It works like a

Finite Impulse Response (FIR) ﬁlterwith di ﬀ erent delays referred to as the taps as shown inthe Figure 2. An FIR ﬁlter is based on a feed-forwarddi ﬀ erence equation. The pre-emphasis technique ap-plies a delay to the signal and adds it back to the realsignal with weight and inversion as and when required.Although depending on the transmission channel pecu-liarity, a simple delay, weight and inversion may notbe able to provide the required compensation. For thisreason, a combination of di ﬀ erent delays, weights andthe polarity are combined. In this conﬁguration, thepre-emphasis 1st post-tap is the most useful parameter.It emphasises the immediate bit period after the tran-sition. The generation of the di ﬀ erential emphasisedsignal, applying the unit delay by the ﬁrst post-tap isshown in Figure 3, assuming V OD = < x <

1. The original positive signal Vp(T) iscompared with Vp(T-1) which is the unit-delayed sig-nal. The emphasised signal is the di ﬀ erence betweenthe weighted x*Vp(T-1) signal and the Vp(T) signal.The negative signal is similarly generated. The pre-emphasised di ﬀ erential signal is di ﬀ erentiated from the positive and negative signals. The e ﬀ ect of 2nd post-tap after the transition, depending on the chosen polaritysetting is shown in Figure 4.The pre-tap reduces the e ﬀ ect of pre-cursor ISI. Fig-ure 5 shows the impact of 1st pre-tap and the 2nd pre-tapon the single and double bit period respectively, beforethe occurrence of high-frequency transition dependingon the polarity. Both pre-cursor ISI and post-cursorISI are handled by anti-causal and causal taps respec-tively. However, pre-emphasis alone cannot guaranteethe performance of the system as it is implemented atthe transmitter by pre-conditioning the signal before itis fed to the channel. There are high-frequency lossesin the transmission channel itself. Hence an equalisa-tion is required at the receiver end. It compensates forthe low pass characteristics of the physical medium andampliﬁes the attenuated high-frequency components ofthe incoming signal. An equalizer on the receiver sidelifts the contents inside a band of frequencies and at-tenuates the rest. The DC gain circuitry gives uniformampliﬁcation to the received spectrum. It enables thetransceivers to operate over longer distances. The VGAon the receiver optimizes the signal amplitude beforethe CDR sampling.4 p(T)x*Vp(T-1)Vp(T) - x*Vp(T-1) Vn(T)x*Vn(T-1)Vp(T-1)Vn(T)Vp(T) - x*Vp(T-1)Vn(T) x*Vn(T-1)- -+ x - 1 x+ x- x- Original positive Signal Unit delayedWeighted TapPre-emphasized positiveOriginal negative Pre-emphasized negativePre-emphasized differential

Figure 3: The pre-emphasis signal generation technique at the 1stpost-tap in embedded FPGA transceivers, 0 < x < Vp(T) - x*Vp(T-2) - Vn(T) + x*Vn(T-2)

Signal with Pre-emphasisSignal with Pre-emphasis

Vp(T) - x*Vp(T-1) - Vn(T) + x*Vn(T-1)

Signal withoutPre-emphasisSignal withoutPre-emphasis

Figure 4: Pre-emphasis 2nd post-tap (Inverted) compared with pre-emphasis 1st post-tap and their e ﬀ ect on the signal without pre-emphasis. To achieve an optimal signal integrity perfor-mance, both transmitter and receiver parameters of thetransceiver on FPGA chip augments each other andwork combined to compensate for the high-frequencylosses. However, the overcompensation degrades thesignal quality and adds more jitter leading to the closedeye diagram rendering it futile for the receiver to iden-tify the signal and hence should be avoided.

Signal with Pre-emphasisSignal with Pre-emphasis

Vp(T) + x*Vp(T+1) - Vn(T) - x*Vn(T+1)

Signal withoutPre-emphasisSignal withoutPre-emphasis

Vp(T) + x*Vp(T+2) - Vn(T) - x*Vn(T+2) - Figure 5: Pre-emphasis 1st pre-tap and the 2nd pre-tap (Inverted) andtheir e ﬀ ect on the signal without pre-emphasis.

5. Test setup

An FPGA based setup has been developed to test thepotency of the proposed optimization technique. Thetransceiver is tested for the high-speed links under thestressed conditions. The setup has been utilised to em-ulate the stressed high-speed link conditions and to in-vestigate the high frequency losses in the transmission.It determines the capability of the transceiver system torecover the data from the degraded signals. Tests areperformed at the system level to operate the setup at aprescribed BER equal to or better than 10 − as per theIEEE standard.The test setup, shown in Fig. 6, engrosses the Arria-10 FPGA development board (10AX115S2F45I1SG de-vice) for the implementation and testing of the optimiza-tion technique. The FPGA development card is installedon the PCIe 16 lane slot of the server, where the poweris obtained from the server motherboard. The func-tions and speciﬁcations of each of the components ofthe setup are given in Table 2.Intel Quartus-II platform is the ﬁrmware applicationpackage, implemented on the FPGA logic design. Thetransmission links at the speciﬁed data rates are imple-mented using Quartus-II Qsys tool. Qsys is Intel (cid:48) s sys-tem integration tool for the quick generation of the in-terconnect logic. The signal integrity of the transceiverlinks is validated using Transceiver Toolkit (TTK) fea-ture of Quartus-II with a GUI. The TTK is used toquickly access, tune and test the transceiver parametersettings in runtime through a combination of metrics.The TTK enables us to measure BER and the eye di-agrams and also verify the signal integrity in externalloopback mode. Details of the ﬁrmware-tools, such as,Quartus II, Qsys, TTK, PRBS patterns and auto-sweep5 erverMotherboardSlot for PCIe x 16 Gen 3on mother boardPCIe connector on FPGA board Optical LoopbackVariable Optical Attenuator

Arria 10FPGA FPGA Board

SFP+

User loopback logic on Silicon

ExternallyPluggable Module

TxRx * Optical power meter for optical power measurement * Lucent to Ferrule (LC to FC) connector to couple optical(InGaAs detector, range (-70 dBm), resolution 0.01 dBm) fibre to the power meter (50/125um hybrid connector)

Figure 6: Arria-10 FPGA card inserted in PCIe x16 slot of server. The optical signal from the externally pluggable SFP + is looped back via theﬁbre equipped with the variable optical attenuator (VOA).Table 2: Components used in the test setup, their role and speciﬁcations. Component Role in test setup Speciﬁcation

FPGA Test Board Integrated FPGA based design environmentwith embedded transceivers on silicon. PCIeconnection. Slot for hot pluggable transceiveroptical modules. Other accessories Intel Arria10 FPGA, (20nm mid-range).Transceivers upto 17.4 Gbps [11].Variable OpticalAttenuator (VOA) withoptical Fiber Optical power attenuationin the ﬁbre loopback path. Range(dB)-0 ∼

60, Accuracy +/ - 0.8dB.Fibre(850nm): Multimode 50 / < + )module. External transceiver modules to becoupled to the ﬁbre. Laser at transmitterand PIN diodes at the receiver ends Hot-pluggable footprint, upto 10Gbps,850nm VCSEL laser, duplex LC connector.Link length of 300m [20].Workstation with FPGAdesign platforms FPGA board powered through PCIeGen3x16 slot. Compile andgenerate the FPGA design withﬁrmware development softwares PCIe Gen3 x16 slots available. Quartus-IIplatform installed for ﬁrmware designand generation. FPGA programmed throughUSB blaster download cable. Data Generator(PRBS pattern)

Transmitter Receiver

Data Receiver(PRBS pattern check) core_clk_outFeedback clock

Connection layout in Qsys (Platform designer)

FPGA (Silicon)

Optical link core_clk_in

Figure 7: Typical BER test loopback logic on FPGA using Qsys tool.The serialised data is transmitted, looped back and checked for theﬂipped bits at the receiver. features may be found in reference [12].For the data loopback tests [21], multimode optical ﬁbre equipped with Variable Optical Attenuator (VOA)and external pluggable SFP + modules are used. Thefar end of the transceiver is coiled back to the receiv-ing end. The received data is then veriﬁed by the datachecker logic on FPGA for any erroneous bits as shownin Figure 7. To test the signal integrity a variety ofdata patterns can be used. However, in each case, achecker must be available for veriﬁcation. PRBS pat-terns are injected into the test system as it generates thestressed and lengthy patterns with fewer memory con-sumption [22]. Another advantage of using PRBS pat-terns for the tests is that the boundary synchronisationis not necessary at the physical layer as the patterns aretime correlated. The Intel soft logic cores are used forPRBS data pattern generator and checker [12].The BER measurement approach was chosen with re-6pect to the controlled attenuated optical power at thereceiver with the help of VOA. It allowed us to rapidlycharacterise the transceiver sensitivity below which theembedded clock cannot be recovered from the datastream, and loss of lock occurs [19]. It also deter-mines the minimum required optical power to achievethe targeted BER for a system operating at a speciﬁeddata rate. Auto sweep feature of TTK is used to ob-tain the optimum settings of the best performing param-eters of the transceiver for a speciﬁed BER. This op-timized set of transceiver parameters delivers the bestmetrics of signal integrity and the eye diagrams by itsheight and width. In the next section, we elaborate themethodology for the optimization of high data rate on-chip transceivers to reduce the e ﬀ ect of high-frequencylosses.

6. Methodology

The methodology to extract the optimized settingsof the transceiver parameters has been explained in theﬂowchart in Figure 8. To start with, the optimizationprocess scans the full range of each transceiver param-eter using the TTK auto-sweep feature while the rest ofthe parameters are set at their Intel-default values. Thenit records the best performing tap setting values for eachtransceiver parameter as indicated by eye parameters.At this instance, a Solution Matrix ( S ) at Nth iteration,set N = OD , Pre-emphasis (1stpre-tap, 2nd pre-tap, 1st post-tap, 2nd post-tap) andthe receiver parameters (DC gain, Equalisation control,VGA). Then we scan again the transmission and receiveparameters separately in the range of -3 ≤ S ≤

3, whilereceive and transmit parameters respectively are set atthe values enlisted in the S . Record again the best per-forming cases and update the S with newer values, in-crement N by 1. Assign the latest matrix values to theTTK and run the loopback test. If this does not resultin the improved metrics of signal integrity (Eye dia-gram and the BER) than the one obtained at the Inteldefault set values; repeat the optimization loop with theadjusted S values in the range deﬁned until the improve-ment in both eye diagram and BER is achieved.The parameters cannot be declared as optimized un-til a stage of degradation in the signal integrity metricsfrom their peak values is observed. The degradation ofmetrics denotes the over-compensation and it marks thetransition from the maxima of the transceiver parame-ters. Assign and update the S with the best performingcase metric values rejecting the over-compensated valueset. The ﬁnal S values with the best performing metrics is known as Solution Space [19]. The deduced ﬁnal val-ues are fed to the transceiver for further analysis. Theresults are presented and discussed in the next section.The proposed technique has deﬁnite advantages overtraditional method where the transceiver optimizationmay be carried out in an extremely time-consuming wayby evaluating the signal integrity through a large num-ber of permutations and combinations of the parame-ters. The parameters and their possible ranges are listedin the Table 3.

Table 3: Transceiver parameters, range of operations for the manualoptimization.

Transceiverparameter Range ofpossiblevalues Number ofiterationsrequired

Transmitter Side

VOD

Pre-emphasis 1st post-tap -31 to 31 63

Pre-emphasis 1st pre-tap -31 to 31 63

Pre-emphasis 2nd post-tap -15 to 15 31

Pre-emphasis 2nd pre-tap - 7 to 7 15

Receiver Side

DC gain

Equalisation

VGA

7. Results and discussion

Results are demonstrated and validated for the threedi ﬀ erent high speed optical links: 10 Gbps links, 4.8Gbps GBT protocol and 9.6 Gbps TTC-PON. The testsystem confronts the lock and hold capability of theCDR circuit, perturbs all the conceivable instances ofISI and analyses the receiver sensitivity for any prob-able drifts. Drifts at the receiver are caused due tolong imbalanced runs of the data transition pattern. ThePRBS31, 2 − ﬀ erent combinations induce non-similar ISI conﬁgurations. It is required to stress thetransceivers, test any innate ISI in a transmitter, and toassess the quality of transmission. PRBS patterns depicta white spectrum in the frequency domain and are in-jected to tests the robustness of the high-speed links. Forthe entire analysis, PRBS31 is used to stress the system.However, the variation of eye diagram and BER charac-teristics are also studied for PRBS7, PRBS9, PRBS15,PRBS23 in addition to PRBS31.7 oad the developed TTK design for the specified data rate on the FPGASelect the desired PRBS valueSelect the loopback mode as ExternalStart the data transmission(TTK parameters at Intel-Altera default)Record Eye Width/Height and BER at Nth iterationSet variable N =1Scan each individual parameter of transceiver for full range using Auto-sweep feature of TTK (Rest of the pararmeter values at the Intel-Altera default Attenuate the received optical signal in steps using VOAIs Receiver CDR locked Record BER vs dBm at each attenuated stepSignal attenuated beyond the Receiver CDR limit Record the best performing value of each transceiverparameter regard to Eye diagram metricsDevelop S.Matrix(Nth) with the best performing valuesGroup the transmitter parameters and the receiver parametersScan the Transmitter parameters in range (-3 <= S.Matrix(Nth) <= 3)while receiver parameters at the S. Matrix(Nth) Scan the receiver parameters in range (-3 <= S. Matrix(Nth) <= 3)while transmitter parameters at the S. Matrix(nth) Record the best performing value of each transceiverparameter regard to Eye diagram metrics Set variable N = N+1 Assign the S.Matrix (Nth) values to the TTK and run the data loopback transmission Record Eye Width/Height and BERat Nth iterationIs BER (Nth) Eye W/H (Nth-1)Is BER (Nth+1)>BER (Nth)orEye W/H (Nth+1) < Eye W/H (Nth) NoNo YesS.Matrix(Nth) is the final Solution space and the optimized set (reject the over-compensated S.Matrix)Assign the Solution space to the TTKRun the transmission in external loopback modeAttenuate the received optical signal in steps using VOA Record BER vs dBm at each attenuated stepSignal attenuated beyond the Receiver CDR limitPLOT BER vs the power in dBm(at Solution space) Plot the Intel default Settings and the solution space on the multivariate spider chartYesIs Receiver CDR locked NONo Yes Update the S.Matrix (Nth) with the latest valuesof the best performing transceiver parametresStartPLOT BER vs the power in dBm (at Intel-default) Reduce the opticalpower attenuation to zero using VOA YesOver compensation has occuredStop Tune the values of S.Matrix Solution Matrix (S.Matrix)

Figure 8: Stepwise ﬂow diagram for the Transceiver Optimization. Data transmission is started with the Intel default parameters and a Solutionmatrix is derived to achieve the optimized signal integrity .1. Eye Diagram analysis At the system startup, the transceiver parameters inTTK are set at the default values. Changes in eye dia-gram are compared for di ﬀ erent PRBS stressed patternsas the ﬁrst set of analysis. Eye Height and Width is plot-ted on a three axes plot with PRBS pattern on the thirdaxes as shown in Figure 9. It is found that PRBS31 hasthe most stressed eye metrics and as anticipated a moreclosed eye is examined for all the three links speed. PRBS

Eye Width

Eye Height

PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31

10 Gbps

PRBS

Eye Width

Eye Height

PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31

GBT 4.8 Gbps

PRBS

Eye Width

Eye Height

PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31

TTC-PON 9.6 Gbps

Figure 9: Changes in the Eye height and Eye width with PRBS varia-tion for optical links at three line rates.

Another important metric of signal integrity is BER.Its measurement is a statistical phenomenon and the es- timate is ideal only if the number of tested bits tendsto inﬁnity, which is not possible in a real lab test setup.Hence, a method was proposed in reference [23] to limitthe stressing time of a system to a feasible length and tomeasure the BER with high conﬁdence level (CL) too.CL is used to quantify the quality of the estimate in per-centage. It is the systems actual probability of error lessthan the speciﬁed limit. The minimum number of bitsrequired to be tested for the BER measurement with aspeciﬁc associated CL is given in equation 1: n = − ln(1 − CL ) BER + ln (cid:32)(cid:80) Nk = ( n ∗ BER ) k k ! (cid:33) BERT = n / R  (1) T is test time needed, R is the line rate and when N = n = − ln(1 − CL ) BER (2)Where n are the total number of bits transmitted and N are the number of errors that occurred during the trans-mission. There is a compromise between testing timeand the required accuracy of the measurement as shownin equation 1.For the 95 percent CL, equation 2 reduces to n (cid:39) / ( BER ). Hence to achieve the BER of 10 − at 95 per-cent CL, total 3x10 bits need to be tested, as a thumbrule. The concept is further extended to ﬁnd the minimuminspection time required to measure BER of 10 − fordi ﬀ erent CL with no errors for GBT, TTC-PON and10 Gbps links as shown in Figure 10. In this paper,all the BER measurements are done for 3x10 bits toachieve 95 percent CL. Variation of BER at Intel-default Line Rate (Gbps) T e s t T i m e needed ( s e cs ) TTC−PONupstreamrate = 2.4Gbps GBT linerate = 4.8Gbps TTC−PONdownstreamrate = 9.6Gbps 10 G linerate= 10.3125GbpsCL = 0.90CL = 0.95CL = 0.99

Figure 10: Time to achieve BER of 10 − for the Line rate of GBT,TTC-PON and 10 Gbps optical links having di ﬀ erent CL. transceiver set is recorded with respect to the attenuation9

15 −14 −13 −12 −11 −10 −9−14−12−10−8−6−4

Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS 7 (R = 0.99)PRBS 9 (R = 0.98)PRBS15 (R = 0.98)PRBS23 (R = 0.99)PRBS31 (R = 0.99)

10 Gbps −16 −15 −14 −13 −12 −11−16−14−12−10−8−6−4 Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS7 (R = 0.98)PRBS9 (R = 0.90)PRBS15 (R = 0.98)PRBS23 (R = 0.99)PRBS31 (R = 0.96) GBT 4.8 Gbps −15 −13 −11 −9 −7 −5−16−14−12−10−8−6−4−2 Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS7 (R = 0.99)PRBS9 (R = 0.99)PRBS15 (R = 0.99)PRBS23 (R = 0.98)PRBS31 (R = 0.96) TTC-PON 9.6 Gbps

Figure 11: BER versus received optical power(dBm) for transceiverat Intel FPGA default settings for di ﬀ erent PRBS operating in threeline rates. of the received optical power; following the methodol-ogy ﬂowchart shown in Figure 8. This test is executedwith the help of VOA attached to the loopback ﬁbre.BER variation is recorded for di ﬀ erent PRBS patternsand plotted for the links operating at 10 Gbps, 4.8 Gbpsand 9.6 Gbps rates as shown in Figure 11.The exponential curve ﬁtting is the best-suited ap-proximation for the BER in logarithmic domain [24].Double exponent ﬁt function with constants is used toﬁt the BER data as it provides close ﬁts in a variety ofBER plot situations. It ﬁts the BER data using uncon-strained nonlinear optimization [25]. The statistics forgoodness-of-ﬁt in terms of R-Square ( R ) for di ﬀ erentPRBS is marked in the Figure 11.The test shown in Figure 11 highlights that at a spec-iﬁed CL higher number of errors are received in the transmission system for a given received optical power;when PRBS31 is injected as the test data pattern ascompared to the other PRBS patterns. The outcome ofthe tests shown in Figure 9 and Figure 11 revealed thedegradation of the metrics of signal integrity with the in-crease in the size of a unique word of data in the PRBSsequence. The results from these tests are as anticipatedand well substantiated. It has further strengthened theusefulness of the PRBS31 as a strenuous test pattern todemonstrate the validation of the proposed methodol-ogy. However, there is a crossover point for 4.8 Gbpsat BER ∼ − . It is kept beyond the discussion as ourregion of interest is better by two orders of magnitudewhich is BER ∼ − . The improvement in the system performance ismarked by two metrics of signal integrity viz. BERand Eye Diagram. The eye contour for the Intel-defaultsettings and at the deduced optimized settings of thetransceiver is captured using the EyeQ (a GUI feature ofTTK). It helps to estimate and visualize the vertical andhorizontal eye opening at the receiver as shown in Fig-ure 12. After the application of the deduced transceiverparameters settings using the proposed technique, thereis a notable enhancement in width (Horizontal PhaseStep) and height (Vertical Step) of the eye diagram.Hence the quality of signal transmission is improved.The optimized values of the transceiver parametersknown as solution space, found from the proposedmethodology for the targeted BER of 10 − are plottedagainst the Intel-default set in the form of a multivariatekiviat diagram for all the three link speeds as given inFigure 13. It allows us to demonstrate a clear compari-son of the individual parameters on each axis.Variation in BER is plotted for the deduced solutionspace values of a transceiver and for the Intel defaultset; concerning the di ﬀ erent attenuation levels of inputoptical power at the receiver. It is shown for PRBS31for all the three links under observation in Figure 14.Further analysing the results from Figure 14, the leastoptical power required at the receiver to attain a pre-ferred BER or better could be determined from thecurve. Also it shows, that a speciﬁc marked BER isachieved at a lower optical power when transceiver isoperated at the deduced parameter values listed in solu-tion space in comparison to the Intel default set. Here tomention the particular case as an example, the targetedBER of 10 − for the optical link test as per IEEE stan-dards is achieved at lower values of the optical power10ertical step(19) / Horizontal Phase step(41) for 10Gbps at the Intel FPGA default settingsVertical step(49) / Horizontal Phase step(54) for 10Gbps at the Optimized FPGA settingsVertical step(28) / Horizontal Phase step(59) for 4.8Gbps at the Intel FPGA default settingsVertical step(63) / Horizontal Phase step(63) for 4.8Gbps at the Optimized FPGA settingsVertical step(18) / Horizontal Phase step(43) for 9.6Gbps at the Intel FPGA default settingsVertical step(41) / Horizontal Phase step(50) for 9.6Gbps at the Optimized FPGA settings

Figure 12: Eye diagram at the Intel FPGA default and at the Opti-mized settings of transceiver. and the improvement at the mentioned BER is quantita-tively listed in Table 4 for the three link speeds.

Table 4: Comparison of Optical power(dBm) to attain BER of 10 − for the three high speed interface links. Protocol With defaultapproach(dBm) With optimizationtechnique(dBm) Di ﬀ erence(dBm) Improvement(Percentage)10Gb Ethernet -9.2 -10.35 -1.15 12.5GBT -11.9 -12.7 -0.8 6.7TTC-PON -6.45 -9.3 -2.85 44.1 Another clear observation emerged from the datacomparison of Figure 14 is that the receiver sensitiv- −5.25 14.5 34.25 54

VOD Control −5.25 14.5 34.25 54

Pre−emphasis 1st Post−Tap −5.2514.534.2554

Pre−emphasis 1st Pre−Tap −5.2514.534.2554

Pre−emphasis 2nd Post−Tap −5.2514.534.2554

Pre−emphasis 2nd Pre−Tap −5.2514.534.2554

DC Gain −5.2514.534.2554

EqualizationControl −5.2514.534.2554

VGA −5.2514.534.2554

Eye Height −5.25 14.5 34.25 54

Eye Width Solution Space Intel−Default

10 Gbps line rate −3 19 41 63

VOD Control −3 19 41 63

Pre−emphasis 1st Post−Tap −3194163

Pre−emphasis 1st Pre−Tap −3194163

Pre−emphasis 2nd Post−Tap −3194163

Pre−emphasis 2nd Pre−Tap −3194163

DC Gain −3194163

Equalization Control −3194163

VGA −3194163

Eye Height −3 19 41 63

Eye Width Solution Space Intel−Default

GBT 4.8 Gbps line rate −6.25 12.5 31.25 50

VOD Control −6.25 12.5 31.25 50

Pre−emphasis 1st Post−Tap −6.2512.531.2550

Pre−emphasis1st Pre−Tap −6.2512.531.2550

Pre−emphasis 2nd Post−Tap −6.2512.531.2550

Pre−emphasis 2nd Pre−Tap −6.2512.531.2550

DC Gain −6.2512.531.2550

Equalization Control −6.2512.531.2550

VGA −6.2512.531.2550

Eye Height −6.25 12.5 31.25 50

Eye Width Solution Space Intel−Default

TTC-PON 9.6 Gbps line rate

Figure 13: Multivariate kiviat diagram showing the solution space andthe Intel FPGA default values for three di ﬀ erent link rates. Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default Settings (R = 0.99) Optimized settings (R = 0.99)

10 Gbps line rate −16.5 −16 −15.5 −15 −14.5 −14 −13.5 −13 −12.5 −12 −11.5 −11−15−13−11−9−7−5

Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default settings (R = 0.99) Optimized settings (R = 0.98) GBT 4.8 Gbps line rate −13.5 −12.5 −11.5 −10.5 −9.5 −8.5 −7.5 −6.5 −5.5−16−14−12−10−8−6−4−2

Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default settings (R = 0.96) Optimized settings (R = 0.99) TTC-PON 9.6 Gbps line rate

Figure 14: Comparison of BER versus the received optical powerfor default and optimized transceiver settings separately for three linerates. ity below which the loss of lock occurs, is enhanceddue to the reduction in the high-frequency losses withthe application of the proposed optimization technique.This results in reducing the limit of the optical powerrequired for the proper CDR and the signal is traceablefor comparatively lower values of the received opticalpower. The quantitative comparisons are given in Ta-ble 5.

Table 5: Comparison of optical power for CDR for the three highspeed interface links.

Protocol With defaultparameters(dBm) With optimizationtechnique(dBm) Di ﬀ erence(dBm) Improvement(Percentage)10Gb Ethernet -14.4 -15 -0.6 4.17GBT -15.34 -16.04 -0.7 4.56TTC-PON -11.78 -13.2 -1.42 12.05 The test results shown in Figure 13 and 14 conﬁrmsthat the e ﬀ ect of high-frequency losses on the link per-formance is controlled. It is achieved after the applica-tion of the deduced solution space values to the TTKand a signiﬁcant improvement on the BER is noted ata particular received optical power. The tests and re-sults validate the usefulness of the proposed techniqueto enhance the transceiver performance and the signalintegrity by compensating for the high-frequency losses.

8. Summary

We have presented a novel transceiver optimizationtechnique to reduce the high-frequency losses whichoccur due to the increased rates of data transmissionin case of HEP experiments. The technique has beenimplemented on the latest 20nm Intel-Altera Arria-10FPGA. The scheme has been tested and validated for thelink rates of three high-speed communication protocols,GBT, TTC-PON and 10 Gbps Ethernet, which are mostcommonly used for interfacing the detector front-endelectronics, trigger and DAQ systems. The proposedscheme is an optimized approach which reduces num-ber of iterations required.The tests are performed with PRBS31 pattern at aconﬁdence level of 95 percent. There is considerablegain in the system performance with the application ofthe proposed technique as speciﬁed by the two parame-ters of signal integrity, the BER and the Eye Diagram.The Intel FPGA set parameters and the solution spacevalues are marked on the kiviat diagram for the fastcomparison between the parameters. The results pointthat to attain the marked BER of 10 − ; the required op-tical power is reduced by 12.5%, 6.7% and 44.1% for10Gbps, GBT and TTC-PON respectively. The BER isalso improved over the received range of optical power.The CDR capability of the system is also enhanced asthe least optical power required to recover the data traf-ﬁc is reduced by 4.17%, 4.56% and 12.05% for 10Gbps,GBT and TTC-PON respectively. The technique im-proves the signal integrity and reduces the BER. Thistechnique is a heuristic solution and has potential forpractical applications as it provides rapid convergenceof the solution space to achieve optimized transceiversettings. It makes the implementation of the new tech-nique time e ﬃ cient. This transceiver optimization tech-nique and its implementation approach would lend itselfwell for other FPGAs users that allows on-chip assess-ment of signal quality like Eye diagram. Acknowledgement

The authors gratefully acknowledge the support ofthe ALICE Collaboration at CERN during the period12f the research work. We thank Alex Kluge, TivadarKiss, Erno David of the ALICE Electronics coordina-tion and the CRU project for their valuable help and ad-vice. We thank Subhasis Chattopadhyay, Anurag Misraand Saurabh Srivastava for fruitful suggestions duringthe preparation of the manuscript.

References [1] D. E. Morrissey, T. Plehn, T. M. Tait, Physics searches at theLHC, Physics Reports 515 (1-2) (2012) 1–113.[2] W. K. Panofsky, Evolution of particle accelerators, SLAC BeamLine 27 (1997) 36–44.[3] W. Smith, Trigger and data acquisition for hadron colliders atthe energy frontier (2013), arXiv preprint arXiv:1307.0706.[4] S. A. Khan, J. Mitra, E. David, T. Kiss, T. K. Nayak, A po-tent approach for the development of FPGA based DAQ sys-tem for HEP experiments, Journal of Instrumentation 12 (2017)T10010.[5] J. Toledo, F. Mora, H. M¨uller, Past, present and future of dataacquisition systems in high energy physics experiments, Micro-processors and Microsystems 27 (2003) 353–358.[6] J. Mitra, S. A. Khan, et al., Common readout unit (CRU)-a newreadout architecture for the ALICE experiment, Journal of In-strumentation 11 (2016) C03021.[7] Guti´errez, et al., The ALICE TPC readout control unit, in:Nuclear Science Symposium Conference Record, Vol.1, IEEE2005, Vol. 1, 2005, p. 575.[8] S. A. Khan, J. Mitra, T. K. Nayak, Development of a highspeed data acquisition system for the detectors at high luminos-ity LHC, in: Proceedings of the XXII DAE High Energy PhysicsSymposium, Springer, 2018, p. 223.[9] B. Razavi, Challenges in the design high-speed clock and datarecovery circuits, IEEE Communications magazine 40 (2002)94–101.[10] S. H. Hall, H. L. Heck, Advanced signal integrity for high-speeddigital designs, John Wiley & Sons, 2011.[11] I. Altera, Intel Arria 10 Device Overview (2018).[12] I. Altera, Quartus Prime Standard Edition Handbook Volume 1:Design and Synthesis (2017).[13] G. F. Knoll, Radiation detection and measurement, John Wiley& Sons, 2010.[14] L. Li, A. M. Wyrwicz, Parallel 2D FFT implementation onFPGA suitable for real-time MR image processing, Review ofScientiﬁc Instruments 89 (9) (2018) 093706.[15] S. G. Castillo, K. B. Ozanyan, Field-programmable data acqui-sition and processing channel for optical tomography systems,Review of scientiﬁc instruments 76 (9) (2005) 095109.[16] Intel Corporation, Arria 10 FPGA Development Kit User Guide(2017).[17] M. B. Marin, S. Baron, et al., The GBT-FPGA core: featuresand challenges, Journal of Instrumentation 10 (2015) C03021.[18] E. Mendes, S. Baron, D. Kolotouros, C. Soos, F. Vasey, The10G TTC-PON: challenges, solutions and performance, Journalof Instrumentation 12 (2017) C02041.[19] I. Altera, High-SpeedLink Tuning Using Signal ConditioningCircuitry in Stratix V Transceivers (2015).[20] S. Committee, et al., SFF-8431 Speciﬁcations for EnhancedSmall Form Factor Pluggable Module SFP + , revision 4.1, July6, 2009.[21] S. I. Green, Multichannel bit error rate tester for ﬁber op-tic transceiver testing, Review of scientiﬁc instruments 73 (8)(2002) 3125–3127. [22] H. Badaoui, Y. Frignac, P. Ramantanis, B. E. Benkelfat,M. Feham, PRQS Sequences Characteristics Analysis by Auto-correlation Function and Statistical Properties, IJCSI (2010) 39.[23] D. Miti´c, A. Lebl, ˇZ. Markov, Calculating the required numberof bits in the function of conﬁdence level and error probabilityestimation, Serbian Journal of Electrical Engineering 9 (2012)361–375.[24] L. J. Ippolito, Appendix b: Error functions and bit error rate,Satellite Communications Systems Engineering: AtmosphericE ﬀ ects, Satellite Link Design and System Performance 363–366.[25] S. Chapra, R. P. Canale, Numerical methods for engineers : withpersonal computer applications / steven c. chapra, raymond p.canale.steven c. chapra, raymond p.canale.