Optimization of multi-gigabit transceivers for high speed data communication links in HEP Experiments
Shuaib Ahmad Khan, Jubin Mitra, Tushar Kanti Das, Tapan K. Nayak
OOptimization of multi-gigabit transceivers for high speed data communicationlinks in HEP Experiments
Shuaib Ahmad Khan a, ∗ , Jubin Mitra a , Tushar Kanti Das a , Tapan K. Nayak a,b a Variable Energy Cyclotron Centre, Homi Bhabha National Institute, Kolkata, India b CERN,CH-1211 Geneva 23, Switzerland
Abstract
The scheme of the data acquisition (DAQ) architecture in High Energy Physics (HEP) experiments consist of datatransport from the front-end electronics (FEE) of the online detectors to the readout units (RU), which perform onlineprocessing of the data, and then to the data storage for o ffl ine analysis. With major upgrades of the Large HadronCollider (LHC) experiments at CERN, the data transmission rates in the DAQ systems are expected to reach a fewTB / sec within the next few years. These high rates are normally associated with the increase in the high-frequencylosses, which lead to distortion in the detected signal and degradation of signal integrity. To address this, we havedeveloped an optimization technique of the multi-gigabit transceiver (MGT) and implemented it on the state-of-the-art20nm Arria-10 FPGA manufactured by Intel Inc. The setup has been validated for three available high-speed datatransmission protocols, namely, GBT, TTC-PON and 10 Gbps Ethernet. The improvement in the signal integrity isgauged by two metrics, the Bit Error Rate (BER) and the Eye Diagram. It is observed that the technique improves thesignal integrity and reduces BER. The test results and the improvements in the metrics of signal integrity for di ff erentlink speeds are presented and discussed. Keywords:
HEP, DAQ, Transceiver, FPGA, Signal Integrity
1. Introduction
The major goals of HEP experiments are to probethe fundamental constituents of the matter and under-stand the nature of fundamental forces. Advanced re-search in HEP demands a progressive increase in colli-sion energies and beam luminosities of the particle ac-celerators, which are essential for accessing rare probeswith extremely low cross sections [1]. The experimentsare continuously upgraded with sophisticated detectors,electronics and DAQ systems [2, 3]. The DAQ architec-tures have been evolving continuously to cope up withthe demands of the experiments [4, 5]. The LHC atCERN will go through a major upgrade during the longshutdown (LS2) period, following which the beam lu-minosities will increase by about an order of magnitudefrom their present values. At the same time, the exper-iments at the LHC are upgrading the detector and DAQsystems to allow for faster readout of the online data. ∗ Corresponding author
Email address: [email protected] (ShuaibAhmad Khan)
The DAQ architecture in HEP experiments consistsof the three general steps: (i) the data from the onlinedetectors are transferred to the FEE through the detectorbackplane, (ii) the data from the FEE are transferred tothe RU [6, 7, 8], and (iii) the processed data are furthertransferred to data storage. These steps require high-speed data communication links from one step to theother. Most of the DAQ systems are designed usingthe present available technology in such a way that itcould be easily upgraded to match the requirements ofthe system. Since one of the major concerns is to e ffi -ciently acquire data for all the collisions, error resilientand e ffi cient data transmission with minimal signal at-tenuation is required. Signal integrity is essential for theproper Clock and Data Recovery (CDR) [6, 9]. Thus itis a challenge to minimize the bit error ratio (BER) andimprove signal integrity for increased data rates [10].In this manuscript we address the challenges of high-frequency losses arising due to the high data rates forthe DAQ systems in HEP experiments. Using FPGAwe present a heuristic optimization technique to tunethe parameters of multi-gigabit transceivers for achiev-ing the best performance at high-speeds for the trans- Preprint submitted to Elsevier January 10, 2019 a r X i v : . [ phy s i c s . i n s - d e t ] J a n ission of data, trigger, timing and slow control infor-mation. The proposed technique helps to improve thesystem performance in terms of signal integrity and isimplemented on a state-of-the-art 20nm Intel Arria-10FPGA [11]. It uses the Intel-Altera on-die Instrumen-tation tools [12] and does not require the probing ofFPGA pins or transceiver attributes. The full setup istested for the link rate of the high-speed communica-tion protocols frequently used for data transmission inthese experiments. The technique is useful for on-fieldsystem-level debugging, and the parameters can be re-configured dynamically, allowing the user to configurethe transceivers for optimum performance. The robust-ness of the optimization technique has been tested withPseudo Random Binary Sequence31 (PRBS31) pattern,which represents the stressed and transitional data con-ditions. For the statistical reliability of the performedtests, a large number of data vectors are acquired. Dif-ferent performance indicators, such as, BER and eye di-agrams have been used to verify the improvement of thequality of data signal posterior to the execution of pro-posed optimization technique.The manuscript is organized as follows. In section 2,we present the data aggregation and processing in HEPexperiments. The important constituents of the high-speed DAQ system are discussed in section 3. Details ofthe transceiver optimization technique with its intricatefeatures are presented in section 4. Section 5 describesthe FPGA based test setup, and section 6 discusses themethodology to implement the proposed technique andits advantages. The test results are presented and dis-cussed in section 7. The manuscript is summarised insection 8.
2. Data aggregation and processing
A generalised architecture for the DAQ scheme of theHEP experiments is presented in Figure 1. The FEEboards are connected to the detectors and are located inthe radiation zone with proximity to the detector, requir-ing custom-built radiation hard electronics. The FEEboards process the analog detector signals and convertthose to digital signals. Design and specifications ofthese boards are unique to the individual detector sys-tem [13]. The particle detectors operate in the harsh ra-diation zones and in some cases, in high magnetic fields.The main data storage units, on the other hand, are keptin low radiation zones. The RUs, which are interme-diary between FEE and storage, can be placed eitherin the radiation zone of the experiment’s cavern or ina low radiation zone near the data storage units. In anideal case, the placing of the RUs near the detectors in the cavern minimizes the transmission latencies. But itrequires custom-built radiation hard electronics, whichare di ffi cult to obtain. In order to minimize the e ff ect ofradiation, the RUs as well as the trigger system and theback-end computing nodes, are kept out of the radiationzone. This helps to get the advantage of the high pro-cessing power available electronics with a large ecosys-tem, ease of accessibility and maintenance. Computing Node (Server/PCs)RUFEE
Trigger System Data Links DAQ LinksTrigger Links T FEE : Front End Electronics T: MultiGigabit Transceivers RU: Readout UnitRadiation Zone Outside the Radiation Zone Detector
TT TT T
Figure 1: Basic blocks of a typical data acquisition architecture forHEP experiments.
The RU acts as an interface between detector datalinks, the trigger system, and links to storage as wellas computing nodes as shown in Figure 1. The tasksperformed by the FPGA based RUs depend on the de-tector specifications and requirements. Main tasks aredata sorting, optical link handling, multiplexing and for-warding of data from di ff erent interfacing links, embed-ding control and trigger information, etc. [14]. Theseversatile functionalities require RU to be designed oncustom electronics boards with re-programmable func-tionality [15]. It is based on up-to-date FPGA technol-ogy with embedded on-chip transceivers. For our testswe have used the Intel Arria-10 GX FPGA based devel-opment board [11, 16]. The interfacing links of RU andthe high-speed communication protocols used for theLHC experiments in the context of the present frame-work are discussed in the following sections.
3. High-speed protocols
The DAQ architecture in Fig. 1 features three di ff er-ent interfacing links: (i) the Data link, which connectsthe detector FEE to RU, (ii) the Trigger link, which con-nects the RU to the trigger system of the experiment,and (iii) the DAQ link, which takes the data from theRU to the storage and computing nodes. For the datalink, the Gigabit Transceiver (GBT) protocol architec-ture [17], developed at CERN, has been found to bemost ideal. The GBT protocol supports 4.8 Gb / sec datatransmission rate. It ensures the transmission of data2rom the FEE near the detectors in high radiation zoneto the RU, which is located near the counting room in alow or no radiation zone. The Trigger link uses the Tim-ing, Trigger and Control system based on Passive Opti-cal Networks (TTC-PON) technology [18]; operates atthe rate of 9.6 Gigabit per second. It ensures fixed, de-terministic latency and satisfies the timing specificationof the LHC.The data packets get time-stamped in the RU. Thusthe links from the RU to the computing nodes is not la-tency critical. It has been found that the latest promisingtechnology option of 10-Gigabit Ethernet [5] with am-ple ecosystem are most suitable for the DAQ links inthe experiments. In Table 1, we give the detailed spec-ifications of the three interface links used in the HEPexperiments for the acquisition of data.
4. Transceiver optimization
High-speed data communication su ff ers from thetransmission losses and signal integrity issues; not seenat normal digital signalling levels [10]. The high-frequency content of the signal gets degraded due todielectric losses, skin e ff ect, discontinuities in connec-tors, reflections caused by the vias, inadequately placedtraces, etc. We have developed a technique to optimizethe transceiver parameters accurately and o ff er the bestcombination for a given high-speed link. This optimiza-tion of the transceiver parameters could take care of thetransmission losses [19].For the high-speed transmission channels with multi-gigabit rates, the unit interval (UI) for the data bit de-creases. At high transmission rates, the PCB materialssu ff er from frequency dependent losses, hence becomedispersive. This prevents the signal from reaching itsfull strength at the shrunk UI window, leading to jitterand intersymbol interference (ISI). It also disturbs thedeciphering of the signal and the extraction of the em-bedded clock becomes di ffi cult at the receiver end.An increase of the signal strength is an obvious so-lution to overcome the attenuation. However, the issueof high-frequency roll-o ff remains, and the pattern de-pendent jitter gets aggravated. Consequently, the signaldoes not reach its optimal strength within the intervaland may di ff use further into the next UI leading to ISI.Also for the increase of signal strength overall powerconsumption of the transceiver increases. Noise levelsin the system also increase proportionally. All theselead to deteriorated metrics of signal integrity and re-duced drive length. The e ff ects are even more evidentwith the use of high-speed interfaces with the systems which were originally designed for low bandwidth ap-plications.To overcome these losses, we have developed thetransceiver optimization technique and a proficientmethodology for 20nm Arria-10 FPGA. This newFPGA with considerably large on-chip resources [11]are ideal for the processing requiremnts in the experi-ments. For the optimization, the high-frequency componentsin the data stream are boosted up on every switch-ing, using the digital pre-emphasis taps of the on-chiptransceiver. In addition, the low frequency componentsare reduced. This technique helps to achieve the sameamount of emphasis with less power dissipation. Theexaggerations are overridden by the attenuation duringtransmission and allow for the signal to be recoveredaccurately. -2Z-1Z+1Z+2Z +/-+/-+/-+/-
VOD1st Pre-tap2nd Pre-tap1st Post-tap2nd Post-tap
Z: Operator for Z-transform
Figure 2: Voltage output di ff erential (VOD) and tunable pre-emphasistaps with flexible polarity in the embedded transceiver of FPGA. The optimization technique has been implemented onIntel Arria-10 FPGA development board with integratedreconfigurable transceiver architecture [11]. It incorpo-rates additional circuitry in bu ff ers for equalisation andpre-emphasis techniques. The transmitter of the embed-ded transceiver has five programmable drivers as shownin Figure 2. Voltage output di ff erential (V OD ) controlsthe base amplitude. The four pre-emphasis taps are1st pre-tap, 2nd pre-tap, 1st post-tap and 2nd post-tap.These taps also include polarity settings. The post tapsare the causal taps and the pre-taps are the anti-causaltaps. These multiple taps and choice of polarity couldhandle channel attenuating characteristics. Equalisationwith DC gain and Variable Gain Amplifier (VGA) is onthe receiver side of the transceiver. There are multipletransceiver parameters with a large span of operatingrange and so to scan the system performance for ev-ery combination of the parameters is a time-consumingprocess. Our goal had been to develop an e ffi cient tech-nique for optimization of transceiver parameters such3 able 1: Specifications of three high speed interface links, GBT [17], TTC-PON [18] and 10-Gb Ethernet. Parameters GBT TTC-PON 10Gb Ethernet
Technology Specification
Custom XGPON1 withmodifications 802.3ae SpecificationStandard
Designer Group
CERN ITU-T withCERN modifications IEEE
Line Rate
Payload Rate
Payload Size
120 bits@40 MHz Downstream:192 bits@40 MHzUpstream:16 bits@40 MHz 64 [email protected] MHz
Wavelength (nm)
850 nm(Multi-mode)1310 nm(Single-mode) Downstream: 1577 nmUpstream: 1270 nm 850 nm(10 Gb BASE-SR)
Network Topology
Point-to-Point Point-to-Multipoint Point-to-Point
Encoding
RS ECC with BlockInterleaver 8b /
10b 64b / Synchronous TriggerSupport
Yes Yes No
Trigger Latency
150 ns(Optical loop-back) 100 ns Downstream1.6 us Upstream NA that the signals impacted by the high-frequency lossesare recovered.It works like a
Finite Impulse Response (FIR) filterwith di ff erent delays referred to as the taps as shown inthe Figure 2. An FIR filter is based on a feed-forwarddi ff erence equation. The pre-emphasis technique ap-plies a delay to the signal and adds it back to the realsignal with weight and inversion as and when required.Although depending on the transmission channel pecu-liarity, a simple delay, weight and inversion may notbe able to provide the required compensation. For thisreason, a combination of di ff erent delays, weights andthe polarity are combined. In this configuration, thepre-emphasis 1st post-tap is the most useful parameter.It emphasises the immediate bit period after the tran-sition. The generation of the di ff erential emphasisedsignal, applying the unit delay by the first post-tap isshown in Figure 3, assuming V OD = < x <
1. The original positive signal Vp(T) iscompared with Vp(T-1) which is the unit-delayed sig-nal. The emphasised signal is the di ff erence betweenthe weighted x*Vp(T-1) signal and the Vp(T) signal.The negative signal is similarly generated. The pre-emphasised di ff erential signal is di ff erentiated from the positive and negative signals. The e ff ect of 2nd post-tap after the transition, depending on the chosen polaritysetting is shown in Figure 4.The pre-tap reduces the e ff ect of pre-cursor ISI. Fig-ure 5 shows the impact of 1st pre-tap and the 2nd pre-tapon the single and double bit period respectively, beforethe occurrence of high-frequency transition dependingon the polarity. Both pre-cursor ISI and post-cursorISI are handled by anti-causal and causal taps respec-tively. However, pre-emphasis alone cannot guaranteethe performance of the system as it is implemented atthe transmitter by pre-conditioning the signal before itis fed to the channel. There are high-frequency lossesin the transmission channel itself. Hence an equalisa-tion is required at the receiver end. It compensates forthe low pass characteristics of the physical medium andamplifies the attenuated high-frequency components ofthe incoming signal. An equalizer on the receiver sidelifts the contents inside a band of frequencies and at-tenuates the rest. The DC gain circuitry gives uniformamplification to the received spectrum. It enables thetransceivers to operate over longer distances. The VGAon the receiver optimizes the signal amplitude beforethe CDR sampling.4 p(T)x*Vp(T-1)Vp(T) - x*Vp(T-1) Vn(T)x*Vn(T-1)Vp(T-1)Vn(T)Vp(T) - x*Vp(T-1)Vn(T) x*Vn(T-1)- -+ x - 1 x+ x- x- Original positive Signal Unit delayedWeighted TapPre-emphasized positiveOriginal negative Pre-emphasized negativePre-emphasized differential
Figure 3: The pre-emphasis signal generation technique at the 1stpost-tap in embedded FPGA transceivers, 0 < x < Vp(T) - x*Vp(T-2) - Vn(T) + x*Vn(T-2)
Signal with Pre-emphasisSignal with Pre-emphasis
Vp(T) - x*Vp(T-1) - Vn(T) + x*Vn(T-1)
Signal withoutPre-emphasisSignal withoutPre-emphasis
Figure 4: Pre-emphasis 2nd post-tap (Inverted) compared with pre-emphasis 1st post-tap and their e ff ect on the signal without pre-emphasis. To achieve an optimal signal integrity perfor-mance, both transmitter and receiver parameters of thetransceiver on FPGA chip augments each other andwork combined to compensate for the high-frequencylosses. However, the overcompensation degrades thesignal quality and adds more jitter leading to the closedeye diagram rendering it futile for the receiver to iden-tify the signal and hence should be avoided.
Signal with Pre-emphasisSignal with Pre-emphasis
Vp(T) + x*Vp(T+1) - Vn(T) - x*Vn(T+1)
Signal withoutPre-emphasisSignal withoutPre-emphasis
Vp(T) + x*Vp(T+2) - Vn(T) - x*Vn(T+2) - Figure 5: Pre-emphasis 1st pre-tap and the 2nd pre-tap (Inverted) andtheir e ff ect on the signal without pre-emphasis.
5. Test setup
An FPGA based setup has been developed to test thepotency of the proposed optimization technique. Thetransceiver is tested for the high-speed links under thestressed conditions. The setup has been utilised to em-ulate the stressed high-speed link conditions and to in-vestigate the high frequency losses in the transmission.It determines the capability of the transceiver system torecover the data from the degraded signals. Tests areperformed at the system level to operate the setup at aprescribed BER equal to or better than 10 − as per theIEEE standard.The test setup, shown in Fig. 6, engrosses the Arria-10 FPGA development board (10AX115S2F45I1SG de-vice) for the implementation and testing of the optimiza-tion technique. The FPGA development card is installedon the PCIe 16 lane slot of the server, where the poweris obtained from the server motherboard. The func-tions and specifications of each of the components ofthe setup are given in Table 2.Intel Quartus-II platform is the firmware applicationpackage, implemented on the FPGA logic design. Thetransmission links at the specified data rates are imple-mented using Quartus-II Qsys tool. Qsys is Intel (cid:48) s sys-tem integration tool for the quick generation of the in-terconnect logic. The signal integrity of the transceiverlinks is validated using Transceiver Toolkit (TTK) fea-ture of Quartus-II with a GUI. The TTK is used toquickly access, tune and test the transceiver parametersettings in runtime through a combination of metrics.The TTK enables us to measure BER and the eye di-agrams and also verify the signal integrity in externalloopback mode. Details of the firmware-tools, such as,Quartus II, Qsys, TTK, PRBS patterns and auto-sweep5 erverMotherboardSlot for PCIe x 16 Gen 3on mother boardPCIe connector on FPGA board Optical LoopbackVariable Optical Attenuator
Arria 10FPGA FPGA Board
SFP+
User loopback logic on Silicon
ExternallyPluggable Module
TxRx * Optical power meter for optical power measurement * Lucent to Ferrule (LC to FC) connector to couple optical(InGaAs detector, range (-70 dBm), resolution 0.01 dBm) fibre to the power meter (50/125um hybrid connector)
Figure 6: Arria-10 FPGA card inserted in PCIe x16 slot of server. The optical signal from the externally pluggable SFP + is looped back via thefibre equipped with the variable optical attenuator (VOA).Table 2: Components used in the test setup, their role and specifications. Component Role in test setup Specification
FPGA Test Board Integrated FPGA based design environmentwith embedded transceivers on silicon. PCIeconnection. Slot for hot pluggable transceiveroptical modules. Other accessories Intel Arria10 FPGA, (20nm mid-range).Transceivers upto 17.4 Gbps [11].Variable OpticalAttenuator (VOA) withoptical Fiber Optical power attenuationin the fibre loopback path. Range(dB)-0 ∼
60, Accuracy +/ - 0.8dB.Fibre(850nm): Multimode 50 / < + )module. External transceiver modules to becoupled to the fibre. Laser at transmitterand PIN diodes at the receiver ends Hot-pluggable footprint, upto 10Gbps,850nm VCSEL laser, duplex LC connector.Link length of 300m [20].Workstation with FPGAdesign platforms FPGA board powered through PCIeGen3x16 slot. Compile andgenerate the FPGA design withfirmware development softwares PCIe Gen3 x16 slots available. Quartus-IIplatform installed for firmware designand generation. FPGA programmed throughUSB blaster download cable. Data Generator(PRBS pattern)
Transmitter Receiver
Data Receiver(PRBS pattern check) core_clk_outFeedback clock
Connection layout in Qsys (Platform designer)
FPGA (Silicon)
Optical link core_clk_in
Figure 7: Typical BER test loopback logic on FPGA using Qsys tool.The serialised data is transmitted, looped back and checked for theflipped bits at the receiver. features may be found in reference [12].For the data loopback tests [21], multimode optical fibre equipped with Variable Optical Attenuator (VOA)and external pluggable SFP + modules are used. Thefar end of the transceiver is coiled back to the receiv-ing end. The received data is then verified by the datachecker logic on FPGA for any erroneous bits as shownin Figure 7. To test the signal integrity a variety ofdata patterns can be used. However, in each case, achecker must be available for verification. PRBS pat-terns are injected into the test system as it generates thestressed and lengthy patterns with fewer memory con-sumption [22]. Another advantage of using PRBS pat-terns for the tests is that the boundary synchronisationis not necessary at the physical layer as the patterns aretime correlated. The Intel soft logic cores are used forPRBS data pattern generator and checker [12].The BER measurement approach was chosen with re-6pect to the controlled attenuated optical power at thereceiver with the help of VOA. It allowed us to rapidlycharacterise the transceiver sensitivity below which theembedded clock cannot be recovered from the datastream, and loss of lock occurs [19]. It also deter-mines the minimum required optical power to achievethe targeted BER for a system operating at a specifieddata rate. Auto sweep feature of TTK is used to ob-tain the optimum settings of the best performing param-eters of the transceiver for a specified BER. This op-timized set of transceiver parameters delivers the bestmetrics of signal integrity and the eye diagrams by itsheight and width. In the next section, we elaborate themethodology for the optimization of high data rate on-chip transceivers to reduce the e ff ect of high-frequencylosses.
6. Methodology
The methodology to extract the optimized settingsof the transceiver parameters has been explained in theflowchart in Figure 8. To start with, the optimizationprocess scans the full range of each transceiver param-eter using the TTK auto-sweep feature while the rest ofthe parameters are set at their Intel-default values. Thenit records the best performing tap setting values for eachtransceiver parameter as indicated by eye parameters.At this instance, a Solution Matrix ( S ) at Nth iteration,set N = OD , Pre-emphasis (1stpre-tap, 2nd pre-tap, 1st post-tap, 2nd post-tap) andthe receiver parameters (DC gain, Equalisation control,VGA). Then we scan again the transmission and receiveparameters separately in the range of -3 ≤ S ≤
3, whilereceive and transmit parameters respectively are set atthe values enlisted in the S . Record again the best per-forming cases and update the S with newer values, in-crement N by 1. Assign the latest matrix values to theTTK and run the loopback test. If this does not resultin the improved metrics of signal integrity (Eye dia-gram and the BER) than the one obtained at the Inteldefault set values; repeat the optimization loop with theadjusted S values in the range defined until the improve-ment in both eye diagram and BER is achieved.The parameters cannot be declared as optimized un-til a stage of degradation in the signal integrity metricsfrom their peak values is observed. The degradation ofmetrics denotes the over-compensation and it marks thetransition from the maxima of the transceiver parame-ters. Assign and update the S with the best performingcase metric values rejecting the over-compensated valueset. The final S values with the best performing metrics is known as Solution Space [19]. The deduced final val-ues are fed to the transceiver for further analysis. Theresults are presented and discussed in the next section.The proposed technique has definite advantages overtraditional method where the transceiver optimizationmay be carried out in an extremely time-consuming wayby evaluating the signal integrity through a large num-ber of permutations and combinations of the parame-ters. The parameters and their possible ranges are listedin the Table 3.
Table 3: Transceiver parameters, range of operations for the manualoptimization.
Transceiverparameter Range ofpossiblevalues Number ofiterationsrequired
Transmitter Side
VOD
Pre-emphasis 1st post-tap -31 to 31 63
Pre-emphasis 1st pre-tap -31 to 31 63
Pre-emphasis 2nd post-tap -15 to 15 31
Pre-emphasis 2nd pre-tap - 7 to 7 15
Receiver Side
DC gain
Equalisation
VGA
7. Results and discussion
Results are demonstrated and validated for the threedi ff erent high speed optical links: 10 Gbps links, 4.8Gbps GBT protocol and 9.6 Gbps TTC-PON. The testsystem confronts the lock and hold capability of theCDR circuit, perturbs all the conceivable instances ofISI and analyses the receiver sensitivity for any prob-able drifts. Drifts at the receiver are caused due tolong imbalanced runs of the data transition pattern. ThePRBS31, 2 − ff erent combinations induce non-similar ISI configurations. It is required to stress thetransceivers, test any innate ISI in a transmitter, and toassess the quality of transmission. PRBS patterns depicta white spectrum in the frequency domain and are in-jected to tests the robustness of the high-speed links. Forthe entire analysis, PRBS31 is used to stress the system.However, the variation of eye diagram and BER charac-teristics are also studied for PRBS7, PRBS9, PRBS15,PRBS23 in addition to PRBS31.7 oad the developed TTK design for the specified data rate on the FPGASelect the desired PRBS valueSelect the loopback mode as ExternalStart the data transmission(TTK parameters at Intel-Altera default)Record Eye Width/Height and BER at Nth iterationSet variable N =1Scan each individual parameter of transceiver for full range using Auto-sweep feature of TTK (Rest of the pararmeter values at the Intel-Altera default Attenuate the received optical signal in steps using VOAIs Receiver CDR locked Record BER vs dBm at each attenuated stepSignal attenuated beyond the Receiver CDR limit Record the best performing value of each transceiverparameter regard to Eye diagram metricsDevelop S.Matrix(Nth) with the best performing valuesGroup the transmitter parameters and the receiver parametersScan the Transmitter parameters in range (-3 <= S.Matrix(Nth) <= 3)while receiver parameters at the S. Matrix(Nth) Scan the receiver parameters in range (-3 <= S. Matrix(Nth) <= 3)while transmitter parameters at the S. Matrix(nth) Record the best performing value of each transceiverparameter regard to Eye diagram metrics Set variable N = N+1 Assign the S.Matrix (Nth) values to the TTK and run the data loopback transmission Record Eye Width/Height and BERat Nth iterationIs BER (Nth)
Figure 8: Stepwise flow diagram for the Transceiver Optimization. Data transmission is started with the Intel default parameters and a Solutionmatrix is derived to achieve the optimized signal integrity .1. Eye Diagram analysis At the system startup, the transceiver parameters inTTK are set at the default values. Changes in eye dia-gram are compared for di ff erent PRBS stressed patternsas the first set of analysis. Eye Height and Width is plot-ted on a three axes plot with PRBS pattern on the thirdaxes as shown in Figure 9. It is found that PRBS31 hasthe most stressed eye metrics and as anticipated a moreclosed eye is examined for all the three links speed. PRBS
Eye Width
Eye Height
PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31
10 Gbps
PRBS
Eye Width
Eye Height
PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31
GBT 4.8 Gbps
PRBS
Eye Width
Eye Height
PRBS 7PRBS 9PRBS 15PRBS 23PRBS 31
TTC-PON 9.6 Gbps
Figure 9: Changes in the Eye height and Eye width with PRBS varia-tion for optical links at three line rates.
Another important metric of signal integrity is BER.Its measurement is a statistical phenomenon and the es- timate is ideal only if the number of tested bits tendsto infinity, which is not possible in a real lab test setup.Hence, a method was proposed in reference [23] to limitthe stressing time of a system to a feasible length and tomeasure the BER with high confidence level (CL) too.CL is used to quantify the quality of the estimate in per-centage. It is the systems actual probability of error lessthan the specified limit. The minimum number of bitsrequired to be tested for the BER measurement with aspecific associated CL is given in equation 1: n = − ln(1 − CL ) BER + ln (cid:32)(cid:80) Nk = ( n ∗ BER ) k k ! (cid:33) BERT = n / R (1) T is test time needed, R is the line rate and when N = n = − ln(1 − CL ) BER (2)Where n are the total number of bits transmitted and N are the number of errors that occurred during the trans-mission. There is a compromise between testing timeand the required accuracy of the measurement as shownin equation 1.For the 95 percent CL, equation 2 reduces to n (cid:39) / ( BER ). Hence to achieve the BER of 10 − at 95 per-cent CL, total 3x10 bits need to be tested, as a thumbrule. The concept is further extended to find the minimuminspection time required to measure BER of 10 − fordi ff erent CL with no errors for GBT, TTC-PON and10 Gbps links as shown in Figure 10. In this paper,all the BER measurements are done for 3x10 bits toachieve 95 percent CL. Variation of BER at Intel-default Line Rate (Gbps) T e s t T i m e needed ( s e cs ) TTC−PONupstreamrate = 2.4Gbps GBT linerate = 4.8Gbps TTC−PONdownstreamrate = 9.6Gbps 10 G linerate= 10.3125GbpsCL = 0.90CL = 0.95CL = 0.99
Figure 10: Time to achieve BER of 10 − for the Line rate of GBT,TTC-PON and 10 Gbps optical links having di ff erent CL. transceiver set is recorded with respect to the attenuation9
15 −14 −13 −12 −11 −10 −9−14−12−10−8−6−4
Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS 7 (R = 0.99)PRBS 9 (R = 0.98)PRBS15 (R = 0.98)PRBS23 (R = 0.99)PRBS31 (R = 0.99)
10 Gbps −16 −15 −14 −13 −12 −11−16−14−12−10−8−6−4 Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS7 (R = 0.98)PRBS9 (R = 0.90)PRBS15 (R = 0.98)PRBS23 (R = 0.99)PRBS31 (R = 0.96) GBT 4.8 Gbps −15 −13 −11 −9 −7 −5−16−14−12−10−8−6−4−2 Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e PRBS7 (R = 0.99)PRBS9 (R = 0.99)PRBS15 (R = 0.99)PRBS23 (R = 0.98)PRBS31 (R = 0.96) TTC-PON 9.6 Gbps
Figure 11: BER versus received optical power(dBm) for transceiverat Intel FPGA default settings for di ff erent PRBS operating in threeline rates. of the received optical power; following the methodol-ogy flowchart shown in Figure 8. This test is executedwith the help of VOA attached to the loopback fibre.BER variation is recorded for di ff erent PRBS patternsand plotted for the links operating at 10 Gbps, 4.8 Gbpsand 9.6 Gbps rates as shown in Figure 11.The exponential curve fitting is the best-suited ap-proximation for the BER in logarithmic domain [24].Double exponent fit function with constants is used tofit the BER data as it provides close fits in a variety ofBER plot situations. It fits the BER data using uncon-strained nonlinear optimization [25]. The statistics forgoodness-of-fit in terms of R-Square ( R ) for di ff erentPRBS is marked in the Figure 11.The test shown in Figure 11 highlights that at a spec-ified CL higher number of errors are received in the transmission system for a given received optical power;when PRBS31 is injected as the test data pattern ascompared to the other PRBS patterns. The outcome ofthe tests shown in Figure 9 and Figure 11 revealed thedegradation of the metrics of signal integrity with the in-crease in the size of a unique word of data in the PRBSsequence. The results from these tests are as anticipatedand well substantiated. It has further strengthened theusefulness of the PRBS31 as a strenuous test pattern todemonstrate the validation of the proposed methodol-ogy. However, there is a crossover point for 4.8 Gbpsat BER ∼ − . It is kept beyond the discussion as ourregion of interest is better by two orders of magnitudewhich is BER ∼ − . The improvement in the system performance ismarked by two metrics of signal integrity viz. BERand Eye Diagram. The eye contour for the Intel-defaultsettings and at the deduced optimized settings of thetransceiver is captured using the EyeQ (a GUI feature ofTTK). It helps to estimate and visualize the vertical andhorizontal eye opening at the receiver as shown in Fig-ure 12. After the application of the deduced transceiverparameters settings using the proposed technique, thereis a notable enhancement in width (Horizontal PhaseStep) and height (Vertical Step) of the eye diagram.Hence the quality of signal transmission is improved.The optimized values of the transceiver parametersknown as solution space, found from the proposedmethodology for the targeted BER of 10 − are plottedagainst the Intel-default set in the form of a multivariatekiviat diagram for all the three link speeds as given inFigure 13. It allows us to demonstrate a clear compari-son of the individual parameters on each axis.Variation in BER is plotted for the deduced solutionspace values of a transceiver and for the Intel defaultset; concerning the di ff erent attenuation levels of inputoptical power at the receiver. It is shown for PRBS31for all the three links under observation in Figure 14.Further analysing the results from Figure 14, the leastoptical power required at the receiver to attain a pre-ferred BER or better could be determined from thecurve. Also it shows, that a specific marked BER isachieved at a lower optical power when transceiver isoperated at the deduced parameter values listed in solu-tion space in comparison to the Intel default set. Here tomention the particular case as an example, the targetedBER of 10 − for the optical link test as per IEEE stan-dards is achieved at lower values of the optical power10ertical step(19) / Horizontal Phase step(41) for 10Gbps at the Intel FPGA default settingsVertical step(49) / Horizontal Phase step(54) for 10Gbps at the Optimized FPGA settingsVertical step(28) / Horizontal Phase step(59) for 4.8Gbps at the Intel FPGA default settingsVertical step(63) / Horizontal Phase step(63) for 4.8Gbps at the Optimized FPGA settingsVertical step(18) / Horizontal Phase step(43) for 9.6Gbps at the Intel FPGA default settingsVertical step(41) / Horizontal Phase step(50) for 9.6Gbps at the Optimized FPGA settings
Figure 12: Eye diagram at the Intel FPGA default and at the Opti-mized settings of transceiver. and the improvement at the mentioned BER is quantita-tively listed in Table 4 for the three link speeds.
Table 4: Comparison of Optical power(dBm) to attain BER of 10 − for the three high speed interface links. Protocol With defaultapproach(dBm) With optimizationtechnique(dBm) Di ff erence(dBm) Improvement(Percentage)10Gb Ethernet -9.2 -10.35 -1.15 12.5GBT -11.9 -12.7 -0.8 6.7TTC-PON -6.45 -9.3 -2.85 44.1 Another clear observation emerged from the datacomparison of Figure 14 is that the receiver sensitiv- −5.25 14.5 34.25 54
VOD Control −5.25 14.5 34.25 54
Pre−emphasis 1st Post−Tap −5.2514.534.2554
Pre−emphasis 1st Pre−Tap −5.2514.534.2554
Pre−emphasis 2nd Post−Tap −5.2514.534.2554
Pre−emphasis 2nd Pre−Tap −5.2514.534.2554
DC Gain −5.2514.534.2554
EqualizationControl −5.2514.534.2554
VGA −5.2514.534.2554
Eye Height −5.25 14.5 34.25 54
Eye Width Solution Space Intel−Default
10 Gbps line rate −3 19 41 63
VOD Control −3 19 41 63
Pre−emphasis 1st Post−Tap −3194163
Pre−emphasis 1st Pre−Tap −3194163
Pre−emphasis 2nd Post−Tap −3194163
Pre−emphasis 2nd Pre−Tap −3194163
DC Gain −3194163
Equalization Control −3194163
VGA −3194163
Eye Height −3 19 41 63
Eye Width Solution Space Intel−Default
GBT 4.8 Gbps line rate −6.25 12.5 31.25 50
VOD Control −6.25 12.5 31.25 50
Pre−emphasis 1st Post−Tap −6.2512.531.2550
Pre−emphasis1st Pre−Tap −6.2512.531.2550
Pre−emphasis 2nd Post−Tap −6.2512.531.2550
Pre−emphasis 2nd Pre−Tap −6.2512.531.2550
DC Gain −6.2512.531.2550
Equalization Control −6.2512.531.2550
VGA −6.2512.531.2550
Eye Height −6.25 12.5 31.25 50
Eye Width Solution Space Intel−Default
TTC-PON 9.6 Gbps line rate
Figure 13: Multivariate kiviat diagram showing the solution space andthe Intel FPGA default values for three di ff erent link rates. Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default Settings (R = 0.99) Optimized settings (R = 0.99)
10 Gbps line rate −16.5 −16 −15.5 −15 −14.5 −14 −13.5 −13 −12.5 −12 −11.5 −11−15−13−11−9−7−5
Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default settings (R = 0.99) Optimized settings (R = 0.98) GBT 4.8 Gbps line rate −13.5 −12.5 −11.5 −10.5 −9.5 −8.5 −7.5 −6.5 −5.5−16−14−12−10−8−6−4−2
Power (dBm) B i t E rr o r R a t i o ( B E R ) i n l og sca l e Default settings (R = 0.96) Optimized settings (R = 0.99) TTC-PON 9.6 Gbps line rate
Figure 14: Comparison of BER versus the received optical powerfor default and optimized transceiver settings separately for three linerates. ity below which the loss of lock occurs, is enhanceddue to the reduction in the high-frequency losses withthe application of the proposed optimization technique.This results in reducing the limit of the optical powerrequired for the proper CDR and the signal is traceablefor comparatively lower values of the received opticalpower. The quantitative comparisons are given in Ta-ble 5.
Table 5: Comparison of optical power for CDR for the three highspeed interface links.
Protocol With defaultparameters(dBm) With optimizationtechnique(dBm) Di ff erence(dBm) Improvement(Percentage)10Gb Ethernet -14.4 -15 -0.6 4.17GBT -15.34 -16.04 -0.7 4.56TTC-PON -11.78 -13.2 -1.42 12.05 The test results shown in Figure 13 and 14 confirmsthat the e ff ect of high-frequency losses on the link per-formance is controlled. It is achieved after the applica-tion of the deduced solution space values to the TTKand a significant improvement on the BER is noted ata particular received optical power. The tests and re-sults validate the usefulness of the proposed techniqueto enhance the transceiver performance and the signalintegrity by compensating for the high-frequency losses.
8. Summary
We have presented a novel transceiver optimizationtechnique to reduce the high-frequency losses whichoccur due to the increased rates of data transmissionin case of HEP experiments. The technique has beenimplemented on the latest 20nm Intel-Altera Arria-10FPGA. The scheme has been tested and validated for thelink rates of three high-speed communication protocols,GBT, TTC-PON and 10 Gbps Ethernet, which are mostcommonly used for interfacing the detector front-endelectronics, trigger and DAQ systems. The proposedscheme is an optimized approach which reduces num-ber of iterations required.The tests are performed with PRBS31 pattern at aconfidence level of 95 percent. There is considerablegain in the system performance with the application ofthe proposed technique as specified by the two parame-ters of signal integrity, the BER and the Eye Diagram.The Intel FPGA set parameters and the solution spacevalues are marked on the kiviat diagram for the fastcomparison between the parameters. The results pointthat to attain the marked BER of 10 − ; the required op-tical power is reduced by 12.5%, 6.7% and 44.1% for10Gbps, GBT and TTC-PON respectively. The BER isalso improved over the received range of optical power.The CDR capability of the system is also enhanced asthe least optical power required to recover the data traf-fic is reduced by 4.17%, 4.56% and 12.05% for 10Gbps,GBT and TTC-PON respectively. The technique im-proves the signal integrity and reduces the BER. Thistechnique is a heuristic solution and has potential forpractical applications as it provides rapid convergenceof the solution space to achieve optimized transceiversettings. It makes the implementation of the new tech-nique time e ffi cient. This transceiver optimization tech-nique and its implementation approach would lend itselfwell for other FPGAs users that allows on-chip assess-ment of signal quality like Eye diagram. Acknowledgement
The authors gratefully acknowledge the support ofthe ALICE Collaboration at CERN during the period12f the research work. We thank Alex Kluge, TivadarKiss, Erno David of the ALICE Electronics coordina-tion and the CRU project for their valuable help and ad-vice. We thank Subhasis Chattopadhyay, Anurag Misraand Saurabh Srivastava for fruitful suggestions duringthe preparation of the manuscript.
References [1] D. E. Morrissey, T. Plehn, T. M. Tait, Physics searches at theLHC, Physics Reports 515 (1-2) (2012) 1–113.[2] W. K. Panofsky, Evolution of particle accelerators, SLAC BeamLine 27 (1997) 36–44.[3] W. Smith, Trigger and data acquisition for hadron colliders atthe energy frontier (2013), arXiv preprint arXiv:1307.0706.[4] S. A. Khan, J. Mitra, E. David, T. Kiss, T. K. Nayak, A po-tent approach for the development of FPGA based DAQ sys-tem for HEP experiments, Journal of Instrumentation 12 (2017)T10010.[5] J. Toledo, F. Mora, H. M¨uller, Past, present and future of dataacquisition systems in high energy physics experiments, Micro-processors and Microsystems 27 (2003) 353–358.[6] J. Mitra, S. A. Khan, et al., Common readout unit (CRU)-a newreadout architecture for the ALICE experiment, Journal of In-strumentation 11 (2016) C03021.[7] Guti´errez, et al., The ALICE TPC readout control unit, in:Nuclear Science Symposium Conference Record, Vol.1, IEEE2005, Vol. 1, 2005, p. 575.[8] S. A. Khan, J. Mitra, T. K. Nayak, Development of a highspeed data acquisition system for the detectors at high luminos-ity LHC, in: Proceedings of the XXII DAE High Energy PhysicsSymposium, Springer, 2018, p. 223.[9] B. Razavi, Challenges in the design high-speed clock and datarecovery circuits, IEEE Communications magazine 40 (2002)94–101.[10] S. H. Hall, H. L. Heck, Advanced signal integrity for high-speeddigital designs, John Wiley & Sons, 2011.[11] I. Altera, Intel Arria 10 Device Overview (2018).[12] I. Altera, Quartus Prime Standard Edition Handbook Volume 1:Design and Synthesis (2017).[13] G. F. Knoll, Radiation detection and measurement, John Wiley& Sons, 2010.[14] L. Li, A. M. Wyrwicz, Parallel 2D FFT implementation onFPGA suitable for real-time MR image processing, Review ofScientific Instruments 89 (9) (2018) 093706.[15] S. G. Castillo, K. B. Ozanyan, Field-programmable data acqui-sition and processing channel for optical tomography systems,Review of scientific instruments 76 (9) (2005) 095109.[16] Intel Corporation, Arria 10 FPGA Development Kit User Guide(2017).[17] M. B. Marin, S. Baron, et al., The GBT-FPGA core: featuresand challenges, Journal of Instrumentation 10 (2015) C03021.[18] E. Mendes, S. Baron, D. Kolotouros, C. Soos, F. Vasey, The10G TTC-PON: challenges, solutions and performance, Journalof Instrumentation 12 (2017) C02041.[19] I. Altera, High-SpeedLink Tuning Using Signal ConditioningCircuitry in Stratix V Transceivers (2015).[20] S. Committee, et al., SFF-8431 Specifications for EnhancedSmall Form Factor Pluggable Module SFP + , revision 4.1, July6, 2009.[21] S. I. Green, Multichannel bit error rate tester for fiber op-tic transceiver testing, Review of scientific instruments 73 (8)(2002) 3125–3127. [22] H. Badaoui, Y. Frignac, P. Ramantanis, B. E. Benkelfat,M. Feham, PRQS Sequences Characteristics Analysis by Auto-correlation Function and Statistical Properties, IJCSI (2010) 39.[23] D. Miti´c, A. Lebl, ˇZ. Markov, Calculating the required numberof bits in the function of confidence level and error probabilityestimation, Serbian Journal of Electrical Engineering 9 (2012)361–375.[24] L. J. Ippolito, Appendix b: Error functions and bit error rate,Satellite Communications Systems Engineering: AtmosphericE ff ects, Satellite Link Design and System Performance 363–366.[25] S. Chapra, R. P. Canale, Numerical methods for engineers : withpersonal computer applications / steven c. chapra, raymond p.canale.steven c. chapra, raymond p.canale.