Timing and characterization of shaped pulses with MHz ADCs in a detector system: a comparative study and deep learning approach
Pengcheng Ai, Dong Wang, Guangming Huang, Ni Fang, Deli Xu, Fan Zhang
PPrepared for submission to JINST
Timing and characterization of shaped pulses with MHzADCs in a detector system: a comparative study and deeplearning approach
Pengcheng Ai, a Dong Wang, a , Guangming Huang, a Ni Fang, a Deli Xu, a Fan Zhang b a Central China Normal University, No.152 Luoyu Road, Wuhan, Hubei, 430079 P.R.China b Hubei University of Technology, No.28 Nanli Road, Wuhan, Hubei, 430068 P.R.China
E-mail: [email protected]
Abstract: Timing systems based on Analog-to-Digital Converters are widely used in the designof previous high energy physics detectors. In this paper, we propose a new method based on deeplearning to extract the time information from a finite set of ADC samples. Firstly, a quantitativeanalysis of the traditional curve fitting method regarding three kinds of variations (long-term drift,short-term change and random noise) is presented with simulation illustrations. Next, a comparativestudy between curve fitting and the neural networks is made to demonstrate the potential of deeplearning in this problem. Simulations show that the dedicated network architecture can greatlysuppress the noise RMS and improve timing resolution in non-ideal conditions. Finally, experimentsare performed with the ALICE PHOS FEE card. The performance of our method is more than 20%better than curve fitting in the experimental condition.Keywords: Analysis and statistical methods; Pattern recognition, cluster finding, calibration andfitting methods; Front-end electronics for detector readout; Timing detectors Corresponding author. a r X i v : . [ phy s i c s . d a t a - a n ] M a r ontents µ s shaping time 175.2 100 ns shaping time 195.3 Discussion about the experimental results 20 Pulse timing is a common problem in high energy physics [1], optics [2], telecommunication[3] and many other applied physics disciplines. Among feasible methods, fast electronic readoutsystems provide a cost-effective and robust solution with relatively high timing resolution. Inmany engineering circumstances, we care more about the availability and practicality than technicalindicators. Electronic timing systems are usually good candidates for these applications.In high energy physics, accurate timing, along with energy and position information, is neededto reconstruct the collision events so as to discriminate against backgrounds [4] and identify phe-nomenons of interest [5]. Several kinds of detectors can provide the time information. For example,Time-of-Flight detectors can measure the time of incoming events directly; Time Projection Cham-bers (TPC) and calorimeters can measure the pulse signal and infer the time afterwards; silicondetectors and pixel sensors can measure the hit information and offer an auxiliary time stamp, and soon. The final reconstructed event is a combination and coincidence of multiple sources of detectors.– 1 –here are two major branches of timing systems: systems based on Analog-to-Digital Con-verters (ADC) and systems based on Time-to-Digital Converters (TDC). In general, TDC-basedsystems are specialized in time measurement and can achieve a precision of tens of picoseconds[6] when configured properly. In spite of their high precision, the major drawback of TDC-basedsystems is that they lack the necessary amplitude information which is critical in some applications.If both time and amplitude are of interest, ADC-based systems are good alternatives to TDC-basedsystems. The empirical timing precision for ADC-based systems is in the order of nanoseconds.For ADC-based systems, a typical work flow can be described as follows. The original signalfrom TPCs or calorimeters is preprocessed by Charge Sensitive pre-Amplifiers (CSA) to get a step-like signal. Afterwards, this signal is fed to Front-End Electronics (FEE). The signal conditioningon the FEE board includes buffering, amplifying and bandpass filtering by CR-RCn shapers. Finally,the signal is sampled by ADCs with the prescribed precision and data depth. The recorded ADCsamples can serve multiple purposes. For a classification task, the shaped pulse signal can be usedto discriminate between particles or physical events [7–10]. For a regression task, timing or otherpulse information is extracted from the digitized pulse signal [11].To obtain the time from a finite set of ADC samples, we can use an estimated fitting function andperform curve fitting to get estimated values of underlying parameters. Curve fitting is a standardinference method in the time domain and it shows promising properties under certain conditions(See section 3.1.2). However, its applicability and accuracy rely on the fitting function and the idealform of noise heavily. As a result, the actual performance of curve fitting is limited by experimentalconditions of ADC-based systems [12].Recently, deep learning [13] as a renewed machine learning technique has progressed rapidly.It has been successfully used for particle/event discrimination and identification at the pulse level[14], the pixel level [15] and the voxel (three-dimensional) level [16]. In view of the fact that neuralnetworks are applicable to classification tasks as well as regression tasks, it is meaningful to explorethe capability of deep learning in the above-mentioned pulse timing problem.In this paper, we mainly discuss the deep learning approach to pulse timing based on acomparison between curve fitting and the proposed method. Section 2 briefly introduces the projectbackground and the mathematical form of the researched pulse. Section 3 explains the traditionalcurve fitting method by theoretical analysis and simulation studies. Section 4 gives a comparativestudy and the details of the new approach of deep learning. Section 5 discusses the experiments weconduct and shows the experimental results. Finally, a conclusion is drawn in section 6.
The ALICE PHOS detectors [17] refer to the Photon Spectrometers designed for the ALICEexperiment [18]. The detectors were produced in 2007 and scheduled for the first p+p collisionsat LHC in 2008 [19]. The scintillator is made of lead tungstate crystals and mainly used to detecthigh energy photons (up to 80 GeV). An Avalanche Photo-Diode (APD) receives the scintillationand converts it to an electrical signal, which is applied to a CSA near the APD. The output of theCSA is connected to the FEE card via a flat cable.The FEE card has 32 independent readout channels, each of which is connected to two shapersections with high gain and low gain. The CR-RC2 signal shapers are made up of discrete– 2 –omponents on a 12-layer Printed Circuit Board (PCB). For each channel, there are two overlapping10-bit ADCs at the terminations of the two shapers, which give an equivalent dynamic range of14 bits. The sampling rate of the ADCs is fixed to 10 MS/s. The same readout plan and PCBlayout were adopted by ALICE EMCal detectors [20], which refer to the ALICE ElectromagneticCalorimeters. The major difference of FEE cards between PHOS and EMCal lies in the shapingtime of the shapers. For PHOS, the designate shaping time is 1 µ s; however for EMCal, we usedifferent resistors and capacitors to achieve a shaping time of 100 ns.The CR-RC2 shaper is a bandpass filter in the frequency domain. In the time domain, itsresponse to an ideal step signal can be formulated as the equation below: f ( t ) = K (cid:16) t − t τ p (cid:17) · e − ·( t − t τ p ) + b , for t ≥ t b for t < t (2.1)where t is the start time, and b is the pedestal. K is originally defined as Q · A C f which is a variablerelated to the energy of the incoming photon, where Q is the APD charge, A is the shaper gainand C f is charging capacitance of the CSA. In our simulations, without changing the nature of theproblem, we use K as a normalization factor for numerical purposes. τ p is the peaking time definedas the interval between the start of the semi-Gaussian pulse and the moment when f ( t ) reaches itsmaximum value. The relation between the shaping time τ and the peaking time τ p is τ p = n · τ .For the CR-RC2 shaper structure, n equals 2, so the peaking times for the PHOS and EMCal are 2 µ s and 200 ns, respectively.Since the CR-RCn shaper is representative for most applications in high energy physics, in thelatter sections we center on the pulse function in equation 2.1 to discuss different timing methods. Curve fitting is a traditional model fitting technique mainly aimed at finding the parameterizedmathematical relations between two or more variables. Classical linear curve fitting can be directlysolved by the least squares method, and nonlinear curve fitting can be solved by the trust region andLevenberg-Marquardt methods [21]. In the pulse timing scenario, the main purpose of curve fittingis to determine the desired parameters related to the time information. In the following subsections,we analyze the curve fitting method in terms of its capability to reveal the ground-truth parametersunder various conditions.
We consider the following nonlinear least squares problem:– 3 –inimize S = minimize n (cid:213) i = r i = minimize n (cid:213) i = [ y i − f ( t i ; β , θ )] (3.1)where S is the sum of squared residuals to minimize, r i is the i-th residual, y i is the i-th observedvalue (from ADC), and t i is the i-th time value. There is some noise residing in the observed value y i , and we denote this noise term as n i . Besides, β is the fitting parameters and θ is the systemparameters . The division of fitting parameters and system parameters is made according to ourunderstanding of the problem and practical issues. It is not recommended to set two parameterswith high correlation as fitting parameters at the same time, which will cause instability to the fittingprocess.It should be noted that the above formulation is a general framework for the fitting problem.Usually we choose a function family f ( t ; β , θ m ) for curve fitting. However, f ( t ; β , θ m ) is only asubset of the underlying possible functions f ( t ; β , θ ) . We denote the reference fitting function as f ( t ; β , θ ) in section 3.1.2 and section 3.1.3. In this part, we assume that the selected fitting function is accurate (i.e. θ is fixed to θ and θ m = θ ),and the noise distribution is strictly Gaussian with a fixed variance σ . Under these assumptions,the distribution of the observed value can be written as: y i = f ( t i ; β , θ ) + n i ∼ N (cid:16) f ( t i ; β , θ ) , σ (cid:17) (3.2)Since the Gaussian distribution is P ( x | µ, σ ) = √ πσ e −( x − µ ) / σ , the corresponding log-likelihood function is: L ( y , y , . . . , y n ; β , θ ) = ln n (cid:214) i = P ( y i | f ( t i ; β , θ ) , σ ) = − σ n (cid:213) i = [ y i − f ( t i ; β , θ )] + const (3.3)The equation 3.3 implies that, in the ideal condition, using curve fitting to minimize the sum ofsquared residuals S is equivalent to maximizing the log-likelihood function of the noise distributions.In other words, curve fitting gives the maximum likelihood estimators of fitting parameters. Thisclaim reveals the statistical properties of the curve fitting method. It is based on a hypothesisof Gaussian noise distributions, which is a useful prior when our knowledge about the system islimited. – 4 – .1.3 Quantitative analysis of drift, change and noise In reality, the assumptions in section 3.1.2 are usually not valid. Variations in the fitting functionand the noise make the problem much more complicated. In this paper, we consider three types ofvariations which are representative in high energy physics:1.
Long-term drift . This kind of variation refers to the deviation in the system parameters θ after the circuit board is fabricated. It can also represent the persistent change betweentwo calibration runs. It will affect the pulse function consistently so that the event-by-eventcharacteristics stay the same for ADC sampling values.2. Short-term change . This kind of variation refers to the deviation in the system parameters θ between two events. It will change according to the current status of the detector, but itseffect is near-identical to all ADC sampling values in a single event. In other words, theevent-by-event characteristics will change in the operation of the experiment.3. Random noise . This kind of variation refers to the randomized noise n i residing in theobserved value y i . It will vary between ADC samples in a single event. Since it is random,the actual value of the noise is not predictable. However, its statistical features can bedetermined in advance.Next, we will introduce these variations into the curve fitting. We only consider the variationsthat are near the reference point so that the fitting result will not be rejected by the fitting process (i.e.without increasing the chi-square criterion significantly). When the above variations are present,by using the first-order approximation we can formulate y i as: y i = f ( t i ; β , θ ) + (cid:213) j ∂ f ( t i ; β , θ ) ∂θ j ∆ θ j + n i (3.4)Since we use the reference system parameters in the curve fitting, non-ideal y i will cause achange in the fitting parameters. By using the first-order approximation: f ( t i ; β , θ ) = f ( t i ; β , θ ) + (cid:213) j ∂ f ( t i ; β , θ ) ∂ β j ∆ β j (3.5)Curve fitting tries to minimize the sum of squared residuals by varying β . By applying thefirst-order necessary condition for a minimum, we get the following equation: ∇ β S = ∇ β n (cid:213) i = r i = ∇ β (cid:34) n (cid:213) i = ( y i − f ( t i ; β , θ )) (cid:35) = ( J T J ) ∆ β = J T ( P ∆ θ + n ) (3.7)where J ij = ∂ f ( t i ; β , θ ) ∂ β j P ij = ∂ f ( t i ; β , θ ) ∂θ j – 5 –f J T J is nonsingular, the deviation in the fitting parameters can be solved by: ∆ β = ( J T J ) − J T ( P ∆ θ + n ) (3.8)In general, equation 3.8 is a generalization of linear curve fitting to nonlinear cases. It impliesthat, under first-order approximations, the deviation of the fitting parameters around the referencepoint is linearly dependent on the deviation of the system parameters and random noise. To demonstrate the accuracy of first-order approximations in our pulse function, we compare theresults from calculating equation 3.8 to the results from directly applying curve fitting. For the pulsefunction in equation 2.1, we divide parameters in the following way without inducing a complicatedfunction family: β = { K , t } , θ = { τ p , b } (3.9)In the following simulations, we choose K = . t = . τ p = . b = . t = . t = . .
1, so there are a total of 33points. The value of K ensures that the amplitude is renormalized to a range in the interior of ( , ) .This parameterization is in accord with the PHOS electronics with 1 µ s shaping time (section 5.1). Long-term drift and short-term change
These two kinds of variations are associated withsystem parameters θ . We separate τ p and b and study their influence on fitting parameters K and t respectively. The simulation results are shown in figure 1. The solid line is calculated fromfirst-order approximations, and the solid dots are generated from curve fitting. It can be seen that ina region near the reference point the first-order approximations are fairly accurate. This is especiallytrue for ( t , τ p ) and ( K , b ) pairs, which have high correlations. In other two pairs, the discrepancyof first-order approximations and curve fitting is determined by higher order effects. Random noise
According to equation 3.8, if the per-sample noise is Gaussian, the linear mappingwill propagate the noise to the fitting parameters directly, so the distribution of fitting parameterswill also be Gaussian. On the other hand, if the per-sample noise is not Gaussian, the linearmapping will work in a similar way. In order to study the distribution of fitting parameters forthese non-Gaussian cases, we select two representative noise distributions, which are the crystalball distribution [22] and the Moyal distribution [23]. The former one has a long tail at the left-handside and the latter one has a long tail at the right-hand side. Their probability density functions are:crystal ball: f ( x , β, m ) = (cid:40) N exp (− x / ) , for x > − β N A ( B − x ) − m for x ≤ − β (3.10)Moyal: f ( x ) = exp (−( x + exp (− x ))/ )/√ π (3.11)For the crystal ball distribution, we choose β = m =
3, shift its center to [ . , . ] and downscaleit with 0.01. For the Moyal distribution, we shift its center to [ . , . ] and downscale it with0.00625. In addition, in order to study the non-negative effect in detector electronics, we clip the– 6 – s h i f t o f K i n c u r v e f i tt i n g actual pointlinear approximation (a) K vs. τ p −0.100 −0.075 −0.050 −0.025 0.000 0.025 0.050 0.075 0.100shift of tau_p−0.04−0.020.000.020.04 s h i f t o f t _ i n c u r v e f i tt i n g actual pointlinear approximation (b) t vs. τ p −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3shift of base−2−1012 s h i f t o f K i n c u r v e f i tt i n g actual pointlinear approximation (c) K vs. b −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3shift of base−0.3−0.2−0.10.00.10.20.3 s h i f t o f t _ i n c u r v e f i tt i n g actual pointlinear approximation (d) t vs. b Figure 1 . A gathering of figures for drift and change simulations. Each figure compares the result fromfirst-order (linear) approximations and the result from curve fitting directly. noise to force the noise values to 0 if they become negative. The simulation results are shownin figure 2. For each figure, we calculate the first-order approximations and run Monte Carlosimulation with a volume of 1000. It can be seen that although the noise distributions have strongnon-Gaussian features, the distributions of fitting parameters have Gaussian shapes. The mean andstandard deviation calculated from equation 3.8 can very well characterize the distributions fromcurve fitting. This implies that for medium size (33 in this case) of sampling points, distributionsof fitting parameters show accordance to the law of large numbers, which is a statistical property ofmany independent random variables.In conclusion, the first-order approximations can describe curve fitting in a simple and convenientway. Different variations can be viewed as independent forces that drive the deviation of fittingparameters, and their relation is additive. This paves the way for the comparison in section 4.2where we will demonstrate the potential of deep learning against this perspective.– 7 – .0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 0.0200shift of noise p.d.f.0.150.200.250.300.35 s h i f t o f K i n c u r v e f i tt i n g avg. of linear approx.error bandMonte Carlo simulation (a) K vs. shift of clipped crystal ball s h i f t o f t _ i n c u r v e f i tt i n g avg. of linear approx.error bandMonte Carlo simulation (b) t vs. shift of clipped crystal ball s h i f t o f K i n c u r v e f i tt i n g avg. of linear approx.error bandMonte Carlo simulation (c) K vs. shift of clipped Moyal s h i f t o f t _ i n c u r v e f i tt i n g avg. of linear appro .error bandMonte Carlo simulation (d) t vs. shift of clipped Moyal Figure 2 . A gathering of figures for noise simulations. Each figure compares the result from first-order(linear) approximations and the result from curve fitting directly.
Deep learning is a major breakthrough in recent years. It is based on neural networks, but itsfocus has shifted to building intricate network architectures for real-world applications (eg. image,voice, natural language, etc.). It started with image classification tasks [24, 25] and spread to otherdomains [26, 27] in artificial intelligence. Furthermore, it has been applied to high energy physicsin recent literatures [28–31]. In the following subsections, we discuss how to use deep learning tosolve the pulse timing problem.
The concept of neural networks is fundamental in deep learning. The basic element of a neuralnetwork is called a neuron . A neuron has N inputs and one output. Besides, it has N weights and onebias as its parameters. It computes the products of the inputs and the weights in an element-wisemanner, adds them together with the bias and uses a nonlinear activation function on the sum.Many similar neurons can act on the same inputs and form a layer. For a neural network with one– 8 –ntermediate layer, the output unit is also a neuron. The only intermediate layer is also called thehidden layer.A deep neural network usually refers to a network with more than one hidden layer. By takingthe output of the former layer as the input, hidden layers can be stacked. Increasing the depthof the network can gain additional power to extract structured features and reduce the number ofparameters needed to approximate some functions. In general, neural networks have promisingmathematical characteristics. They are supported by the universal approximation theorem [32, 33],which states that neural networks can approximate mathematical functions with arbitrary precisionswith enough neurons and layers.One successful network structure is the convolutional neural network [24]. It is based on theideas of weights sharing and shift invariance. Instead of connecting a neuron to all inputs, wecompute the output of a neuron in a vicinity (eg. a 2D patch) of the neuron. Besides, the weightsto produce the output are shared across different places. By taking these measures, the parametersin a neural network can be greatly reduced and the efficiency can be improved dramatically.To train a neural network, we need (input, label) pairs. The input is propagated through theneural network and compared to the label to compute a loss function. Then the loss is used to updatethe parameters of the whole network by the back propagation algorithm. The updating formula isusually based on the gradient descent method, i.e. descending the parameters in a direction whichreduces the loss function. The loss function is usually the cross entropy along with the softmaxfunction for a classification task [24], and the mean square error and its derivatives for a regressiontask.
With the knowledge above, it is ready to discuss the potential of deep learning and compare it tothe curve fitting method. The study is carried out in the aspect of variations in section 3.1.3.
Long-term drift
From the analysis in section 3, the long-term drift will introduce a bias to thefitting parameters. In a large detector system, correcting the bias is a tremendous task, and evenimpractical in some cases. For one thing, unlike the discussion in section 3.2, system parametersare hidden in the function and sometimes have very sophisticated forms. For another, the non-uniformity of different cells makes the problem even more complicated. Furthermore, if we viewthe bias in the non-Gaussian noise as a kind of long-term drift, the total effect is a mixture of severalaspects. To tackle the bias challenge, we can use a regression neural network to fix the influence ofthe long-term drift to the fitting parameters. Without loss of generality, we can assume that the lastlayer of the neural network has the form y = f ( x ; w , b ) = (cid:205) i w i · x i + b . Since the last layer hasa bias parameter b , if there is a persistent shift in the system, this shift will be counteracted by thebias parameter b through the training process. As long as the training label is sufficiently accurate,the bias can be greatly reduced by the neural network. Short-term change
For curve fitting, the short-term change has a direct impact on the precisionof the fitting parameters. In equation 3.8, it can be seen that the event-by-event variations of thesystem parameters θ will result in the fluctuation of the fitting parameters. The primary cause forthis phenomenon is that curve fitting treats each set of ADC samples as an independent and complete– 9 –et of features. However, different sets of ADC samples belong to the same function family, and anoverall understanding of the function family is beneficial to the explanation of the individual set offeatures. The optimization of neural networks is such a global process which is helpful to establishthe overall understanding. To see this point, we can rewrite the mapping of the neural network as: β (cid:48) = g ( f ( t ; β , θ ) + n ; W , B ) (4.1)where f ( t ; β , θ ) = ( f ( t ; β , θ ) , f ( t ; β , θ ) , . . . , f ( t n ; β , θ )) is the vector of sampling points, and W , B are the weights and biases of the neural network. When we optimize the model, the traininglabel will change consistently with the underlying fitting parameters β but remain the same whensystem parameters θ vary. As a result of training, the weights W and biases B of the neural networkfollow a gradient descent direction so that the change of β (cid:48) is proportional to the change of β butorthogonal to the change of θ . In other words, training increases the sensitivity to variations offitting parameters and reduces the sensitivity to variations of system parameters. In this way, theinfluence of the short-term change can be greatly alleviated. Random noise
We have already analyzed the Gaussian noise with the accurate fitting function insection 3.1.2. Here we focus on the noise with more complex forms. According to the central limittheorem, the distributions of fitting parameters will take Gaussian shapes when noise is presented.This is a degenerative process and could loss original information. To help understand the claim, wemight think of the development of modern physics. When the instrumentation was not so advanced,people could only observe macro phenomenons, which were normally distributed according tostatistical laws. Once the hardware condition had improved, people could measure the micromechanisms, and the fine structures could be found. In our problem, curve fitting does not utilizethe information in each time point sufficiently, and the loss of information can not be retrieved.On the other hand, we already know that neural networks have micro structures. This offers anopportunity to achieve better performance than curve fitting in the non-Gaussian settings. Sincethe nonlinear mapping in the activation function can implement a complicated function family, it ispossible to use neural networks to retrieve the origin information from noisy inputs.In conclusion, deep learning is a good alternative to the traditional curve fitting method in termsof drift, change and noise when used in an appropriate way.
In this part, we will discuss the implementation issues of deep learning in the specific pulse timingproblem. Although neural networks are promising according to the analysis in section 4.2, it doesnot mean that any structure will perform well. When facing a new problem, practitioners need tocustomize the network structure to make it suitable for the problem settings.We design our network architecture based on the ideas from [34, 35]. A diagram of the adoptedarchitecture is shown in figure 3. In principle, the network is comprised of two parts, a denoisingautoencoder and a regression network.The denoising autoencoder [36] is a network which tries to recover the original unstainedinput from its noisy version. A typical autoencoder is made up of a pyramid structure whichperforms feature extraction (encoding), and an inverted pyramid structure which restores original– 10 – r i g i na l i n t e r po l a t i on ne t w o r k i npu t s en c ode r l a y e r c ode r l a y e r c ode r l a y e r c ode r l a y e r c ode r l a y e r c ode r l a y e r t w o r k ou t pu t s f u ll yc onne c t ed l a y e r f u ll yc onne c t ed l a y e r f i na l ou t pu t c on v o l u t i on c on v o l u t i on c on v o l u t i on de c on v o l u t i on de c on v o l u t i on r eg r e ss i onne t w o r k deno i s i ngau t oen c ode r skip connections Figure 3 . A diagram of the network architecture. data (decoding). We add following features to the prototype of the autoencoder to improve itsperformance:1.
Convolution and deconvolution . In the encoder layers and decoder layers, we use convolution[24] and deconvolution [37] operations to replace the fully-connected layers. These operationscan utilize the locality of input features and extract structured patterns from data. In theconvolution, we use many groups of parameters (called a filter or kernel ) to compute theoutput (called a feature map ). Each filter has its own weights and bias, and it moves acrossthe input feature map to produce an one-dimensional output. Many filters will result in afeature map with many channels of one-dimensional data. In the deconvolution, the operationbetween the input and the output is transposed. For the same stride and padding, the outputshape of deconvolution operations will be the same as the input shape of the correspondingconvolution operations.2. Skip connections . Optimizing a deep neural network suffers from problems of vanish-ing/exploding gradients. Even when these problems are handled by normalization, a degrad-ing problem still affects the performance of the model. In [25], a dedicated structure calledthe residual network was suggested to solve the problem. In view of this work, we implementskip connections between the encoder layers and decoder layers to overcome the issues whentraining a deep network. Except the last layer, every layer in the encoder is directly copied tothe corresponding layer in the decoder. At the decoding side, the channels from the encoderand the channels from the main passage of the network are concatenated. In this way, therelation between long-range layers is preserved so that it is easier for the network to learnvaluable features from the input.3.
Leaky ReLU . The Rectified Linear Unit (ReLU) [38] is a kind of activation function whichis widely used in deep learning. Since ReLUs force the output to become zero when theinput is negative, it blocks the flow of information for a considerable amount of neurons in a– 11 –etwork. The leaky ReLU [39] is proposed to solve the problem. Unlike ReLUs, the leakyReLU has a gradual slope at the negative x-axis. It has a non-zero gradient even when theinput is negative. In our network, we use leaky ReLUs in the encoder layers.
Table 1 . Specification for the denoising autoencoder.
ConvolutionNo. stride filter width out channels leaky ReLU1 2 4 64 No ReLU2 2 4 128 Yes (0.2)3 2 4 256 Yes (0.2)4 2 4 512 Yes (0.2)5 2 4 512 Yes (0.2)6 2 4 512 Yes (0.2)7 2 4 512 Yes (0.2)8 2 4 512 Yes (0.2) DeconvolutionNo. stride filter width out channels dropout8 2 4 1024 Yes (0.5)7 2 4 1024 Yes (0.5)6 2 4 1024 Yes (0.5)5 2 4 1024 No4 2 4 512 No3 2 4 256 No2 2 4 128 No1 2 4 1 No
Specifically, the denoising autoencoder is a network with 8 × softmax layer is usedbetween the denoising autoencoder and the regression network.Training such a network can be divided into the following two steps:1. Autoencoder pre-training . It is strongly recommended to pre-train the denoising autoencoderas the first step of the training process. Based on the function of the autoencoder, we needto estimate the form of noise and generate (noisy input, unstained input) pairs as the (input,label) to train the network. To be more specific, first we randomly generate a set of samplingpoints according to the pulse function. Then we add per-sample noise to the sampling pointsaccording to the probability distribution of the estimated form of noise. If the expressionof the short-term change is known, it can also be used. Actually, only a rough estimate canimprove the final performance significantly (see section 5). In this stage, only simulation datais used.2.
End-to-end finetuning . After pre-training, we can use experimental data (if available) to makean end-to-end finetuning of the whole network. A precise label indicating the ground-truthparameter is used at the far-end of the network to generate a loss function. There are twooptions in finetuning. The first option is to keep the autoencoder unchanged and only finetunethe regression network. If there are no distinct changes in the pulse function compared tothe pre-training stage, this option can be used. The second option is to finetune the wholenetwork together. For this option, the pre-trained network only works as an optimal start– 12 –oint for finetuning, and the capacity of the model is larger (which also implies overfittingissues).
In this part, we run simulations of the proposed neural network regarding the variations discussedin section 3.1.3. Since the advantage of the neural network model on the long-term drift is evidentaccording to the discussion in section 4.2, we do not run simulations for this kind of variation.In order to study the variations, first we need to generate the simulation dataset. The pulsefunction is the same as section 3.2. In the following simulations, we choose K uniformly sampled inthe range [ . , . ] and t uniformly sampled in the range [− . , . ] . The reference values for τ p and b are 2.0, 0.1 respectively. The pulse for the noisy input (or the input with short-term change)is sampled from t = . t = . .
1. We drop the last point when training, sothere are 32 points. The same pulse for the label is sampled at a super-resolution ratio of 8 in thesame interval, so there are a total of 256 points. We gather the simulation samples into two separatedatasets. The training dataset has 40000 samples and the test dataset has 10000 samples.To calculate the timing resolution, we test different methods on the test dataset and get thepredicted values of the start time t . For curve fitting, the predicted values are the fitting parameters.For regression networks, the predicted values are the outputs of the networks. Then we use thedifference between the predicted values and the ground-truth values to make a Gaussian fit. Thestandard deviation of the Gaussian fit is a measure of the timing resolution. r e l a t i v e a m p li t u d e inputsoutputstargets (a) A typical figure of the inputs, the outputs and the targets(label) of the autoencoder. P r o b a b ili t y inputs: μ = 0.01110, σ = 0.00843 outputs: μ = 0.00310, σ = 0.00191 inputsoutputs (b) The RMS of amplitude between the inputs/outputs andthe ground-truth targets. The figure is plotted on the statis-tics of the whole test dataset. Figure 4 . The simulation results of the denoising autoencoder for the short-term change. ( left ) We choosea sample in the test dataset and plot the noisy input, the denoising outcome and the training label. ( right ) Wecalculate the Root Mean Square (RMS) between the inputs/outputs of the neural network and the ground-truthlabel for each sample in the test dataset. Then we make a Gaussian fit for all the samples and plot the figure.
Short-term change
To study the effects of the short-term change, we introduce the baselineshift, i.e. the variations of the pedestal b . The baseline shift is a common type of the short-termchange especially when the event rate is high so that nearby events will interplay. To construct– 13 – able 2 . Simulation results for the short-term change. The table compares different neural network modelswith curve fitting. model note converged timing resolution ( µ s)fitting original data — — 0.01217fitting autoencoder outputs only base network — 0.00296regression net v1 base network fixed successful 0.00303regression net v2 base network trainable successful 0.00182 the dataset, first we add the same shift to all sampling points in an event. The shift is randomlysampled from a Gaussian distribution with 0.1 mean and 0.014 standard deviation. The trainingtargets of the denoising autoencoder are set to have the pedestal b = .
1, which is the standard valueused in curve fitting. The results are shown in figure 4 and table 2. In the left figure, we can seethat although the pedestal b and the amplitude K are both random and have high correlation, thedenoising autoencoder can effectively perceive the change in the pedestal and cancel the change.The right figure shows the distribution of RMS based on the statistics of the whole test dataset.The average of RMS is reduced from 0.01110 to 0.00310 by a factor of 3.58. In the table, wecompare the timing resolution achieved by curve fitting and neural networks. In the first two lines,it can be seen that fitting the outputs of the denoising autoencoder is better than fitting original data,which demonstrates the effectiveness of the neural network structure. The result of the regressionnetwork v1 when the base network is fixed is slightly worse than fitting the outputs of the denoisingautoencoder. The best result (1.82 ns) comes with the regression network v2 when the base networkis trainable. It outperforms curve fitting results significantly. It implies that, for the short-termchange, when we choose a proper start point and finetune the whole network, the result can be evenbetter than the autoencoder alone. Random noise
We analyze two representative kinds of noise: the Gaussian noise and the clippedMoyal noise (see section 3.2).
Table 3 . Simulation results for the Gaussian noise. The table compares different neural network models withcurve fitting. model note converged timing resolution ( µ s)fitting original data maximum likelihood estimator — 0.01206only regression net no base network failed 0.26756fitting autoencoder outputs only base network — 0.01249regression net v1 base network fixed successful 0.01530regression net v2 base network trainable successful 0.01261 In the first place, we add the Gaussian noise with zero mean and 0.014 standard deviation.This introduces a noise ratio noise std. deviationaverage amplitude ≈ . .0 0.5 1.0 1.5 2.0 2.5 3.0time (μs)0.300.350.400.450.500.55 r e l a t i v e a m p li t u d e inputsoutputstargets (a) A typical figure of the inputs, the outputs and the targets(label) of the autoencoder. P r o b a b ili t y inputs: μ = 0.01390, σ = 0.00172 outputs: μ = 0.00372, σ = 0.00155 inputsoutputs (b) The RMS of amplitude between the inputs/outputs andthe ground-truth targets. The figure is plotted on the statis-tics of the whole test dataset. Figure 5 . The simulation results of the denoising autoencoder for the Gaussian noise. The figures areplotted in the same way as figure 4. test dataset. The average of the noise RMS is reduced from 0.01390 to 0.00372 by a factor of3.74. In the table, we use three neural network models and compare their performance with curvefitting. Since the Gaussian noise is the most common case, in the analysis we add the regressionnetwork alone for comparison. According to section 3.1.2, fitting original data gives the resultof the maximum likelihood estimator which is the theoretical lower bound. It can be seen thatthe network architecture is important to achieve the optimal performance. When we use only theregression network, the model fails to converge and gives a result worse than the sampling period.However, when we use the autoencoder-regression network architecture, the model can convergesuccessfully. The best result of neural networks comes from the regression network v2 with thebase network trainable. This shows the advantage of the model capacity in the problem.
Table 4 . Simulation results for the clipped Moyal noise. The table compares different neural network modelswith curve fitting. model note converged timing resolution ( µ s)fitting original data — — 0.01203fitting autoencoder outputs only base network — 0.00324regression net v1 base network fixed successful 0.00463regression net v2 base network trainable successful 0.00487 In the second place, we analyze the clipped Moyal noise. The original Moyal distribution isshifted to location 0.004, rescaled with 0.006 and then clipped for noise generation. Again, thenoise is more intense than reality. The results are shown in figure 6 and table 4. In the left figure,we can see that the unique structure of the denoising autoencoder can very well get the clue ofthe ground-truth target from the noisy input. To further illustrate the idea, we plot the distributionof RMS on the test dataset in the right figure. The average of the noise RMS is reduced from0.01722 to 0.00093 by a factor of 18.52. This exceeds the results from former simulations. In the– 15 – .0 0.5 1.0 1.5 2.0 2.5 3.0time (μs)0.30.40.50.6 r e l a t i v e a m p li t u d e inputsoutputstargets (a) A typical figure of the inputs, the outputs and the targets(label) of the autoencoder. P r o b a b ili t y inputs: μ = 0.01722, σ = 0.00313 outputs: μ = 0.00093, σ = 0.00046 inputsoutputs (b) The RMS of amplitude between the inputs/outputs andthe ground-truth targets. The figure is plotted on the statis-tics of the whole test dataset. Figure 6 . The simulation results of the denoising autoencoder for the clipped Moyal noise. The figures areplotted in the same way as figure 4. table, we compare the timing resolution between curve fitting and neural networks. In the first twolines, it can be seen that curve fitting with the denoising autoencoder alone can improve the timingresolution significantly. Besides, when regression networks are added, the models can successfullyconverge and show competitive results. In this case, keeping the base network fixed (regressionnetwork v1) is slightly better than making the base network trainable (regression network v2), whichdemonstrates the good baseline provided by the autoencoder.To conclude the simulation results, the network architecture proposed in section 4.3 can very welltackle the non-ideal conditions. Finetuning the whole network together can achieve results betterthan fitting the outputs of autoencoder when the short-term change is applied, but slightly worsewhen the random noise is applied. Finetuning the regression network alone can sometimes achievebetter results than finetuning the whole network, especially when the base network is accurate. Inexperimental conditions, it is not always possible to provide exact training targets for the denoisingautoencoder as in the simulations. Thus, finetuning the regression network with the precise timelabel is vital to improve the performance of the whole network.
We build a hardware test platform to study the pulse timing in the real-world environment. Aphotograph of the platform is shown in figure 7. The test platform is based on the PHOS detector(section 2). We use a pulse generator to produce pulses with the ∼
50 ns width and the ∼
10 Hzfrequency. This pulse signal drives a LED to produce light for the PHOS crystal. The scintillationis collected by the APD and passed to the CSA. Then it is transmitted to the CR-RC2 shaper onthe FEE card. The output of the CR-RC2 shaper is hardwired to the AD9656 data acquisitionboard, which is connected to the HPDAQ motherboard for TCP/IP communications. The AD9656is a 4-channel ADC chip with 2.8 V dynamic range, 16-bit precision and 125 MHz sampling rate.– 16 – igure 7 . A photograph of the hardware test platform with the PHOS detector, AD9656 data acquisitionboard and HPDAQ.
Choosing such a high-speed ADC chip makes it possible to compare the performance of curvefitting and the neural network model with different number of sampling points.To prepare the datasets, we watch two channels of signals simultaneously. One channel is thetrigger signal driving the LED, and the other channel is the output of the shaper on the FEE card.We randomly choose a fixed-interval section from the most salient part of the output pulse. Thenwe add a label to the pulse according to the interval between the trigger signal and the selectedsection. This label is used to train the neural network and work as the baseline for curve fitting. Wenormalize the amplitude of the ADC sampling points to the range similar to section 3.2 and section4.4. We collect 80000 samples for the training dataset and 20000 samples for the test dataset. µ s shaping time In this part, we conduct experiments with 1 µ s shaping time (2 µ s peaking time) which is theALICE PHOS specification. The sampling section has a span of 3072 ns. We choose 2 k + k ≥
2) or quadratic (for k =
1) interpolation when training the neural network.We pre-train the model under the assumption of the Gaussian noise with the parameterization insection 4.4. Then we finetune the whole network using the experimental data. The base network istrainable when finetuning.We analyze 6 different conditions with k = , , , , ,
6. This gives an approximate sampling– 17 – t i m i n g r e s o l u t i o n ( n s ) curve fitting3 5 9 17 33 65number of sampling points8.08.18.28.38.48.58.6 t i m i n g r e s o l u t i o n ( n s ) neural network (a) timing resolution s y s t e m b i a s ( n s ) curve fitting3 5 9 17 33 65number of sampling points−15−10−5051015 s s t e m b i a s ( n s ) neural network (b) system bias Figure 8 . Experimental results for the 1 µ s shaping time. rate of 0.625 MHz, 1.25 MHz, 2.5 MHz, 5 MHz, 10 MHz and 20 MHz respectively. We performan independent training process using the same pre-trained model. Then we test our model on thecorresponding test dataset and make a Gaussian fit of the residuals (difference between regressionoutputs and time labels) to get the mean and the standard deviation. The standard deviation of theGaussian fit is a measure of the timing resolution and the mean is a measure of the system bias.For curve fitting, we use the same sampling points and fit the residuals (difference between fittingparameters and time labels) to a Gaussian distribution.We use a batch size of 16 when training the neural network, and the training proceeds for 10epoches. The final result and error bar (1 σ error) for the neural network are calculated by the testresults paused at even number of training epoches.The main result is shown in figure 8. In the left figure, it can be seen that the neural networkworks better than curve fitting steadily. With as few as 3 sampling points, the two methods canalready achieve relatively good performance. When sampling points increase, the results improveslightly. When we use greater or equal than 17 sampling points, the performance of curve fitting hitsa plateau, but the neural network can still improve. The best performance achieved by the neuralnetwork is 8 . ± .
11 ns, which is 27 .
3% better than curve fitting (11.31 ns).In the right figure, the system bias is greatly reduced by the neural network model comparedto curve fitting. From directly observation, the interval between the start of the trigger signal andthe start of the shaped pulse is approximately 15 sampling points (120 ns), which is close to resultsfrom curve fitting (137.94 ns to 148.11 ns). The bias for the neural network fluctuates around thehorizontal axis. Since the bias is a fixed value for a given model, it can be calibrated in the sameway as curve fitting, and the burden for calibration is considerably alleviated.– 18 – .81.92.02.12.2 t i m i n g r e s o l u t i o n ( n s ) curve fitting3 5 9 17 33number of sampling points1.31.41.51.61.7 t i m i n g r e s o l u t i o n ( n s ) neural network (a) timing resolution s y s t e m b i a s ( n s ) curve fitting3 5 9 17 33number of sampling points 3 2 10123 s y s t e m b i a s ( n s ) neural network (b) system bias Figure 9 . Experimental results for the 100 ns shaping time.
In this part, we conduct experiments with 100 ns shaping time (200 ns peaking time) which is theALICE EMCal specification. We replace resistors and capacitors in the CR-RC2 shaper on the FEEcard to achieve a shorter shaping time. The sampling section has a span of 256 ns. We choose2 k + k = , , , ,
5. This gives a sampling rate of7.8125 MHz, 15.625 MHz, 31.25 MHz, 62.5 MHz and 125 MHz respectively. Other experimentalconditions and procedures are similar to section 5.1.To determine the label for curve fitting and the neural network with a precision superior to thesampling period, we fit the trigger signal to the square pulse response of a second-order system: Y step ( t ) = K (cid:18) + T T − T e −( t − t s )/ T − T T − T e −( t − t s )/ T (cid:19) u ( t − t s ) (5.1) Y square ( t ) = Y step ( t ) − Y step ( t − w ) (5.2)where u ( t ) is the step function, Y step ( t ) is the overdamped step response of a second-order system. K and t s are parameters to be fitted, and other parameters are fixed according to the circuit specificationand the experimental observation. t s is used as the label to judge the quality of curve fitting andtrain the neural network.The main result is shown in figure 9. In the left figure, the timing resolution has improvedsignificantly compared to the 1 µ s shaping time. Again, the neural network outperforms curvefitting. When the number of sampling points increases from 3 to 33, the precision of the neuralnetwork and curve fitting increases slightly, and the trend gradually slows down. The neural networkachieves the optimal result 1 . ± .
03 ns at 17 sampling points, which is 24 .
7% better than curvefitting (1.82 ns). – 19 –n the right figure, the system bias of the neural network model is much less than curve fitting.Basically, curve fitting has a large system bias (90.16 ns to 91.73 ns) which is in accord withapproximate 96 ns from direct observation, but the neural network model suppresses the absolutevalue of the bias to less than 2 ns. This facilitates the calibration and improves the overall stabilityof the timing system.
In the above experiments, a relation between the shaping time of the shaper and the timing resolutionis being considered. The experimental results show that, decreasing shaping time can potentiallyincrease the timing resolution when other conditions are the same. In the frequency domain,a shorter shaping time means a bandpass filter with higher cut-off frequency. Therefore, moreinformation about the original event is kept. In the time domain, shorter shaping time can alleviatethe long-range misfit problem. To be more specific, in the experiments of 1 µ s shaping time,sampling points are far away from the desired start time t ; thus any slight discrepancy betweenthe fitted model and the ideal model will cause a large deviation in the value of t . The similarissue applies to the neural network if we view the discrepancy as an intrinsic error and a source ofmisunderstanding. To use the 100 ns shaping time, the distance between sampling points and thestart time is shortened and the long-range problem is properly handled.However, on the other hand, when the shorter shaping time is used, the influence of threekinds of variations (especially short-term change and random noise) is relatively more significant.Besides, since the width of the LED pulse is less than 50 ns, signal integrity issues (especiallyovershooting) affect the precision of the fitted label. As a result, the improvement of timingresolution is worse than estimates based on a proportional hypothesis ( ∼ The classic curve fitting method uses a Gaussian noise hypothesis, and its performance is guaranteedby its statistical properties. However, when the long-term drift, short-term change and random noiseare presented in the pulse function, the limitation of curve fitting emerges. Among the possiblealternatives, neural networks show strong resistance to these three kinds of variations by its delicatestructure and optimization process. Simulations and experiments demonstrate its superiority overcurve fitting.Nevertheless, neural networks have their special requirements which pose new challenges tothe design of the detector system. Since most deep learning methods are based on the supervisedlearning, an accurate label for training is needed. Sometimes acquiring the label is not an easy task,especially when the detector system has complex geometric structures and intricate components.This raises the demand for the traceable design, i.e. a design scheme in which the timing informationcan be traced back internally through the calibration process. From this perspective, we sincerelyhope our work will provide a new way of thinking in the future design of timing systems.– 20 – cknowledgments
This research is supported by the National Natural Science Foundation of China (Grant Number11875146, 11505074, 11605051).
References [1] R. Grzywacz,
Applications of digital pulse processing in nuclear spectroscopy , Nucl. Instrum. Meth.B (2003) 649.[2] E. Samain,
Timing of optical pulses by a photodiode in the geiger mode , Appl. Opt. (1998) 502.[3] C. Han, I. F. Akyildiz and W. H. Gerstacker, Timing acquisition and error analysis for pulse-basedterahertz band wireless systems , IEEE Tran. Veh. Technol. (2017) 10102.[4] ALICE Collaboration, Performance of the ALICE experiment at the CERN LHC , Int. J. Mod. Phys. A (2014) 1430044.[5] ATLAS collaboration, Observation of a new particle in the search for the Standard Model Higgsboson with the ATLAS detector at the LHC , Phys. Lett. B (2012) 1 [ arXiv:1207.7214 ].[6] P. Antonioli and S. Meneghini,
A 20 ps tdc readout module for the alice time of flight system: designand test results , in
Proceedings of the 9 th Workshop on Electronics for LHC Experiments ,Amsterdam, The Netherlands, 29 September – 3 October 2003, pp. 311–315 [CERN-2003-006].[7] G. Mauri, M. Mariotti, F. Casinini, F. Sacchetti and C. Petrillo,
Pulse shape analysis of neutronsignals in Si-based detectors , arXiv:1805.01261 .[8] K. Mahata, A. Shrivastava, J.A. Gore, S.K. Pandit, V.V. Parkar, K. Ramachandran et al., Particleidentification using digital pulse shape discrimination in a nTD silicon detector with a 1 GHzsampling digitizer , Nucl. Instrum. Meth. A (2018) 20 [ arXiv:1804.01985 ].[9] LUX collaboration,
Liquid xenon scintillation measurements and pulse shape discrimination in theLUX dark matter detector , Phys. Rev. D (2018) 112002 [ arXiv:1802.06162 ].[10] Y. Ashida, H. Nagata, Y. Koshio, T. Nakaya and R. Wendell, Separation of gamma-ray and neutronevents with CsI(tl) pulse shape analysis , Progr. of Theor. Exp. Phys. (2018) 043H01.[11] J. Kaspar et al.,
Design and performance of SiPM-based readout of PbF crystals for high-rate,precision timing applications , 2017 JINST P01009 [ arXiv:1611.03180 ].[12] P. J. Fish,
Electronic noise and low noise design , Macmillan International Higher Education (2017).[13] Y. LeCun, Y. Bengio and G. Hinton,
Deep learning , Nature (2015) 436.[14] J. Griffiths, S. Kleinegesse, D. Saunders, R. Taylor and A. Vacheret,
Pulse Shape Discrimination andExploration of Scintillation Signals Using Convolutional Neural Networks , arXiv:1807.06853 .[15] MicroBooNE collaboration, A Deep Neural Network for Pixel-Level Electromagnetic ParticleIdentification in the MicroBooNE Liquid Argon Time Projection Chamber , arXiv:1808.07269 .[16] P. Ai, D. Wang, G. Huang and X. Sun, Three-dimensional convolutional neural networks forneutrinoless double-beta decay signal/background discrimination in high-pressure gaseous TimeProjection Chamber , 2018
JINST P08015 [ arXiv:1803.01482 ].[17] H. Muller, R. Pimenta, Z. Yin, D. Zhou, X. Cao, Q. Li et al.,
Configurable electronics with low noiseand 14-bit dynamic range for photodiode-based photon detectors , Nucl. Instrum. Meth. A (2006)768. – 21 –
18] ALICE collaboration,
The ALICE experiment at the CERN LHC , 2008
JINST S08002.[19] H. Torii,
The ALICE PHOS calorimeter , J. Phys. Conf. Ser. (2009) 012045.[20] A. Fantoni,
The ALICE Electromagnetic Calorimeter: EMCAL , J. Phys. Conf. Ser. (2011)012043.[21] D. W. Marquardt,
An algorithm for least-squares estimation of nonlinear parameters , J. Soc. Ind. andAppl. Math. (1963) 431.[22] J. Gaiser, Appendix-F Charmonium Spectroscopy from Radiative Decays of the J/ ψ and ψ (cid:48) , Ph.D.thesis, SLAC, 1982.[23] C. Walck, Hand-book on statistical distributions for experimentalists , Tech. Rep., Particle PhysicsGroup, Fysikum, University of Stockholm (1996).[24] A. Krizhevsky, I. Sutskever and G. E. Hinton,
Imagenet classification with deep convolutional neuralnetworks , in
Advances in neural information processing systems , 2012, pp. 1097–1105.[25] K. He, X. Zhang, S. Ren and J. Sun,
Deep residual learning for image recognition , in
Proceedings ofthe IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778.[26] G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly et al.,
Deep neural networks foracoustic modeling in speech recognition: The shared views of four research groups , IEEE SignalProcessing Mag. (2012) 82.[27] Y. Shen, X. He, J. Gao, L. Deng and G. Mesnil, Learning semantic representations usingconvolutional neural networks for web search , in
Proceedings of the 23 rd International Conferenceon World Wide Web , 2014, pp. 373–374.[28] L. de Oliveira, M. Kagan, L. Mackey, B. Nachman and A. Schwartzman,
Jet-images – deep learningedition , JHEP (2016) 69 [ arXiv:1511.05190 ].[29] E. Racah, S. Ko, P. Sadowski, W. Bhimji, C. Tull, S.-Y. Oh et al., Revealing fundamental physics fromthe daya bay neutrino experiment using deep neural networks , in
Proceedings of the 15 th IEEEInternational Conference on Machine Learning and Applications (ICMLA) , 2016, pp. 892–897.[30] J. Renner, A. Farbin, J. M. Vidal, J. Benlloch-Rodríguez, A. Botas, P. Ferrario et al.,
Backgroundrejection in NEXT using deep neural networks , 2017
JINST T01004 [ arXiv:1609.06202 ].[31] R. Acciarri, C. Adams, R. An, J. Asaadi, M. Auger, L. Bagby et al.,
Convolutional Neural NetworksApplied to Neutrino Events in a Liquid Argon Time Projection Chamber , 2017
JINST P03011[ arXiv:1611.05531 ].[32] K. Hornik, M. Stinchcombe and H. White,
Multilayer feedforward networks are universalapproximators , Neural networks (1989) 359.[33] G. Cybenko, Approximation by superpositions of a sigmoidal function , Math. Control Signal. (1992) 455.[34] P. Isola, J.-Y. Zhu, T. Zhou and A. A. Efros, Image-to-image translation with conditional adversarialnetworks , in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2017, pp. 5967–5976.[35] V. Kuleshov, S. Z. Enam and S. Ermon,
Audio super resolution using neural networks , arXiv:1708.00853 .[36] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P.-A. Manzagol, Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion , J. Mach. Learn.Res. (2010) 3371. – 22 –
37] H. Noh, S. Hong and B. Han,
Learning deconvolution network for semantic segmentation , in
Proceedings of the IEEE international conference on computer vision , 2015, pp. 1520–1528.[38] V. Nair and G. E. Hinton,
Rectified linear units improve restricted boltzmann machines , in
Proceedingsof the 27 th International Conference on Machine Learning (ICML-10) , 2010, pp. 807–814.[39] B. Xu, N. Wang, T. Chen and M. Li,
Empirical evaluation of rectified activations in convolutionalnetwork , arXiv:1505.00853 .[40] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov, Dropout: a simple wayto prevent neural networks from overfitting , J. Mach. Learn. Res. (2014) 1929.(2014) 1929.