[PDF] A deep learning approach to multi-track location and orientation in gaseous drift chambers

Abstract

Accurate measuring the location and orientation of individual particles in a beam monitoring system is of particular interest to researchers in multiple disciplines. Among feasible methods, gaseous drift chambers with hybrid pixel sensors have the great potential to realize long-term stable measurement with considerable precision. In this paper, we introduce deep learning to analyze patterns in the beam projection image to facilitate three-dimensional reconstruction of particle tracks. We propose an end-to-end neural network based on segmentation and fitting for feature extraction and regression. Two segmentation branches, named binary segmentation and semantic segmentation, perform initial track determination and pixel-track association. Then pixels are assigned to multiple tracks, and a weighted least squares fitting is implemented with full back-propagation. Besides, we introduce a center-angle measure to judge the precision of location and orientation by combining two separate factors. The initial position resolution achieves 8.8 μm for the single track and 11.4 μm (15.2 μm ) for the 1-3 tracks (1-5 tracks), and the angle resolution achieves 0.15 ∘ and 0.21 ∘ (0.29 ∘ ) respectively. These results show a significant improvement in accuracy and multi-track compatibility compared to traditional methods.

Full PDF

AA deep learning approach to multi-track location andorientation in gaseous drift chambers

Pengcheng Ai a , Dong Wang a , Xiangming Sun a, ∗ , Guangming Huang a, ∗ , ZiliLi a a Central China Normal University, No.152 Luoyu Road, Wuhan, Hubei, 430079P.R.China

Abstract

Accurate measuring the location and orientation of individual particlesin a beam monitoring system is of particular interest to researchers in multi-ple disciplines. Among feasible methods, gaseous drift chambers with hybridpixel sensors have the great potential to realize long-term stable measurementwith considerable precision. In this paper, we introduce deep learning to an-alyze patterns in the beam projection image to facilitate three-dimensionalreconstruction of particle tracks. We propose an end-to-end neural networkbased on segmentation and ﬁtting for feature extraction and regression. Twosegmentation heads, named binary segmentation and semantic segmentation,perform initial track determination and pixel-track association. Then pixelsare assigned to multiple tracks, and a weighted least squares ﬁtting is im-plemented with full back-propagation. Besides, we introduce a center-anglemeasure to judge the precision of location and orientation by combining twoseparate factors. The initial position resolution could achieve 8.8 µm for thesingle track and 11.4 µm (15.2 µm ) for the 1-3 tracks (1-5 tracks), and theangle resolution could achieve 0.15 ◦ and 0.21 ◦ (0.29 ◦ ) respectively. Theseresults show a signiﬁcant improvement in accuracy and multi-track compat-ibility compared to traditional methods. Keywords:

Multi-track location and orientation, Pixel sensors, Gaseousdrift chambers, Convolutional neural networks, Deep learning, Imagesegmentation, Weighted least squares ﬁtting ∗ Corresponding authors

Email addresses: [email protected] (Xiangming Sun), [email protected] (Guangming Huang)

Preprint submitted to Nuclear Instruments and Methods in Physics Research Section AMay 21, 2020 a r X i v : . [ phy s i c s . d a t a - a n ] M a y . Introduction In the context of high energy physics, the beam physics studies the char-acteristics of particles with similar position and momentum under the elec-tromagnetic ﬁelds. The charged particles range from electrons, positrons,protons to some medium-weight or heavy ions. To accelerate the chargedparticles to relatively high energy, synchrotrons with alternative electric ﬁeldsand gradually increased magnetic ﬁelds are constructed to force the particlesto speed up and deﬂect. Particle beams are widely used in many appliedand experimental domains. For example, in semiconductor industry, ions areaccelerated to stop and implant into the substrate; in radiation medicine,hadrons are focused onto the pathological tissues; in high energy physics ex-periments, particles with extremely high energy are collided and observed,and so on.Under many circumstances, measuring the information (position, angle)of individual particles in a beam with high precision has great signiﬁcance.For example, in large-scale accelerator equipment [1], the initial beam un-dergoes multiple manipulations and its proﬁle becomes very complicated, soit is necessary to accurately measure the particles in the beam. Another ex-ample is the hadron therapy [2] intensely researched in recent years. In orderto focus the beam at the location of tumor and form the Bragg Peak [3],online monitoring the location and orientation of the beam is vital to avoiddamaging the normal body. According to diﬀerent applications, the requiredprecision of the position varies from millimeter-scale to micrometer-scale.Based on diﬀerent detection principles, there are several methods to ac-curately determine the information of the beam. In [4] and [5], parallel-plateionization chambers and strip electrodes were used to measure the ionizationcharges induced by the incoming particle so as to measure the incident loca-tion. Besides, diamond sensors [6] using the chemical vapor deposition wereradiation hardened and had high electron and hole mobility, which could beused for beam measurement in extreme environments. In [7], array structuresmade up of the scintillator and photo multiplier tubes (or silicon photo mul-tipliers) could realize measurement with high timing resolution and mediumspatial resolution. Finally, in [8], ﬁber optic radiation sensors transmittingﬂuorescent light were arranged in a grid to achieve sub-millimeter precisionof spatial resolution. 2part from above methods, silicon-based pixel sensors featuring excellentspatial and timing resolution, rapid response and ﬂexible readout are goodcandidates in the scenario of beam monitoring. According to the relativelocation between the pixel sensor and the beam, the detection methods canbe divided into two types: targeting and drifting . The targeting type placesthe pixel sensor facing the incoming beam. The particles in the beam pen-etrate the pixel sensor and generate charges collected by the electrodes [9].Generally, the targeting type has rigorous demands on the radiation harden-ing, and in most cases the incoming particles are light-weighted, because thedeposition of heavy ions will cause the malfunction of the sensor. Relativelyspeaking, the drifting type utilizes the ionization of beam particles in thegaseous chamber, and an electric ﬁeld vertical to beam direction is imposed.Under the electrostatic force, ionization charges reach the pixel sensor on thedetecting plane. The sensor only receives the drifting charges and measuresthe track of beam particles indirectly. Because the sensor is placed parallelto the beam direction, high energy particles will not run across the sensor.Hence, the radiation damage to the sensor is negligible, and there is no limi-tation on the kind of particles, as long as the ionization is signiﬁcant enoughto be detected. However, the charge cloud will diﬀuse in transverse (parallelto the pixel sensor) and longitudinal (vertical to the pixel sensor) directions.How to tackle the diﬀusion and improve spatial and timing resolution is amain issue for the drifting type.In such a measuring system based on the gaseous drift chamber and thepixel sensor, enabling multi-track location and orientation has a lot of mer-its (discussed in Section 2). In order to obtain the ability, we need to in-telligently ﬁgure out the presented tracks from an image generated by thepixel sensor, and output the per-track information. The procedure can bedivided into the step of segmentation and the step of regression. Regard-ing the segmentation problem, much progress has been achieved in recentyears because of the renowned deep learning [10] techniques. Some repre-sentative examples include: the Fully Convolutional Network [11] replacedthe traditional fully connected layers with convolution layers, and used up-sampling to get pixel-level classiﬁcation results; SegNet [12] constructed theencoder-decoder architecture and mapped the index of pooling in the encoderlayers to corresponding decoder layers; ENet [13] worked based on the formertwo structures, and achieved similar results with less parameters; recently,LaneNet [14] adopted the overall architecture from ENet, and used instancesegmentation to assist clustering before curve ﬁtting.3n this paper, we present a deep learning architecture to analyze theparticular patterns of ionization track projections acquired from the pixelsensor. It is the most important process to reconstruct the three-dimensional(3D) track and get its location and orientation. The main contributions arelisted as follows: • We create an end-to-end neural network based on segmentation andﬁtting. The base network takes the encoder-decoder architecture, andthe binary segmentation head and the semantic segmentation head arebuilt upon the base network. The network can eﬀectively extract fea-tures from the raw track image, and facilitate subsequent operations. • We invent a pixel assignment algorithm to assign the pixels on theheat-map to multiple tracks. The pixel assignment combines the resultsfrom two heads, and pave the way for subsequent weighted least squaresﬁtting. • We implement the weighted least squares ﬁtting in the software frame-work of deep learning. The whole network can be optimized end-to-endthrough back-propagation, and we show the improvement of perfor-mance by ﬁnetuning the network in an end-to-end way. • To evaluate the results, we propose a center-angle measure (CAM)which combines the information from both the location regression andthe angle regression. Based on this measure, we investigate the de-tection rate (for the single track) and F1-scores (for multiple tracks)versus the CAM threshold. • Finally, to demonstrate the practicality of the method, we apply theneural network model to experimental images. The results show thecorrectness of the simulation process and conﬁrm the same model cangeneralize to experimental data.

2. Multi-Track Measuring System

Since 2012, the

Topmetal series [15, 16, 17] have been researched anddeveloped for beam monitoring applications. The

Topmetal is a hybrid pixelsensor with position and amplitude resolution. It features ultra low noiselevel (equivalent noise charge less than 15 electrons) and high charge sensi-tivity (10-100 µ V/electron). Around 2016, we performed the beam test using4arbon ions with 80.55 MeV/u (mega electron volt per nucleon) energy atHeavy Ion Research Facility in Lanzhou (HIRFL). With the array structureof

Topmetal-II- , the spatial resolution was better than 20 µ m and the angleresolution was better than 0.5 ◦ [2]. Around 2019, the same pixel sensor wastested using krypton ions with 25 MeV/u at HIRFL. After pre-selection ofimages, better results were achieved (4 µ m and 0.15 ◦ ) [18]. Based on theformer experiences, we plan to develop new Topmetal chips for the CoolerStorage Ring External Target Facility Experiment (CEE). The following twosub-sections will discuss the necessity for multi-track measuring and intro-duce the proposed multi-track measuring system.

The original beam monitoring system [18] was designed to locate single-event latchups of the device under test (DUT). The instrument is the sameas Figure 2, except for the time sensitive region which is not present on

Topmetal-II- . The locating system is comprised of two independent gaseousdrift chambers, each of which is ﬁlled with air at 1 atm. The pixel sensorworks as the anode (0 V), and the face-to-face metal board works as thecathode (-500 V). The DUT is placed on the right side of the second driftchamber. When an ion with relatively high energy passes through the de-tecting instrument, it will collide with gas molecules and leave a track ofelectron-positive ion pairs. Under the eﬀect of the electric ﬁeld, the elec-trons move towards the pixel sensor anode, and positive ions move towardsthe metal board cathode. The ionization electrons are collected by the pixelelectrodes. Charge signals are read out line by line and pixel by pixel ata working frequency (1.5625 MHz). Finally, track projection images with aﬁxed refreshing rate (3.3 ms) are sent to and processed by the computer.Strip-like patterns will appear on the track projection images (dark yellowregions in Figure 2). The width of the strip is determined by the diﬀusioncoeﬃcients and the drifting distance. Usually the energy of the incoming ionis high enough so that colliding with gas molecules has a minor eﬀect on itsdirection. If we assume the electric ﬁeld is uniform and the ionization chargesare distributed ideally, the center line of the strip can represent the directprojection of traversing path of the ion. One projection line can determinea plane perpendicular to the pixel sensor anode; two such planes have anintersecting line, which can be regarded as the traversing path of the ion.We performed beam test with krypton ions at HIRFL. A photograph ofthe testing scene is shown in Figure 1. The ﬂux ranged from 10 ions/(cm · s)5 igure 1: The photograph of the former beam test at HIRFL with the gaseous driftchambers and the Topmetal pixel sensor. to 1 . × ions/(cm · s). When operating at a low ﬂux, most images fromthe pixel sensor were empty with background noise, and a few had a singleprojection track; increasing ﬂux would result in more images with a singletrack, and images with two or more tracks started to appear (pile-up); whenwe continued to increase the ﬂux, pile-ups became prevalent; at a very highﬂux, substantial pile-ups would make individual tracks unrecognizable. Table 1: The relation between the quantile of the Poisson distribution and the dis-tribution parameter λ at diﬀerent detection rates. Probability distribution function: f ( k ) = exp ( − λ ) λ k / ( k !) Quantile 95% detection 97% detection 99% detection λ rate λ rate λ ratek=1 0.3554 1.00 0.2676 1.00 0.1486 1.00k=2 0.8177 2.30 0.6649 2.48 0.4361 2.93k=3 1.3664 3.84 1.1551 4.32 0.8233 5.54k=4 1.9702 5.54 1.7061 6.38 1.2792 8.61k=5 2.6131 7.35 2.3005 8.60 1.7853 12.01According to the above observations, the capability to accurately analyzethe track information, especially in pile-up situations, is of great importance6o improve the eﬃciency of the detecting instrument. Some empirical meth-ods could very well determine the single track information. To avoid losingvaluable events when the pile-up happens, the system must operate at a rel-atively low ﬂux. This will inevitably prolong the time needed to reach therequired dose of radiation. On the other hand, if we could locate and orientmultiple tracks when they are recognizable, the ﬂux might increase signif-icantly at the same detection rate. Assuming the incoming ions in a unitof time obey the Poisson distribution, we could argue that the distributionparameter λ is proportional to the ﬂux. Table 1 shows the relation betweenthe quantile and distribution parameter λ at diﬀerent detection rates. It canbe seen that when the detection rate is ﬁxed, λ will increase with respect tothe increase of the quantile, and the tendency gradually becomes speedingup. This phenomenon is more obvious at high detection rates. We mighttake the ﬁrst line and the third line in the table as an example. When the de-tection rates are 95%, 97%, 99%, the λ values when k=3 are 3.84, 4.32, 5.54times bigger than those when k=1. It demonstrates that increasing recog-nizable pile-ups will make it possible for the system to work at higher ﬂuxes.Meanwhile, the decrease of empty background images could also improve theeﬃciency of the data transmission back-end. In Section 2.1, we introduce the detecting instrument with an emphasison the necessity to measure multiple tracks. However, we do not settle theissue of ambiguity when reconstructing the 3D track from coordinate planeprojections. Since each track projection could determine a perpendicularplane, if there are several tracks projecting on the two coordinate planes (as-suming n tracks), the reconstructed 3D tracks have more than one possibility(total n!) following the law of combination. to eliminate the ambiguity, thereare some feasible methods, such as adding a third projection plane, settinga reference point outside the detecting instrument, building extra detectorsfor coincidence measurement, and so on.Beyond the above-mentioned methods, we would like to solve the problemfrom inside the pixel sensor. This is why we propose to design

Topmetal-CEE specially optimized for the CEE project [19]. Figure 2 shows the conceptualdesign of the detecting system enabled by

Topmetal-CEE . The major innova-tion compared to what is discussed in Section 2.1 is the time sensitive region integrated with the chip. This region has excellent timing ability (betterthan 1 µ s) and reasonable position resolution (depending on the pixel size,7 ‘ Y‘ z‘ X‘’ Y‘’ z‘’ X Y zE ECathode CathodeTopmetal anode Topmetal anode TargetWidth L e n g t h T i m e s e n s i t i v e r e g i o n T i m e s e n s i t i v e r e g i o n T i m e s e n s i t i v e r e g i o n T i m e s e n s i t i v e r e g i o n + + +- - -+ + +-- - - -+ +-+ -+ Figure 2: A conceptual design of the multi-track measurement system enabled by the

Topmetal-CEE pixel sensor. This ﬁgure shows two incoming particles and their ionizationtrack projections onto the

Topmetal-CEE chip. The

Topmetal-CEE provides position &litude information (the square area) and time information (the time sensitive region). usually ∼ µ m). When a high energy ion passes through the drift chamber,the drifting charges will trigger the time sensitive region on both sides of thepixel sensor. The timing information is thus recorded and buﬀered inside thechip. At ﬁxed intervals, the recorded times are read out and used to matchthe tracks projecting on diﬀerent coordinate planes.In Figure 3, we plot a typical timeline of the readout scheme. Validions will come cross all four time sensitive regions, so their directions havea relatively small angle with regard to the z-axis in Figure 2. The maincomplexities arise from the row-major order of the rolling-shutter readout.We assume synchronization is maintained between the two pixel sensors, andthe spatial and amplitude readout is reset at periodical slots (only the readpointer is reset, not the signals). Since the locations of the incoming ionsare completely random, it is possible to receive the track projection in thecurrent interval but read out in the next interval (x-z plane in Figure 3). Asa result, the times read out in a slot might correspond to a track projectionin the following interval; besides, the same ion might project on the twocoordinate planes in two consecutive intervals. In spite of these complexities,in most cases the spatial and amplitude readout can match the times recordedby the time sensitive regions, and times between two coordinate plane canbuild up a solid relation according to the kinetics of the high energy ion. We8 mp(xz) Time(xz) Amp(yz) Time(yz) slot slot(a) (b) (c) (d) (e) Figure 3: The timeline of the readout scheme. (a) The ﬁrst ion enters the detectionarea and leaves tracks on the x-z plane and the y-z plane, and the crossing times arerecorded by time sensitive regions of

Topmetal-CEE so as to match the tracks. (b) Thesecond ion enters the detection area and the tracks overlie the previous; however it is stilldistinguishable because recorded times by two planes can be matched. (c) The third ionenters, and it is beyond the subsequent shutters of the x-z plane while remaining in thesubsequent shutters of the y-z plane; besides, times are recorded. (d) At the slot, recordedtimes are read out, and the shutters are reset. (e) The next ion enters, and the trackoverlies the previous on the x-z plane. only need to enlarge the time-space matching region in a single pixel sensorfrom the current interval to consecutive two intervals, and keep the two pixelsensors synchronize. As long as the multi-track location and orientation in asingle image is accurate enough, the detecting system has the capability toreconstruct 3D tracks in pile-up situations. Since the ﬁnal track location andorientation is obtained by the information-rich heat-map of the projectionimages, results with high precision can be achieved.Another issue needed to be explained here is the possibility of partialtrack images due to the rolling-shutter readout. For one thing, as stated inthe previous paragraph, the direction of the valid ion has a relatively smallangle with regard to the z-axis, so the probability of partial tracks is nothigh considering the row-major readout in the whole period. For another,if a partial track does happen in the current interval, by carefully designingthe discharge circuit in each pixel, we can control the discharge time to beapproximately (and optimally) one period. As a result, the partial imagewill become whole in the next interval, and tracks already read out willsigniﬁcantly dissipate. If the partial track is not evident enough to be readout in the current interval, there will be a high probability to catch the wholetrack in the next interval. Although switching to global-shutter readout orusing multiple parallel output buﬀers might be more robust solutions, webelieve the proposed scheme can ﬁx the issue of partial tracks in most cases.In the following sections, the size of the pixel array is 72 ×

72, and the9ixel pitch is 83.2 µ m, which is the Topmetal-II- speciﬁcation and used asa reference for

Topmetal-CEE in the conceptual design. In two-dimensionalimages within the subsequent sections, if not speciﬁed, the horizontal axisand the vertical axis represent indexes of the pixels. The same method isapplicable to other reasonable sizes in the proper experimental condition.

3. Empirical Methods and Their Limitations

Input image

Fitting result

Figure 4: Location and orientation with the mass center method. (left) There are twotracks on the input image. (right) We manually separate the upper part of the image andthe lower part of the image, and use the mass center method individually.

In the ideal condition (the ion path is a straight line, and ionizationcharges distribute evenly), the center line of the track images could representthe projection of the traversing path.

Topmetal is a device collecting charges.Each pixel electrode collects drifting electrons and charges the capacitor.Then it is read out in the form of voltage and digitized. To take advantageof the amplitude readout of

Topmetal , in [18] the mass center method is usedto ﬁt the position and direction of the tracks, which is illustrated in Figure4. When there is only one track in the ﬁtting region, the ﬁtting procedure islisted as follows:1. Process the projection image in the column-major order, and ﬁnd thepixel with maximum amplitude in that column as the reference point.10. Take 5 pixels above and below the reference point. If the number ofpixels is less than 5, then take symmetrical pixels as many as possible.3. Calculate the mass center of the pixels around the reference point (Themass of each pixel is its amplitude).4. Fit a straight line according to the mass center of each column. Theintercept and the slope represent the initial position and the direction,respectively.The mass center method is based on the assumption of the symmetricalcharge distribution, and takes advantage of amplitude readout of the hybridpixel sensor. It is used in practice successfully. When the diﬀusion of theelectron cloud is not so signiﬁcant, the eﬃciency of the charge collection isrelatively high and the integrity of projection tracks is guaranteed, the masscenter method has good accuracy.

Diﬀerent from the mass center method which utilizes the intensity of theinput image, the double edge detection method works with the contrast in theedge locations. Since the electron cloud forms patterns with a certain widthon the image sensor, there are obvious changes of the intensity at the twoedges of the strip. With the practical edge detection algorithm [20], we canﬁnd the two straight edges, and the center line of the two edges can representthe direct projection of the traversing path of the high energy ion. Detailsare listed as follows:1. Process the input image with the Canny edge detection algorithm.2. Perform Hough transform of the edge image after the Canny algorithm,and ﬁnd the distance and angle of the edge lines in the Hough space.3. Match the distance and angle of edge lines so as to get the number oftracks.4. Select the center line of each matched pair as the projection of thetraversing path. The intercept and the slope of the center line representthe initial position and the direction, respectively.The double edge detection method mainly uses the features of edges onthe track projection image. Its computations are based on the image after theedge detection algorithm. When the structures of the tracks are integral andmultiple tracks are far away at a certain distance, the double edge detectionmethod could detect multiple tracks and give the information of each trackseparately. 11

20 40 60010203040506070

Input image

Input image after canny

50 0 50Angles (degrees)1007550250255075100 D i s t a n c e ( p i x e l s ) Hough transform

Detected lines

Figure 5: Location and orientation with the double edge detection method. (left top)The input image is the same as Figure 4. (right top) Canny edge detection algorithm isapplied to the input image. (left bottom) Hough transform converts the edge image tothe Hough space. (right bottom) Straight lines in the Hough space are matched, and thecenter line of each matched pair is regarded as the projection of the traversing path. .3. Limitations of Empirical Methods The above two empirical methods could achieve satisfactory results inproper conditions. However, some issues with these methods are hard toresolve in practice. Here we make a discussion about the limitations of eachmethod.

Mass Center Method.

The original algorithm of the mass center method tar-gets the single track case. When there are multiple tracks on the projectionimage, the mass center algorithm simply calculates the location of the masscenter and does not have the ability to infer the number of tracks. If there isan error in the number, large deviations will make the results invalid. Thisis especially true when there are 3 or more tracks on the projection image.Furthermore, even if we can design an eﬀective rule to judge the number oftracks before mass center calculation, the amplitude of two tracks will in-terplay at the overlapping regions. If we could not separate the amplitudeof two or more tracks, the calculation of the mass center could not proceedcorrectly, which will inevitable impact on the ﬁnal precision. The core prob-lem of the mass center method is that it operates in the spatial domain andcould not handle the complicated situations when tracks aﬀect each other.

Double Edge Detection Method.

For the double edge detection method, thebiggest practical issue is how to determine the parameters in the algorithm.First, the Canny algorithm needs to set the standard deviation and thresholdsfor the Gaussian ﬁlters. This step is relatively robust and not so parameter-sensitive. Second, the Hough transform requires the divisions of the distanceand the angle. If the number of divisions is too small, the ”pixel” in theHough space would be too large and it will lower the precision of the Houghtransform. if the number of divisions is too large, Hough space will be moreﬁne-grained; however, the accumulations in each ”pixel” of the Hough spacewill decrease, and too many ”pixels” will similar accumulations will make itharder to perform non-maximum suppression (NMS). Third, in the processof NMS, there are three key parameters needed to be set: minimum distance,minimum angle and threshold. If these parameters are not chosen properly,either some edges will be missed, or pseudo edges will present in the result.Finally, the core diﬃculty of the issue is that we want to deal with a datasetof images, or images continuously acquired from the detecting instrument.The best parameters for an image might not be suitable for another image. If13e cannot adapt the parameters to each image, the recognition rates and pre-cision will shrink without doubt. For this reason, the double edge detectionmethod is not a good choice for automatic streams of image data.Finally as a supplement, these two methods belong to empirical methodsbased on some prior assumptions of the ideal conditions. In the physicalsimulation and the experiment, the physical process has its complexity, whichimplies that conditions out of those assumptions will happen. Once theyhappen, empirical methods are hard to handle them. For example, if thereis a ”bright point” in the projection image due to the abnormal status ofthe pixel sensor, the mass center method will treat it as the reference point,and it will deﬁnitely inﬂuence the position of the mass center. In conclusion,empirical methods have their safe regions and behave inadequately to accountfor real-world complexities.

4. Architecture

According to the analysis in Section 3, empirical methods use ﬁxed rou-tines to infer the position and the direction of tracks. They cannot utilizethe information in the image thoroughly and cannot handle multiple tracksin a single image reliably. By contrast, deep learning techniques based onconvolutional neural networks (CNN) could extract features from the inputdata at diﬀerent levels. At the front layers, shallow features (intensity, linesegment, angle) could be extracted; when the network goes deep, the pixelsin the feature map have increasingly large receptive ﬁelds, and the featuresare more abstract (pattern or instance). Due to the hierarchy of the neuralnetworks, they can ﬁnd out the underlying relations between the input dataand ﬁnish advanced classiﬁcation or regression tasks.The deﬁnition of beams in high energy physics indicates that most parti-cles in the beam have similar location and momentum. To make our neuralnetwork model work in more general scenarios, we actually consider the multi-track problem; the initial positions and directions of the particles change ina certain range. This actually raises the diﬃculty to obtain the informationof tracks individually. In the experiments (Section 5.6), it can be seen thatthe same neural network model could work very well in the real-world trackprojection images.In the multi-track problem, we need to judge the number of tracks fromthe input projection image, and measure each track accurately. It is not14nly a one-to-many problem, but also an indeterminate problem. Further-more, the principle of the beam line implies that each particle in the beam isindependently and identically distributed, which means no prior knowledgeabout the relative locations of multiple tracks. As a result, it is impossibleto use ﬁxed output dimensions and allocate each track to a certain outputdimension. Because of these diﬃculties, traditional CNN structures are notsuitable in this problem. We need to explore new methods to discover inde-pendent modes in the projection images and possess the regression ability atthe same time.To tackle the multi-track problem, we create an end-to-end neural net-work based on segmentation and ﬁtting. This section will describe the basenetwork, the binary segmentation, the semantic segmentation with pixel as-signment, and the weighted least squares ﬁtting separately. Finally, theconﬁgurations of the overall architecture are introduced.

The base network in the architecture is shown in Figure 6. The conﬁg-uration of parameters in each layer refers to the VGG Net [21]. The basenetwork is made up of the encoder part and the decoder part. Convolutionswith kernel size 3 × × ×

4, stride 2 and SAME padding, so15

Legend

Fmap Conv Activation BN Deconv conv BN Input Feature Map Output Feature Map relu

Conv Block

BN Decoder Fmap Output Fmap relu

Deconv Block deconv Encoder Fmap + Add + BN relu ① ② Original Input Pooling

Conv Block

Out Channels: 64 Encoder Stage 1 Output

Conv Block Max

Out Channels: 128 Encoder Stage 2 Output

Conv Block Max

Out Channels: 256 Encoder Stage 3 Output

Max Conv Block

Out Channels: 512

Conv Block

Binary Seg Semantic Seg

Deconv Block

E1 E2 E3 E3

Deconv Block

Out Channels: 256, Stride: 2 Decoder Stage 3 Output (Binary) Decoder Stage 3 Output (Semantic) E2

Deconv Block Deconv Block

Out Channels: 128, Stride: 2 Decoder Stage 2 Output (Binary) Decoder Stage 2 Output (Semantic) E1 Decoder Stage 1 Output (Binary) Decoder Stage 1 Output (Semantic)

Deconv Block Deconv Block

Out Channels: 64, Stride: 2 ① ② ② ② ② ② Figure 6: Base network of the propose network architecture. (cid:13) except the last stage in the semanticsegmentation branch.

In deep learning and computer vision, the binary segmentation usuallyrefers to generating a feature map with the same size as the original input;each pixel on the feature map could be 0 or 1 so as to discriminate a singleclass or a single instance. In our multi-track problem, we generalize theconcept to also include the heat-map before argmax operation to get thebinary map.For the single-track case, the function of the binary segmentation is todetermine the center line of the track to facilitate Hough transform; besides,we could use the heat-map before binarization to make a weighted leastsquares ﬁtting. For the multi-track case, the binary result could also beused for coarse location and orientation through Hough transform; or wecan combine the heat-map with the semantic segmentation and assign eachpixel to diﬀerent tracks. After that, weighted least squares ﬁtting could beperformed to provide accurate information of multiple tracks.The ﬂow diagram of the binary segmentation process is shown in Figure7. The binary segmentation output in the base network has 64 channels. Toﬁt into the two binary classes, a convolution layer with kernel size 1 × ase Network Output （ Binary Seg. ） Binary Seg.Label Image1x1 Convolutionto Match Channels Count Occurrences of 0 and 1Set the Weights for the Cross Entropy LossCompute Weighted Cross Entropy LossCompute Argmax and Get the Binary Result Compute One-HotLabel

Test Phase Training Phase

Figure 7: The ﬂow diagram of the binary segmentation process. Input: the binarysegmentation branch in the base network and the binary label image. Output: the binarysegmentation result in the test phase, or the weighted cross entropy loss in the trainingphase. w i = 1log( N i (cid:80) j N j + (cid:15) ) (1)where N i is the number of occurrences of class i , and (cid:15) is a positive numberslightly bigger than 1 (we set it to 1.02), and w i is the weight for class i .To calculate the weighted cross entropy, softmax is applied to the di-mension of channel in the feature map after 1 × x ) i = exp( x i ) (cid:80) nj =1 exp( x j ) (2) L = − (cid:88) m,n (cid:88) i ( y = i ) · w i log(softmax( x m,n ) i ) (3)where x m,n is the vector at (m, n) in the feature map, softmax is computedalong the vector. ( · ) is the indicative function which takes 1 when thecondition is satisﬁed, and 0 otherwise. The binary segmentation branch canbe trained with the weighted cross entropy loss by back-propagation. In computer vision, the semantic segmentation usually refers to classifyingthe pixels in an image according to its class property. In our multi-trackproblem, the task for the semantic segmentation and the following pixelassignment is to assign the pixels of the heat-map to the most appropriatetracks so as to use the weighted least squares ﬁtting for accurate locationand orientation.Unlike what is normally considered, the semantic segmentation in ourmulti-track problem is to divide the pixels according to relative locations.To be more speciﬁc, we classify each pixel to one of ﬁve kinds of status: Oﬀ , On , Up , Down and

Both . The method in detail is shown in Figure 8.In the training phase of semantic segmentation, we set label image relatedto the track projection image to optimize the network parameters. In thetest phase, the status of each pixel is predicted. Before the weighted leastsquares ﬁtting, we use operators inside the deep learning framework to assignpixels in the heat-map to proper tracks according to the results of semanticsegmentation. 19

20 40 60010203040506070

Binary segmentation

Semantic segmentation offonupdownboth

Figure 8: An example to show how binary segmentation and semantic segmentation work.We set the angles of tracks with regard to the horizontal axis more obvious than reality forbetter visualization. (left) Binary segmentation gives the center lines of tracks as well as aheat-map for weights in the least squares ﬁtting. (right) Semantic segmentation annotateseach pixel to one of ﬁve relative locations, and the pixel assignment uses the annotationsand the Hough transform of the binary map to assign pixels to proper tracks. After that,weighted least squares ﬁtting can be performed.

The ﬂow diagram of the semantic segmentation process is shown in Figure9. The diagram is similar to Figure 7, except for the following facts: (1) batchnormalization and ReLU are used before 1 × H ( d, θ ) = (cid:88) m,n C ( m, n, d, θ ) · P ( m, n ) (4)where C represents the conversion matrix of the Hough transform, P is the20 ase Network Output （ Semantic Seg. ） Semantic Seg.Label ImageBN, ReLU, 1x1 Convolutionto Match Channels Count Occurrences of ClassesSet the Weights for the Cross Entropy LossCompute Weighted Cross Entropy LossOutput the Pixel-Level Feature Compute One-HotLabel

Test Phase Training Phase

Figure 9: The ﬂow diagram of the semantic segmentation process. Input: the semanticsegmentation branch in the base network and the semantic label image. Output: thepixel-level semantic feature in the test phase, or the weighted cross entropy loss in thetraining phase. ough Transform by Matrix Multiplication Select Over-Thresh Pixels in the Hough Space Non-Maximum Suppression Output Intercepts and Slopes According to Accum. ScoresEnter the Pixel Assignment RoutineCompute Argmax and Convert toOne-HotPixel-Level Feature in the Semantic Seg. Compute the Relation between Pixels and TracksGet the Track Related to 5 Kinds of StatusCombine the Results and Assign the PixelsGenerate Masks OutputPredictions of Binary Seg. Figure 10: The ﬂow diagram of the pixel assignment process. The green squares representcomputations with the binary segmentation branch, the yellow squares with the semanticsegmentation branch, and the blue squares with the pixel assignment routine. In the pixelassignment routine, relations between each pixel and each center line of the track arecomputed. One pixel is assigned to 0-2 tracks according to predictions of the semanticsegmentation. H is the result of the Hough transform.Next, we select the signiﬁcant regions in the Hough space by setting athreshold to half the maximum accumulations, and invoke the NMS after-wards. The outputs of NMS are converted into (intercept, slope) pairs, sortedaccording to the intercepts, and sent to the pixel assignment routine. On theother hand, argmax is applied to the pixel-level feature of semantic segmen-tation, and the one-hot results are also transferred to the pixel assignmentroutine.In the pixel assignment routine, we compute the following arguments ofextrema: d ( i, m, n ) = k i · m − n + b i (5)nn( m, n ) = arg min i d ( i, m, n ) s.t. d ( i, m, n ) ≥ m, n ) = arg min i d ( i, m, n ) s.t. d ( i, m, n ) > m, n ) = arg max i d ( i, m, n ) s.t. d ( i, m, n ) ≤ m, n ) = arg min i | d ( i, m, n ) | (9)where k i represents slope, b i represents intercept, and d represents the verticaldistance between point (m,n) and the i-th track center line. nn, pos, npand abs represent the track indexes of the nearest non-negative, the nearestpositive, the nearest non-positive and the nearest absolute with regard topoint (m,n). If no tracks satisfy the constraints, the value will be -1 (invalid).Then, the masks are generated as follows:mask( i, m, n ) =  , if abs( m, n ) = i and I ( m, n,

1) = 11 , if nn( m, n ) = i and I ( m, n,

2) = 11 , if np( m, n ) = i and I ( m, n,

3) = 11 , if (pos( m, n ) = i or np( m, n ) = i ) and I ( m, n,

4) = 10 , otherwise (10)where I represents the one-hot predictions of semantic segmentation, and themask represents the assigned pixels corresponding to each track.23

20 40 600204060

Superposition map

Slice 0 of mask

Slice 1 of mask

Figure 11: The pixel assignment based on the binary segmentation and semantic seg-mentation in Figure 8. (left) Superposition map of all the assigned pixels. (center) Thepixels assigned to the ﬁrst track. (right) The pixels assigned to the second track.

The example of the pixel assignment is shown in Figure 11. It can beseen that for most pixels the two tracks are discriminated very well. At theintersection of the two tracks, there will be a ”gap” because of the mutualeﬀect in the pixel assignment. However, it is not signiﬁcant when we takethe overall integrity into consideration. In the training phase, weighted leastsquares ﬁtting is performed and the residuals are back-propagated. Theoptimization process ensures that the representative features are learned andthe negative eﬀect of the ”gap” is reduced.

With the pixel assignment results, we can use the heat-map before bi-narization to make a weighted least squares ﬁtting. Consider the followingsystem of linear equations: Xβ = Y (11) where X =  x x ... ...1 x m  , β = (cid:18) β β (cid:19) , Y =  y y ... y m  where x i and y i are the abscissa and the ordinate of the i-th pixel involved inthe ﬁtting, and β i is the i-th parameter to be ﬁtted. Usually m (cid:29)

2, so theabove linear equations are over-determined and no unique solution exists inthe common case. To achieve the best solution, the following optimization24roblem needs to be solved: β = arg min Z ∈ R × || XZ − Y || (12)There exists an analytical solution (pseudo-inverse matrix) to the opti-mization problem: β = ( X T X ) − X T Y (13)In principle, we can apply Equation (13) directly to the non-zero pixelson the binary map to solve for the parameters. However, this method canonly utilize the ”0” and ”1” of each pixel and not exploit the informationthoroughly. Besides, it is not helpful to back-propagate through the binarysegmentation branch. Therefore, we use the weighted least squares ﬁtting andintroduce the weights into Equation (11): W Xβ = W Y (14) where W = diag( w , w , ..., w m ) =  w · · · · · · w m  w i is the amplitude of the i-th pixel before the binarization in the binarysegmentation branch. As a result, the equation to solve for the ﬁtting pa-rameters becomes: β = ( ( W X ) T W X ) − ( W X ) T W Y (15)For each track, we generate (pixel, weight) pairs after the pixel assign-ment. The weighted least squares of multiple tracks can be computed to-gether in the deep learning framework and it improves the eﬃciency to agreat extent.To judge the accuracy of least squares ﬁtting, we need to deﬁne the lossfunction. An intuitive method is to use the square errors beween the ﬁttingparameters and the label. However, the impacts of the intercept and theslope have diﬀerent scales, and they work together as a whole. To separate25hem manually is inappropriate. Hence, we deﬁne the geometric loss functionas follows: L = W − (cid:88) x =0 (cid:104) ( β x + β ) − ( ˆ β x + ˆ β ) (cid:105) (16)where β and β are predictions by the weighted least squares ﬁtting, ˆ β andˆ β are ground-truth values in the label, and W is the width of the image.According to Equation (16), it is very convenient to compute the partialderivatives of the loss with regard to the ﬁtting parameters. The core prob-lem is how to propagate the errors to the binary segmentation branch. InEquation (15), the matrix manipulations are diﬀerentiable [24], so we canuse the derivation method in the linear algebra to compute the derivatives of β with regard to W . Through Cholesky decomposition and other availablemethods, the derivation can be implemented in the deep learning frameworks.So far, we have cleared the last barrier of back-propagation. The constructedend-to-end model can be optimized as a whole. The network comprised of the base network, the binary segmentationbranch and the semantic segmentation branch is shown in Figure 12. Thisarchitecture is mainly designed for images with multiple and variable tracks,and is ﬂexible with diﬀerent conﬁgurations.

Table 2: Diﬀerent conﬁgurations of the network and their application scenarios.

Conﬁguration Network Part Output Loss Function App. Scenario

Conf (1) Base NetworkBinary Seg. Hough OutSingle LS Out Binary Seg. Loss Single Track(Coarse)Conf (2) Base NetworkBinary Seg. Hough OutSingle LS Out Binary Seg. LossSingle LS Loss Single Track(Precise)Conf (3) Base NetworkBinary Seg.Semantic Seg. Hough OutLS Out Binary Seg. LossSemantic Seg. Loss Multiple Tracks(Not End-to-End)Conf (4) Base NetworkBinary Seg.Semantic Seg. Hough OutLS Out Binary Seg. LossSemantic Seg. LossLS Loss Multiple Tracks(End-to-End)

Table 2 summarizes four common conﬁgurations and their applicationscenarios. Conf (1) and Conf (2) are used for the single-track case, whileConf (3) and Conf (4) are used for the multi-track case. Compared to Conf(1), Conf (2) adds the single-track least squares (LS) loss, which provides26 east Squares LabelSemantic Seg. LabelBinary Seg. Label Input Track Projection ImageBase NetworkBinary Seg. Semantic Seg.Hough Out Least Squares OutPixel AssignmentLeast Squares LabelLeast Squares Out(Single Track)

Figure 12: The overall architecture of the proposed network model. The shaded squaresmean the labels working as the reference when training. The labels of least squares ﬁttingare optional and can be discarded at the ﬁrst stage of training; however, they are vitalfor end-to-end optimization. For the single-track, pixel assignment is not necessary forleast squares ﬁtting. For the multi-track, pixel assignment combines the two branches andprovides the mask for each track.

5. Simulation and Experimental Results

To simulate the track projection images in real conditions and provide abasis for comparison between empirical methods and the neural network, weuse Garﬁeld++ [25], ROOT [26] and other software to produce high energyion events. The whole process from the injection of ions to the collectionof ionization electrons is simulated. The parameters for the detector are setaccording to [18]. Because we mainly discuss the location and orientation oftracks in projection images, only one projection plane with the pixel sensoris considered for simplicity.The ﬁrst step of the simulation is to generate conﬁguration ﬁles related tothe physical process. We use SRIM [27] to compute the statistical character-istics of GeV (giga electron volts) ions (Kr with 85.9 relative atomic mass inthis simulation) passing through the air. The air is represented by gas mix-tures of 76% N , 23% O and 1% Ar. The computed characteristics includeenergy loss rate, projected range and longitudinal/lateral straggling. Theyprovide information for the passage of the ion and ionization along the pas-sage. Next, we use Magboltz integrated into the Garﬁeld++ to compute thegas table in the air with identical mixtures at 1 atm and room temperature.The electric ﬁeld (0 ∼

400 V/cm) is used for the gas table. Two conﬁgurationﬁles are generated in this step and loaded in the following simulations withGarﬁeld++.The second step of the simulation is to generate the geometric structure,the electric ﬁeld and the readout sensor. We adopt the geometry of the ﬁrstdetecting cell in Figure 2. The ions enter the detector from left to right.The ionization electrons drift under the horizontal electric ﬁeld towards oneside, and the

Topmetal is located vertically. Figure 13 gives the side-viewand top-view of the deﬁned geometric structure, and the possible ranges ofthe tracks. In the side-view, the height and length of

Topmetal are both 6mm, and its horizontal distance to the reference point (Ref Point) is 6 mm.In the top-view, the distance from the center of the ionization region to the28

Δh Δzw =6 mm w =6 mm h =6 mmTopmetal areaRefPoint (a) φΔl Δxw =6 mm w =6 mm l =1.6 mml =3.2 mmTopmetal areaRefPoint (b)Figure 13: The geometric structure deﬁned in the Garﬁeld++. The Topmetal is placedvertically in the x-z coordinate plane, and ionization electrons drift along the y-axis. (a)A side-view of the detector geometry. (b) A top-view of the detector geometry.

Topmetal plane is 3.2 mm. We deﬁne a uniform electric ﬁeld (250 V/cm)between the

Topmetal anode (0 V) and the metal board cathode (-575 V).The

Topmetal plane is set to be the readout sensor. When electrons reachthe plane, they will leave the detecting area. The location of pixel readoutis where the electrons leave.The third step of the simulation is to generate the path of ions and ion-ization clusters. The energy of the Kr ion is ﬁxed at 2.0087 GeV. The Wvalue (average energy loss for a single ionization electron) is 30 eV [28], andthe Fano factor is set to 0.3. The average atomic mass ratio (average Z/A)is set to 0.49919. To improve the eﬃciency of numerical computation, theionization electrons appear in clusters, and the average number of electronsin a cluster is 500. When generating tracks, the initial position and angleshould be speciﬁed. We use a stratiﬁed sampling strategy: the angle is di-vided into many equal ranges, and equal number of examples is generatedin each range. In each range, the accurate angle obeys the uniform distri-bution, and the possible range of initial position is constrained by the angle.Finally, the accurate initial position is sampled from the uniform distribu-tion. This stratiﬁed sampling strategy could ensure the coverage of diﬀerentangles, which is vital to train robust models. After determining the initialkinetics of the ion, we generate tracks and associated clusters of electrons inthe framework of Garﬁeld++. These events are saved to a ROOT ﬁle to beused in the next step. 29he ﬁnal step of the simulation is to drift the electrons to the detect-ing plane, then collected by

Topmetal . The physical parameters are thesame as the former two steps. The electron collection eﬃciency is deﬁnedas 0.11 ∼ In the preprocessing stage, we consider the digital readout of the

Top-metal sensor, and generate input images and label data/images for the neuralnetwork model.When producing images for the input, we deﬁne foreground to be tracksin the physical simulation, and background to be noise intrinsic in each pixel.The histogram of collected electrons is randomly picked from the ROOT ﬁlein the previous section. Then the count of electrons is converted to samplingvalues of analog-to-digital converters (ADC) at 16.76 electrons per unit. Mul-tiple histograms are summed in the foreground and converted to samplingvalues together to generate images with multiple and variable tracks. Toavoid the same foreground image being synthesized twice in the training andtest dataset, we record the indexes of each generated foregrounds and dropthe combination if there is a duplication. For background, we do statistics ofexperimental image data at extremely low event rates. According to statisti-cal results, we sample the noise of each pixel from the Gaussian distributionwith -0.695 mean and 1.701 standard deviation. Finally, the foreground andbackground are added together for the ﬁnal input image.It should be noted that ADC sampling values are usual discrete by nature;however, the values on the generated input image are continuous. In practice,the projection images from experiments are also continuous because of adiﬀerent mechanism. In experiments, the pedestal of each pixel is estimatedby the average of ADC sampling values in a time span without any events;the ADC sampling values will rise and fall in a small range, so the averageis continuous and varies from pixel to pixel. The actual value when events30ome is the the diﬀerence between the observed value and the pedestal, so itis also continuous.To produce labels for the binary segmentation and semantic segmentation,we take out the intercept and slope along with the histogram. A necessarystep is to convert the coordinate system from physical simulation to theconvention of images. Based on single-track or multi-track, the straight linescan be only one or several. For binary segmentation, we process the labelimage column by column, and set the pixel nearest to each straight line tovalue ”1”, otherwise value ”0”. For semantic segmentation, the label imageis also processed in the order of columns. First, for 3 points nearest to eachstraight line in each column, we mark them On . The upper 3 points andlower 3 points with regard to On are candidates of Up and Down . If a pointis both Up and Down candidates for diﬀerent straight lines, we mark it

Both in priority. The remaining candidates are marked as Up or Down . Finally,other points in the semantic label image are marked as Oﬀ . If the points areclose to the edge of the image, out-of-range points are not marked.When the above preprocessing work is done, we save the input image,the binary label image and the semantic label image to ﬁles along with theintercept and slope of each track. Besides, indexes used in each example(from the ROOT ﬁle) are also saved. For single-track and multi-track simulations, we use 10,000 examples forthe training dataset and 2,000 examples for the test dataset. The L2 lossis used for regularization. We choose the stochastic gradient descent withmomentum as the optimization algorithm. The initial learning rate and themomentum coeﬃcient are set to 0.0001 (except for 0.000005 in the multi-track end-to-end ﬁnetuning) and 0.9. The batch size is 4 in single-track andnot end-to-end conditions. When training the end-to-end neural network formulti-track with least squares loss, the batch size is 1. At diﬀerent stages,we train from randomly initialized weights or on the basis of former weights(discussed in following sections). The variance scaling initializer [29] is usedfor weights in the convolution layers. For weights in deconvolution layers,we compute a standard deviation according to the number of input channels,and truncated normal initializer (dropping samples out of two sigma) is usedwith 0 mean and this standard deviation. The biases in convolution layersand deconvolution layers are initialized to 0. We stop training and test ourmodel on the test dataset when the total loss has decreased substantially.31he network model is implemented with the TensorFlow [30] software on adesktop computer with NVIDIA GeForce TITAN X GPU (12 GB). inputs of index 3 heatmap of index 3 binary map of index 3 predictions of index 3

Figure 14: An example of images at diﬀerent stages of the neural network model for thesingle-track simulation. (left top) The original input with the label (red line). (right top)The heat-map before binarization in the binary segmentation branch. (left bottom) Thebinary map. (right bottom) Predictions of the neural network model (blue line: weightedleast squares ﬁtting; green line: Hough output).

For the single-track simulation dataset, we create the network architecturedescribed in Section 4 for training and testing, and compare the results withthe traditional empirical methods in Section 3. We use the base network andthe binary segmentation part (Conf (1) and Conf (2) in Table 2). Based onthe heat-map before binarization, we make a weighted least squares ﬁtting.First, we do not back-propagate through the least squares loss and performrelevant training and testing. Then the least squares loss is used to ﬁnetunethe whole network and testing is conducted after ﬁnetuning.In ﬁgure 14, we give an example showing how the network model dealswith the single-track image. The left top image is the original input image,32 able 3: The statistics of residuals of intercept and slope predictions. For the single-trackimage, ﬁve methods are analyzed. The spatial resolution is estimated based on the pixelsize of the

Topmetal-II- chip in [16].

Method Residual of Intercept Residual of Slope mean std. resolution ( µ m) mean std. resolution (degree)double edge detection 0.17711 0.62096 51.664 0.00009 0.00828 0.474mass center -0.00656 0.17119 14.243 0.00017 0.00393 0.225Hough output 0.58766 0.41500 34.528 0.00018 0.01002 0.574least squares (w/o loss) -0.01342 0.11147 9.274 -0.00002 0.00278 0.159least squares (w/ loss) -0.00087 0.10559 8.785 0.00000 0.00260 0.149 Table 4: Statistics of center measure, angle measure and CAM. For the single-trackimage, six methods are analyzed.

Method Center Measure Angle Measure CAM double edge detection 0.9849 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± the red line in which indicates the label. After the feature extraction ofthe base network and binary segmentation, heat-map before binarization isshown in the right top image. It can be seen in the image that the ampli-tudes of pixels are highly centralized; most pixels have values close to zero,and only pixels on the center line of the track have values close to one. Inthe left bottom binary map, the features become even clearer. In the rightbottom image, we draw the input track image again and give the resultsfrom least squares ﬁtting (blue line) and Hough transform (green line). Intotal, the outputs of the neural network are approaching the label closely.It demonstrates the eﬀectiveness of the proposed network architecture. Bycarefully examining the results, we can see a small shift (mainly the inter-cept) of the Hough transform output compared to the label. This non-idealphenomenon is due to the discrete Hough space. In Table 4, we add an ad-justment to the Hough output to correct the shift and give the results beforeand after the adjustment.To quantify the performance of diﬀerent methods, based on the inter-cept and slope predictions on the test dataset, we ﬁt the residuals (diﬀerencefrom label) to a Gaussian distribution and list the mean and standard de-viation (std.) in Table 3. The mean values represent the system bias ofdiﬀerent methods, which could be greatly reduced by adding/subtracting aconstant. The standard deviation could represent the resolution achieved by33he method, which is also listed beside the standard deviation. We excludesome outliers when doing statistics of the double edge detection method. Forthe intercept, the order of performance from bad to good is: double edgedetection, Hough output, mass center, least squares (w/o loss) and leastsquares (w/ loss). For the slope, the double edge detection is better than theHough output, and others are the same. It can be seen that the results fromthe least squares are better than the mass center in the single-track case, andthe least squares (w/ loss) is slightly better than least squares (w/o loss).This demonstrates the eﬀectiveness of the least squares method and its back-propagation. When using the least squares (w/ loss), the resolution for theinitial position (intercept) and angle (slope) could be 8.785 µ m and 0.149 ◦ ,which is the best achievable result.In the above analysis, we deal with the initial position and angle sepa-rately. However, they are two sides of a single problem and cannot be totallyisolated. To quantify the precision in a systematic way, we introduce the center-angle measure (CAM) which combines both the location and orienta-tion: c = 1 − || C label − C pred || . · L label (17) a = 1 − | θ label − θ pred | ◦ (18) CAM = c · a (19)where C label and C pred represent the center positions of the label line segmentand the predicted line segment. We calculate the L2 distance between thetwo centers and divide it by half the length of the label. θ label and θ pred represent the angles of the label line and the predicted line. We calculatethe absolute diﬀerence between the two angles and divide it by 90 degrees.The center positions and angles are calculated from intercepts and slopesgenerated by diﬀerent methods and recorded in the label. The product ofthe center measure and angle measure is deﬁned as the CAM.In Table 4, we gather the results of center measure, angle measure andCAM using diﬀerent methods. The mean and standard deviation are calcu-lated on the test dataset. Comparing the ﬁnal CAM, the order of performancefrom bad to good is: Hough output (original), double edge detection, Houghoutput (adjusted), mass center, least squares (w/o loss) and least squares34w/ loss). It is consistent with the results in Table 3. The CAM achievedby least squares (w/ loss) could be 0.9977 ± ± D e t e c t i o n r a t e Detection rate at CAM thresholds

Double edgeMass centerHough (original)Hough (adjusted)Lsq fit (w/o loss)Lsq fit (w/ loss)

Figure 15: Detection rates of diﬀerent methods versus the CAM threshold for the single-track image. At a speciﬁc threshold, the test results with CAM above the threshold areconsidered as detected. We use the logarithmic x-axis in the ﬁgure.

In Figure 15, we compare the detection rates of diﬀerent methods at CAMthresholds. The detection rate refers to the ratio of tested images above aCAM threshold. The scale of the x-axis is logarithmic from 0.9 to 0.999. Itcan be seen that the overall trend of the curves is in good accordance withTable 4. The diﬀerence between the mass center and least squares is notlarge. However, the advantage of least squares is still noticeable, and therelative improvement is more evident at high CAM thresholds.35

20 40 60010203040506070 inputs of index 0 heatmap of index 0 semantic map of index 0 offonupdownboth 0 20 40 60010203040506070 predictions of index 0

Figure 16: An example of images at diﬀerent stages of the neural network model for themulti-track simulation. (left top) The original input with the label (red line). (right top)The heat-map before binarization in the binary segmentation branch. (left bottom) Theresult of semantic segmentation. (right bottom) Predictions of the neural network model(blue line: weighted least squares ﬁtting; green line: Hough output).Table 5: The statistics of residuals of intercept and slope predictions. For the multi-track image, three methods are analyzed. The upper part uses 1-3 tracks, and the lowerpart uses 1-5 tracks. The spatial resolution is estimated based on the pixel size of the

Topmetal-II- chip in [16].

Method Residual of Intercept Residual of Slope mean std. resolution ( µ m) mean std. resolution (degree)Hough output 0.61478 0.43175 35.922 0.00001 0.01036 0.594least squares (w/o loss) 0.01227 0.15250 12.688 0.00022 0.00403 0.231least squares (w/ loss) 0.00419 0.13722 11.417 0.00003 0.00366 0.210Hough output 0.60142 0.44297 36.855 -0.00014 0.01086 0.622least squares (w/o loss) 0.00916 0.19269 16.032 0.00047 0.00553 0.317least squares (w/ loss) -0.01206 0.18281 15.210 0.00011 0.00504 0.289 able 6: Statistics of center measure, angle measure and CAM. For the multi-track image,four methods are analyzed. The upper part uses 1-3 tracks, and the lower part uses 1-5tracks. Method Center Measure Angle Measure CAM

Hough output (original) 0.9833 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± For the multi-track simulation dataset, we create the full network archi-tecture comprised of the base network, binary segmentation and semanticsegmentation (Conf (3) and Conf (4) in Table 2). First, training and testingare performed without the least squares loss. Next, the loss is used to ﬁnetunethe network end-to-end, and testing is conducted again after ﬁnetuning.In Figure 16, we give an example visualizing diﬀerent stages of the net-work. The left bottom image is the result from semantic segmentation, andother images and their annotations are the same as Figure 14. The multi-track discrimination is more diﬃcult than the single-track case. In this exam-ple, two tracks are close to each other, and intersect at the right edge of theinput image. In the right top heat-map, the intersecting region is somewhatvague; however, the ”X” shape is well distinguishable. In the left bottomimage, there are also some non-ideal pixels, but the overall condition is fairlygood. The network predictions in the right bottom image are approachingthe label. Again, the Hough output (green line) has a shift compared tothe weighted least squares output (blue line), which could be ﬁxed with anadjustment.To set up relations between the predicted tracks and tracks in the label, weuse a greedy match algorithm . First, for each example in the test dataset, webuild up a CAM matrix; the numbers of rows and columns are the numbersof predicted tracks and label tracks, and the content is the CAM calculatedusing the prediction and the label. Each time we select the maximum elementin the CAM matrix, record the relation between the row (prediction) and thecolumn (label) and leave out the row and the column in subsequent matching.The process continues until no elements in the CAM matrix satisfying thecondition. 37n Table 5 and Table 6, we show the statistical results of Hough outputand least squares in diﬀerent conditions. Two multi-track datasets are usedin this section: 1-3 tracks dataset and 1-5 tracks dataset. They are listedin the upper part and low part of the tables, respectively. For the interceptand slope, the precision of the least squares method is higher than Houghoutput. Compared to the least squares (w/o loss), the least squares (w/ loss)can improve the precision further. It should be noted that the resolution ofinitial position and angle achieved by the least squares (w/ loss) in 1-3 tracksdataset (11.417 µ m and 0.210 ◦ ) is still better than mass center in the single-track dataset (14.243 µ m and 0.225 ◦ ). It demonstrates the superiority ofthe network architecture. Besides, when using the 1-5 tracks dataset, theperformance only deteriorates a little, and the same model is still competentin feature extraction. For CAM results, the trend is almost the same.Finally, we give the F1-macro and F1-micro curves of diﬀerent methodsversus the CAM threshold. We deﬁne T P i to be successfully matched tracksabove the CAM threshold in an example, ( T P i + F P i ) to be total predictedtracks in an example and ( T P i + F N i ) to be total label tracks in an example.The F1-macro and F1-micro can thus be calculated: P i = T P i T P i + F P i , R i = T P i T P i + F N i , F i = 2 · P i · R i P i + R i (20)F1-macro = 1 N N (cid:88) i =1 F i (21) P = N (cid:80) i =1 T P iN (cid:80) i =1 ( T P i + F P i ) , R = N (cid:80) i =1 T P iN (cid:80) i =1 ( T P i + F N i ) (22)F1-micro = 2 · P · RP + R (23)The curves of F1-macro and F1-micro are shown in Figure 17 and Figure18. The scale of the x-axis is logarithmic ranging from 0.9 to 0.999. It canbe seen that the least squares ﬁtting is better than Hough output in bothﬁgures, which is in good accordance with Table 6. The least squares (w/ loss)is slightly better than the least squares (w/o loss), and it is more obviousin the F1-micro ﬁgure. In two ﬁgures, we plot results of 1-3 tracks dataset38 .9990.990.9 CAM threshold0.00.20.40.60.81.0 F - m a c r o F1-macro at CAM thresholds

Hough_m3 (original)Hough_m3 (adjusted)Lsq_fit_m3 (w/o loss)Lsq_fit_m3 (w/ loss)Hough_m5 (original)Hough_m5 (adjusted)Lsq_fit_m5 (w/o loss)Lsq_fit_m5 (w/ loss)

Figure 17: F1-macro of diﬀerent methods versus the CAM threshold for the multi-trackimage. F1-macro is the average of F1 scores of individual examples. We use the logarithmicx-axis in the ﬁgure. The suﬃx ”m3” means dataset with 1-3 tracks, and the suﬃx ”m5”means dataset with 1-5 tracks. .9990.990.9 CAM threshold0.00.20.40.60.81.0 F - m i c r o F1-micro at CAM thresholds

Hough_m3 (original)Hough_m3 (adjusted)Lsq_fit_m3 (w/o loss)Lsq_fit_m3 (w/ loss)Hough_m5 (original)Hough_m5 (adjusted)Lsq_fit_m5 (w/o loss)Lsq_fit_m5 (w/ loss)

Figure 18: F1-micro of diﬀerent methods versus the CAM threshold for the multi-trackimage. F1-micro is the F1 score of all the examples. We use the logarithmic x-axis in theﬁgure. The suﬃx ”m3” means dataset with 1-3 tracks, and the suﬃx ”m5” means datasetwith 1-5 tracks. single: index 0 single: index 1 single: index 2 multiple: index 0 multiple: index 4 multiple: index 5

Figure 19: Test results of applying the trained network weights to experimental data.The upper three images show the single-track case, and the lower three images show themulti-track case. The predictions of least squares ﬁtting (blue line) and Hough output(green line) are plotted.

To demonstrate the eﬀectiveness of the proposed network architecture inexperiments, we qualitatively analyze the test results with the experimentalimages. These images come from the experiment in [18]. We directly applythe trained network weights to initialize the model, and the results are shownin Figure 19. In these ﬁgures, the gray-scale background image is the exper-imental data, and the line segments are predictions of least squares ﬁtting41blue line) and Hough output (green line). Based on the observation of thebackground track images, the individual diﬀerence is considerably large. Forexample, in the right top image, the diﬀerence between the noisy backgroundand the track is relatively small, which indicates the increase of amplitudeon the track is not obvious; in the right bottom image, the pattern of tracksis very centralized, which may be caused by the short drifting distance of theelectron cloud. In spite of these diversities and discrepancy between simu-lations and experiments, the weights trained with the simulation data couldvery well recognize the tracks in the experiment. Whether the least squaresor Hough output is used, it can correctly judge the number of tracks and givethe accurate position and angle. When we carefully examine the images, weﬁnd that the prediction of least squares ﬁtting is nearer to the center line ofeach track. This is consistent with the previous simulations.

6. Conclusions

In this paper, we mainly discuss the multi-track location and orientationproblem in the gaseous drift chamber based on the

Topmetal series pixel sen-sor. First, we brieﬂy introduce the detecting instrument used in the previousexperiment and the conceptual design implemented with the newly devel-oped

Topmetal-CEE chip. When describing the scheme of measurement, weemphasize on the capability of multi-track detection enabled by the time &litude readout of the

Topmetal-CEE .Next, two traditional empirical methods, mass center method and doubleedge detection method, are introduced, and their limitations in the speciﬁcproblem are thoroughly analyzed. these empirical methods and their lim-itations show the possibilities to improve the performance of location andorientation, and shed light on the direction to design novel deep learningmethods.The proposed architecture is an end-to-end neural network based on seg-mentation and ﬁtting. It is comprised of the base network, the binary seg-mentation branch and the semantic segmentation branch. The base networkadopts an encoder-decoder structure, and the two decoder parts share theencoder. The binary segmentation generates the heat-map for weighted leastsquares ﬁtting and the binary map for Hough transform. The semantic seg-mentation maps each pixel to ﬁve possible relative locations, which pavesthe way for the following pixel assignment. Weighted least squares ﬁtting isimplemented insides the deep learning framework with full back-propagation42ompatibility. The network architecture could be trained end-to-end to elim-inate irretrievable errors from stage to stage.After that, the procedures of physical simulation and data preprocessingare described in detail. In the ﬁrst place, single-track simulation is performedwith comparison to empirical methods. The simulation results show thatthe least squares method is signiﬁcantly better than mass center method,and least squares with loss can improve the precision further on the basisof the trained network. In the second place, multi-track simulation is con-ducted with 1-3 tracks dataset and 1-5 tracks dataset. We ﬁnd that theleast squares method with 1-3 tracks dataset can still perform better thanmass center method in the single-track case. Least squares with loss have thehighest precision throughout the simulations. The location resolution couldachieve 8.8 µ m for the single track and 11.4 µ m (15.2 µ m) for the 1-3 tracks(1-5 tracks), and the orientation resolution could achieve 0.15 ◦ and 0.21 ◦ (0.29 ◦ ) respectively. Finally, results on the experimental data demonstratethe robustness of the physical simulation and the validity of the method inreal-world experiments.In the future, we would like to deploy the algorithm on application-speciﬁcintegrated circuits or ﬁeld programmable gate arrays for on-line and on-sitefeature extraction. To achieve this, the optimization and quantization of thenetwork architecture could be necessary. Acknowledgements

This research is partly supported by the National Natural Science Foun-dation of China (Grant Number 11875146, U1932143), and partly supportedby the National Key Research and Development Program of China (GrantNumber 2016YFE0100900).

References [1] The ATLAS Collaboration, The ATLAS experiment at the CERNlarge hadron collider, Journal of Instrumentation 3 (08) (2008) S08003–S08003. doi:10.1088/1748-0221/3/08/s08003 .URL https://doi.org/10.1088%2F1748-0221%2F3%2F08%2Fs08003 [2] Z. Wang, S. Zou, Y. Fan, J. Liu, X. Sun, D. Wang, H. Kang, D. Sun,P. Yang, H. Pei, G. Huang, N. Xu, C. Gao, L. Xiao, A beam monitor43sing silicon pixel sensors for hadron therapy, Nuclear Instrumentsand Methods in Physics Research Section A: Accelerators, Spec-trometers, Detectors and Associated Equipment 849 (2017) 20 – 24. doi:10.1016/j.nima.2016.12.050 .URL [3] L. Badano, M. Benedikt, P. Bryant, M. Crescenti, P. Holy, P. Knaus,A. Maier, M. Pullia, S. Rossi, Synchrotrons for hadron therapy: Parti, Nuclear Instruments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and Associated Equipment430 (2) (1999) 512 – 522. doi:10.1016/S0168-9002(99)00206-5 .URL [4] Z. Xu, R. Mao, L. Duan, Q. She, Z. Hu, H. Li, Z. Lu, Q. Zhao,H. Yang, H. Su, C. Lu, R. Hu, J. Zhang, A new multi-strip ionizationchamber used as online beam monitor for heavy ion therapy, NuclearInstruments and Methods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment 729 (2013) 895 –899. doi:10.1016/j.nima.2013.08.069 .URL [5] O. Actis, D. Meer, S. Knig, Precise on-line position measurement for par-ticle therapy, Journal of Instrumentation 9 (12) (2014) C12037–C12037. doi:10.1088/1748-0221/9/12/c12037 .URL https://doi.org/10.1088%2F1748-0221%2F9%2F12%2Fc12037 [6] R. S. Sussmann, CVD diamond for electronic devices and sensors,John Wiley & Sons, Chichester, United Kingdom, 2009. doi:10.1002/9780470740392 .[7] M. Alvarado, A. Ayala, M. A. Ayala-Torres, W. Bietenholz,I. Dominguez, M. Fontaine, P. Gonzlez-Zamora, L. M. Montao,E. Moreno-Barbosa, M. E. P. Salazar, L. Moreno, P. Nieto-Marn,V. Reyna Ortiz, M. Rodrguez-Cahuantzi, G. Tejeda-Muoz, M. E.Tejeda-Yeomans, A. Villatoro-Tello, C. Zepeda Fernndez, A beam-beam monitoring detector for the mpd experiment at nica, Nuclear44nstruments and Methods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment 953 (2020) 163150. doi:10.1016/j.nima.2019.163150 .URL [8] J. Son, S. Lee, Y. Lim, S. Park, K. Cho, M. Yoon, D. Shin, Developmentof optical ﬁber based measurement system for the veriﬁcation of entrancedose map in pencil beam scanning proton beam, Sensors 18 (1) (2018)227. doi:10.3390/s18010227 .[9] Gianotti, Paola, The padme detector, EPJ Web Conf. 170 (2018) 01007. doi:10.1051/epjconf/201817001007 .URL https://doi.org/10.1051/epjconf/201817001007 [10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553)(2015) 436–444. doi:10.1038/nature14539 .URL https://doi.org/10.1038/nature14539 [11] E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for se-mantic segmentation, IEEE Transactions on Pattern Analysis and Ma-chine Intelligence 39 (4) (2017) 640–651. doi:10.1109/TPAMI.2016.2572683 .[12] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convo-lutional encoder-decoder architecture for image segmentation, IEEETransactions on Pattern Analysis and Machine Intelligence 39 (12)(2017) 2481–2495. doi:10.1109/TPAMI.2016.2644615 .[13] A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neu-ral network architecture for real-time semantic segmentation, CoRRabs/1606.02147. arXiv:1606.02147 .URL http://arxiv.org/abs/1606.02147 [14] D. Neven, B. D. Brabandere, S. Georgoulis, M. Proesmans, L. V. Gool,Towards end-to-end lane detection: an instance segmentation approach,in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 286–291. doi:10.1109/IVS.2018.8500547 .[15] Y. Fan, C. Gao, G. Huang, X. Li, Y. Mei, H. Pei, Q. Sun, X. Sun,D. Wang, Z. Wang, Development of a highly pixelated direct charge45ensor, topmetal-i, for ionizing radiation imaging, arXiv abs/1407.3712. arXiv:1407.3712 .URL http://arxiv.org/abs/1407.3712 [16] M. An, C. Chen, C. Gao, M. Han, R. Ji, X. Li, Y. Mei, Q. Sun,X. Sun, K. Wang, L. Xiao, P. Yang, W. Zhou, A low-noise cmospixel direct charge sensor, topmetal-ii-, Nuclear Instruments andMethods in Physics Research Section A: Accelerators, Spectrome-ters, Detectors and Associated Equipment 810 (2016) 144 – 150. doi:10.1016/j.nima.2015.11.153 .URL [17] C. Gao, G. Huang, X. Sun, Topmetal-II-: a direct charge sensor forhigh energy physics and imaging applications, Journal of Instrumenta-tion 11 (01) (2016) C01053–C01053. doi:10.1088/1748-0221/11/01/c01053 .URL https://doi.org/10.1088%2F1748-0221%2F11%2F01%2Fc01053 [18] Z. Li, Y. Fan, Z. Wang, J. Liu, X. Sun, C. Zhao, H. Pei, D. Wang,G. Huang, D. Zhang, D. Sun, P. Yang, C. Gao, L. Xiao, A new methodfor directly locating single-event latchups using silicon pixel sensors ina gas detector, Nuclear Instruments and Methods in Physics ResearchSection A: Accelerators, Spectrometers, Detectors and AssociatedEquipment 962 (2020) 163697. doi:10.1016/j.nima.2020.163697 .URL [19] L. L, H. Yi, Z. Xiao, M. Shao, S. Zhang, G. Xiao, N. Xu, Concep-tual design of the hirﬂ-csr external-target experiment, Science ChinaPhysics, Mechanics & Astronomy 60 (1) (2016) 012021. doi:10.1007/s11433-016-0342-x .URL https://doi.org/10.1007/s11433-016-0342-x [20] J. Canny, A computational approach to edge detection, IEEE Transac-tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986)679–698. doi:10.1109/TPAMI.1986.4767851 .[21] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: 3rd International Conference on Learning46epresentations (ICLR), 2015.URL http://arxiv.org/abs/1409.1556 [22] S. Ioﬀe, C. Szegedy, Batch normalization: Accelerating deep networktraining by reducing internal covariate shift, in: Proceedings of the32nd International Conference on International Conference on MachineLearning - Volume 37, ICML15, JMLR.org, 2015, p. 448456.[23] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for imagerecognition, in: 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2016, pp. 770–778. doi:10.1109/CVPR.2016.90 .[24] M. Giles, An extended collection of matrix derivative results for for-ward and reverse mode automatic diﬀerentiation, Report, University ofOxford (2008).[25] H. Schindler, Microscopic simulation of particle detectors, Thesis (2012).[26] R. Brun, F. Rademakers, Root an object oriented data analysis frame-work, Nuclear Instruments and Methods in Physics Research SectionA: Accelerators, Spectrometers, Detectors and Associated Equipment389 (1) (1997) 81 – 86, new Computing Techniques in Physics ResearchV. doi:10.1016/S0168-9002(97)00048-X .URL [27] J. F. Ziegler, M. Ziegler, J. Biersack, Srim the stopping and range ofions in matter (2010), Nuclear Instruments and Methods in PhysicsResearch Section B: Beam Interactions with Materials and Atoms268 (11) (2010) 1818 – 1823, 19th International Conference on IonBeam Analysis. doi:10.1016/j.nimb.2010.02.091 .URL [28] J. M. Valentine, S. C. Curran, Average energy expenditure per ion pairin gases and gas mixtures, Reports on Progress in Physics 21 (1) (1958)1–29. doi:10.1088/0034-4885/21/1/301 .URL https://doi.org/10.1088%2F0034-4885%2F21%2F1%2F301 [29] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectiﬁers: Sur-passing human-level performance on imagenet classiﬁcation, in: 201547EEE International Conference on Computer Vision (ICCV), 2015, pp.1026–1034. doi:10.1109/ICCV.2015.123 .[30] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga,S. Moore, D. G. Murray, B. Steiner, P. A. Tucker, V. Vasudevan,P. Warden, M. Wicke, Y. Yu, X. Zheng, Tensorﬂow: A system for large-scale machine learning, in: K. Keeton, T. Roscoe (Eds.), 12th USENIXSymposium on Operating Systems Design and Implementation, OSDI2016, Savannah, GA, USA, November 2-4, 2016, USENIX Association,2016, pp. 265–283.URL