[PDF] DIFFnet: Diffusion parameter mapping network generalized for input diffusion gradient schemes and bvalues

Abstract

In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generalized reconstruction tool of the diffusion-weighted signals for various gradient schemes and b-values. For generalization, diffusion signals are normalized in a q-space and then projected and quantized, producing a matrix (Qmatrix) as an input for the network. To demonstrate the validity of this approach, DIFFnet is evaluated for diffusion tensor imaging (DIFFnetDTI) and for neurite orientation dispersion and density imaging (DIFFnetNODDI). In each model, two datasets with different gradient schemes and b-values are tested. The results demonstrate accurate reconstruction of the diffusion parameters at substantially reduced processing time (approximately 8.7 times and 2240 times faster processing time than conventional methods in DTI and NODDI, respectively; less than 4% mean normalized root-mean-square errors (NRMSE) in DTI and less than 8% in NODDI). The generalization capability of the networks was further validated using reduced numbers of diffusion signals from the datasets. Different from previously proposed deep neural networks, DIFFnet does not require any specific gradient scheme and b-value for its input. As a result, it can be adopted as an online reconstruction tool for various complex diffusion imaging.

Full PDF

-- 1 -

DIFFnet: Diffusion parameter mapping network generalized for input diffusion gradient schemes and b-values

Juhung Park , Woojin Jung , Eun-Jung Choi , Se-Hong Oh , Dongmyung Shin , Hongjun An and Jongho Lee Author affiliations: Laboratory for Imaging Science and Technology, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do, Korea

Corresponding Author:

Jongho Lee, Ph.D Department of Electrical and Computer Engineering, Seoul National University Building 301, Room 1008, 1 Gwanak-ro, Gwanak-gu, Seoul, Korea Tel: 82-2-880-7310 E-mail: [email protected]

2 -

Abstract

DTI ) and for neurite orientation dispersion and density imaging (DIFFnet

NODDI ). In each model, two datasets with different gradient schemes and b-values are tested. The results demonstrate accurate reconstruction of the diffusion parameters at substantially reduced processing time (approximately 8.7 times and 2240 times faster processing time than conventional methods in DTI and NODDI, respectively; less than 4% mean normalized root-mean-square errors (NRMSE) in DTI and less than 8% in NODDI). The generalization capability of the networks was further validated using reduced numbers of diffusion signals from the datasets. Different from previously proposed deep neural networks, DIFFnet does not require any specific gradient scheme and b-value for its input. As a result, it can be adopted as an online reconstruction tool for various complex diffusion imaging.

3 -

INTRODUCTION

Diffusion magnetic resonance imaging (dMRI) non-invasively measures the diffusion characteristics of water molecules and has been widely applied in neuroscience and clinic [1], [2]. In dMRI, various microstructure diffusion models have been developed to extract complex diffusion characteristics [3]-[5]. Among them, diffusion tensor imaging (DTI) [4] and neurite orientation dispersion and density imaging (NODDI) [3] are popular models, measuring water diffusivity and tissue microstructural properties. When reconstructing a complex microstructure diffusion model (e.g., NODDI), non-linear fitting is often required, costing substantial processing time. For example, the processing time of the NODDI model for a whole-brain dataset takes over 10 hours, limiting real-time processing on the scanner. To amend this limitation, various methods have been proposed to reduce the amount of computation cost [6], [7]. However, the processing time still takes several minutes in spite of decreased accuracy. Hence, further efforts are necessary to reduce the processing time while maintaining accuracy. Recently, deep learning has been widely applied for the reconstruction and processing of MRI data [8]. A deep neural network trained with a sufficient amount of dataset has been shown to generate highly accurate results at a significantly reduced computational cost [9], [10]. However, when a test dataset has different characteristics (e.g., resolution, signal to noise ratio) from the training dataset, the performance of the deep neural network degrades substantially [11], [12]. This issue of generalization in deep neural networks has been demonstrated to be critical when applying networks for routine practice. In dMRI reconstruction, neural networks have successfully generated accurate results at a reduced processing time [13]-[15]. In our knowledge, however, these networks proposed so far require a specific diffusion gradient scheme (i.e., specific gradient directions and number of gradients) with specific b-values that are the same as the training data as the input of the network. Such networks require a new training, when input data have a different gradient scheme or b-value, costing a long training time and efforts. In this study, we present a new deep neural network for diffusion data reconstruction. This deep neural network, referred to as DIFFnet, is generalized for input gradient schemes and b-values by introducing an input matrix. Two DIFFnets are designed and evaluated: DIFFnet

DTI for DTI and DIFFnet

NODDI for NODDI. The results of DIFFnet are compared with

4 - respect to the conventional fitting methods and the previously proposed neural networks. The source code of DIFFnet is available at https://github.com/SNU-LIST/DIFFnet.

5 -

METHODS

Deep neural networks: DIFFnet

The outline of DIFFnet is presented in Fig. 1. For DTI, DIFFnet

DTI was designed to reconstruct the four DTI model parameters, fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) from DTI data, which had 𝑛 number of the diffusion gradient directions with a single b-value of 𝑏 s/mm (Fig. 1a). For NODDI, DIFFnet NODDI generates the three NODDI model parameters, intracellular volume fraction (ICVF), isotropic volume fraction (ISOVF), and orientation dispersion index (ODI) from three-shell diffusion data with three b-values ( 𝑏 , 𝑏 , and 𝑏 s/mm ) and corresponding three numbers ( 𝑛 , 𝑛 , and 𝑛 ) of diffusion gradient directions (Fig. 1b). The networks were targeted to produce the parameter maps from any reasonable diffusion gradient schemes and b-values. Figure 1.

Overview of DIFFnet. Two DIFFnets, DIFFnet

DTI for DTI and DIFFnet

NODDI for NODDI were designed. (a) DIFFnet

DTI reconstructed the four DTI parameters, FA, MD, AD, and RD, from DTI data, which had 𝑛 number of diffusion gradient directions with a single b-value of 𝑏 s/mm . (b) DIFFnet NODDI generated the three NODDI parameters, ICVF, ISOVF, and ODI, from three-shell diffusion data with three b-values ( 𝑏 , 𝑏 , and 𝑏 s/mm ) and corresponding three numbers ( 𝑛 , 𝑛 , and 𝑛 ) of diffusion gradient directions. The networks were targeted to produce the parameter maps from any reasonable diffusion gradient schemes and b-values. The network structure of DIFFnet was a modified version of the residual neural network, shown in Supplementary Information Fig. S1 [16]. The network was end-to-end trained using one set of diffusion-weighted signals, normalized by the signals with no diffusion weighting, as an input and diffusion model parameters (e.g., FA, MD, AD, and RD or ICVF, ISOVF, and ODI) as a label. 6 - To achieve the generalization for gradient directions and b-values, we introduced an input matrix “Qmatrix”, which was a projected and quantized input matrix for the diffusion data. In the first step, a diffusion-weighted signal set was placed in a q-space [17] with q-vectors normalized by the b-value of 1300 s/mm for DTI or 2300 s/mm for NODDI [18] (Fig. 2a). From this q-space, two different input matrix formats, Qmatrix and Qmatrix , were designed and tested. In Qmatrix , the q-space was quantized by 𝑞 𝑛 (Fig. 2c), which was evaluted for five different values (5, 10, 15, 20, and 25), along the three axes, producing a 𝑞 𝑛 × 𝑞 𝑛 × 𝑞 𝑛 matrix as the input of the network (Fig. 2b). This matrix was used for both DTI and NODDI. In Qmatrix for DTI, the q-space was projected onto xy-, yz-, and xz-planes, then quantized by 𝑞 𝑛 × 𝑞 𝑛 , which was also tested for the five different values, producing a 𝑞 𝑛 × 𝑞 𝑛 × 3 matrix (Fig. 2e). For NODDI, the 2D projection was performed for each of the three shells, generating a 𝑞 𝑛 × 𝑞 𝑛 × 9 matrix (Fig. 2f) (see Discussion for using the 𝑞 𝑛 × 𝑞 𝑛 × 3 matrix in NODDI). Consequently, the gradient direction and b-value of each signal were represented by the position of the signal in Qmatrix, allowing various gradient schemes and b-values to be used for the input of DIFFnet. 7 - Figure 2.

Design of Qmatrix for the generalization of diffusion parameter mapping for input diffusion gradient schemes and b-values. (a) A signal set was placed in the q-space with q-vectors normalized by the b-value of 1300 s/mm for DTI or 2300 s/mm for NODDI. From this q-space, two different input matrix formats, Qmatrix and Qmatrix , were designed. (b and c) For Qmatrix , the q-space was quantized by 𝑞 𝑛 along the three axes, producing a 8 - 𝑞 𝑛 × 𝑞 𝑛 × 𝑞 𝑛 matrix as the input matrix. In Qmatrix , the design was different for DTI and NODDI. (e) For Qmatrix in DTI, the signal set was projected onto the xy- (blue dots), yz- (orange dots), and xz- (green dots) planes. Then, each projected data were quantized by 𝑞 𝑛 × 𝑞 𝑛 and concatenated, producing a 𝑞 𝑛 × 𝑞 𝑛 × 3 matrix. (f) For NODDI, the projection was performed on each shell, generating nine sets of projected planes with a 𝑞 𝑛 × 𝑞 𝑛 × 9 matrix. Training dataset generation & training

The training dataset of DIFFnet was generated solely from Monte-Carlo diffusion simulation. The diffusion characteristic of protons was modeled as a tensor, which had three diffusion coefficients ( 𝑑 , 𝑑 , and 𝑑 ) with the corresponding eigenvectors ( 𝑒⃗ , 𝑒⃗ , and 𝑒⃗ ), and was determined by either DTI model or NODDI model. The diffusion tensor of the DTI simulation was determined as follows: First, 𝑑 was randomly chosen between 0 and 3.5×10 -3 mm s -1 . Subsequently, 𝑑 and 𝑑 were also decided between 0 and 𝑑 . The diffusion vector 𝑒⃗ was determined at random. The b-value ( 𝑏 ) and number of directions ( 𝑛 ) were chosen in the range of 600 to 1300 s/mm and 30 to 80, respectively. In the NODDI simulation, intracellular, extracellular, and cerebrospinal fluid (CSF) compartments were simulated [3]. Protons were allocated to the three compartments following ICVF and ISOVF, which were determined randomly between 0 and 1. Then, the diffusion characteristics were set as follows: In the intracellular compartment, 𝑑 was fixed as the parallel diffusion coefficient 𝑑 ‖ (= 1.7×10 -3 mm s -1 ), while 0 for the others [19]. For 𝑒⃗ , Watson distribution was utilized, randomly selecting the mean orientation 𝜇⃗ and ODI (0 to 1) [3], [20]. In the extracellular compartment, 𝑑 was set to be the apparent parallel diffusion coefficient 𝑑 ‖ ′ whereas the others were set to be the apparent perpendicular diffusion coefficient 𝑑 ⊥ ′ , of which 𝑑 ‖ ′ and 𝑑 ⊥ ′ are the functions of ODI [3], [21], [22]. It was assumed that 𝑒⃗ is the same as 𝜇⃗ . Lastly, in the CSF compartment, an isotropic diffusion coefficient 𝑑 𝑖𝑠𝑜 (= 3.0×10 -3 mm s -1 ) was used for all three diffusion coefficients [3]. The ranges for 𝑏 , 𝑏 , and 𝑏 were 200 to 400, 500 to 900, 1700 to 2300 s/mm , respectively. Those for 𝑛 , 𝑛 , and 𝑛 were 5 to 10, 25 to 50, and 50 to 100, respectively. A total of 10 protons were generated to create a diffusion-weighted signal. Each proton was assumed to have a unit magnetization, and performed random walks with Gaussian distribution for a time step of Δt (= 0.2 ms). The diffusion simulation was conducted based on a pulsed gradient spin-echo diffusion sequence [2] with TE of 72 ms in DTI and 95 ms in 9 - NODDI. For each time step, the phase of each spin magnetization, which was affected by the diffusion gradient, was accumulated. When the simulation reached TE, the average signal of the protons was calculated by the complex sum of all the magnetizations. The signal to noise ratio (SNR) was selected between 30 to 100 by adding Gaussian noise to real and imaginary axes of the signals. The diffusion simulations were performed using MATLAB (MATLAB 2019a, MathWorks Inc., Natick, MA, USA). For the labels of DIFFnet DTI , the simulated data were processed using the conventional method [4], [23], generating FA, MD, AD, and RD. In DIFFnet

NODDI , ICVF, ISOVF, and ODI from the conventional method were used as the labels [3]. In both simulations, A total of 10 input signal sets and label model parameter pairs were produced for the training. The training was performed on a GPU workstation (NVIDIA GeForce GTX 1080Ti GPU [NVIDIA Corp., Santa Clara, CA] with Intel Xeon CPU E5-2630 v3 at 2.40GHz [Intel Corp., Santa Cruz, CA]) using TensorFlow [24]. The initial weights of the convolutional kernel were set by Xavier initializer [25]. The batch size was 100. The loss function was defined as mean-squared-error and Adam optimizer was utilized [26]. The initial learning rate was 10 -3 with a decaying factor of 0.87 for each epoch. The training process was stopped after 50 epochs. MRI data acquisition & post-processing

For the validation of DIFFnet

DTI and DIFFnet

NODDI , two types of in-vivo data (Dataset

DTI-A and Dataset

DTI-B for DTI; Dataset

NODDI-A and Dataset

NODDI-B for NODDI) were used. Dataset

DTI-A and Dataset

NODDI-A were from Jung, et al. [27]. Dataset

DTI-B and Dataset

NODDI-B were obtained for this study to test the effects of a different diffusion gradient scheme and had different gradient directions and b-values from Dataset

DTI-A and Dataset

NODDI-A . All subjects (10 subjects) were scanned with a 3T MRI system (Tim Trio, SIEMENS, Erlangen, Germany) using a 32-channel phased-array head coil. The study was approved by the institutional review board. For Dataset

DTI-A , single-shell data (b = 700 s/mm with 32 directions; b = 0 s/mm with 13 averages) were acquired using a single-shot spin-echo echo-planar-imaging (SE-EPI) sequence. The scan parameters were TR/TE = 4000/95 ms, FOV = 192×192 mm , voxel size = 2×2 mm , slice thickness = 2 mm, multi-band factor = 2, GRAPPA factor = 2, and partial Fourier = 6/8. The dataset had five subjects. 10 - For Dataset NODDI-A , the same data as in Dataset

DTI-A were used, with an addition of b = 300 s/mm in 8 directions and b = 2000 s/mm in 64 directions data. For Dataset DTI-B , five healthy subjects were scanned. Single-shell data (b = 1000 s/mm with 30 directions; b = 0 s/mm with 4 averages) were acquired using a single-shot SE-EPI sequence. The scan parameters were TR/TE = 3500/72 ms, FOV = 256×256 mm , voxel size = 2×2 mm , slice thickness = 2 mm, multi-band factor = 2, GRAPPA factor = 3, and partial Fourier = 6/8. For Dataset NODDI-B , five healthy subjects were scanned. Three-shell data (b = 300 s/mm with 8 directions; b = 700 s/mm with 30 directions; b = 2000 s/mm with 60 directions; b = 0 s/mm with 13 averages) were acquired using a single-shot SE-EPI sequence. The scan parameters were TR/TE = 3000/105 ms, FOV = 240×240 mm , voxel size = 1.5×1.5 mm , slice thickness = 2 mm, GRAPPA factor = 2, and partial Fourier = 6/8. To compensate for EPI geometric distortion, a reversed-phase encoding direction scan with b = 0 s/mm was acquired for all datasets. The b-values and the number of gradient vectors of all datasets are summarized in Table I (see Supplementary Information Table SI for the directions of the gradient vectors). All datasets had different gradient vector directions. All datasets were processed as follows: TOPUP and EDDY (FSL, FMRIB, Oxford, UK) [28] were used for geometric distortion. A brain tissue mask was generated from the magnitude image with b = 0 s/mm using BET (FSL, Oxford, UK) [29]. For Dataset DTI-A and Dataset

DTI-B , the DTI parameters were reconstructed by least-square-fitting as references [30], [31]. For Dataset

NODDI-A and Dataset

NODDI-B , the NODDI parameters were reconstructed by conventional NODDI as references [3]. Additionally, the NODDI parameters were reconstructed using accelerated microstructure imaging via convex optimization (AMICO), which is commonly utilized for NODDI reconstruction because of computational efficiency [32], [33]. 11 -

Table 1.

List of the b-values and the numbers of diffusion directions in test datasets. The directions of the gradient vectors are reported in Supplementary information Table S9.

Evaluation

To determine the quantization size of Qmatrix ( 𝑞 𝑛 = 5, 10, 15, 20, and 25 for both Qmatrix and Qmatrix ), normalized root-mean-square-errors (NRMSEs) were calculated in the brain mask with respect to the reference maps. The data processing time of each quantization size was also computed. After deciding the optimum quantization size, the performance of DIFFnet was compared with a previously proposed neural network [14], which utilized multi-layer perceptron (MLP). Two MLPs were designed: MLP DTI having five fully connected layers with 32, 256, 256, 256, and 4 neurons for DTI, and MLP

NODDI having five fully connected layers with 104, 400, 400, 400, and 3 neurons for NODDI. For the input, 32 diffusion signals were used whereas it was increased to 104 signals for MLP

NODDI . If a test dataset had less than 32 or 104 signals, the rest was zero-padded. For the training of MLPs, datasets were generated using our Monte-Carlo diffusion simulation with the gradient scheme the same as Dataset

DTI-A or Dataset

NODDI-A . All the other training parameters and procedures were the same as those in DIFFnet. All the processing methods (conventional fitting, DIFFnet, and MLP, additionally AMICO in NODDI) were evaluated by the four datasets. The data processing time for each method was measured. NRMSEs were estimated in the brain mask with respect to the reference maps. A Wilcoxon rank-sum test was performed for NRMSEs between DIFFnet and MLP or between DIFFnet and AMICO for each of the model parameters. For statistical significance, a b-value (s/mm ) Number of directions DTI Dataset DTI-A

700 32 Dataset

DTI-B

NODDI-A

300 8 700 32 2000 64 Dataset

NODDI-B

300 8 700 30 2000 60

12 - p-value threshold was set to be 0.05. To further demonstrate the generalization capability of DIFFnet for a various number of gradient directions, DIFFnet was evaluated using five different numbers of the gradient directions in all four datasets (see Supplementary Information S1 for the numbers of gradient directions and the criteria of selecting gradient directions). The performance of DIFFnet was compared with the conventional methods. Additionally, DIFFnet

NODDI was evaluated by a two-shell NODDI protocol, which is also commonly used in practice, using six different numbers of the gradient directions of Dataset

NODDI-A and Dataset

NODDI-B . 13 -

RESULTS

When we investigated the effects of the quantization in the q-space using a range of 𝑞 𝑛 , the results reveal the minimum mean NRMSEs at 𝑞 𝑛 of 20 in Qmatrix and 15 in Qmatrix (Table II). Between the two results, they show similar mean NRMSEs (no statistical difference), but the processing time of Qmatrix is faster than Qmatrix (13.3 times in DIFFnet DTI and 13.0 times in DIFFnet

NODDI ). Hence, Qmatrix with 𝑞 𝑛 of 20 is chosen as the default format for DIFFnet hereafter. Table 2 . Mean NRMSE and processing time measured using a range of 𝑞 𝑛 , which is the number of quantization, in Qmatrix and Qmatrix . Qmatrix with 𝑞 𝑛 of 20 is chosen as the default format for DIFFnet. In Fig. 3, the DTI maps of the two datasets with different gradient schemes are reconstructed by the conventional fitting, DIFFnet DTI , and MLP trained with the gradient scheme of Dataset

DTI-A . DIFFnet

DTI generates highly accurate parameter maps with respect to those using the conventional fitting in both datasets (NRMSEs of FA: 3.73 ± ± ± ± DTI-A , FA: 3.89 ± ± ± ± DTI-B ). The error maps also confirm little difference between the two maps (Fig. 3). The mean processing time of DIFFnet is measured to be faster than that of the conventional fitting (26.7 ± 1.6 s in

DIFFnet

DTI ; 46.1 ± DTI-A , which has the same gradient scheme as in the MLP training, MLP generates highly accurate parameter maps, showing similar NRMSEs to DIFFnet (NRMSEs of FA: 3.45 ± ± ± ± DTI-B , which has a different gradient scheme, 𝑞 𝑛 NRMSE (%) Processing time (seconds) DIFFnet

DTI

DIFFnet

NODDI

DIFFnet

DTI

DIFFnet

NODDI

Qmatrix

5 4.35 ± ± ± ± ± ± ± ± ± ±

5 3.41 ± ± ± ± ± ± ± ± ± ±

14 - reporting significantly larger NRMSEs than DIFFnet (NRMSEs of FA: 27.52 ± ± ± ± p = 0.008 for FA, p = 0.008 for MD, p = 0.008 for AD and p = 0.008 for RD). The mean processing time of MLP is 10.3 ± Figure 3 . DTI maps of the two datasets with different gradient schemes. The b-value and number of the gradient vectors for each test dataset are displayed at the top. (a) The DTI maps of Dataset

DTI-A (first to fourth columns) reconstructed by least-square-fitting (first row), DIFFnet

DTI (second row), and MLP (third row) are shown. The error maps of DIFFnet

DTI (fourth row) and MLP (last row) are also included (display range is reduced by a factor of 4 for visualization). (b) The DTI maps of Dataset

DTI-B (fifth to last columns) reconstructed by least-square-fitting (first row), DIFFnet

DTI (second row), and MLP (third row) are displayed along 15 - with the errors maps of DIFFnet

DTI (fourth row) and MLP (last row). The NRMSE is shown at the top of each error map (* denotes a statistically significant difference between the NRMSEs of DIFFnet

DTI and MLP). In the reconstruction of the NODDI maps, DIFFnet

NODDI successfully generates the parameter maps of the two datasets with different gradient schemes (Fig. 4; NRMSEs of ICVF: 3.95 ± ± ± NODDI-A , ICVF: 3.59 ± ± ± NODDI-B ). Compared to the results of the AMICO reconstruction (NRMSEs of ICVF: 6.67 ± ± ± NODDI-A , ICVF: 5.81 ± ± ± NODDI-B ), those of DIFFnet

NODDI show lower NRMSEs (Wilcoxon rank-sum test results: p = 0.008 for ICVF, p = 0.008 for ISOVF, and p = 0.095 for ODI in Dataset NODDI-A ; p = 0.008 for ICVF, p = 0.008 for ISOVF, and p = 0.150 for ODI in Dataset NODDI-B ). This tendency can be confirmed in the error maps, which show higher errors in the results of AMICO than DIFFnet

NODDI , particularly in ICVF and ISOVF (Fig. 4). Another advantage of DIFFnet

NODDI is reconstruction time. Compared to AMICO and NODDI, DIFFnet

NODDI reveals approximately 8.7 times and 2240 times faster processing time, respectively (27.8 ± 1.4 s in DIFFnet

NODDI ; 242.5 ± 11.8 s in AMICO; 17.3 ± 0.8 h in NODDI). When the performance of DIFFnet

NODDI is compared with MLP, similar trends to those in DTI are observed. For Dataset

NODDI-A , which has the same gradient scheme as in the MLP training, MLP results show similar NRMSEs to those of DIFFnet

NODDI (ICVF: 3.88 ± ± ± NODDI-B , which has a different gradient scheme, reporting significantly larger NRMSEs (NRMSEs of ICVF: 12.91 ± ± ± p = 0.008 for ICVF, p = 0.008 for ISOVF, and p = 0.008 for ODI). The mean processing time of MLP is 13.1 ± Figure 4.

NODDI maps of the two datasets with different gradient schemes. The b-values and number of gradient vectors for each test dataset are displayed at the top. (a) The NODDI maps of Dataset

NODDI-A (first to third columns) reconstructed by NODDI (first row), DIFFnet

NODDI (second row), AMICO (third row), and MLP (fourth row) are shown. The error maps of DIFFnet

NODDI (fifth row), AMICO (sixth row), and MLP (last row) are also included (display range is reduced by a factor of 4). (b) The NODDI maps of Dataset

NODDI-B (fourth to last columns) reconstructed by NODDI (first row), DIFFnet

NODDI (second row), AMICO (third 17 - row), and MLP (fourth row) are displayed along with the error maps of DIFFnet

NODDI (fifth row), AMICO (sixth row), and MLP (last row). The NRMSE is shown at the top of each error map (* denotes there is a statistically significant difference in NRMSEs). When DIFFnets are further tested using the smaller numbers of the gradient directions for all the four datasets, DIFFnets successfully reconstructed the parameter maps of DTI and NODDI (Supplementary Information Fig. S2, S3, S4, and S5). Additionally, DIFFnet

NODDI successfully reconstructed NODDI results from the two-shell NODDI dataset (Supplementary Information Fig. S6 and S7), consolidating the generalization capability of DIFFnet. 18 -

DISCUSSION AND CONCLUSION

In this study, we developed a deep neural network, DIFFnet, to reconstruct the diffusion model parameters from diffusion-weighted signals. Unlike previously proposed deep neural networks [13]-[15], DIFFnet was targeted to generate the parameter maps from various gradient schemes and b-values. For the generalization of the input signals, Qmatrix was introduced via q-space projection and quantization. The performance of DIFFnet was evaluated in two diffusion models, DTI and NODDI, with two datasets in each model. The results of DIFFnet demonstrated successful reconstruction, differentiating it from MLP. In the NODDI reconstruction results, DIFFnet outperformed AMICO, reporting higher accuracy. The processing time of DIFFnet was less than 30s, suggesting it can be used for online reconstruction. In our diffusion simulation, the b-values and number of gradient directions were chosen to include commonly used scan protocols (DTI: b = 600 to 1000 s/mm

30 to 32 gradient directions; NODDI: b = 300 to 2000 s/mm

90 to 104 gradient directions) [27], [35]-[37]. Similarly, the normalization factors for Qmatrix (b = 1300 s/mm in DTI and b = 2300 s/mm in NODDI) were large enough to cover the b-values in the commonly used protocols. Nevertheless, for data with a higher b-value, additional simulation data with the corresponding b-value can be added for the training data. In the Qmatrix of NODDI, the projection was performed for each shell, generating a 𝑞 𝑛 × 𝑞 𝑛 × 9 matrix. When this design was compared with the Qmatrix projected for all shells (i.e., 𝑞 𝑛 × 𝑞 𝑛 × 3 matrix), our results showed higher accuracy (NRMSEs of ICVF: 3.95 ± 𝑞 𝑛 × 𝑞 𝑛 × 9 vs 4.23 ± 𝑞 𝑛 × 𝑞 𝑛 × 3 ; all the other parameters presented similar trends). This inferior performance of the 𝑞 𝑛 × 𝑞 𝑛 × 3 matrix may be explained by the different intensity ranges between the signals with different b-values. Low-intensity signals from the high b-value diffusion signals may be inaccurately processed. When comparing 𝑞 𝑛 , the largest 𝑞 𝑛 (= 25) provided the highest resolution in the q-space. However, the results showed the lowest NRMSEs when 𝑞 𝑛 of 20 in Qmatrix and 15 in Qmatrix . This result may be explained by the size of the convolutional kernel in DIFFnet, which utilized 7 × × × 𝑞 𝑛 larger than 20 (or 15). In our DTI simulation, only 𝑑 was assumed to be the largest among the three 19 - diffusion coefficients. This did not lose generality despite the common assumption of 𝑑 > 𝑑 > 𝑑 . The computational times for conventional DTI and NODDI including AMICO were calculated using CPU processing whereas that of DIFFnet using GPU processing. Hence the comparison was not fair. The use of GPU for NODDI reconstruction, however, has not been demonstrated. In DIFFnet, two diffusion models, DTI and NODDI, were chosen as exemplary diffusion models. Since our approach of using Qmatrix is general for any diffusion imaging, it can be applied to other diffusion models. Acknowledgments

This research was supported by the National Research Foundation of Korea (NRF-2018R1A4A1025891 and NRF-2017M3C7A1047864), the Institute of New Media and Communications, and the Institute of Engineering Research at Seoul National University. 20 -

Reference [1]

D. Le Bihan, E. Breton, D. Lallemand, P. Grenier, E. Cabanis, and M. Laval-Jeantet, "MR imaging of intravoxel incoherent motions: application to diffusion and perfusion in neurologic disorders,"

Radiology, vol. 161, no. 2, pp. 401-407, 1986. [2]

E. O. Stejskal and J. E. Tanner, "Spin diffusion measurements: spin echoes in the presence of a time‐dependent field gradient,"

The journal of chemical physics, vol. 42, no. 1, pp. 288-292, 1965. [3]

H. Zhang, T. Schneider, C. A. Wheeler-Kingshott, and D. C. Alexander, "NODDI: Practical in-vivo neurite orientation dispersion and density imaging of the human brain,"

NeuroImage, vol. 61, pp. 1000-16, Jul. 2012. [4]

P. J. Basser, J. Mattiello, and D. LeBihan, "MR diffusion tensor spectroscopy and imaging,",

Biophys J, vol. 66, no. 1, pp. 259-67, Jan. 1994. [5]

D. S. Tuch, T. G. Reese, M. R. Wiegell, N. Makris, J. W. Belliveau, and V. J. Wedeen, "High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity," (in eng),

Magn Reson Med, vol. 48, no. 4, pp. 577-82, Oct. 2002. [6]

A. Daducci, E. J. Canales-Rodríguez, H. Zhang, T. B. Dyrby, D. C. Alexander, and J.-P. Thiran, "Accelerated Microstructure Imaging via Convex Optimization (AMICO) from diffusion MRI data,"

NeuroImage, vol. 105, pp. 32-44, Jan. 2015. [7]

M. Hernandez-Fernandez, I. Reguly, S. Jbabdi, M. Giles, S. Smith, and S. N. Sotiropoulos, "Using GPUs to accelerate computational diffusion MRI: From microstructure estimation to tractography and connectomes,"

NeuroImage, vol. 188, pp. 598-615, Mar. 2019. [8]

J. Schmidhuber, "Deep learning in neural networks: An overview,"

Neural Networks, vol. 61, pp. 85-117, Jan. 2015. [9]

J. Yoon et al. , "Quantitative susceptibility mapping using deep neural network: QSMnet,”

Neuroimage, vol. 179, pp. 199-206, Oct. 2018. [10]J. Lee, D. Lee, J. Y. Choi, D. Shin, H. G. Shin, and J. Lee, "Artificial neural network for myelin water imaging,"

Magn. Reson. Med., vol. 83, no. 5, pp. 1875–1883, Oct. 2019. [11]W. Jung, S. Bollmann, and J. Lee, "Overview of quantitative susceptibility mapping using deep learning: Current status, challenges and opportunities,"

NMR in Biomed., p. e4292, Mar. 2020. [12]Golkov et al. , "q-Space Deep Learning: Twelve-Fold Shorter and Model-Free Diffusion MRI Scans,"

IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1344-1351, May. 2016. [13]Y. Masutani, "Noise Level Matching Improves Robustness of Diffusion Mri Parameter Inference by Synthetic Q-Space Learning," in

Proc. IEEE 16th Int. Symp. Biomed. Imag. (ISBI),

Apr. 2019, pp. 139-142. [14]C. Ye, Y. Cui, and X. Li, "Q-space learning with synthesized training data," in

Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI),

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Jun. 2016, pp. 770–778. [16]P. T. Callaghan,

Principles of nuclear magnetic resonance microscopy . Oxford, U.K.: Clarendon, 1991. [17]D. S. Tuch, "Q-ball imaging,"

Magn. Reson. Med., vol. 52, no. 6, pp. 1358-1372, Nov. 2004. [18]Y. Assaf and Y. Cohen, "Assignment of the water slow‐diffusing component in the central nervous system using q‐space diffusion MRS: implications for fiber tract imaging,"

Magn. Reson. Med, vol. 43, no. 2, pp. 191-199, Mar. 2000. 21 - [19]H. Zhang, P. L. Hubbard, G. J. Parker, and D. C. Alexander, "Axon diameter mapping in the presence of orientation dispersion with diffusion MRI,"

Neuroimage, vol. 56, no. 3, pp. 1301-1315, Jun. 2011. [20]M. Abramowitz and I. A. Stegun (Eds.), "Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing,"

New York: Dover , pp. 880, 1972. [21]A. Szafer, J. Zhong, and J. C. Gore, "Theoretical model for water diffusion in tissues,"

Magn. Reson. Med., vol. 33, no. 5, pp. 697-712, May.1995. [22]P. J. Basser and D. K. Jones, "Diffusion‐tensor MRI: theory, experimental design and data analysis–A technical review,"

NMR in Biomed., vol. 15, no. 7‐8, pp. 456-467, Dec. 2002. [23]M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” in

Proc. 12th USENIX Symp. OSDI , Atlanta, GA, USA, Nov. 2016, pp. 265–283. [24]X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," in

Proc. Fourteenth Int. Conf. Artif. Intell. Statist. , 2011, pp. 315-323. [25]D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in

Proc. ICLR , et al. , "Whole brain g-ratio mapping using myelin water imaging (MWI) and neurite orientation dispersion and density imaging (NODDI)," NeuroImage, vol. 182, pp. 379-388, Nov. 2018. [27]J. L. Andersson, S. Skare, and J. Ashburner, "How to correct susceptibility distortions in spin-echo echo-planar images: application to diffusion tensor imaging,"

Neuroimage, vol. 20, no. 2, pp. 870-888, Oct. 2003. [28]S. M. Smith, "Fast robust automated brain extraction,"

Human brain mapping, vol. 17, no. 3, pp. 143-155, Sep. 2002. [29]P. J. Basser and C. Pierpaoli, “Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI,” J. Magn. Reson. B, vol. 111, no. 3, pp. 209–219, June 1996. [30]P. J. Basser, J. Mattiello, and D. LeBihan, “Estimation of the effective self-diffusion tensor from the NMR spin echo,” J. Magn. Reson. Imag., Ser. B, vol. 103, no. 3, pp. 247–254, Mar. 1994. [31]K. L. Miller et al. , "Multimodal population brain imaging in the UK Biobank prospective epidemiological study,"

Nature Meurosci., vol. 19, no. 11, pp. 1523-1536, Sep. 2016. [32]B. Lampinen, F. Szczepankiewicz, J. Mårtensson, D. van Westen, P. C. Sundgren, and M. Nilsson, "Neurite density imaging versus imaging of microscopic anisotropy in diffusion MRI: A model comparison using spherical tensor encoding,"

Neuroimage, vol. 147, pp. 517-531, Feb. 2017. [33]M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil, "Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction: Database-free deep learning for fast imaging,"

Magn. Reson. Med., vol. 81, no. 1, pp. 439-453, Sep. 2019. [34]P. Mukherjee, S. Chung, J. Berman, C. Hess, and R. Henry, "Diffusion tensor MR imaging and fiber tractography: technical considerations, " AJNR Amer. J. Neuroradiol., vol. 29, no. 5, pp. 843-852, May. 2008. [35]C. F. Slattery et al. , "ApoE influences regional white-matter axonal density loss in Alzheimer's disease,"

Neurobiol. Aging, vol. 57, pp. 8-17, Sep. 2017. [36]S. Groeschel et al. , "Assessing white matter microstructure in brain regions with different myelin architecture using MRI,"

PLoS One, vol. 11, no. 11, p. e0167274, Nov. 2016. 22 -

Supplementary Information

Supplementary information 1. To investigate the effects of the number of gradient directions, DIFFnet and the conventional methods were evaluated using five different numbers of gradient directions in all four datasets. For DIFFnet

DTI , 12, 16, 20, 24 and 28 gradient directions with b = 700 s/mm were tested in Dataset DTI-A , and 10, 14, 18, 22 and 26 gradiet directions with b = 1000 s/mm were tested in Dataset DTI-B . For DIFFnet

NODDI , 39, 52, 65, 78, and 91 gradient directions (3, 4, 5, 6, and 7 for b = 300 s/mm ; 12, 16, 20, 24, and 28 for b = 700 s/mm ; 24, 32, 40, 48, and 56 for b = 2000 s/mm ) were used in Dataset NODDI-A , and 33, 46, 59, 72, and 85 gradient directions (3, 4, 5, 6, and 7 for b = 300 s/mm ; 10, 14, 18, 22, and 26 for b = 700 s/mm ; 20, 28, 36, 44, and 52 for b = 2000 s/mm ) were used in Dataset NODDI-B . When selecting gradient directions, a combination that had the lowest condition number was chosen [4]. NRMSEs were estimated based on the reference maps, which utilized the full the gradient directions. Additionally, DIFFnet

NODDI was evaluated using a two-shell protocol. In Dataset

NODDI-A , 36, 48, 60, 72, 84, and 96 directions (12, 16, 20, 24, 28, and 32 for b = 700 s/mm ; 24, 32, 40, 48, 56, and 64 for b = 2000 s/mm ) were tested. In Dataset NODDI-B , 30, 42, 54, 66, 78, and 90 directions (10, 14, 18, 22, 26, and 30 for b = 700 s/mm ; 20, 28, 36, 44, 52, and 60 for b = 2000 s/mm ) were tested.

23 -

Fig. S1 . Detailed structure of DIFFnet. A modified version of the residual neural network was utilized. The network consisted of five stages, followed by one averaging pooling layer and two fully connected layers. Each stage contained 13 convolutional layers and 4 skip connections [1]. Each convolutional layer was followed by batch normalization [2] and leaky ReLU (alpha = 0.1) [3]. The first fully connected layer had 20 nodes, and the second fully connected layer had three and four nodes in DTI and NODDI, respectively. 24 -

Fig. S2 . (a) DTI parameter maps of Dataset

DTI-A reconstructed by the least-square-fitting and DIFFnet

DTI using the five different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet

DTI and least-square-fitting results. The reference for NRMSE was the least-square-fitting results using the 32 diffusion gradient directions.

25 -

Fig S3 . (a) DTI parameter maps of Dataset

DTI-B reconstructed by the least-square-fitting and DIFFnet

DTI using the five different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet

DTI and least-square-fitting results. The reference for NRMSE was the least-square-fitting results using the 30 diffusion gradient directions. 26 -

Fig S4 . (a) NODDI parameter maps of Dataset

NODDI-A reconstructed by the NODDI fitting and DIFFnet

NODDI using the five different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet

NODDI and NODDI results. The reference NRMSEs was the NODDI fitting results using the 104 diffusion gradient directions.

27 -

Fig S5. (a) NODDI parameter maps of Dataset

NODDI-B reconstructed by the NODDI fitting and DIFFnet

NODDI using the five different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet

NODDI and NODDI results. The reference maps was the NODDI fitting results using the 98 gradient directions.

28 -

Fig S6. (a) NODDI parameter maps of Dataset

NODDI-A reconstructed by the NODDI fitting and

DIFFnet

NODDI using the two-shell protocol (b = 700 and 2000 s/mm ), with the six different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet NODDI and NODDI results. The reference for NRMSE was the NODDI results using the 96 gradient directions (b = 700 s/mm with 32 directions; b = 2000 s/mm with 64 directions).

29 -

Fig S7. (a) NODDI parameter maps of Dataset

NODDI-B reconstructed by the NODDI fitting and DIFFnet

NODDI using the two-shell protocol (b = 700 and 2000 s/mm ), with the six different numbers of the diffusion gradients. (b) NRMSEs in the DIFFnet NODDI and NODDI results. The reference for NRMSE was the NODDI results using the 90 gradient directions (b = 700 s/mm with 30 directions; b = 2000 s/mm with 60 directions).

30 -

Dataset

DTI-A & Dataset

NODDI-A

Dataset

DTI-B

Dataset

NODDI-B b-value (s/mm ) x y z b-value (s/mm ) x y z b-value (s/mm ) x y z 300 0.392 -0.033 -0.919 1000 0.802 -0.064 -0.593 300 -0.594 -0.804 0.025 -0.391 -0.421 -0.819 -0.468 -0.55 -0.692 -0.014 -0.734 0.68 -0.592 -0.804 0.067 0.601 -0.493 -0.629 0.761 -0.646 -0.062 0.028 -0.73 0.683 -0.801 0.223 -0.556 0.967 0.144 -0.21 0.756 -0.646 -0.105 0.38 0.404 -0.832 0.448 -0.033 -0.893 0.954 0.143 -0.263 -0.103 0.449 -0.888 0.125 -0.919 -0.373 0.101 -0.922 -0.374 0.749 0.363 -0.555 -0.337 -0.419 -0.843 -0.754 0.244 -0.61 0.479 -0.041 -0.877 -0.709 0.244 -0.661 700 0.803 -0.064 -0.593 0.232 -0.429 -0.873 700 0.542 0.726 -0.423 -0.468 -0.551 -0.691 -0.207 -0.304 -0.93 0.133 0.965 -0.224 0.601 -0.494 -0.628 0.119 0.751 -0.649 -0.423 -0.549 -0.721 -0.802 0.223 -0.554 -0.437 0.091 -0.895 0.673 0.197 0.713 0.381 0.405 -0.832 -0.499 0.515 -0.697 0.638 -0.493 -0.592 -0.103 0.449 -0.887 0.058 0.049 -0.997 -0.763 0.223 -0.607 0.749 0.364 -0.554 0.519 0.854 -0.029 -0.013 0.734 0.679 0.48 -0.041 -0.877 0.515 0.727 -0.453 0.431 0.403 -0.807 0.232 -0.429 -0.873 0.118 0.966 -0.229 -0.932 -0.161 -0.324 -0.207 -0.305 -0.93 -0.952 -0.161 -0.261 -0.047 0.448 -0.893 0.119 0.752 -0.649 -0.761 -0.567 -0.314 -0.739 -0.567 -0.364 -0.437 0.092 -0.895 -0.698 0.654 -0.292 -0.677 0.654 -0.338 -0.499 0.515 -0.697 -0.945 0.294 -0.142 -0.934 0.293 -0.204 0.058 0.049 -0.997 -0.354 0.934 -0.049 -0.349 0.934 -0.074 0.519 0.854 -0.027 -0.681 0.715 0.161 -0.689 0.715 0.115 0.516 0.727 -0.453 0.891 -0.361 -0.277 0.781 0.363 -0.508 0.118 0.966 -0.228 -0.379 -0.846 -0.375 0.906 -0.36 -0.223 -0.952 -0.161 -0.26 -0.282 0.833 -0.476 -0.354 -0.845 -0.4 -0.761 -0.567 -0.314 0.842 0.525 -0.122 -0.251 0.832 -0.495 -0.698 0.654 -0.291 0.979 0.099 -0.18 0.848 0.525 -0.072 -0.946 0.293 -0.141 0.004 0.977 0.214 0.532 -0.041 -0.846 -0.353 0.934 -0.048 -0.348 0.817 0.46 0.286 -0.428 -0.858 -0.68 0.715 0.162 0.715 0.197 0.671 -0.148 -0.304 -0.941 0.891 -0.36 -0.276 0.03 0.732 0.681 0.988 0.099 -0.122 -0.379 -0.846 -0.374 0.159 0.75 -0.642 -0.282 0.834 -0.474 -0.009 0.978 0.21 0.842 0.525 -0.121 -0.376 0.818 0.435 0.979 0.099 -0.179 -0.378 0.091 -0.921 0.004 0.977 0.215 -0.454 0.513 -0.729 -0.348 0.816 0.461 0.12 0.05 -0.992 0.714 0.197 0.671 2000 -1 0.027 0 0.029 0.731 0.682 0.047 0.014 -0.999 2000 -0.016 0.015 -1 0.813 0.38 0.441 0.308 -0.091 -0.947 0.233 -0.891 -0.39 -0.329 0.035 -0.944 0.589 -0.252 -0.768 -0.056 0.318 -0.947 -0.112 -0.936 -0.334 0.065 -0.296 -0.953 0.818 -0.549 0.172 -0.24 -0.286 -0.928 0.366 -0.091 -0.926 0.277 0.218 -0.936 0.27 0.505 -0.82 0.798 0.137 -0.587 0.735 -0.666 -0.124 0.27 -0.713 -0.647 -0.284 -0.557 -0.78 -0.646 -0.682 -0.343 -0.268 0.034 -0.963 0.208 -0.892 -0.402 0.317 0.864 -0.392 0.54 -0.253 -0.803 -0.55 0.742 0.383 -0.133 -0.937 -0.324 -0.072 0.59 -0.804 0.218 0.506 -0.835 0.929 0.086 0.359 -0.333 -0.558 -0.76 0.004 0.317 -0.949 0.292 0.865 -0.409 -0.107 0.825 -0.556 -0.122 0.591 -0.798 0.592 -0.79 0.158 TABLE

SI G

RADIENT D IRECTIONS OF T HE T EST D ATASETS

31 - -0.142 0.826 -0.546 0.765 0.444 -0.467 0.735 0.445 -0.512 0.661 0.75 0.035 -0.613 -0.503 -0.61 -0.572 -0.502 -0.648 -0.832 0.382 -0.403 -0.804 0.381 -0.457 -0.371 0.345 -0.862 -0.316 0.344 -0.884 -0.763 -0.186 -0.618 -0.722 -0.186 -0.666 0.927 -0.117 -0.356 0.125 -0.295 -0.947 -0.063 -0.797 -0.601 0.947 -0.117 -0.3 0.769 -0.194 -0.608 -0.026 -0.795 -0.606 -0.523 -0.249 -0.815 0.805 -0.194 -0.561 -0.379 -0.767 -0.517 0.454 -0.883 -0.115 -0.638 0.377 -0.671 -0.846 -0.51 0.154 0.306 -0.465 -0.831 -0.41 -0.883 -0.229 0.534 0.371 -0.76 -0.47 -0.249 -0.847 -0.408 0.823 -0.395 -0.345 -0.766 -0.542 -0.837 0.105 -0.538 -0.594 0.376 -0.711 0.928 0.216 -0.303 -0.946 0.217 -0.239 0.458 0.643 -0.614 0.356 -0.464 -0.811 0.612 0.713 -0.342 -0.18 -0.285 -0.941 0.582 0.069 -0.811 0.579 0.37 -0.726 -0.652 0.631 -0.419 -0.381 0.823 -0.421 -0.414 0.62 -0.666 -0.8 0.105 -0.591 0.577 -0.523 -0.627 0.945 0.216 -0.247 0.793 -0.456 -0.403 0.929 -0.363 -0.069 -0.609 0.077 -0.79 -0.133 0.985 0.107 -0.02 -0.577 -0.816 -0.427 -0.898 0.109 0.138 0.748 -0.65 0.495 0.642 -0.586 -0.279 0.949 -0.148 0.632 0.712 -0.305 -0.997 0.027 0.066 0.63 0.069 -0.774 0.827 -0.549 0.123 -0.155 -0.988 0.003 0.726 -0.666 -0.168 -0.624 0.631 -0.461 0.601 -0.79 0.124 0.334 0.218 -0.917 0.661 0.75 -0.004 -0.828 -0.533 -0.175 0.446 -0.884 -0.141 -0.371 0.619 -0.692 -0.835 -0.51 0.209 0.614 -0.522 -0.591 -0.424 -0.883 -0.201 0.817 -0.455 -0.355 -0.96 0.217 -0.177 -0.557 0.077 -0.827 0.923 -0.363 -0.124 0.959 0.267 0.089 -0.126 0.985 0.117 0.025 0.955 -0.295 -0.419 -0.897 0.138 0.031 -0.576 -0.817 -0.155 -0.988 0.016 0.179 0.746 -0.641 -0.838 -0.533 -0.12 0.963 0.267 0.032 0.007 0.956 -0.294 0.839 0.379 0.391 -0.524 0.742 0.419 0.949 0.086 0.302

Supplementary information reference [1]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) , Jun. 2016, pp. 770–778. [2]

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. ICML, 2015, pp. 448–456. [3]

A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. 30th ICML, 2013, pp. 1–6. [4]