Complex Convolutional Neural Networks for Ultrasound Image Reconstruction from In-Phase/Quadrature Signal
Jingfeng Lu, Fabien Millioz, Damien Garcia, Sebastien Salles, Dong Ye, Denis Friboulet
11 Complex Convolutional Neural Networksfor Ultrasound Image Reconstructionfrom In-Phase/Quadrature Signal
Jingfeng Lu, Fabien Millioz, Damien Garcia, S´ebastien Salles, Dong Ye, and Denis Friboulet
Abstract —A wide variety of studies based on deep learninghave recently been investigated to improve ultrasound (US)imaging. Most of these approaches were performed on radiofrequency (RF) signals. However, inphase/quadrature (I/Q) digitalbeamformers (IQBF) are now widely used as low-cost strategies.In this work, we leveraged complex convolutional neural net-works (CCNNs) for reconstructing ultrasound images from I/Qsignals. We recently described a CNN architecture called ID-Net,which exploited an inception layer devoted to the reconstructionof RF diverging-wave (DW) ultrasound images. We derived in thiswork the complex equivalent of this network, i.e., the complexinception for DW network (CID-Net), operating on I/Q data. Weprovided experimental evidence that the CID-Net yields the sameimage quality as that obtained from the RF-trained CNNs; i.e.,by using only three I/Q images, the CID-Net produced high-quality images competing with those obtained by coherentlycompounding 31 RF images. Moreover, we showed that theCID-Net outperforms the straightforward architecture consistingin processing separately the real and imaginary parts of theI/Q signal, indicating thereby the importance of consistentlyprocessing the I/Q signals using a network that exploits thecomplex nature of such signal.
Index Terms —Deep learning, complex convolutional neuralnetworks (CCNNs), ultrasound imaging, diverging wave, imagereconstruction.
I. I
NTRODUCTION R ECONSTRUCTING high-quality ultrasound (US) im-ages for ultrafast imaging using deep learning techniqueshas recently raised a growing interest in the US community.Most of the existing studies operated on radio frequency(RF) signals [1]–[4], using real-valued convolutional neuralnetworks (CNNs). Nevertheless, it can be very advantageousto process and beamform the signals in the baseband, byusing digital I/Q beamformers, as this greatly reduces the size,power requirement, and therefore the cost of the front-end.This beamforming strategy might be preferred for compact andlow-cost ultrasound imaging [5]. Yet I/Q data rarely appearedin the pipeline of deep learning-based reconstruction . In [7],Khan et al. proposed to generate I/Q signals from time-delayed J. Lu is with Metislab, School of Instrumentation Science andEngineering, Harbin Institute of Technology, Harbin, China. (e-mail:[email protected]).F. Millioz, D. Garcia, S. Salles, and D. Friboulet are with the University ofLyon, CREATIS, CNRS UMR 5220, Inserm U1044, INSA-Lyon, Universityof Lyon 1, Villeurbanne, France.D. Ye is with School of Instrumentation Science and Engineering, HarbinInstitute of Technology, Harbin, China. (e-mail: [email protected]). As very recently mentioned in [6], it is interesting to note that the sameissue is currently raised for the inherently complex MRI raw data.
RF signals using a CNN with two output channels, thus didnot take advantage of I/Q signals as the source data. In [8],Vedula et al. proposed to improve multi-line transmission(MLT) quality by reconstructing images from I/Q data. Tworeal-valued CNNs were trained separately for the real andimaginary components of the I/Q data.Although the complex signals can be identified as anordered pair of real signals in a two-branch network struc-ture, each branch containing real and imaginary componentsrespectively, such representation does not take the nature ofcomplex calculations into account. As shown in [9] by Hiroseet al., a complex-valued model provides a more constrainedsystem than a model based on real parameters. Thus inthis paper, inspired by the study of Trabelsi et al. [10], wepropose to extend the deep learning-based reconstruction to thecomplex domain using complex convolutional neural networks(CCNNs).We present the complex inception for diverging wave (DW)network (CID-Net) for high-quality DW image reconstructionfrom I/Q data. CID-Net consists of the complex buildingcomponents introduced in [10], which allowed incorporatingcomplex numbers in general frameworks for training deepneural networks. Regarding the network architecture, CID-Net maintained the architecture of the inception for DWNetwork (ID-Net) [11], which has demonstrated the ability toreconstruct high-quality US images using RF data from DWacquisitions. We experimentally demonstrate that the CID-Net: i) yields the same image quality as that obtained fromthe RF-trained CNN; ii) outperformed the approach consistingin processing the real and imaginary parts of the I/Q signalseparately. II. M
ETHODS
Let X ∈ C m × w × h be a complex-valued tensor which repre-sents a limited number ( m ) of beamformed I/Q images fromsuccessive DW acquisitions, each yielding w I/Q signals oflength h . The DW image reconstruction is modeled as animage input-output problem where the objective is to estimatea high-quality image ˆ Y ∈ C w × h using the low-quality X . Wepropose to use the CID-Net with trainable complex-valuedparameters Θ to seek for the optimal reconstruction operator f ( Θ ) : C m × w × h (cid:55)→ C w × h , with respect to a high-quality targetimage Y ∈ C w × h obtained from the coherent compounding of n ( n (cid:29) m ) DW acquisitions. a r X i v : . [ ee ss . I V ] S e p A. CCNN Building Blocks1) Complex convolution:
We begin with the representationof complex convolutions, which are the basic building blocksof CID-Net. As depicted in Fig. 1, we used real-valued entitiesto represent the real and imaginary components of complexnumbers, and performed complex convolution using real-valued arithmetic. Let us define a complex-valued data tensor X = X r + jX i , where j = √− X r = Re ( X ) and X i = Im ( X ) are the real and imaginary compo-nents of X respectively. Likewise, we represented the complex-valued weight of a convolution kernel as W = W r + jW i .Convolution of W with X yields Z = W ∗ X = ( W i + jW i ) ∗ ( X r + jX i ) , (1)which reduces to the following when considering the distribu-tive property of convolution, Z = ( W r ∗ X r − W i ∗ X i ) + j ( W r ∗ X i + W i ∗ X r ) . (2)The representation can be reformulated in algebraic notationas (cid:20) Re ( W ∗ X ) Im ( W ∗ X ) (cid:21) = (cid:20) W r − W i W i W r (cid:21) ∗ (cid:20) X r X i (cid:21) . (3)Thus the mathematical relations between real and imaginarycomponents of data and convolution kernels were fully re-flected in this representation, in contrast to a two-brancharchitecture which would separately operate on the real andimaginary components.
2) Activation function:
The generalization of the mostcommon activation function, the rectified linear unit (ReLU),to the complex domain is far from being straightforward, asshown in [10] where three types of ReLU-based complexactivations were investigated. Since the goal of this workwas to build the complex equivalent of ID-Net with maxoutunits [12] used as the activation function, we focused on thedesign of a complex version of maxout activation. A real-valued maxout unit takes the pixel-wise maximum valuesacross several adjacent feature maps to achieve a nonlineartransformation. As it is unclear how to determine the maxvalue among complex numbers, the max operation must beredefined for the complex maxout unit. One simple solutionwas to apply maxout activation to real and imaginary featuresseparately. However, such activation shared the same drawbackas two-branch CNNs, i.e., dismissing the interaction betweenreal and imaginary channels. Thus we devised the amplitudemaxout (a-maxout) units for CID-Net. As illustrated in Fig.2, an a-maxout unit simultaneously activates both the realand imaginary elements corresponding to the element-wisemaximum values across the amplitude maps.
3) Complex differentiability:
Performing backpropagationin a complex-valued neural network implies differentiable lossfunctions and activations. One possibility would be to usefunctions that allow a complex derivative, that is holomorphicfunctions (i.e., satisfying Cauchy-Riemann condition) [10].Such a choice is however rather restrictive, and it has beenshown in [10] and [13] that such a restriction is not necessary. 𝑋 ! v 𝑋 ! 𝑋 " 𝑊 ! 𝑊 " ∗ 𝑊 ! ∗ 𝑋 ! − 𝑊 " ∗ 𝑋 " 𝑊 ! ∗ 𝑋 " + 𝑊 " ∗ 𝑋 ! 𝑊 ! 𝑊 ! 𝑊 " −𝑊 " 𝑋 " v v Fig. 1. Block diagram of complex convolutions. The orange, blue and greenblocks denote the convolution kernels W , input data X , and output data Z .Both the data and kernels were formed as the concatenation of two real-valuedtensors, each representing the real (denoted as solid blocks) and imaginary(denoted as dotted blocks) parts. 𝑋 ! 𝑍 " . . . 𝑋 " argmaxinput 𝑋 𝑍 ! . . . convolution output 𝑍 amplitude map 𝑍 amplitude maxout . . . complex conv. 𝑍 ! [argmax( 𝑍 )] 𝑍 " [argmax( 𝑍 )] Fig. 2. Block diagram of amplitude maxout (a-maxout) units. The solid anddotted blocks denote the real and imaginary parts of the complex data. Ana-maxout unit simultaneously activated both the real and imaginary elementscorresponding to the element-wise maximum values across the amplitudemaps.
Indeed, cost functions and activation functions that are differ-entiable with respect to the real and imaginary parts of eachparameter are also compatible with backpropagation.We used complex mean squared loss as the loss function L ( Θ ) = n n ∑ i = (cid:107) ˆ Y i − Y i (cid:107) . (4)As the loss function produced real-valued output, it was non-holomorphic and optimized by adopting backpropagation with TABLE IA
RCHITECTURE OF THE PROPOSED NETWORK layer type feature size kernel size nb. of activationkernelsinputs m × h × w - -convolution 32 × h × w × × h × w × × h × w × × h × w ×
11 4 a-maxout 449 ×
13 4 a-maxout 457 ×
15 4 a-maxout 465 ×
17 4 a-maxout 4convolution 1 × h × w × respect to the real and the imaginary parts of the convolutionweight. B. Network Architectures
The CID-Net architecture, provided in Table I, was derivedfrom the ID-Net recently proposed for RF DW reconstruction.CID-Net is a fully convolutional network with five complexconvolution layers, constructed using the complex convolutionblocks described in Section II-A. In particular, the secondlast layer is an inception layer, which is the concatenationof multi-scale convolution kernels. As demonstrated in [11],the inception layer used in conjunction with maxout activationallowed features from multiple receptive field sizes to becaptured, which helped to address the specific geometry ofDW.We also trained a two-branch ID-Net (abbreviated as 2BID-Net) model where each branch was trained separately on thereal and imaginary parts of the I/Q data, as well as an ID-Netmodel using RF data. The ID-Net, 2BID-Net, and CID-Netshared the same architecture except for the feature numbersin each layer. To have fair comparison, 2BID-Net and CID-Net had half as much feature numbers than ID-Net, since acomplex kernel and a two-branch kernel would double theparameter numbers compared with a conventional convolutionkernel. III. E
XPERIMENTS
A. Data Acquisition
A phased array probe (ALT P4-2, bandwidth: 2-4 MHz,center frequency: 3 MHz) was interfaced with a Verasonicssystem research scanner (Vantage 256) to perform steeredDW acquisitions. For each acquisition, 31 DWs were emittedwith tilt angles between ± ◦ and an incremental step of 2 ◦ .The DWs were transmitted at a pulse repetition frequency(PRF) of 1500 Hz. The probe was moved manually, onthe in-vitro or in-vivo surfaces, to generate a wide rangeof significantly different images for a proper training of thenetwork. The received raw data were I/Q demodulated, thendownsampled by a factor 2 and beamformed using a delay-and-sum (DAS) [14] to produce the beamformed I/Q data. From each acquisition, one target image Y was obtained bycompounding all n =
31 beamformed images, while a smallsubset of m = 3 beamformed images, corresponding to steeringangles (-20 ◦ , 0 ◦ , and 20 ◦ ), were used as the network input X .A total of 7500 ( X , Y ) samples (i.e., acquisition pairs) wereused in the experiment. Specifically, 1500 acquisitions wereperformed on in-vivo tissues (thigh muscle, finger phalanx,and liver regions), and 6000 acquisitions were performed ontwo in-vitro phantoms (Gammex, model 410SCG and CIRS,model 054GS). B. Network Training X , Y ) samples were randomly selected from the entiredata set as the training set, 1250 ( X , Y ) samples were usedas an independent validation set, and the remaining 1250( X , Y ) samples were used as the testing set for evaluation.Three models (i.e., 2BID-Net, CID-Net, and ID-Net) weretrained with the same training implementation described asfollows. The network weights were initialized with the Xavierinitializer [15]. The loss was minimized using mini-batchgradient descent with the Adam optimizer [16], and the batchsize was set to 16. The initial learning rate was set to 1 × − and an early stopping strategy was employed to adjust thelearning rate. The learning rate was halved if there had beenno decrease in the validation loss for 20 epochs, and 40 epochswithout validation loss reduction would end the training. Thetrainings were performed using Pytorch [17] library on anNVIDIA Tesla V100 GPU with 32 Gb of memory, resultingin training time of about two days.IV. R ESULTS
A. Performance of Proposed Method
Fig. 3 shows the representative test samples from in-vivo and in-vitro scans, displayed in B-Mode with a dynamic range of60 dB. It appears that the deep learning-based models [Fig. 3(second, third, and fourth columns)], produced better imagequality than that produced by coherent compounding withthe same three DWs. In particular, CID-Net (third column inFig. 3) and ID-Net (fourth column in Fig. 3) both showed asignificant improvement in contrast and anatomical structures,yielding images visually very close to the reference images(rightmost column in Fig. 3), while 2BID-Net (second columnin Fig. 3) showed a noticeable difference as compared with thereference images.We report in Table II the quantitative evaluation metricsto assess the performance: peak signal-to-noise ratio (PSNR),structural similarity index (SSIM) [18], mutual information(MI) [19], contrast ratio (CR), contrast-to-noise ratio (CNR),and lateral resolution (LR). PSNR, SSIM, and MI were com-puted from the full set of testing samples, showing the qualityof the reconstruction by comparing it to the reference images(i.e., images obtained through the standard compounding of31 DWs). The CR and CNR were measured on two anechoicregions (in the near field at 40-mm depth and the far field at120-mm depth) of the images shown in Fig. 3 (second row).The LR (i.e., full width at half maximum of the point spreadfunction) was measured on the isolated scatterers (in the near
31 DWs3 DWs
Fig. 3. B-mode images obtained using different methods. Top to bottom: in-vivo tissues from the thigh muscle; in-vitro tissues from the phantom (Gammex,model 410SCG); and in-vitro tissues from the phantom (CIRS, model 054GS). Left to right: compounding of 3 DWs; reconstruction of 2BID-Net (3 I/Qimages), CID-Net (3 I/Q images) and ID-Net (3 RF images); and compounding of 31 DWs (reference).TABLE IIE
VALUATION METRICS OF DIFFERENT METHODS ON THE TEST DATA .model PSNR [dB] SSIM MI CR [dB] CNR [dB] LR [mm]near field far field near field far field near field middle field far fieldcompounding (3 DWs) 29 . ± .
35 0 . ± .
13 0 . ± .
17 12.24 10.54 2.94 3.02 1.05 1.54 1.942BID-Net (3 I/Q images) 30 . ± .
53 0 . ± .
11 0 . ± .
24 18.94 17.28 7.14 4.89 1.07 1.78 2.15CID-Net (3 I/Q images) 31 . ± . . ± . . ± . . ± .
45 0 . ± .
05 0 . ± . compounding (31 DWs) − − − UMBER OF PARAMETERS , COMPOUNDING TIME , AND ATTAINABLEFRAME RATE OF ET , CID-N ET , AND
ID-N ET .model number of real compounding attainableparameters [million] time [ms] frame rate [fps]2BID-Net 1.7 0 . ± .
03 1280CID-Net 1.7 0 . ± .
03 1250ID-Net 1.7 . ± .
02 1330 field at 20- and 40-mm depth, the middle field at 60-mm depth,and the far field at 80-, 90-, and 100-mm depth) of the imagesshown in Fig. 3 (third row).Using the same three I/Q images, CID-Net yielded animprovement over 2BID-Net in terms of PSNR, SSIM, andMI metrics (gain of 1.08 dB, 0.05, and 0.07, respectively).In the same way, CID-Net produced better contrast measures as compared with 2BID-Net: the CR was higher in the nearand far field (gain of 2.57 dB and 0.96 dB, respectively) andthe same observation held for the CNR (gain of 0.96 dB and1.46 dB in the near and far field, respectively). LR associatedto CID-Net was also better than those provided by 2BID-Net(decrease of 0.02 mm, 0.11 mm, and 0.07 mm in the near,middle, and far field). Besides, CID-Net and ID-Net obtainedapproximately the same values in all evaluation metrics, whileproviding slightly lower values in CR and CNR, and lowervalues in LR than those associated to the references.
B. Computational Complexity and Speed
Table III gives for each model the number of parametersand the corresponding compounding time (i.e., inference timeexecuted on the platform described in Section III-B) andattainable frame rate. Since the number of real parametersof each network was equal, the computation times were veryclose, while CID-Net required a slightly higher compounding time, which is linked to the computation of the complexmaxout. As a result, the attainable frame rates were also closeto one another, with the CID-Net reaching 1250 fps, which wasslightly slower than 2BID-Net (1280 fps) and ID-Net (1330fps). V. C
ONCLUSION
In this paper, a methodology for reconstructing ultrasoundimages using a Complex CNN, CID-Net, was presented. Acompounding operator was learned to produce high-qualityimages from I/Q data obtained with a small number of DWtransmissions. Experiments were performed on real data from in-vitro and in-vivo scans. Experimental results showed thatthe proposed CID-Net offered the same image quality as theequivalent real CNN trained with RF data, and outperformedthe two-branch CNN architecture processing the real andimaginary parts of the I/Q signal separately. The proposedwork will promote the exploration of complex-CNN-basedapproaches for ultrasound imaging applications.R
EFERENCES[1] M. Gasse, F. Millioz, E. Roux, D. Garcia, H. Liebgott, and D. Friboulet,“High-quality plane wave compounding using convolutional neural net-works,”
IEEE transactions on ultrasonics, ferroelectrics, and frequencycontrol , vol. 64, no. 10, pp. 1637–1639, 2017.[2] Z. Zhou, Y. Wang, J. Yu, Y. Guo, W. Guo, and Y. Qi, “High spatial–temporal resolution reconstruction of plane-wave ultrasound images witha multichannel multiscale convolutional neural network,”
IEEE trans-actions on ultrasonics, ferroelectrics, and frequency control , vol. 65,no. 11, pp. 1983–1996, 2018.[3] Y. H. Yoon, S. Khan, J. Huh, and J. C. Ye, “Efficient b-mode ultrasoundimage reconstruction from sub-sampled rf data using deep learning,”
IEEE transactions on medical imaging , vol. 38, no. 2, pp. 325–336,2019.[4] B. Luijten, R. Cohen, F. J. De Bruijn, H. A. Schmeitz, M. Mischi, Y. C.Eldar, and R. J. Van Sloun, “Adaptive ultrasound beamforming usingdeep learning,”
IEEE Transactions on Medical Imaging , 2020.[5] K. Ranganathan, M. K. Santy, T. N. Blalock, J. A. Hossack, and W. F.Walker, “Direct sampled i/q beamforming for compact and very low-costultrasound imaging,”
IEEE transactions on ultrasonics, ferroelectrics,and frequency control , vol. 51, no. 9, pp. 1082–1094, 2004.[6] D. Liang, J. Cheng, Z. Ke, and L. Ying, “Deep magnetic resonanceimage reconstruction: Inverse problems meet neural networks,”
IEEESignal Processing Magazine , vol. 37, no. 1, pp. 141–151, 2020.[7] S. Khan, J. Huh, and J. C. Ye, “Deep learning-based universal beam-former for ultrasound imaging,” in
Medical Image Computing andComputer Assisted Intervention -MICCAI 2019 . Springer InternationalPublishing, 2019, pp. 619–627.[8] S. Vedula, O. Senouf, G. Zurakhov, A. Bronstein, M. Zibulevsky,O. Michailovich, D. Adam, and D. Gaitini, “High quality ultrasonicmulti-line transmission through deep learning,” in
International Work-shop on Machine Learning for Medical Image Reconstruction . Springer,2018, pp. 147–155.[9] A. Hirose, “Nature of complex number and complex-valued neuralnetworks,”
Frontiers of Electrical and Electronic Engineering in China ,vol. 6, no. 1, pp. 171–180, 2011.[10] C. Trabelsi, O. Bilaniuk, Y. Zhang, D. Serdyuk, S. Subramanian,J. F. Santos, S. Mehri, N. Rostamzadeh, Y. Bengio, and C. J. Pal,“Deep complex networks,” in
International Conference on LearningRepresentations , 2018.[11] J. Lu, F. Millioz, D. Garcia, S. Salles, W. Liu, and D. Friboulet,“Reconstruction for diverging-wave imaging using deep convolutionalneural networks,”
IEEE Transactions on Ultrasonics, Ferroelectrics, andFrequency Control , 2020.[12] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Ben-gio, “Maxout networks,”
Computer Science , pp. 1319–1327, 2013.[13] A. Hirose and S. Yoshida, “Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence,”
IEEE Transactions on Neural Networks and learning systems , vol. 23,no. 4, pp. 541–551, 2012. [14] V. Perrot, M. Polichetti, F. Varray, and D. Garcia, “So you think youcan das? a viewpoint on delay-and-sum beamforming,” arXiv preprintarXiv:2007.11960 , 2020.[15] X. Glorot and Y. Bengio, “Understanding the difficulty of trainingdeep feedforward neural networks,” in
Proceedings of the thirteenthinternational conference on artificial intelligence and statistics , 2010,pp. 249–256.[16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”in
Proceeding of International Conference on Learning Represent , 2015,pp. 1–41.[17] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiationin pytorch,” in
Advances in Neural Information Processing SystemsWorkshop , 2017.[18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: from error visibility to structural similarity,”
IEEEtransactions on image processing , vol. 13, no. 4, pp. 600–612, 2004.[19] D. Guo, S. Shamai, and S. Verd´u, “Mutual information and minimummean-square error in gaussian channels,”