B-spline Parameterized Joint Optimization of Reconstruction and K-space Trajectories (BJORK) for Accelerated 2D MRI
Guanhua Wang, Tianrui Luo, Jon-Fredrik Nielsen, Douglas C. Noll, Jeffrey A. Fessler
IIEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2021 1
B-spline Parameterized Joint Optimization ofReconstruction and K-space Trajectories(BJORK) for Accelerated 2D MRI
Guanhua Wang, Tianrui Luo, Jon-Fredrik Nielsen, Douglas C. Noll, and Jeffrey A. Fessler
Abstract — Optimizing k-space sampling trajectories isa challenging topic for fast magnetic resonance imaging(MRI). This work proposes to optimize a reconstruction al-gorithm and sampling trajectories jointly concerning imagereconstruction quality. We parameterize trajectories withquadratic B-spline kernels to reduce the number of param-eters and enable multi-scale optimization, which may helpto avoid sub-optimal local minima. The algorithm includesan efficient non-Cartesian unrolled neural network-basedreconstruction and an accurate approximation for back-propagation through the non-uniform fast Fourier trans-form (NUFFT) operator to accurately reconstruct and back-propagate multi-coil non-Cartesian data. Penalties on slewrate and gradient amplitude enforce hardware constraints.Sampling and reconstruction are trained jointly using largepublic datasets. To correct the potential eddy-current effectintroduced by the curved trajectory, we use a pencil-beamtrajectory mapping technique. In both simulations and in-vivo experiments, the learned trajectory demonstrates sig-nificantly improved image quality compared to previousmodel-based and learning-based trajectory optimizationmethods for 20 × acceleration factors. Though trained withneural network-based reconstruction, the proposed trajec-tory also leads to improved image quality with compressedsensing-based reconstruction. Index Terms — Magnetic resonance imaging, non-Cartesian sampling, deep learning, eddy-current effect,image reconstruction
I. I
NTRODUCTION
Magnetic Resonance Imaging (MRI) systems acquire rawdata in the frequency domain (k-space). Most scanning pro-tocols sample data points sequentially according to a pre-determined sampling pattern. The most common samplingpatterns are variants of Cartesian rasters and non-Cartesiantrajectories such as radial spokes [1] and spiral interleaves[2]. The local smoothness of these patterns facilitates toensure that they obey hardware limits, namely the maximumgradient and slew rate that constrain the speed and accelerationwhen traversing k-space. These patterns also make it easy toensure sufficient sampling densities. In recent years, hardwareimprovements, especially with the RF and gradient systems,
This work is supported in part by NIH Grants R01 EB023618 and U01EB026977, and NSF Grant IIS 1838179.G. Wang, T. Luo, J.-F. Nielsen, and D. C. Noll are with the Departmentof Biomedical Engineering, University of Michigan, Ann Arbor, MI 48109USA (e-mails: { guanhuaw, tianrluo, jfnielse, dnoll } @umich.edu).J. A. Fessler is with the Department of EECS, University of Michigan,Ann Arbor, MI 48109 USA (e-mail:[email protected]). enable more complex gradient waveform designs and samplingpatterns. For a given readout time, optimized designs cancover a broader and potentially more useful region in k-space,reducing the overall scanning time and/or improving imagequality, particularly when combined with multiple receivecoils.For fast imaging, many works focus on acceleration in thephase-encoding (PE) direction with fully sampled frequency-encoding (FE) lines [3]–[7]. Usually, there is enough time forthe ∆ k shift in the PE direction, so gradient and slew rateconstraints are readily satisfied. More general non-Cartesiantrajectory designs in 2D and 3D can further exploit the flex-ibility in the FE direction. However, in addition to hardwarephysical constraints, the system is affected by imperfectionssuch as the eddy currents that cause the actual trajectory todeviate from the nominal one and introduce undesired phasefluctuations in the acquired data [8]. Some studies optimizeproperties of existing trajectories such as the density of spiraltrajectories [9] or the rotation angle of radial trajectories [10].More complex waveforms, e.g., wave-like patterns [11], canprovide more uniform coverage of k-space and mitigate alias-ing artifacts. To accommodate the incoherence requirementsof compressed sensing based methods, [12], [13] introduceslight perturbations to existing trajectories, like radial or spiraltrajectories. Some works also explore genetic algorithms tosolve this non-convex constrained problem [14].The recent SPARKLING method [15] considers two criteriafor trajectory design: (1) the trajectory should match a pre-determined density according to a certain measure, and (2) thesampling points should be locally uniform to avoid clusters orgaps. The density and uniformity criteria are transformed into“attraction” and “repulsion” forces among the sampling points.The work uses fast multipole methods (FMM) [16] to effi-ciently calculate the interactions between points. Projection-based optimization handles the gradient and slew rate con-straints [17]. In-vivo and simulation experiments demonstratethat this approach reduces the level of aliasing artifacts for 2Dand 3D T2*-weighted imaging. However, in SPARKLING, thedensity is determined heuristically; determining the optimalsampling density for different protocols remains an openproblem. The work also does not consider some k-space signalcharacteristics such as conjugate symmetry. Furthermore, thepoint spread function (PSF) of the calculated trajectory forhigh under-sampling rates is not guaranteed to be optimalfor reconstruction algorithms like those based on convolution a r X i v : . [ ee ss . SP ] J a n IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2021 neural networks, because the reconstruction algorithm is notpart of the SPARKLING design process.With rapid advances in deep learning and auto-differentiation software, learning-based signal samplingstrategies are being investigated in multiple fields such asoptics and ultrasound [18], [19]. In MRI, the majority oflearning-based works have focused on sampling patterns ofphase encoding locations. Some studies formulate the on-gridsampling pattern as i.i.d samples from multivariate Bernoullidistribution [20], [21]. Since random sampling operationsare not differentiable, different surrogate gradients, suchas Gumbel-Softmax, are developed in these works. Ratherthan gradient descent, [22] uses a greedy search method.[23] further reduces the complexity of greedy search byusing Pareto optimization, an evolutionary algorithm forsparse regression [24]. Some works have used reinforcementlearning. For example, [25] and [26] adopted a double networksetting: one for reconstruction and the other generating asampling pattern, where the first work used Monte-Carlo TreeSearch (MCTS) and the second used Q-learning to optimizethe 1-D sub-sampling. Instead of using an end-to-end CNN asthe reconstruction algorithm in other works, [27] constructs adifferentiable compressed sensing reconstruction framework.[28] used an unrolled neural network as the reconstructionalgorithm.To our knowledge, PILOT [29] is the first work to optimizea 2D non-Cartesian trajectory and an image reconstructionmethod jointly. The training loss is the reconstruction errorsince the ultimate goal of trajectory optimization is highimage quality. The trained parameters were the locations ofsampling points and the weights of the reconstruction neuralnetwork. Large datasets and stochastic gradient descent wereused to optimize the parameters. To meet the hardware limits,a penalty was applied on the gradient and slew rate. Since thereconstruction involves non-Cartesian data, PILOT uses a (bi-linear, hence differentiable) gridding reconstruction algorithmto map the k-space data into the image domain, followed bya U-net [30] to refine the gridded image data. Simulationexperiments report encouraging results compared to ordinarytrajectories. Nevertheless, the algorithm often gets stuck insub-optimal local minima where the initial trajectory is onlyslightly perturbed yet the slew rate rapidly oscillates. Toreduce the effect of initialization, the paper uses a random-ized initialization algorithm based on the traveling salesmanproblem (TSP). However, this initialization approach worksonly with single-shot long TE sequences, limiting its utilityin many clinical applications. The implementation in [29]relies on auto-differentiation to calculate the Jacobian of thenon-uniform Fourier transform; here we adopt a new NUFFTJacobian approximation that is faster and more accurate.To overcome the limitations of previous methods and fur-ther expand their possible applications, this paper proposesan improved supervised learning workflow called B -splineparameterized J oint O ptimization of R econstruction and K -space trajectory ( BJORK ). Our main contributions include thefollowing. (1) We propose to parameterize the trajectories withquadratic B-spline kernels. This reparameterization reducesthe number of parameters and enables multilevel optimiza- tion, enabling non-local improvements to the initial trajectory.Moreover, the local smoothness of B-spline kernels avoidsrapid slew rate oscillations. (2) We adopt an unrolled neuralnetwork reconstruction method for non-Cartesian samplingpatterns. Compared to the end-to-end model implemented inprevious works, the proposed approach combines the strengthof learning-based and model-based reconstruction, improvingthe effect of both reconstruction and trajectory learning. (3)We derive a more accurate and efficient approximation of theNUFFT Jacobian matrix. (See Supplementary Materials forderivation and validation.) (4) In addition to a simulation ex-periment, we also conducted phantom and in-vivo experimentswith protocols that differ from the training dataset to evaluatethe generalizability and applicability of the model. (5) Weused a k-space mapping technique to correct potential eddycurrent-related artifacts. (6) Compared with SPARKLING, theproposed learning-based approach does not need to assumesignal characteristics such as spectrum energy density. Instead,the required sampling trajectories are learned from a large dataset in a supervised manner.The remaining materials are organized as follows. Section IIdetails the proposed method. Section III describes experimentsettings and control methods. Sections IV and V present anddiscuss the results.
II. M
ETHODS
This section describes the proposed approach for supervisedlearning of the sampling trajectory and image reconstructionmethod.
A. Problem formulation
Fig. 1 shows the overall workflow of the proposed approach.The goal is to optimize ω ∈ R N s × N d , a trainable (possiblymulti-shot) sampling pattern, and θ ∈ R M , the M parametersof the image reconstruction method, where N s denotes thetotal number of k-space samples, and N d denotes the imagedimensionality. (The results are for N d = 2 , i.e., 2D images,but the method is general.)The training loss for jointly optimizing the parameters is asfollows: arg min ω ∈ R N s × N d , θ ∈ R M E x (cid:107) f θ ( ω ; A ( ω ) x + ε ) − x (cid:107) (1)s.t. (cid:107) D ω [ d ] (cid:107) ∞ ≤ γ ∆ tg max , (cid:107) D ω [ d ] (cid:107) ∞ ≤ γ ∆ t s max , d = 1 , . . . , N d , where x ∈ C N v is a fully sampled reference image (batch)having N v voxels from the training data set and ε is simulatedadditive complex Gaussian noise. A ∈ C N s N c × N v denotes thesystem matrix of MR imaging, where N c denotes the numberof receiver coils. For multi-coil non-Cartesian acquisition, it isa non-Cartesian SENSE operator [31] that applies a pointwisemultiplication of the sensitivity maps followed by a NUFFToperator (currently we do not consider field inhomogeneity). f θ ( ω ; · ) denotes an image reconstruction algorithm with pa-rameters θ that is applied to simulated under-sampled data A ( ω ) x + ε . As detailed in subsection II-C, we use an unrolled ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 3
Fig. 1. Diagram for the proposed approach. To optimize the sampling trajectory and the reconstruction algorithm jointly using the stochasticgradient descent (SGD)-type method, we construct a differentiable forward MRI system matrix A ( ω ) that simulates k-space data w.r.t. trajectory ω from ground truth images, and an unrolled neural network that reconstructs the simulated data. The reconstruction errors compared with the groundtruth are used as the training loss to update learnable parameters (the trajectory ω and the network’s parameters θ ). neural network. The reconstruction loss (cid:107) · (cid:107) is a combined (cid:96) and square of (cid:96) norm. D and D are the first-order andsecond-order finite difference operators. ∆ t is the dwell timeand γ denotes the gyromagnetic ratio. For multi-shot imaging,the difference operator applies to each shot individually. Theconstraints stand for maximum gradient field strength ( g max )and slew rate ( s max ). To use the stochastic gradient descent(SGD) method, we convert the box constraint into a penaltyfunction φ , which in our case is a soft-thresholding function,leading to our final optimization problem as follows: arg min ω ∈ C N s × N d , θ ∈ R M E x (cid:107) f θ , ω ( ω ; A ( ω ) x + ε ) − x (cid:107) (2) + φ γ ∆ tg max ( | D ω | )+ φ γ ∆ t s max ( | D ω | ) . We update θ and ω simultaneously for each mini-batch oftraining data. B. Parameterization and multi-level optimization
We parameterize the sampling pattern with 2nd-orderquadratic B-spline kernels: ω [ d ] = Bc [ d ] , (3)where B ∈ R N s × L denotes the interpolation matrix, and c [ d ] denotes the d th column of the coefficient matrix c ∈ R L × N d . L denotes the length of c [ d ] , or the number of interpolation ker-nels in each dimension. Using B-spline kernels to parameterizethe trajectory reduces the number of individual inequalityconstraints from N d N s to N d L [32], where typically L (cid:28) N s .As shown in previous works [29], the optimized trajectoriesare often local minima near the initialization, only slightlyperturbing the initial trajectory. We propose to use a multileveltraining strategy to improve the optimization process [33]. Thedecimation rate N s /L is initialized with a large value (like 64). Thus, many neighboring sample points are controlled by thesame coefficient. After both c and θ converge, we reduce thedecimation rate, typically by a factor of , and begin a newround of training initialized with ω and θ of the previousround. C. Reconstruction
In the joint learning model, we adopted a model-basedunrolled neural network (UNN) approach to image recon-struction [34]–[37]. Compared to the previous joint learningmodel (PILOT) that used a single end-to-end network [29], anunrolled network can lead to a more accurate reconstruction[34].A typical cost function for regularized MR image recon-struction has the form: ˆ x = arg min x (cid:107) Ax − y (cid:107) + R ( x ) . (4)The first term is usually called the data-consistency term thatensures the reconstructed image is consistent with the acquiredk-space data y . (In the training phase, A ( ω ) x + ε is thesimulated y .) The regularization term R ( · ) is designed tocontrol aliasing and noise when the data is under-sampled.By introducing an auxiliary variable z , we replace (4) withthe following alternative: ˆ x = arg min x min z (cid:107) Ax − y (cid:107) + R ( z ) + µ (cid:107) x − z (cid:107) , (5)where µ > is a penalty parameter. Using an alternatingminimization approach, the optimization updates become: x i +1 = arg min x (cid:107) Ax − y (cid:107) + µ (cid:107) x − z i (cid:107) , (6) z i +1 = arg min z R ( z ) + µ (cid:107) x i +1 − z (cid:107) . (7)The analytical solution for the x update is x i +1 = ( A (cid:48) A + µ I ) − ( A (cid:48) y + µ z i ) , IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2021 but the direct inverse is useful only for single-coil Cartesiansampling. Instead, we use a few iterations of the conjugategradient (CG) method for the x update, applying the Toeplitzembedding technique to accelerate computation of A (cid:48) A [38].For a mathematically defined regularizer, the z updatewould be a proximal operator. Here we follow previous work[39] and use a CNN-based denoiser z i +1 = D θ ( x i +1 ) . Tominimize memory usage and avoid over-fitting, we used thesame θ across iterations, though iteration-specific networksmay improve the reconstruction result [37].For the CNN-based denoiser, we used the Deep IterativeDown-Up CNN (DIDN) [40]. As a state-of-art model forimage denoising, the DIDN model uses less memory thanpopular models like U-net [30] with improved reconstructionresults. In our experiments, it also led to faster convergenceof training than previous denoising networks.Since neural networks are sensitive to the scale of the input,a good and consistent initial estimate of x is important. Weused the following quadratic roughness penalty approach tocompute an initial image estimate: x = arg min x (cid:107) Ax − y (cid:107) + λ (cid:107) Rx (cid:107) (8) = ( A (cid:48) A + λ R (cid:48) R ) − A (cid:48) y , where R denotes the N d -dimensional first-order finite differ-ence (roughness) operator. We also used the CG method to(approximately) solve this quadratic minimization problem. D. Correction of eddy-current effect
Rapidly changing gradient waveforms may suffer fromeddy-current effects, even with shielded coils. This hardwareimperfection requires additional measurements and correctionsso that the actual sampling trajectory is used for reconstructingreal MRI data. Some previous works used a field probe andcorresponding gradient impulse-response (GIRF) model [41].In this work, we adopted the ‘k-space mapping’ method [8],[42] that does not require additional hardware. Rather thanmapping the k x and k y separately as in previous papers,we excite a pencil-beam region using one ◦ flip and asubsequent ◦ spin-echo pulse [43]. We averaged multiplerepetitions to estimate the actual acquisition trajectory. Wealso subtracted a zero-order eddy current phase term from theacquired data [8]. III. E
XPERIMENTS
A. Comparison with prior art
We compared the proposed BJORK approach with theSPARKLING method for trajectory design in all experiments .The initial trajectories were identical for both BJORK andSPARKLING methods. SPARKLING used the default multi-level optimization strategy and parameter settings, as detailedin [44].Both BJORK and PILOT [29] are methods for joint sam-pling design and reconstruction optimization. We comparedthree key differences between the three methods individually: The code will be available on Github if accepted. (1) the accuracy of the NUFFT Jacobian matrices, as discussedin the Supplementary Materials. (2) the reconstruction methodinvolved. Our BJORK approach uses an unrolled neural net-work, while PILOT uses a single end-to-end reconstructionneural network in the image domain. (3) the effect of trajectoryparameterization . B. Image quality evaluation
To evaluate the reconstruction quality provided by differenttrajectories, we used two types of reconstruction methods inthe test phase: unrolled neural network (UNN) (with learned θ ) and compressed sensing (sparsity regularization for anorthogonal wavelet transform). For SPARKLING and standardundersampled trajectories, the unrolled neural networks θ forreconstruction were trained with the same hyper-parametersas the proposed approach. The only difference is that BJORKlearns a trajectory simultaneously whereas SPARKLING usesa trajectory based on a specified sampling density. We alsoused compressed sensing-based reconstruction to test the gen-eralizability of different kinds of trajectories. The penalty func-tion is the (cid:96) norm of the orthogonal wavelet transform with aDaubechies 4 wavelet. The ratio between the penalty term andthe data-fidelity term is 1e-7. We used the SigPy package andits default primal-dual hybrid gradient (PDHG) algorithm. Thisstudy used two evaluation metrics: the structural similaritymetric (SSIM) and peak signal-to-noise ratio (PSNR) [45]. C. Trajectories
To demonstrate the proposed model’s adaptability, we inves-tigated two types of trajectory initializations: an undersampledin-out radial trajectory with a shorter readout time ( ≈ . ms)and an undersampled spiral trajectory with a longer readouttime ( ≈ ms). For the radial initialization, the number ofspokes is 16, and each spoke has 640 points of acquisition.The rotation angle is equidistant between − π/ and π/ . Forthe spiral initialization, the number of shots is 8, and each leafhas around 4000 points. We used the variable-density spiraldesign package from [46].For both simulation and real acquisition, the dwell time forboth waveforms and data sampling is set to 4 usec, with a field-of-view (FOV) of 22 cm. The maximum gradient strength is5 Gauss/cm, and the maximum slew rate is 15 Gauss/cm/ms. D. Network training and hyper-parameter setting
The simulation experiments used the NYU fastMRI braindataset to train the trajectories and neural networks [47]. Thedataset consists of multiple contrasts, including T1w, T2w, andFLAIR. FastMRI’s knee subset was also used in a separatetraining run to investigate the influence of training data onlearned sampling patterns. The central × region wascropped to exclude the noise in the air background. Sensitivitymaps were estimated using the ESPIRiT method [48] with The latest version of PILOT on arXiv [29, version 4] also uses trajectoryparameterization and multi-level optimization, focusing on long readout cases. https://github.com/mikgroup/sigpy https://mrsrl.stanford.edu/˜brian/vdspiral/ ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 5
TABLE IP
ROTOCOLS FOR DATA ACQUISITION
Protocols for the phantom experiment:Name
FOV(cm) dz(mm) Gap(mm) TR(ms) TE(ms) FA Acqs dt(us) Time
Radial-like 22*22*5 2 0.5 13.8 2.24 15° 16*640 4 0:05Radial-full 22*22*5 2 0.5 13.8 2.24 15° 320*640 4 1:30Protocols for the in-vivo experiment:Name
FOV(cm) dz(mm) Gap(mm) TR(ms) TE(ms) FA Acqs dt(us) Time
Radial-like 22*22*4 2 0.5 318.4 3.56 90° 16*1280 4 0:05Radial-full 22*22*4 2 0.5 318.4 3.56 90° 320*1280 4 1:40dz: slice thickness; Gap: gap between slices; Acqs: number of shots * readout points; FA: flip angle the central 24 phase-encoding lines, and the correspondingconjugate phase reconstruction was regarded as the groundtruth x .The batchsize was 4. The number of blocks, or the numberof outer iterations for the unrolled neural network was 6. Theweight µ in (5) could also be learned, but this operation willdouble the computation load with minor improvement. In thecurrent setting, µ was set to 2. The number of training epochswas set to 10 for each level of B-spline kernel length. We used4 levels of optimization, and the total number of epochs was40. For training the network with existing trajectories (radial,spiral, and SPARKLING), we also used 40 training epochs. Weused the Adam optimizer [49], with parameter β = [0 . , . ,for both trajectories ω and network parameters θ . The learningrate linearly decayed from 1e-3 to 0 for the trajectory update,and from 1e-5 to 0 for the network update. We did not observeobvious over-fitting phenomena on the validation set. Fig. 2. PSFs of different sampling patterns. The plots show themagnitude of the row across the zero point.
E. Phantom and in-vivo Experiments
Table I details the scanning protocols of the RF-spoiled,gradient echo (GRE) sequences used. For in-vivo acquisitions,
Fig. 3. The dash-dot line shows the ◦ rotated BJORK trajectory.The original and rotated trajectory have little overlap, suggesting that theBJORK automatically learned a sampling pattern that exploits k-spacesymmetry. a fat-saturation pulse was applied before the tip-down RFpulse. For radial-like sequences, we tested a GRE sequencewith 3 different readout trajectories: standard (golden-angle)undersampled radial, BJORK initialized with undersampledradial, and SPARKLING initialized with undersampled radial.Radial-full means the fully sampled radial trajectory. We alsoacquired an additional dual-echo Cartesian GRE image, forgenerating the sensitive map and (potentially) B0 map. Thesensitivity maps are generated by ESPIRiT [48] methods. Thesequences were programmed with TOPPE [43], and imple-mented on a GE MR750 3.0T scanner with a Nova Medical32 channel Rx head coil. For in-vivo imaging, to increase theSNR of acquired signal, the trajectory had a longer readout( ≈ ms, 1280 sampling points), trained by the same procedureas in subsection III-D. Subjects gave informed consent underlocal IRB approval. For phantom experiments, we used a waterphantom with 3 internal cylinders.The k-space mapping was implemented on a water phantom.The thickness of the pencil-beam was mm × mm. The tra-jectory estimates were based on an average of 30 repetitions. IV. R
ESULTS
1) Quantitative results:
The test set includes 950 slices.Table II shows the quantitative results (SSIM and PSNR). Theproposed method has significant improvement compared withun-optimized trajectories (
P < . ). It also has improvedreconstruction quality compared with SPARKLING when con-sidering unrolled neural network-based reconstruction. For IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2021
CS-based reconstruction of radial-like sequences, the resultsfor the SPARKLING trajectory were slightly better than thosefrom the BJORK-optimized trajectory (that was not tunedfor CS-based reconstruction). The reason is that CS-basedreconstruction using the BJORK trajectory may have somestreak artifacts that can be resolved by the neural network-based reconstruction, but that poses a challenge to CS-basedreconstruction. Supplementary Materials include several exam-ples of the UNN-based reconstruction of different trajectories.Compared to undersampled radial trajectory or SPARKLINGtrajectory, the proposed method has a better restoration ofdetails and lower levels of artifacts.Fig. 2 displays point spread functions of radial-like tra-jectories. The BJORK PSF has a narrower central-lobe thanSPARKLING and much fewer streak artifacts than standardradial. Fig. 3 shows the conjugate symmetry relationshipimplicitly learned in the BJORK trajectory.
TABLE IIQ
UANTITATIVE RESULTS FOR SIMULATION EXPERIMENTS
SSIM: Standard SPARKLING BJORKradial-like UNN 0.958 0.963
CS 0.911
CS 0.958 0.924
PSNR (in dB): Standard SPARKLING BJORKradial-like UNN 35.5 36.3
CS 33.0
CS 38.5 34.7
2) Multi-level optimization:
Fig. 4 shows the evolution ofsampling patterns using our proposed multi-level optimization.Different widths of the B-spline kernels introduce differentlevels of improvement as the acquisition is optimized. Alsoshown are the results of multi-level optimization and a non-parametric trajectory as used in the first PILOT paper [29,version 1]. Directly optimizing sampling points seems only tointroduce a small perturbation to the initialization.
3) Effect of training set:
Fig. 5 shows radial-initialized tra-jectories trained by BJORK with brain and knee datasets.Different trajectories are learned from different datasets. Wehypothesize that the difference is related to frequency distri-bution of energy, as well as the noise-level, which requiresfurther study.
4) Effect of reconstruction methods:
To test the influence ofreconstruction methods on trajectory optimization, we tried asingle image-domain refinement network as the reconstruc-tion method in the joint learning model, similar to PILOT’sapproach. Quadratic roughness penalty reconstruction in (8)still is the network’s input. The initialization of the samplingpattern is an undersampled radial trajectory. The proposed re-construction method (unrolled neural network, UNN) improvesreconstruction quality compared to a single end-to-end model,as shown in Table III. Such improvements are consistent withother comparisons between UNN methods and image-domainCNN methods using non-learned sampling patterns [34], [35],[37].
TABLE IIIE
FFECT OF DIFFERENT RECONSTRUCTION NETWORKS INVOLVED INTHE JOINT LEARNING MODEL
SSIM PSNR(dB)UNN
Single U-net 0.933 32.7
5) Phantom and in-vivo Experiments:
Fig. 6 shows waterphantom results for different reconstruction algorithms. Therightmost column is the fully-sampled ground truth (Radial-full). Note that the unrolled neural network (UNN) herewas trained with fastMRI brain dataset, and did not receivefine-tuning on the phantom dataset. The BJORK optimizedtrajectory leads to fewer artifacts and improved contrast forthe UNN-based reconstruction.Fig. 7 showcases one slice from the in-vivo experiment. ForCS-based reconstruction, the undersampled radial trajectoryexhibits stronger streak artifacts. The SPARKLING image hashigher levels of ringing artifacts. CS reconstruction with theBJORK sampling pattern has less obvious artifacts but thetissue contrast is not as good as SPARKLING. For UNN-basedreconstruction, all trajectories’ results show reductions ofartifacts compared to CS-based reconstruction. The proposedmethod restores most of the structures and fine details, withminimal remaining artifacts. The contrast between grey andwhite matter seems suboptimal for all three methods. Thereason may be that the acceleration ratio is too high (20 × ) forrecovering high-frequency details. Also, the short acquisitiontime ( ≈ ms) and the GRE signal characteristics limit theSNR of the acquisition.Supplementary Materials contain examples of reconstruc-tion results before/after correction, and the measurement ofactual trajectories. V. D
ISCUSSION
This paper proposes an efficient learning-based frameworkfor joint design of MRI sampling trajectories and reconstruc-tion parameters. Defining an appropriate objective function fortrajectory optimization is an open question. We circumventedthis long-lasting problem by directly using the reconstructionquality as the training loss function in a supervised manner.The workflow includes a differentiable reconstruction algo-rithm for which the learning process obtains an intermedi-ate gradient w.r.t. the reconstruction loss. However, solelydepending on stochastic gradient descent cannot guaranteeoptimal results for this non-convex problem. To improve thetraining effect, we adopted several tricks, including trajectoryparameterization, multi-level training, warm initialization ofthe reconstruction network, and an accurate approximation ofNUFFT’s Jacobian. Results show that these approaches canstabilize the training and avoid sub-optimal local minima, orat least provide better local minima than previous methods.We constructed an unrolled neural network-based recon-struction for non-Cartesian MRI data. In previous work, asingle end-to-end network approach cannot efficiently re-move aliasing artifacts. Additionally, the k-space “hard” data-consistency trick for data fidelity [50], [51] is inapplicable
ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 7
Fig. 4. The evolution of the learned trajectories. Decim means N s /L in (3) . Nonparametric means the locations of each sampling points areindependent trainable variables, rather than being parameterized by quadratic B-spline kernels. The rightmost zoomed-in set shows the very smallperturbations produced by the nonparametric approach.Fig. 5. Trajectories learned from different datasets. for non-Cartesian sampling. An unrolled algorithm can reacha balance between data fidelity and the de-aliasing effectacross multiple iterations. For 3D trajectory design using theproposed approach, the unrolled method’s memory consump-tion can be very large. More memory-efficient reconstructionmodels, such as the memory efficient network [52] can beexplored in further study.For learning-based medical imaging algorithms, one mainobstacle towards clinical application is the gap between sim-ulation and the physical world. Some factors include thefollowing.First, inconsistency exists between the training datasetsand real-world acquisition, such as different vendors andprotocols, posing a challenge to reconstruction algorithms’robustness and generalizability. Our training dataset consistedof T1w/T2w/FLAIR Fast Spin-Echo (FSE or TSE) sequences,acquired on Siemens 1.5T/3.0T scanners. The number ofreceiver channels includes 4, 8, and 16, etc. We conducted thein-vivo/phantom experiment on a 3.0T GE scanner equippedwith a 32-channel coil. The sequence is a T1w GRE sequencethat has lower SNR compared to FSE sequences in the trainingset. Despite the very large differences with the training set, ourwork still demonstrated improved and robust results in the in-vivo and phantom experiment. We hypothesize that severalfactors could contribute to the results: (1) the reconstructionnetwork uses the quadratic roughness penalized reconstructionas the initialization, normalized by the median value. Previousworks typically use the adjoint reconstruction as the inputof the network. In comparison, our regularized initializationhelps provide consistency between different protocols, withouttoo much compromise of the computation time/complexity, (2) the PSF of the learned trajectory has a compact central-lobe, without large streak artifacts. Thus the reconstructionis basically a de-blurring/denoising task that is a local low-level problem and thus may require less training data than de-aliasing problems. For de-blurring of natural images, networksare usually adaptive to different noise levels and color spaces,and require small cohorts of data [53], [54]. In contrast, aCNN applied to an adjoint reconstruction must remove globalaliasing artifacts, such as the streak and ringing artifacts fortrajectories like radial and SPARKLING. The dynamics behindthe neural network’s ability to resolve such artifacts is still anunsolved question, and the training may require a large amountof diverse data.Secondly, it not easy to simulate system imperfections likeeddy currents and off-resonance in the training phase. Theseimperfections can greatly affect image quality in practice.We used a trajectory measurement method to correct forthe eddy-current effect. Future work will incorporate fieldinhomogeneity into the workflow.Furthermore, even though the BJORK sampling was opti-mized for a UNN reconstruction method, the results in Fig. 6and Fig. 7 suggest that the learned trajectory is also usefulwith a CS-based reconstruction method or other model-basedreconstruction algorithms. This approach can still noticeablyimprove the image quality by simply replacing the readoutwaveform in the existing workflow, promoting the applicabilityof the proposed approach. We plan to apply the generalframework to optimize a trajectory for (convex) CS-basedreconstruction and compare to the (non-convex) open-loopUNN approach in future work.Though the proposed trajectory is learned via a data-drivenapproach, it can also reflect the ideas behind SPARKLING andPoisson disk sampling: sampling patterns having large gaps ortight clusters of points are inefficient, and the sampling pointsshould be somewhat evenly distributed (but not too uniform).Furthermore, our method appeared to have learned some latentcharacteristics, like the conjugate symmetry for these spin-echo training datasets. To combine both methods’ strength, apromising future direction is to use SPARKLING as a primedinitialization of BJORK.The network is trained on a big public data set. In the experi-ment, knee imaging and brain imaging led to different learnedtrajectories. This demonstrates that the data set can greatlyinfluence the optimization results. We also implemented acomplementary experiment on a smaller training set (resultsnot shown). We found that a small subset (3000 slices) also IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. XX, NO. XX, XXXX 2021
Fig. 6. Representative results of the phantom experiment using CS-based and UNN-based reconstruction algorithms. The sequences involved wereradial-like GRE (detailed in Table I). The readout length was 2.56 ms, and we used 16/320 spokes for undersampled (Radial-Under, SPARKLING,BJORK) trajectories and the fully-sampled radial trajectory.Fig. 7. Results of the in-vivo experiment. The trajectories were also radial-like (detailed in Table I). The readout time was 5.12 ms. The number ofshots for undersampled trajectories was 16, and for the fully-sampled radial trajectory is 320 ( × acceleration). The FOV was 22cm. ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 9 led to similar learned trajectories. Therefore, for some organswhere a sizeable dataset is not publicly available, this approachmay still work with small-scale private datasets. To examinethe influence of scanner models, field strength, and sequences,following studies should investigate more diversity in datasets.The eddy-current effect poses a long-term problem for non-Cartesian trajectories and impedes their widespread clinicalusage. This work used a simple k-space mapping techniqueas the correction method. The downside of this method isits long calibration time, although it can be performed in thescanner’s idle time. This method is waveform-specific, whichmeans that correction should be done for different trajectories.Other methods relying on field probes can get a more accuratecorrection with less time. However, this approach demandsdedicated hardware. In a future study, the eddy current-relatedartifacts could be simulated according to the GIRF model inthe training phase, so that the trajectory is learned to be robustagainst the eddy current effect.Aside from practical challenges with GPU memory, the gen-eral approach described here is readily extended from 2D to3D sampling trajectories. A more challenging future directionis to extend the work to dynamic imaging applications likefMRI and cardiac imaging, where both the sampling patternand the reconstruction method should exploit redundancies inthe time dimension, e.g., using low-rank models [55]. A CKNOWLEDGMENT
The authors thank Dr. Melissa Haskell and Naveen Murthyfrom University of Michigan for helpful advice and fruitfuldiscussions. R EFERENCES [1] P. C. Lauterbur, “Image formation by induced local interactions: exam-ples employing nuclear magnetic resonance,”
Nature , vol. 242, no. 5394,pp. 190–191, 1973.[2] C. B. Ahn, J. H. Kim, and Z. H. Cho, “High-speed spiral-scan echoplanar NMR imaging-I,”
IEEE Trans. Med. Imag. , vol. 5, no. 1, pp.2–7, Mar. 1986.[3] D. J. Larkman and R. G. Nunes, “Parallel magnetic resonance imaging,”
Phys. Med. Biol. , vol. 52, no. 7, pp. R15–R55, Mar. 2007.[4] Z. Wang and G. R. Arce, “Variable density compressed image sampling,”
IEEE Trans. Image Proc. , vol. 19, no. 1, pp. 264–270, Jan. 2010.[5] F. Knoll, C. Clason, C. Diwoky, and R. Stollberger, “Adapted randomsampling patterns for accelerated MRI,”
Magn. Reson. Mater. Phys. Biol.Med. , vol. 24, no. 1, pp. 43–50, Feb. 2011.[6] M. Seeger, H. Nickisch, R. Pohmann, and B. Sch¨olkopf, “Optimizationof k-space trajectories for compressed sensing by Bayesian experimentaldesign,”
Magn. Reson. Med. , vol. 63, no. 1, pp. 116–126, 2010.[7] N. Chauffert, P. Ciuciu, and P. Weiss, “Variable density compressedsensing in MRI. Theoretical vs heuristic sampling strategies,” in , Apr. 2013, pp. 298–301.[8] R. K. Robison, Z. Li, D. Wang, M. B. Ooi, and J. G. Pipe, “Correctionof B0 eddy current effects in spiral MRI,”
Magn. Reson. Med. , vol. 81,no. 4, pp. 2501–2513, 2019.[9] J. H. Lee, B. A. Hargreaves, B. S. Hu, and D. G. Nishimura, “Fast3D imaging using variable-density spiral trajectories with applicationsto limb perfusion,”
Magn. Reson. Med. , vol. 50, no. 6, pp. 1276–1285,2003.[10] S. Winkelmann, T. Schaeffter, T. Koehler, H. Eggers, and O. Doessel,“An optimal radial profile order based on the golden ratio for time-resolved MRI,”
IEEE Trans. Med. Imag. , vol. 26, no. 1, pp. 68–76, Jan.2007.[11] B. Bilgic et al. , “Wave-CAIPI for highly accelerated 3D imaging,”
Magn.Reson. Med. , vol. 73, no. 6, pp. 2152–2162, 2015. [12] A. Bilgin, T. Troouard, A. Gmitro, and M. Altbach, “Randomly per-turbed radial trajectories for compressed sensing MRI,” in
Proc. Intl.Soc. Magn. Reson. Med. (ISMRM) , vol. 16, 2008, p. 3152.[13] M. Lustig, S. Kim, and J. M. Pauly, “A fast method for designing time-optimal gradient waveforms for arbitrary k-space trajectories,”
IEEETrans. Med. Imag. , vol. 27, no. 6, pp. 866–873, Jun. 2008.[14] S. Sabat, R. Mir, M. Guarini, A. Guesalaga, and P. Irarrazaval, “Threedimensional k-space trajectory design using genetic algorithms,”
Magn.Reson. Imag. , vol. 21, no. 7, pp. 755–764, Sep. 2003.[15] C. Lazarus et al. , “SPARKLING: variable-density k-space filling curvesfor accelerated T2*-weighted MRI,”
Mag. Res. Med. , vol. 81, no. 6, pp.3643–61, Jun. 2019.[16] W. Fong and E. Darve, “The black-box fast multipole method,”
J.Comput. Phys. , vol. 228, no. 23, pp. 8712–8725, Dec. 2009.[17] N. Chauffert, P. Weiss, J. Kahn, and P. Ciuciu, “A projection algorithmfor gradient waveforms design in magnetic resonance imaging,”
IEEETrans. Med. Imag. , vol. 35, no. 9, pp. 2026–2039, Sep. 2016.[18] S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperturefor the benefit of depth of field extension,”
Opt. Exp. , vol. 26, no. 12,pp. 15 316–15 331, Jun. 2018.[19] I. A. M. Huijben, B. S. Veeling, K. Janse, M. Mischi, and R. J. G. vanSloun, “Learning sub-sampling and signal recovery with applicationsin ultrasound imaging,”
IEEE Trans. Med. Imag. , vol. 39, no. 12, pp.3955–3966, Dec. 2020.[20] C. D. Bahadir, A. Q. Wang, A. V. Dalca, and M. R. Sabuncu, “Deep-learning-based optimization of the under-sampling pattern in MRI,”
IEEE Trans. Comput. Imag. , vol. 6, pp. 1139–1152, 2020.[21] I. A. M. Huijben, B. S. Veeling, and R. J. G. van Sloun, “Learningsampling and model-based signal recovery for compressed sensingMRI,” in , May 2020, pp. 8906–8910.[22] T. Sanchez et al. , “Scalable learning-based sampling optimization forcompressive dynamic MRI,” in , May 2020, pp. 8584–8588.[23] M. V. Zibetti, G. T. Herman, and R. R. Regatte, “Fast data-drivenlearning of MRI sampling pattern for large scale problems,” 2020.[Online]. Available: http://arxiv.org/abs/2011.02322[24] C. Qian, Y. Yu, and Z.-H. Zhou, “Subset selection by Pareto optimiza-tion,” in
Proc. Intl. Conf. Neur. Info. Proc. Sys. (NeurIPS) , ser. NIPS’15,Dec. 2015, pp. 1774–1782.[25] K. H. Jin, M. Unser, and K. M. Yi, “Self-supervised deep activeaccelerated MRI,” 2020. [Online]. Available: http://arxiv.org/abs/1901.04547[26] D. Zeng, C. Sandino, D. Nishimura, S. Vasanawala, and J. Cheng,“Reinforcement learning for online undersampling pattern optimization,”in
Proc. Intl. Soc. Magn. Reson. Med. (ISMRM) , 2019, p. 1092.[27] F. Sherry et al. , “Learning the sampling pattern for MRI,” 2020.[Online]. Available: http://arxiv.org/abs/1906.08754[28] H. K. Aggarwal and M. Jacob, “Joint optimization of sampling patternsand deep priors for improved parallel MRI,” in , May 2020, pp. 8901–8905.[29] T. Weiss, O. Senouf, S. Vedula, O. Michailovich, M. Zibulevsky, andA. Bronstein, “PILOT: physics-informed learned optimal trajectoriesfor accelerated MRI,” 2019. [Online]. Available: http://arxiv.org/abs/1909.05773[30] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: convolutional net-works for biomedical image segmentation,” in
Med. Imag. Comput. andComput.-Assist. Interv. (MICCAI) , 2015, pp. 234–241.[31] K. P. Pruessmann, M. Weiger, P. B¨ornert, and P. Boesiger, “Advances insensitivity encoding with arbitrary k-space trajectories,”
Magn. Reson.Med. , vol. 46, no. 4, pp. 638–651, 2001.[32] S. Hao, J. A. Fessler, D. C. Noll, and J.-F. Nielsen, “Joint design ofexcitation k-space trajectory and RF pulse for small-tip 3D tailoredexcitation in MRI,”
IEEE Trans. Med. Imag. , vol. 35, no. 2, pp. 468–479,Feb. 2016.[33] F. J. Nielsen JF, Sun H and N. DC, “Improved gradient waveforms forsmall-tip 3D spatially tailored excitation using iterated local search,” in
Proc. Intl. Soc. Magn. Reson. Med. (ISMRM) , 2016, p. 1013.[34] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: model-based deeplearning architecture for inverse problems,”
IEEE Trans. Med. Imag. ,vol. 38, no. 2, pp. 394–405, Feb. 2019.[35] Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressivesensing MRI,” in
Proc. of the 30th Intl. Conf. on Neur. Info. Proc. Sys.(NIPS) , Dec. 2016, pp. 10–18.[36] K. Hammernik et al. , “Learning a variational network for reconstructionof accelerated MRI data,”
Magn. Reson. Med. , vol. 79, no. 6, pp. 3055–3071, 2018. [37] J. Schlemper, C. Qin, J. Duan, R. M. Summers, and K. Hammernik,“Sigma-net: ensembled iterative deep neural networks for acceleratedparallel MR image reconstruction,” 2020. [Online]. Available: http://arxiv.org/abs/1912.05480[38] J. A. Fessler, S. Lee, V. T. Olafsson, H. R. Shi, and D. C. Noll,“Toeplitz-based iterative image reconstruction for MRI with correctionfor magnetic field inhomogeneity,”
IEEE Trans. Sig. Proc. , vol. 53, no. 9,pp. 3393–402, Sep. 2005.[39] K. Gregor and Y. LeCun, “Learning fast approximations of sparsecoding,” in
Proc. of the 27th Intl. Conf. on Mach. Learn. (ICML) , Jun.2010, pp. 399–406.[40] S. Yu, B. Park, and J. Jeong, “Deep iterative down-up CNN for imagedenoising,” in
Proc. of the IEEE Conf. on Comput. Vis. and Patt. Recog.Work. (CVPRW) , 2019, pp. 0–0.[41] S. J. Vannesjo et al. , “Gradient system characterization by impulseresponse measurements with a dynamic field camera,”
Magn. Reson.Med. , vol. 69, no. 2, pp. 583–593, 2013.[42] J. H. Duyn, Y. Yang, J. A. Frank, and J. W. van der Veen, “Simplecorrection method for k-space trajectory deviations in MRI,”
J. Magn.Reson. , vol. 132, pp. 150–153, May 1998.[43] J.-F. Nielsen and D. C. Noll, “TOPPE: a framework for rapid prototypingof MR pulse sequences,”
Magn. Reson. Med. , vol. 79, no. 6, pp. 3128–3134, 2018.[44] C. Lazarus, “Compressed sensing in MRI: optimization-based designof k-space filling curves for accelerated MRI,” Ph.D. dissertation,Universit´e Paris Saclay (COmUE), Sep. 2018.[45] A. Hor´e and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in
Intl.Conf. on Patn. Recog. (ICPR) , Aug. 2010, pp. 2366–2369.[46] J. H. Lee, B. A. Hargreaves, B. S. Hu, and D. G. Nishimura, “Fast3D imaging using variable-density spiral trajectories with applicationsto limb perfusion,”
Magn. Reson. Med. , vol. 50, no. 6, pp. 1276–1285,2003.[47] J. Zbontar et al. , “fastMRI: An open dataset and benchmarks foraccelerated MRI,” 2018. [Online]. Available: http://arxiv.org/abs/1811.08839[48] M. Uecker et al. , “ESPIRiT - an eigenvalue approach to autocalibratingparallel MRI: where SENSE meets GRAPPA,”
Mag. Reson. Med. ,vol. 71, no. 3, pp. 990–1001, Mar. 2014.[49] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,”2017. [Online]. Available: http://arxiv.org/abs/1412.6980[50] M. Mardani et al. , “Deep generative adversarial neural networks forcompressive sensing MRI,”
IEEE Trans. Med. Imag. , vol. 38, no. 1, pp.167–179, 2018.[51] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “Adeep cascade of convolutional neural networks for dynamic MR imagereconstruction,”
IEEE Trans. Med. Imag. , vol. 37, no. 2, pp. 491–503,Feb. 2018.[52] S. C. van de Leemput, J. Teuwen, B. van Ginneken, and R. Manniesing,“MemCNN: a Python/PyTorch package for creating memory-efficientinvertible neural networks,”
J. Open Source Softw. , vol. 4, no. 39, p.1576, Jul. 2019.[53] S. Nah et al. , “NTIRE 2020 challenge on image and video deblurring,”in
Proc. of the IEEE Conf. on Comput. Vis. and Patt. Recog. Work.(CVPRW) , 2020, pp. 416–417.[54] A. Lugmayr et al. , “NTIRE 2020 challenge on real-world image super-resolution: methods and results,” in
Proc. of the IEEE Conf. on Comput.Vis. and Patt. Recog. Work. (CVPRW) , 2020, pp. 494–495.[55] M. Jacob, M. P. Mani, and J. C. Ye, “Structured low-rank algorithms:theory, MR applications, and links to machine learning,”
IEEE Sig. Proc.Mag. , vol. 37, no. 1, pp. 54–68, Jan. 2020.[56] J. A. Fessler, “Model-based image reconstruction for MRI,”
IEEE Sig.Proc. Mag. , vol. 27, no. 4, pp. 81–89, 2010.[57] J. A. Fessler and B. P. Sutton, “Nonuniform fast Fourier transformsusing min-max interpolation,”
IEEE Trans. Sig. Proc. , vol. 51, no. 2,pp. 560–74, Feb. 2003.[58] B. Dale, M. Wendt, and J. L. Duerk, “A rapid look-up table method forreconstructing MR images from arbitrary k-space trajectories,”
IEEETrans. Med. Imag. , vol. 20, no. 3, pp. 207–217, 2001.[59] M. J. Muckley, R. Stern, T. Murrell, and F. Knoll, “TorchKbNufft: Ahigh-level, hardware-agnostic non-uniform fast fourier transform,” in
ISMRM Workshop on Data Sampling & Image Reconstruction , 2020.
ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 11 A PPENDIX
A. Experiments
1) Eddy-current effect:
Fig. S1 displays the CS-based recon-struction of real acquisitions according to nominal/measuredtrajectories. Fig. S2 shows the measurement results of thetrajectories. Using the measurement of the actual trajectoryseems to mitigate the influence of eddy current effects in thereconstruction results.
2) Simulation Experiments:
Fig. S3 shows examples of theUNN-based reconstruction for different trajectories in thesimulation experiment. Compared with undersampled radialtrajectory or SPARKLING trajectory, the proposed BJORKtrajectory leads to a lower level of artifacts and better restora-tion of fine structures.
Fig. S1. CS-based reconstruction of a water phantom. The leftcolumn is the reconstruction with the nominal trajectory, and right iswith the measured trajectory. Reconstruction with the mapped trajectoryintroduces fewer artifacts.
B. Efficient backpropagation of NUFFT operators
This section describes an efficient approximation for theJacobian of the NUFFT operation used in the MRI systemmodel for non-Cartesian sampling. This approximation facili-tates fast and accurate backpropagation methods for learningnon-Cartesian k-space trajectories.Consider the (single-coil) MRI measurement model [56] y = Ax + εy ∈ C M , x ∈ C N , A ∈ C M × N where A = A ( ω ) haselements a ij = e − ı(cid:126)ω i · (cid:126)r j for (cid:126)ω i ∈ R D and (cid:126)r j ∈ R D where D ∈ { , , } denotes theimage dimension, and where ω = [ ω [1] ω [2] . . . ω [ D ] ] is the M × d vector representation of all the k-space samplinglocations and ω [ d ] ∈ R M denotes its d th column. N denotesthe number of voxels, and M stands for the number ofacquisition (in k-space). Typically A is approximated by aNUFFT operator [57].For simplicity, consider the training loss of trajectory opti-mization for a single image (or a mini-batch): L ( ω ) = (cid:107) f θ , ω ( ω ; A ( ω ) x + ε ) − x (cid:107) , where x ∈ C N . Note that L : R M (cid:55)→ R so ∇ L ∈ R M . The update w.r.t. L ( ω ) involves the gradients w.r.t. NUFFToperators A ( ω ) and its adjoint. Previous works [29] propagatethe gradient calculations through all of the NUFFT stepsusing the chain rule, which is potentially slow, and couldintroduce error because of the interpolation operators. Herewe investigate a different approach where we analyze onpaper the gradient using the exact (slow) Fourier transformexpression and then implement that expression using (fast)NUFFT approximations. This approach leads to inexact gra-dient computations, but gradients are often inexact in machinelearning problems anyway, due to operations like mini-batches.
1) Forward operator:
For x : ∂ Ax ∂ x = A . For the d th column of ω : (cid:20) ∂ Ax ∂ ω [ d ] (cid:21) il = ∂ [ Ax ] i ∂ω [ d ] l = ∂∂ω [ d ] l N (cid:88) j =1 e − ı(cid:126)ω i · (cid:126)r j x j = (cid:40) − ı (cid:80) Nj =1 e − ı(cid:126)ω i · (cid:126)r j x j r [ d ] j , i = l , otherwise,for i, l = 1 , . . . , M . By denoting the point-wise vector productas (cid:12) , the above summation equals the product of the i th rowof A with x (cid:12) r [ d ] . Thus the M × M Jacobian matrix for thepartial derivatives of Ax w.r.t. ω [ d ] is: ∂ Ax ∂ ω [ d ] = − ı diag (cid:110) A ( x (cid:12) r [ d ] ) (cid:111) . (S.1)Thus, to apply this Jacobian to a (gradient) vector v ∈ C M ,as needed in a back-propagation step, we simply compute − ı ( A ( x (cid:12) r [ d ] )) (cid:12) v , by applying a NUFFT operation A to x (cid:12) r [ d ] .
2) Adjoint operator:
For y : ∂ A (cid:48) y ∂ y = A (cid:48) For the d th column of ω , the N × M Jacobian matrix haselements: (cid:20) ∂ A (cid:48) y ∂ ω [ d ] (cid:21) jl = ∂ [ A (cid:48) y ] j ∂ω [ d ] l = ∂ (cid:80) Mi =1 e ı(cid:126)ω i · (cid:126)r j y i ∂ω [ d ] l = ı e ı(cid:126)ω l · (cid:126)r j y l r [ d ] j . Fig. S2. The measurement of the influence of the eddy currents on readout waveform. The solid line is the nominal trajectory, and the dotted lineis the measurement.
Thus the Jacobian matrix is ∂ A (cid:48) y ∂ ω [ d ] = ı diag (cid:110) r [ d ] (cid:111) A (cid:48) diag { y } . (S.2)Applying this Jacobian to a vector v ∈ C M involves an adjointNUFFT A (cid:48) of y (cid:12) v .
3) Frame operator (Gram matrix):
For x : ∂ A (cid:48) Ax ∂ x = A (cid:48) A . For the d th column of ω , the elements of the N × M Jacobian matrix are (cid:20) ∂ A (cid:48) Ax ∂ ω [ d ] (cid:21) kl = ∂ [ A (cid:48) Ax ] k ∂ω [ d ] l = ∂ (cid:80) Mi =1 (cid:80) Nj =1 e − ı(cid:126)ω l · ( (cid:126)r j − (cid:126)r k ) x j ∂ω [ d ] l = − ı N (cid:88) j =1 e − ı(cid:126)ω l · ( (cid:126)r j − (cid:126)r k ) x j ( r [ d ] j − r [ d ] k )= − ı e ı(cid:126)ω l · (cid:126)r k N (cid:88) j =1 e − ı(cid:126)ω l · (cid:126)r j x j r [ d ] j + ı e ı(cid:126)ω l · (cid:126)r k r [ d ] k N (cid:88) j =1 e − ı(cid:126)ω l · (cid:126)r j x j . Thus the Jacobian matrix is: ∂ A (cid:48) Ax ∂ ω [ d ] = − ı A (cid:48) diag (cid:110) A ( x (cid:12) r [ d ] ) (cid:111) + ı diag (cid:110) r [ d ] (cid:111) A (cid:48) diag { Ax } . (S.3)Again, we apply this Jacobian to a vector by combiningNUFFT operations with Hadamard products. TABLE S.1C
OMPUTATION TIME auto-diff approx.Large image - GPU 1.31s 0.91sSmall image - GPU 1.29s 0.90sLarge image - CPU 4.32s 0.74sSmall image - CPU 0.71s 0.36sLarge size: 320*320; small size: 40*40
4) Validation:
For validation, we examined a simple test casewith different calculation methods. We computed ∂ (cid:107) A (cid:48) Ax (cid:107) ∂ ω [ d ] and ∂ (cid:107) A (cid:48) Ax (cid:107) ∂ x , where x is a cropped Shepp–Logan phantom with randomadditional phase in [ − π, π ] . The sampling pattern is un-dersampled radial spokes. The x and ω displayed are onefragment of the whole vector. Three settings are compared:(1) auto-differentiation of NUFFT; the table finding operation[58] is replaced by bi-linear interpolation to enable auto-differentiation, (2) auto-differentiation of exact non-uniformdiscrete Fourier transform (NDFT), (3) our approximation. It isworth noting that there is no ’ground truth’ here. Errors occurin the interpolation of (1). (2) is the accurate gradient of NDFTbecause the gradient calculation only involves one exponentialand one sum operation, with no accumulation error. SinceNUFFT is only an approximation of NDFT, we cannot directlyregard NDFT’s Jacobian as NUFFT’s Jacobian. If the NUFFToperation is accurate enough, though, (2) can be a goodapproximation of NUFFT’s Jacobian matrix. As a validationmethod, we also compared the trajectory optimization result by(1) and (3) (approach (2) is too time-consuming for this task.).The implementation of (1) and (3) were based on [59]. Wejointly trained the reconstruction network and the trajectoryas described in the main text, but did not parameterize thetrajectory.Fig. S4 and Fig. S5 show the gradient calculated by differentmethods. For gradients w.r.t. the image x , three methods leadto similar results. For gradient concerning the trajectory ω ,the auto-differentiation has larger deviance from the results ofexact NDFT. If random phase is not added to the complex-valued image x , the three methods still have similar results.So here we display an extreme case. In Fig. S6, our approx-imation method leads to a learned trajectory consistent withintuition: sampling points should not be clustered or too distantfrom each other. The quantitative reconstruction results alsodemonstrate significant improvement (950 test slices, SSIM:0.930 vs. 0.957.).Table S.1 compares the time for computing (1) and (2).The CPUs used are Intel Xeon Gold 6138 CPU @ 2.00GHzand GPU is an Nvidia RTX2080Ti. Our method is faster thanauto-differentiation, especially in the CPU mode. ANG et al. : BJORK FOR ACCELERATED 2D MRI TRAJECTORY DESIGN 13
Fig. S3. Results from the simulation experiment using the UNN-based reconstruction algorithm for three different trajectories. The first row showsthe trajectories.
Fig. S4. The gradient w.r.t. image x . Three calculation methods lead to similar results.Fig. S5. The gradient w.r.t. trajectory ωω