Resolution Enhancement in Protein NMR Spectra by Deconvolution with Compressed Sensing Reconstruction
Krzysztof Kazimierczuk, Paweł Kasprzak, Panagiota S. Georgoulia, Irena Burmann, Björn M. Burmann, Linnéa Isaksson, Emil Gustavsson, Sebastian Westenhoff, Vladislav Yu. Orekhov
1 Resolution Enhancement in Protein NMR Spectra by Deconvolution with Compressed Sensing Reconstruction
Krzysztof Kazimierczuk, [a]
Paweł Kasprzak, [a,b]
Panagiota S. Georgoulia, [c]
Irena Matečko-Burmann, [d,e]
Björn M. Burmann, [c,e]
Linnéa Isaksson, [c]
Emil Gustavsson, [f]
Sebastian Westenhoff, [c] and Vladislav Yu. Orekhov* [c,f] [a] Centre of New Technologies, University of Warsaw, ul. Banacha 2C, 02-097 Warsaw, Poland [b] Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland [c] Department of Chemistry and Molecular Biology, University of Gothenburg, Box 465, Gothenburg 405 30, Sweden [d] Department of Psychiatry and Neurochemistry, University of Gothenburg, Gothenburg 405 30, Sweden [e] Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 405 30, Sweden [f] Swedish NMR Centre, University of Gothenburg, Box 465, 405 30, Gothenburg, Sweden. E-mail: [email protected] Supporting information for this article is given via a link at the end of the document.
Abstract:
Multidimensional NMR spectroscopy is one of the basic tools for determining the structure of biomolecules. Unfortunately, the resolution of the spectra is often limited by inter-nuclear couplings. This limitation cannot be overcome by common ways of increasing resolution, i.e. non-uniform sampling (NUS) followed by compressed sensing (CS) reconstruction. In this paper, we show how to enrich CS processing with virtual decoupling leading to an increase in resolution, sensitivity, and overall quality of NUS reconstruction. A mathematical description of the decoupling by deconvolution approach explains the effects of noise, modulation of the sampling schedule, and reveals relation with the underlying assumption of the CS. The gain in resolution and sensitivity is demonstrated for the basic experiment used for protein backbone assignment 3D HNCA applied to two large protein systems: intrinsically disordered 441-residue Tau and a 509-residue globular bacteriophytochrome fragment.
Nuclear magnetic resonance (NMR) is among the main analytical techniques allowing atomic-level studies of proteins. The prerequisite step for most of protein NMR work is a resonance-specific spectral assignment, i.e. association of resonance frequencies with atoms in the protein amino acid chain. [1]
The HNCA [2] is by far the most sensitive and thus often the only feasible triple resonance experiment that provides sequential connectivities between neighbouring protein residues. In principle, a sequence-specific resonance assignment could be obtained using the HNCA experiment alone. Unfortunately, low signal resolution relative to the dispersion of the protein C α resonances results in massive ambiguity of the assignment even for relatively small proteins. For large systems with many amino acid residues as well as for intrinsically disordered proteins (IDP) characterized by the particularly low resonance dispersion, one has to rely on additional experiments at the expense of sensitivity loss, a significant increase of measurement time, and more complicated and tedious analyses. Slow transverse relaxation of the C α spins, which can be further decreased by deuteration, corresponds to the natural line-width of 5–8 Hz even for relatively large protein systems. Unfortunately, the practical resolution in the HNCA spectra is usually almost ten times worse. Two main factors limit the resolution in the HNCA: (i) a large number of time-increments in the C α dimension needed in the 3D experiment to achieve the high resolution. This leads to too long measurement time that can be unaffordable because of short sample stability and/or limitation on the measurement time at an NMR instrument; (ii) homonuclear one-bond coupling between C α and C β spins that produces a doublet with separation of approximately 35 Hz for every C α signal and thus effectively broadens the spectral line. The former issue is well addressed by using fast pulsing [3] and non-uniform sampling (NUS) techniques. [4] A large number of methods for handling of the J(C α –C β ) coupling had been introduced over the last decades, including biochemical unlabelling of the β carbon atom to C, [5] constant time evolution, [6] band-selective homonuclear decoupling, [7] and IPAP decoupling. [8] However, broad practical use of these techniques is hindered due to the inherent compromises in sensitivity, extra demands on sample isotope labelling, inability to deal with serine and threonine residues, and/or significant spectral distortions and artefacts. A viable alternative to these experimental approaches is the virtual decoupling that is the post-acquisition deconvolution of the J-coupling at the signal processing stage. [9]
The aim of this communication is to investigate the possibility of effective deconvolution in compressed sensing (CS) algorithms that are among the most powerful for the NUS spectra – to propose a method of selective deconvolution of individual spectral regions; and to demonstrate relation of the deconvolution to the cornerstone CS concept of sparseness with the resulting benefits for the effectiveness of CS. The fundamental relation between the NMR signal 𝑓(𝑡) detected in the time domain and the spectrum 𝑠 is 𝑓 = 𝐹𝑠 (1) where 𝐹 is the measurement matrix composed of rows from the inverse Fourier transform matrix for every point in 𝑓 . Thus, reconstruction of a spectrum from 𝑓 reduces to solving the inverse linear system in Eq. 1. For an undersampled (i.e. NUS) signal, the solution is not unique and additional constraints on the spectrum s are usually imposed. For example, using generalized Tikhonov regularization, the spectrum can be obtained as: 𝑠 = 𝑎𝑟𝑔 𝑚𝑖𝑛 / + ‖𝑥‖ (2) a r X i v : . [ phy s i c s . d a t a - a n ] S e p where, for a vector y and matrix G , ‖𝑦‖ :5 denotes the weighted norm square 𝑦 ; 𝐺𝑦 with 𝑦 ; denoting the conjugate transpose of 𝑦 ; 𝑄 = 𝜎 ?5 𝐼 is the inverse covariance matrix of noise in 𝑓 , which is multiple of the identity matrix 𝐼 and 𝜎 is the standard deviation of the noise; 𝐷 is a diagonal matrix including weightings of the spectrum points and the Tikhonov regularization term. As will be shown below, 𝑄 is useful when dealing with the J(C α –C β ) coupling, while matrix 𝐷 is the essential element of the Iterative Reweighted Least Squares (IRLS), one of the most popular algorithms for compressed sensing reconstruction of the NUS spectra (see Supporting information) [10] . Assuming the same value of active J(C α –C β ) coupling for all signals, the measured in experiment C α signal 𝑓 and the signal without the J-coupling 𝑓B are related as: 𝑓 = 𝐶𝑓B and 𝑓B = 𝐶 ?D 𝑓 (3) where 𝐶 is a diagonal matrix with elements 𝑐𝑜𝑠(𝜋𝐽𝑡) for every time point 𝑡 in 𝑓 . If points in the measured signal 𝑓 are corrupted by noise with inverse covariance matrix 𝑄 = 𝜎 ?5 𝐼 , the noise in 𝑓B has the inverse covariance matrix 𝑄I = 𝜎 ?5 (𝐶𝐶) . Then, the decoupled spectrum is 𝑠̃ = 𝑎𝑟𝑔 𝑚𝑖𝑛 / + ‖𝑥‖ (4a) or equivalently (see Supporting information), 𝑠̃ = 𝑎𝑟𝑔 𝑚𝑖𝑛 / + ‖𝑥‖ (4b) The last equation shows that the post-acquisition deconvolution can be achieved in IRLS and any other algorithm based on equation akin to Eq 2, e.g. Maximum Entropy [9a-d] and Multi-Dimensional Decomposition (MDD), [4c] by using measurement matrix 𝐶𝐹 instead of 𝐹 . Finally, we note that the deconvoluted spectrum contains half of the peaks relative to the undecoupled spectrum. Thus, it is sparser , and in accordance with the theory of compressed sensing, [11] it requires nearly half of the measured data points for successful reconstruction. This means that the virtual decoupling not only enhances spectral resolution but also provides conditions for higher quality CS reconstruction (see theory in SI). Use of the deconvolution for the HNCA experiment is based on the assumption that J(C α – C β ) coupling constants are nearly the same for all residues in the protein. The variation of the coupling values ±2.5 Hz [12] is lower than the line width determined by the transverse relaxation of C α spins and, thus, does not pose a problem for the reconstruction (see Supporting information). However, signals (singlets) from Gly residues that do not have C β atoms, have no sparse representation in the columns of measurement matrix 𝐶𝐹 . Figure 1 illustrates that this not only corrupts the Gly peaks in the deconvoluted spectrum but also affects other signals and reduces the overall quality of the reconstruction. To tackle this, we suggest a procedure of deconvolution-IRLS (D-IRLS) with the Gly-region selection as outlined in Figure 1 (more details are found in the Supporting information). We start with reconstructing the full undecoupled spectrum using matrix 𝐹 . Because the C α atoms usually have distinctly different chemical shifts with values lower than 45 ppm, we can subtract the well-reproduced signals in the Gly region from the original time-domain signal 𝑓 , which is then used to reconstruct the spectrum with all signals except for Gly using Eq. 4b with the measurement matrix 𝐶𝐹 . Finally, signals of Gly and other residues are combined into the full decoupled spectrum in the frequency domain. Figure 1.
Processing of a spectrum with region-selective deconvolution. A – measured time-domain signal that contains both a singlet and a doublet. IRLS reconstructions of A with and without deconvolution produce spectra F and B , respectively. C - the singlet (green) part of the spectrum B is converted back to the time-domain using inverse Fourier transform (IFT). D – the original signal A after subtraction of D. E – the result of the region-selective deconvolution, i.e. combination of IRLS processing of D (yellow) and the green part of B. Quality of both singlet and decoupled doublet signals in E is better than in F. Selection of the NUS acquisition schedule has a profound effect on the reconstruction quality. As 𝑓 is multiplied by 𝐶 ?D in Eq. 3, the noise is amplified the most for the points in 𝑓B at times, where 𝑐𝑜𝑠(𝜋𝐽𝑡) function has small values (i.e. near 𝑡 = 𝑘 (2𝐽)⁄ , 𝑘 =1,3,5 ). In the weighted least squares method used to derive Eq. 4, these points are used with low weights and thus contain relatively low information value. In the NUS schedule, it is logical to avoid these points and instead invest spectrometer time into more informative measurements. We used the signal amplitude matched NUS schedule with the sampling density corresponding to | 𝑐𝑜𝑠(𝜋𝐽𝑡)| and rejecting points with probability less than 0.2 [13] (Supporting Figure S1). Additionally, the schedule was in all cases relaxation-matched. We demonstrate the new D-IRLS procedure using examples of two representative systems: intrinsically disordered human 441-residue Tau protein (the longest hTau40 isoform) [14] and the monomeric variant of the 509-residue globular photosensory module PAS-GAF-PHY of Deinococcus radiodurans phytochrome ( Dr BphP
PSM ). [15] For each protein, Figure 2 shows the traditional low-resolution 3D HNCA spectrum superimposed with the resolution-enhanced spectrum obtained using D-IRLS with Gly-region selection. For Dr BphP
PSM the two experiments were reconstructed using nearly the same number of NUS points corresponding to the same measurement time; for Tau, the low resolution experiment was around two times shorter. In the shown examples, the dramatically improved resolution of the D-IRLS spectrum allows us to observe sequential connectivities that are ambiguous in the traditional spectrum. Figures 2, S2, and S3 demonstrate that, in addition to the enhanced resolution, the D-IRLS spectra show higher or similar sensitivity in comparison to both the traditional low resolution and non-deconvoluted spectra. The peak connecting A87 and A88 in Dr BphP
PSM spectra (Figure 2B) provides a specific example of this. It is clearly seen in the 1D cross-sections in the D-IRLS spectrum. In the traditional experiment, the weak peak is completely masked by the slope of a stronger peak. In the non-deconvoluted spectrum, only one of the doublet components is present, which gives a completely wrong idea of the peak position.
Figure 2.
Several planes from the 3D HNCA spectra of A – Tau and B – Dr BphP
PSM showing the assignment walk for selected residues. Overlaid blue (green) and red (purple) contour levels depict traditional low-resolution and high-resolution spectra of Tau ( Dr BphP
PSM ) protein. For the peak annotations, we use the previously published assignment. [14-15]
The one-dimensional cross-sections above the spectra planes are taken (orange) the low-resolution, (black) high-resolution deconvoluted, and (grey) high-resolution non-deconvoluted spectra.
Figure S2 shows the C/ N projections from the spectra of both studied proteins, which confirms the superior quality of the spectra reconstructed with Gly-region selective D-IRLS. While the improved resolution in the spectra is anticipated from the deconvolution, the remarkable sensitivity of the D-IRLS spectrum can be explained by the increased sparsity favourable for the NUS reconstruction. In order to extensively test the proposed D-IRLS method, we conducted simulations using synthetic peaks added to the 3D HNCA signal of Tau. Adding the simulated components with known positions and intensities to the time domain signal makes it possible to define the precision of the corresponding peak parameters derived from the reconstructed spectrum. [16]
A detailed description of the simulations can be found in the Supporting information. The results shown in Supporting Figure S3 confirm that the peak intensities and positions are much more accurate when the D-IRLS deconvolution is augmented with the Gly-region selective procedure (Figure 1E). Notably, at low sampling levels (250 and 400 NUS points) the number of detected peaks from the lowest intensity fraction of the injected peaks is significantly larger in the deconvoluted spectrum in comparison to the IRLS without the deconvolution. This is fully in line with the theoretical consideration that the deconvoluted spectrum is much more sparse and thus can be successfully reconstructed with fewer measured points. In the wide range of NUS levels, cosine J-modulated sampling scheme provides spectra with comparable or somewhat better accuracy of the peak positions and intensities than the schedules matched to the exponential relaxation decay only. However, the main practical problem with the latter scheme is the necessity to increase and carefully adjust the Tikhonov regularization parameter λ in the IRLS algorithm, whereas the cosine-modulated sampling is much less demanding in this respect and thus more robust. It is also worth noting that precision of the peak positions derived from the non-deconvoluted spectrum is somewhat better than in its decoupled counterpart provided that both components of the doublet are detected and resolved from other peaks. In conclusion, we proposed an efficient CS method to improve the resolution, sensitivity and quality of NUS reconstruction using virtual decoupling at the processing stage. We presented a complete mathematical description of the problem in terms of the generalized Tikhonov regularization formalism. We also showed that removing singlets from the spectrum before decoupling significantly improves results. The method was demonstrated on the 3D HNCA spectra of two large systems prototypical for the intrinsically disordered and globular proteins. The new CS virtual decoupling technique will enable the sequential signal assignment for many challenging proteins and will be useful for other types of NMR spectra in a variety of applications.
Acknowledgements
KK and PK thank the Foundation for Polish Science for support via the FIRST TEAM program co-financed by the European Union under the European Regional Development Fund no. (POIR.04.04.00-00-4343/17-00). BMB and SW acknowledge funding from the Knut och Alice Wallenberg Foundation. VO thanks for the support by the Swedish Research Council (Research Grant 2019-3661).
Supporting information The Supporting information contains extended mathematical description of D-IRLS, sample preparation and experimental details, and results of simulations with injected peaks.
Keywords: biomolecular NMR • J-coupling • deconvolution • compressed sensing • non-uniform sampling [1] M. Ikura, L. E. Kay, A. Bax,
Biochemistry , , 4659-4667. [2] a) Z. Solyom, M. Schwarten, L. Geist, R. Konrat, D. Willbold, B. Brutscher, J. Biomol. NMR , , 311-321; b) M. Salzmann, K. Pervushin, G. Wider, H. Senn, K. Wüthrich, Proc. Natl. Acad. Sci. USA , , 13585-13590; c) L. E. Kay, M. Ikura, R. Tschudin, A. Bax, J. Magn. Reson. , , 496-514. [3] a) E. Lescop, P. Schanda, B. Brutscher, J. Magn. Reson. , , 163-169; b) K. Pervushin, B. Vögeli, A. Eletsky, J. Am. Chem. Soc. , , 12898-12902. [4] a) J. C. J. Barna, E. D. Laue, M. R. Mayger, J. Skilling, S. J. P. Worrall, J. Magn. Reson. , , 69-77; b) J. C. Hoch, A. S. Stern, M. Mobli , Vol. 26 , John Wiley & Sons, Ltd, , pp. 1-6; c) V. Y. Orekhov, V. A. Jaravine, Prog. Nucl. Magn. Reson. Spectrosc. , , 271-292; d) K. Kazimierczuk, V. Orekhov, Magn. Reson. Chem. , , 921-926; e) M. Bostock, D. Nietlispach, Concept Magn. Reson. A , , e21438; f) S. Robson, H. Arthanari, S. G. Hyberts, G. Wagner, in Methods Enzymol., Vol. 614 , Elsevier, , pp. 263-291. [5] a) D. M. LeMaster, D. M. Kushlan,
J. Am. Chem. Soc. , , 9255-9264; b) P. E. Coughlin, F. E. Anderson, E. J. Oliver, J. M. Brown, S. W. Homans, S. Pollak, J. W. Lustbader, J. Am. Chem. Soc. , , 11871-11874; c) M. Kainosho, T. Torizawa, Y. Iwashita, T. Terauchi, A. M. Ono, P. Güntert, Nature , , 52-57; d) P. Lundström, K. Teilum, T. Carstensen, I. Bezsonova, S. Wiesner, D. F. Hansen, T. L. Religa, M. Akke, L. E. Kay, J. Biomol. NMR , , 199-212; e) K. Takeuchi, Z.-Y. J. Sun, G. Wagner, J. Am. Chem. Soc. , , 17210-17211; f) S. A. Robson, K. Takeuchi, A. Boeszoermenyi, P. W. Coote, A. Dubey, S. Hyberts, G. Wagner, H. Arthanari, Nat. Commun. , 1-11. [6] R. Powers, A. M. Gronenborn, G. M. Clore, A. Bax,
J. Magn. Reson. , , 209–213. [7] a) H. Matsuo, E. Kupče, H. Li, G. Wagner, J. Magn. Reson. B , , 91-96; b) P. W. Coote, S. A. Robson, A. Dubey, A. Boeszoermenyi, M. Zhao, G. Wagner, H. Arthanari, Nat. Commun. , 1-9. [8] a) M. Ottiger, F. Delaglio, A. Bax,
J. Magn. Reson. , , 373-378; b) P. Andersson, J. Weigelt, G. Otting, J. Biomol. NMR , , 435-441. [9] a) A. A. Bothner-By, J. Dadok, J. Magn. Reson. , , 540-543; b) M. A. Delsuc, G. C. Levy, J. Magn. Reson. , , 306-315; c) Z. Serber, C. Richter, D. Moskau, J.-M. Böhlen, T. Gerfin, D. Marek, M. Häberli, L. Baselgia, F. Laukien, A. S. Stern, J. C. Hoch, V. Dötsch, J. Am. Chem. Soc. , , 3554-3555; d) N. Shimba, A. S. Stern, C. S. Craik, J. C. Hoch, V. Dötsch, J. Am. Chem. Soc. , , 2382-2383; e) R. Kerfah, O. Hamelin, J. Boisbouvier, D. Marion, J. Biomol. NMR , , 389-402. [10] a) E. J. Candes, M. B. Wakin, S. P. Boyd, J. Fourier Anal. Appl. , , 877-905; b) K. Kazimierczuk, V. Y. Orekhov, J. Magn. Reson. , , 1-10. [11] a) S. Foucart, H. Rauhut, Bull. Am. Math , , 151-165; b) E. J. Candès, J. Romberg, T. Tao, IEEE Trans. Inf Theory , , 489-509. [12] J. M. Schmidt, M. J. Howard, M. Maestre ‐ Martínez, C. S. Pérez, F. Löhr,
Magn. Reson. Chem. , , 16-30. [13] V. Jaravine, I. Ibraghimov, V. Y. Orekhov, Nat. Methods , , 605-607. [14] R. L. Narayanan, U. H. Dürr, S. Bibow, J. Biernat, E. Mandelkow, M. Zweckstetter, J. Am. Chem. Soc. , , 11906-11907. [15] E. Gustavsson, L. Isaksson, C. Persson, M. Mayzel, U. Brath, L. Vrhovac, J. A. Ihalainen, B. G. Karlsson, V. Orekhov, S. Westenhoff, Biophys. J. , , 415-421. [16] A. D. Schuyler, J. C. Hoch, NUScon, nonuniform sampling and reconstruction challenge in NMR spectroscopy. , https://nuscon.org/, . Entry for the Table of Contents
We introduce an algorithm of Compressed Sensing (CS) with virtual decoupling that increases resolution, sensitivity, and quality of NUS reconstruction of NMR spectra as exemplified by HNCA experiments for two large protein systems: intrinsically disordered 441-residue Tau and a 509-residue globular bacteriophytochrome fragment. Rigorous mathematical description of the algorithm in relation to the underlying assumption of the CS is presented. upporting information to: Resolution Enhancement inProtein NMR Spectra by Deconvolution with CompressedSensing Reconstruction
Krzysztof Kazimierczuk , Pawe(cid:32)l Kasprzak , Panagiota S. Georgoulia , IrenaMateˇcko-Burmann , Bj¨orn M. Burmann , Linn´ea Isaksson , Emil Gustavsson ,Sebastian Westenhoff , and Vladislav Yu. Orekhov Centre of New Technologies, University of Warsaw, Banacha 2C, 02-097Warsaw, Poland Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland. Department of Chemistry and Molecular Biology, University of Gothenburg, 40530 Gothenburg, Sweden Department of Psychiatry and Neurochemistry, University of Gothenburg,Gothenburg 405 30, Sweden Wallenberg Centre for Molecular and Translational Medicine, University ofGothenburg, Gothenburg 405 30, Sweden Swedish NMR center, University of Gothenburg, Box 465, 405 30, Gothenburg,Sweden
Let f be the the subsampled NMR (complex) signal measured at { t , . . . , t k } and viewed as thecolumn vector f ∈ C k . The sampling schedule { t , . . . , t k } is a fixed subset of the full samplinggrid { l ∆ t : l = 0 , , . . . , n − } . The CS methodology when applied in NMR context providesthe algorithms and theory which enable for the recovery of the NMR spectrum s ∈ C n from thesub-sampled signal, even if k (cid:28) n . The CS problem is formulated in terms of the measurementmatrix F ∈ M k × n ( C ). In the NMR application of CS the matrix F consists of rows of the n × n inverse Fourier matrix corresponding to the sampling schedule and we have f = F s. (1)Note that Eq. (1) with known f and unknown s and k < n , has infinitely many solutions. Thefundamental insight of the CS theory [1] specifies the NMR spectrum s as the unique solution ofthe convex optimization problem s = arg min x ∈ C n (cid:0) (cid:107) F x − f (cid:107) + λ (cid:107) x (cid:107) (cid:1) . (2)The first term in this sum promotes the consistency of tested x with the measured data whereas,the second term promotes the sparseness of x and λ > The Eq. (2) can be modified to take better account of the measurement noise. Wecan consider two cases: • Unstructured noise: the covariance noise matrix Σ is diagonal and isotropic, Σ = σ where σ is the standard deviation of the noise. In order to take into an account the level of noisein the reconstruction framework (2) one sets λ proportional to σ .6 Structured noise: the covariance noise matrix Σ is not necessarily diagonal and isotropic.The (cid:96) norm used in the data consistency term should be replaced by its weighted version (cid:107) F x − f (cid:107) Q , where Q = Σ − and for a given complex positive definite matrix G and a complexvector y we define (cid:107) y (cid:107) G = y † Gy where y † is the conjugated transpose of y . In order to justifythe proposed modification of the data consistency term let us discuss an instructive example.For that matter consider two independent measurements f = f ( t ) , f = f ( t ) of the signalin which the noise level of f ( t ) is two times smaller then that of f ( t ). The correspondingcovariance matrix of Σ is of the form Σ = (cid:20) σ
00 4 σ (cid:21) where σ is the noise level entering f ( t ). Let us denote f = F x the signal correspondingto the spectrum x . The consistency term (cid:107) F x − f (cid:107) = | f − f | + | f − f | entering the standard formulation of the CS-problem is replaced by (cid:107) F x − f (cid:107) Q = ( f † − f † )Σ − ( f − f ) = 1 σ | f − f | + 14 σ | f − f | Our modification introduces the weights in the data consistency term that correctly reflectthe noise level of the corresponding measurements. Points with larger noise enter the sumwith smaller weights. The above discussion and conclusion easily generalizes to larger numberof independent measurements. In case of the non-zero correlations between the noise of themeasurements, Σ must be first diagonalized in an appropriate orthonormal basis and theabove justification can be then repeated.Let us note that for the unstructured noise (Σ = σ ) we have (cid:107) F x − f (cid:107) Q = σ − (cid:107) F x − f (cid:107) (3)and thus the cases of structured and unstructured noise are consistent:arg min x ∈ C n (cid:0) (cid:107) F x − f (cid:107) Q + λ (cid:107) x (cid:107) (cid:1) = arg min x ∈ C n (cid:0) σ − (cid:107) F x − f (cid:107) + λ (cid:107) x (cid:107) (cid:1) = arg min x ∈ C n (cid:0) (cid:107) F x − f (cid:107) + σ λ (cid:107) x (cid:107) (cid:1) . Let us also note that in each case we can absorb λ into Σ by the possible scaling Σ (cid:32) λ Σ.Now, we will describe algorithm known as Iteratively Reweighted Least Squares (IRLS) [2, 3, 4]dedicated for the solution of s = arg min x ∈ C n (cid:0) (cid:107) F x − f (cid:107) Q + (cid:107) x (cid:107) (cid:1) . (4)Note first, that the (cid:96) norm (cid:107) x (cid:107) = (cid:80) ni =1 | x i | can be well approximated by the weighted (cid:96) -norm (cid:80) ni =1 | w i x i | , where the weights w i = | x i | − / are regularized for very small x i ’s. This seeminglytrivial observation is the starting point of IRLS[5], which is an iterative procedure that solves thequadratic problem s l = arg min x ∈ C n (cid:107) F x − f (cid:107) Q + (cid:107) w l x (cid:107) (5)where l is the iteration loop number and w l is the weight vector corresponding to s l − as de-scribed above. Thus defining D l = diag( d , . . . d n ) where d j = | s l − ,j | + ε (we write s l − ,j for the j th component of the vector s l − ), Eq.(5) can be written in the form of generalized Tikhonovregularization s l = arg min x ∈ C n (cid:107) F x − f (cid:107) Q + (cid:107) x (cid:107) D l (6)and the latter can be solved explicitly s l = ( F ∗ QF + D l ) − F ∗ Qf -coupling in IRLS The J-modulation of the NMR signal f is represented in vector language by a complex vector M ∈ C k . For example M corresponding to the J -coupling considered in the main text (i.e.approximately the same for all components) is of the form M ( t j ) = cos( πJt j ). The unmodulatedversion ˜ f of f is defined by the equality f = M ˜ f , or more precisely f ( t i ) = M ( t i ) ˜ f ( t i ) where i ∈ { , . . . , k } . Let C ∈ M k × k ( C ) denote the modulation (diagonal) matrix C = diag( M ( t ) , . . . , M ( t k ))The relation between ˜ f , f and the spectrum ˜ s corresponding to unmodulated signal ˜ f is of theform f = C ˜ f = CF ˜ s. Note, that while the noise in the measured signal f is unstructured, the noise of ˜ f is structured(since ˜ f = C − f ).Denoting the spectrum related to modulated signal f by s we have f = F s . Remarkably,the spectrum ˜ s is sparser than s for typical modulations encountered in NMR. For example inthe simplest case of one dimensional J-modulated signal, singlets in ˜ s are doubled in s , i.e. thenumber ˜ m of significant elements of the spectrum ˜ s is approximately half of the number m of thesignificant component of s . The standard estimation for the number of measurements k requiredfor the exact CS reconstruction is[6] k ∼ m log( n ) (7) This estimation and the preceding remark show that that number of measurements ˜ k required for the recovery of ˜ s gets divided by the factor 2 when compared with k required for the recovery of s . The above observations motivate the development of a CSmethodology dedicated to the signal in the presence of modulations, which we formulate in whatfollows.Assuming that the J-modulated (measured) signal f is corrupted by the unstructured noisewith covariance matrix Σ, the de-modulated signal ˜ f = C − f is corrupted by the structured noisewith the covariance matrix ˜Σ = C − Σ C †− . In particular the weighting matrix ˜ Q = ˜Σ − is equalto C † QC and we get (cid:107) F x − ˜ f (cid:107) Q = ( F x − ˜ f ) † C † QC ( F x − f )= ( CF x − C ˜ f ) † Q ( CF x − C ˜ f )= ( CF x − f ) † Q ( CF x − f )= (cid:107) CF x − f (cid:107) Q . This computation shows that the solution of minimization problem˜ s = arg min x ∈ C n (cid:16) (cid:107) F x − ˜ f (cid:107) Q + (cid:107) x (cid:107) (cid:17) coincides with that corresponding to˜ s = arg min x ∈ C n (cid:0) (cid:107) CF x − f (cid:107) Q + (cid:107) x (cid:107) (cid:1) and the latter can be found by the IRLS in the iterative procedure˜ s l = arg min x ∈ C n (cid:0) (cid:107) CF x − f (cid:107) Q + (cid:107) x (cid:107) D l (cid:1) . (8)In the resulting spectrum the multiplets reflecting the modulation (e.g. doublets) will be replacedby singlets. Keeping in mind the fact that in the standard experimental setup the measurement8oise of f is described by Σ = σ , and substituting λ = σ as explained above, we observe thatEq.(8) boils down to the iterative procedure based on generalized Tikhonov regularization˜ s l = arg min x ∈ C n (cid:0) (cid:107) CF x − f (cid:107) + λ (cid:107) x (cid:107) D l (cid:1) . (9)Remarkably this is the IRLS algorithm in the orthodox form (c.f. (2)) applied to˜ s = arg min x ∈ C n (cid:0) (cid:107) CF x − f (cid:107) + λ (cid:107) x (cid:107) (cid:1) . (10)which in turn may be viewed as mathematical formulation of the problem of CS-type of finding asparse spectrum ˜ s whose consistency with the measurement vector f is given by the measurementmatrix CF : f = CF ˜ s .Summarizing, in the above considerations we explained the following two aspects: • we formulate the CS-problem suitable for the case of noisy measurements described by thestructured measurements noise; we also describe the solution of this problem and relate itwith the Tikhonov regularization; • we applied the modified CS-methodology to the J-modulated signals in NMR and explainthe advantages of our approach when compared to the orthodox version of CS.9 Protein samples and NMR spectroscopy
A [U- H, N, C] labeled sample of the monomeric photosensory module (57 kDa) Dr BphP
P SM from
Deinococcus radiodurans was produced exactly as described in our previous study[7, 8]. Themonomeric variant contains three mutations, which disrupt the dimer interface: F145S, L311E,and L314E[9].A 3D BEST-TROSY-HNCA experiment [10] was recorded for the Dr BphP
P SM sample during38.5 hours with 2930 relaxation-matched NUS points (assumed sampling density decay rate T =70 ms) in the C α dimension. The spectral widths (acquisition times) were 12.8 kHz (80 ms), 2.9kHz (22 ms), and 6.0 kHz (42.4 ms), for H, N, and C spectral dimensions, respectively. Forthe processing, the original NUS data set was sub-sampled to create the following data sets: • The ”traditional” low resolution spectrum processed using IRLS algorithm without the vir-tual decoupling, 1170 NUS points (15.4 hours of measurement time) were retained from theoriginal NUS data with maximal evolution time of 14 ms for the C α dimension (FigureS1a)). • The high resolution spectra processed using IRLS (undecoupled) and the region-selectiveD-IRLS algorithm (decoupled). The original NUS data was sub-sampled down to 1200 NUSpoints (15.8 hours of measurement time). The resulting sampling probability distribution forthis spectrum corresponded to the sampling in the C dimension matched to the both relax-ation and J-coupling (T = 70 ms, J=35 Hz), i.e. proportional to exp( − t/ T ) | cos( πt/J ) | ,with additional elimination of the points whose values of | cos( πt/J ) | were less than 0.2(Figure S1b)).A [U- H, N, C] labeled sample of the longest human Tau protein isoform hTau40 with 441residues was prepared as following: full length hTau40 with an amino-terminal His -SUMO-Tag(in a modified pET28b plasmid, Genescript) was expressed in E. coli
BL21( λ DE3) Star ™ (Novagen)cells. [U- H, N, C] isotope (Merck) enriched protein was produced using 2xM9 minimal medium[11]supplemented with NH Cl and D-( H/ C)- glucose as the sole nitrogen and carbon sources, re-spectively, in D O. The cells were grown at 37 ° C until an OD ≈ ° C. Cells wereharvested by centrifugation and subsequently resuspended in lysis buffer (20mM NaPi, 500mMNaCl, pH 7.8), and lysed by an Emulsiflex C3 (Avestin) homogenizer. Cleared lysate was purifiedwith HisTrap HP column (GE Healthcare). Fractions containing hTau40 were pooled and dialyzedover-night, against human SenP1 cleavage buffer (20 mM TrisHCl, 150 mM NaCl, 1 mM DTT,pH 7.8). After dialysis SenP1 protease (Addgene µ M, flash frozen in liquid nitrogen, and stored at -80 ° C till usage.A 3D BEST-TROSY-HN C α experiment [10] for Tau40 was recorded and processed similarto the Dr BphP
P SM spectrum described above. Namely, it was acquired during 13 hours with1552 NUS points using spectral width (acquisition times) of 9.6 kHz (106 ms), 2.9 kHz (22 ms),and 6.0 kHz (42.4 ms), for H, N, and C spectral dimensions, respectively. For the processing,the originally random NUS data set was sub-sampled to create the following data sets: • The ”traditional” low resolution spectrum was processed using IRLS algorithm without thevirtual decoupling, 555 NUS points (4.6 hours of measurement time) were retained from theoriginal NUS data with a maximal evolution time of 14 ms for the C α dimension (FigureS1c)). • The high resolution spectra processed using IRLS (undecoupled) and the region-selectiveD-IRLS algorithm (decoupled). The original NUS data was sub-sampled down to 1200 NUSpoints (10 hours of measurement time). The resulting sampling probability distribution for10igure S1: Sampling schedules used in the original experiments (blue) and sub-sampled data(red). The original schedules include 2930 points for Dr BphP
P SM and 1552 for Tau and followthe relaxation-matched NUS scheme with T = 70 ms for Tau (c) and d) and T = 200 ms for Dr BphP
P SM (c) and d). Schedules used for calculation of the ”low-resolution” spectra, shownin panels a) and c) for Dr BphP
P SM and Tau, respectively, were created by truncating the Cdimension to 14 ms which resulted in 1170 (a) and 555 (c) points. The ”J-modulated” schedulesshown in panels b) and d), for Dr BphP
P SM and Tau, respectively, were created by selecting 1200points that matched the | cos( πt/J ) | envelope in the C dimension (with elimination of points forwhich | cos( πt/J ) | < . C α dimension matched to the both re-laxation and J-coupling (T = 200 ms, J=35 Hz), i.e. proportional to exp( − t/ T ) cos( πt/J ),with additional elimination of the points whose values of | cos( πt/J ) | were less than 0.2 (Fig-ure S1d)). 12 ow-resolution Low-resolution a) b) D-IRLS D-IRLS N [ppm] N [ppm]
Figure S2: N/ C projections of 3D HNCA spectra of a) Tau and b) Dr BphP
P SM proteins. Theplots show superimposed traditional ”low-resolution” and high-resolution deconvoluted spectrawith 14 ms and 42.4 ms of maximum evolution time in the C dimension, respectively.
Synthetic peaks were injected into the spectrum of Tau protein in order to estimate accuracyof the spectra reconstructions. Comparison was performed between the reconstruction obtainedusing two types of NUS sampling schedules: • matched to both C α transverse relaxation and J-modulation • matched to the C α relaxation onlyand four calculations modes: • traditional low resolution spectrum with C α maximum evolution time of 14 ms recon-structed using IRLS algorithm without deconvolution of the J-coupling • high resolution spectrum with C α maximum evolution time of 42 ms - IRLS reconstructionwithout the deconvolution • high resolution spectrum with C α maximum evolution time of 42 ms - IRLS reconstructionwith the deconvolution (D-IRLS) for the whole spectrum • region-selective processing with IRLS for the Gly region ( <
45 ppm in C α ) and D-IRLS forthe rest of the spectrum (as in Figure 1 in the main text).Each version of the reconstruction was calculated 15 times using selected with a correspondingrandom distribution (sub-sampled) NUS data sets from the larger pool of measured data (total1552 flat-random NUS). In each calculation 20 peaks, including 5 Gly-type peaks, were injectedwith random positions (without overlap between each other and the existing Tau protein peaks)and intensities varying in the range 0.05-1.0 of a typical medium strong peak in the original Tauspectrum. The non-Gly signals were injected as the doublets with random J-coupling value in the13ange 35 ± number of sampling points a)c) b)d) c h e m i c a l s h i f t d e v i a t i o n , ppb c h e m i c a l s h i f t d e v i a t i o n , ppb nu m b e r o f p e a k s R Figure S3: The results of simulations with synthetic peaks injected into the experimental spectrumof Tau protein. The columns correspond to: undecoupled IRLS (blue), D-IRLS with (red) andwithout (green) Gly-region extraction, all with J-modulated sampling scheme, and D-IRLS withGly-region extraction with non-modulated sampling (purple). The plots show: a) R coefficientsof peak intensity reconstruction, b) number of detected weak peaks out of 42 lowest intensityinjected peaks, c) deviation of peak position in N dimension, d) deviation of peak position in C α dimension. Numbers at the tops of the columns indicate their exact heights.14 eferences [1] Simon Foucart and Holger Rauhut. A Mathematical Introduction to Compressive Sensing .Wiley, 2010.[2] Rick Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing.In
ICASSP , pages 3869–3872. IEEE, 2008.[3] Krzysztof Kazimierczuk and Vladislav Yu. Orekhov. Accelerated NMR spectroscopy by usingcompressed sensing.
Angewandte Chemie - International Edition , 50(24):5556–5559, 2011.[4] Krzysztof Kazimierczuk and Vladislav Yu. Orekhov. The comparison of convex and non-convex compressed sensing applied in multidimensional NMR.
Journal of Magnetic Reso-nance , 223(0):1–10, 2012.[5] Emmanuel J. Cand`es, Michael B. Wakin, and Stephen P. Boyd. Enhancing sparsity byreweighted L1 minimization.
Journal of Fourier Analysis and Applications , 14(5-6):877–905,2008.[6] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal recon-struction from highly incomplete frequency information.
IEEE Transactions on InformationTheory , 52(2):489–509, Feb 2006.[7] Emil Gustavsson, Linn´ea Isaksson, Cecilia Persson, Maxim Mayzel, Ulrika Brath, Lidija Vrho-vac, Janne A. Ihalainen, B. G¨oran Karlsson, Vladislav Orekhov, and Sebastian Westenhoff.Modulation of Structural Heterogeneity Controls Phytochrome Photoswitching.
BiophysicalJournal , 118(2):415–421, 2020.[8] bmrb entry 10.13018/bmr27783.[9] H. Takala, A. Bj¨orling, M. Linna, S. Westenhoff, and J.A. Ihalainen. Light-induced Changesin the Dimerization Interface of Bacteriophytochrome.
The Journal of Biological Chemistry ,290(26):16383–16392, 2015.[10] Zsofia Solyom, Melanie Schwarten, Leonhard Geist, Robert Konrat, Dieter Willbold, andBernhard Brutscher. BEST-TROSY experiments for time-efficient sequential resonance as-signment of large disordered proteins.
Journal of Biomolecular NMR , 55(4):311–321, 2013.[11] Stephan B Azatian, Navneet Kaur, and Michael P Latham. Increasing the buffering capacityof minimal media leads to higher protein yield.
Journal of Biomolecular NMR , 73(1-2):11–17,2019.[12] Jowita Mikolajczyk, Marcin Drag, Mikl´os B´ek´es, John T Cao, Ze’ev Ronai, and Guy SSalvesen. Small ubiquitin-related modifier (sumo)-specific proteases profiling the specifici-ties and activities of human senps.
Journal of Biological Chemistry , 282(36):26217–26224,2007.[13] F. Delaglio, S. Grzesiek, G. W. Vuister, G. Zhu, J. Pfeifer, and A. Bax. NMRPipe: Amultidimensional spectral processing system based on UNIX pipes.