TorchProteinLibrary: A computationally efficient, differentiable representation of protein structure
TTorchProteinLibrary: A computationally efficient,differentiable representation of protein structure
Georgy Derevyanko ∗ Department of Chemistry and BiochemistryConcordia UniversityMontreal, QC [email protected]
Guillaume Lamoureux † Department of ChemistryCenter for Computational and Integrative BiologyRutgers University – CamdenCamden, NJ [email protected]
Abstract
Predicting the structure of a protein from its sequence is a cornerstone task ofmolecular biology. Established methods in the field, such as homology modelingand fragment assembly, appeared to have reached their limit. However, this yearsaw the emergence of promising new approaches: end-to-end protein structure anddynamics models, as well as reinforcement learning applied to protein folding. Forthese approaches to be investigated on a larger scale, an efficient implementation oftheir key computational primitives is required. In this paper we present a library ofdifferentiable mappings from two standard dihedral-angle representations of proteinstructure (full-atom representation “ φ , ψ , ω , χ ” and backbone-only representation“ φ , ψ , ω ”) to atomic Cartesian coordinates. The source code and documentation canbe found at https://github.com/lupoglaz/TorchProteinLibrary . To understand the molecular details of how any living system functions, one inevitably has to knowthe three-dimensional structure of a large number of proteins. Despite the recent progresses in cryo-electron microscopy (Fernandez-Leiro & Scheres, 2016), experimental approaches to this problemare unlikely to scale up, and there is a dire need for innovation in computational methods.State-of-the-art methods that attempt to solve the protein folding problem (Dill & MacCallum, 2012)usually rely on complex workflows that consist of multiple loosely interconnected operations (Yang et al. , 2015; Raman et al. , 2009). For the majority of these methods, a first step consists in generatinga “rough” protein structure, using either homology modelling or some fragment-based assemblyapproach. The second step consists in refining this structure using one of many optimizationtechniques. The parameters of these two steps are usually tuned separately.However, new computational approaches to protein folding have recently emerged, which make useof end-to-end learning. The work of AlQuraishi (AlQuraishi, 2018) attempts to learn the positions ∗ https://github.com/lupoglaz † On leave from Department of Chemistry and Biochemistry, Concordia University, Montreal, QC32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada. a r X i v : . [ c s . L G ] N ov igure 1: Example of parameterization of threonine in terms of dihedral angles φ , ψ , ω , and χ . In thecurrent implementation ω is fixed to π , which makes R constant. Blue boxes are aligned to localcoordinate systems, in which initial coordinates of rigid groups are defined. R (cid:48) is an out-of-planetransform that does not contain differentiable parameters.of protein backbone atoms using a BiLSTM that predicts distributions of dihedral angles and,using a differentiable transformation, that converts these internal coordinates into atomic Cartesiancoordinates. Other work under review (Anonymous, 2019) tries to learn the force field parametersused by a simulation of the protein folding process, using the same differentiable conversion betweeninternal coordinates and atomic coordinates.We anticipate that end-to-end models will become extremely important for structural biology, due tothe growing amounts of data for both protein sequences and protein structures, and the necessity ofrelating the two. Since the transformation between internal coordinates and atomic positions is theonly known unambiguous differentiable mapping between protein sequence and protein structure, afast implementation of this transformation is the key building block for any “sequence-to-structure”,end-to-end model.In this work we present T ORCH P ROTEIN L IBRARY , a library that implements the conversion betweeninternal protein coordinates and atomic positions for “full-atom” and “backbone-only” modelsof protein structure. It also contains an implementation of the least root-mean-square deviation(LRMSD), a measure of distance in the space of protein structures that respects translational androtational invariance.
The “full-atom” representation of protein structure specifies the positions of all non-hydrogen atoms.(Following the usual convention, hydrogen atoms are omitted from the representation because theirpositions are easy to infer from the rest of the molecular structure.) The layer computes the Cartesiancoordinates by building a graph of transforms, acting on standard coordinates of rigid groups ofatoms. In that standard reference frame, the first atom of the rigid group is at the origin and anyadditional atom is at a position consistent with the stereochemistry of the group. The smallest rigidgroup consists of a single atom at the origin. Each amino acid conformation is described by a listof up to 7 dihedral angles. For example, Figure 1 shows a schematic representation of amino acidthreonine, which has 5 rigid groups in total and is parameterized by 4 transforms.2ach transform R i is parameterized by a dihedral angle α i and has the form R i ( α i ) = R ( α i , θ i , d i ) = R y ( θ i ) T x ( d i ) R x ( α i ) where R x ( α i ) and R y ( θ i ) are the × rotation matrices about axes x and y , respectively, and T x ( d i ) is the × translation matrix along axis x . θ i and d i are fixed parameters for which we donot compute derivatives. They depend only on the type of the amino acid and its stereochemicalproperties. For instance, the first transform of the threonine parameterization of Figure 1 can bewritten as R ( φ ) = R ( φ, θ , d ) where φ is the first dihedral angle (variable), θ is the C-N-CA angle (fixed), and d is the N-CAbond length (also fixed).We compute the position of every atom in a rigid group by transforming its position in the originalreference frame with the appropriate matrix. In short, to get the cumulative transform of a node M i ,we take the cumulative transform M parent( i ) and multiply it by the transform of the current node R i .In the threonine example, the cumulative transforms are as follows: M = M R ( φ ) M = M R ( φ ) R (cid:48) R ( χ ) M = M R ( φ ) R ( ψ ) M = M R ( φ ) R ( ψ ) R ( ω ) M represents the cumulative transform leading to the threonine residue considered, due to allresidues ahead in the sequence. For L-amino acids, R (cid:48) corresponds to a counterclockwise rotation of . ◦ about the x axis, needed to properly orient the side chain. The atomic coordinates “ r ” areobtained by transforming the standard coordinates “ r ◦ ” of each rigid group i by its correspondingcumulative transform M i . For instance, the atomic positions of the threonine residue can be writtenas follows: r CA = M r ◦ CA = M CB = M r ◦ CB = M OG1 = M r ◦ OG1 r CG2 = M r ◦ CG2 r C = M r ◦ C = M O = M r ◦ O r N = M r ◦ N = M where r ◦ t = ( x ◦ t , y ◦ t , z ◦ t , T is the 4-component vector representing the position of atom of type t inthe standard reference frame and = (0 , , , T is the 4-component vector representing the origin.We describe the algorithm of computing the gradients of atomic positions with respect to any dihedralangle α i by considering the transformation graph from Figure 2, corresponding to threonine. Thegraph has four nodes that contain the cumulative transformation matrices M to M . To simplify thenotation, we assume that each rigid group contains a single atom at position r i . The correspondingatoms in threonine would be r = r CA , r = r CB , r = r C , and r = r N . Supposing we have afunction L that depends only on the coordinates of these four atoms, we can write its derivative withrespect to dihedral angle φ as: ∂L∂φ = (cid:88) k =1 ∂L∂ r k · ∂ r k ∂φ where “ · ” denotes the scalar product of two vectors. The derivative of the first position with respectto φ can be written as: ∂ r ∂φ = M ∂R ∂φ r ◦ where r ◦ represents the position of the atom in the standard reference frame. The other threederivatives can be written as: ∂ r ∂φ = M ∂R ∂φ R (cid:48) R r ◦ , ∂ r ∂φ = M ∂R ∂φ R r ◦ and ∂ r ∂φ = M ∂R ∂φ R R r ◦ M i , computed during the forward pass. Rigid groups coordinates associated with eachnode are denoted as r i .We can write those expressions using r , r , r and r , the atomic coordinates calculated during theforward pass: ∂ r ∂φ = M ∂R ∂φ M − M r ◦ = M ∂R ∂φ M − r ∂ r ∂φ = M ∂R ∂φ M − M R (cid:48) R r ◦ = M ∂R ∂φ M − M r ◦ = M ∂R ∂φ M − r ∂ r ∂φ = M ∂R ∂φ M − M R r ◦ = M ∂R ∂φ M − M r ◦ = M ∂R ∂φ M − r ∂ r ∂φ = M ∂R ∂φ M − M R R r ◦ = M ∂R ∂φ M − M r ◦ = M ∂R ∂φ M − r The expression for the derivative becomes: ∂L∂φ = (cid:88) k =1 ∂L∂ r k · M ∂R ∂φ M − r k In this formula index k iterates over the children of node 0, and matrix M ∂R ∂φ M − can be computedefficiently. This expression can be generalized to any graph without loops. For L a function of allatomic positions, the derivative with respect to any dihedral angle θ i of the node i is: ∂L∂θ i = (cid:88) k ∈ children( i ) ∂L∂ r k · M i ∂R i ∂θ i M − i +1 r k (1)This sum is computed during a backward depth-first propagation through the graph. Matrices F i = M i ∂R i ∂θ i M − i +1 , however, are computed during the forward pass. The library presented in thiswork implements the forward and backward passes on CPU. Another widely used protein representation is the “backbone” model, shown on Figure 3, for whichwe compute only three atomic positions per residue (for CA, C, and N atoms). The backbone O atomsare omitted but their positions can be easily inferred from the positions of the other three atoms. Inthis reduced representation, the amino acid side chains are ignored.The backbone model, unlike the full-atom model, can be efficiently implemented on GPU. The keyto efficient parallel implementation is that ∂ r i /∂θ j , the derivatives of the coordinates with respectto parameters of the model, can be computed independently of one another. Here we describe the4igure 3: Illustration of parameterization of one amino acid number j in the backbone model.detailed computation of the coordinates and write down the derivatives in terms of quantities savedduring the forward pass.The position of the i -th atom in the chain is: r i = R R · · · R i where = (0 , , , T and R i are transformation matrices parameterized by dihedral angle α : R ( α, θ, d ) = cos( θ ) sin( α ) sin( θ ) cos( α ) sin( θ ) d cos( θ )0 cos( α ) − sin( α ) 0 − sin( θ ) sin( α ) cos( θ ) cos( α ) cos( θ ) − d sin( θ )0 0 0 1 The sequence of transformations is defined for amino acid sequence indexed with j ∈ [0 , L ) , where L is the length of the sequence. If j < , then the transformation is the identity matrix, otherwise: • C-N peptide bond of residue j − :Atomic index: i = 3 j Transformation: R i = R ( ω j , π − . , . • N-CA bond of residue j :Atomic index: i = 3 j + 1 Transformation: R i = R ( φ j , π − . , . • CA-C bond of residue j :Atomic index: i = 3 j + 2 Transformation: R i = R ( ψ j , π − . , . In the current implementation the angle ω j − is fixed to π , corresponding to the trans conformationof the peptide bond. During the forward pass, we save the cumulative transformation matrices foreach atom: M i = R R · · · R i Notice that atom j is always N, atom j + 1 is always CA, and atom j + 2 is always C. Thustransformation matrices R j +1 and R j +2 depend on the angles φ j and ψ j , respectively. During thebackward pass, we first compute the gradient of r i with respect to each “ φ ” and “ ψ ” angle: ∂ r i ∂φ j = R R · · · ∂R j +1 ∂φ j · · · R i and ∂ r i ∂ψ j = R R · · · ∂R j +2 ∂ψ j · · · R i We can rewrite these expressions using the matrices M , saved during the forward pass: ∂ r i ∂φ j = M j ∂R j +1 ∂φ j M − j +1 M i and ∂ r i ∂ψ j = M j +1 ∂R j +2 ∂ψ j M − j +2 M i The inverses of matrices M k have simple forms and can be computed on the fly during the backwardpass. This allows to compute all derivatives simultaneously on GPU. To compute the derivativesof function L , which depends on the derivatives of the atomic coordinates with respect to the inputangles, we have to calculate the following sums: (cid:88) i ∂L∂ r i · ∂ r i ∂φ j and (cid:88) i ∂L∂ r i · ∂ r i ∂ψ j (2)which can be efficiently computed on GPU. 5 Loss function
The training of any protein structure prediction model requires some measure of how close twostructures of the same sequence are. While there are multiple variations of such a measure, the mostcommon one is the least root-mean-square deviation (LRMSD) of atomic Cartesian coordinates:
LRMSD = min M (cid:118)(cid:117)(cid:117)(cid:116) N atoms (cid:88) i | x i − M y i | N atoms Here, x i and y i are the atomic positions of the “target” and “input” structure, respectively, and theroot-mean-square deviation is minimized over all possible rigid transformation matrices M . Thismeasure is invariant with respect to rotations and translations of the two structures being compared.T ORCH P ROTEIN L IBRARY contains an implementation of the algorithm of Coutsias, Seok andDill (Coutsias et al. , 2004). This algorithm computes LRMSD and its derivative with respect to thecoordinates of one of the structures without explicit minimization. We briefly outline the key steps ofthe algorithm but a detailed derivation can be found in Ref. (Coutsias et al. , 2004).We first move both target and input structures (positions x i and y i ) so that their barycenters are at theorigin, then we compute the correlation matrix R : R = N atoms (cid:88) i x i y T i Using this × matrix we compute the following × matrix T : T = R + R + R R − R R − R R − R R − R R − R − R R + R R + R R − R R + R − R + R − R R + R R − R R + R R + R − R − R + R We then compute λ , the maximum eigenvalue of matrix T , and its associated eigenvector q . Thiseigenvector corresponds to the quaternion that gives the optimal rotation of one structure with respectto the other. The rotation matrix can be computed as follows: U = q + q − q − q q q − q q ) 2( q q + q q )2( q q + q q ) q − q + q − q q q − q q )2( q q − q q ) 2( q q + q q ) q − q − q + q The LRMSD is computed using the formula:
LRMSD = (cid:115) (cid:80) N atoms i ( | x i | + | y i | ) − λN atoms The derivative of LRMSD with respect to the input coordinates is computed using the formula: ∂ LRMSD ∂ x i = x i − U T y i This expression, combined with Eqs. 1 or 2, allows any sequence-based model predicting internalcoordinates of proteins to be directly trained on known protein structures, using LRMSD as a lossfunction.
Here we give estimates of the run times of the modules described above and a simple baseline forcomparison. We perform the measurements using a Titan X Maxwell GPU with Intel Core i7-5930KCPU machine and PyTorch version 0.4.1 CUDA 9.2 build.Figure 4 shows the scaling of computation time of the forward and backward passes for the full-atommodel. We see that the computational complexity of the backward pass is O ( L ) , where L is thesequence length. The reason for this quadratic scaling is that we compute Eq. 1 using depth-first graph6igure 4: Scaling of computation time of forward and backward passes for the full-atom model. Thebatch size was set to 32 and amino acid sequences were generated at random for each measurement.We performed 10 measurements per data point and plotted the 95% confidence interval.Figure 5: Scaling of computation time for forward and backward passes for the backbone model.The batch size was set to 32. We performed 10 measurements per data point and plotted the 95%confidence interval.traversal. In principle, this layer can be further optimized by unfolding the graph and computingthe gradients simultaneously on GPU. (We are planning to incorporate this optimization in the nextversion of the library.)Figure 5 shows the computation time scaling for the backbone protein model with the growth ofthe sequence length. While the computational complexity for the backward pass still scales like O ( L ) , the scaling coefficient is smaller than for the full-atom model. Here, the presence of quadraticscaling can be attributed to GPU hardware limitations. However, we expect this scaling coefficient todecrease with an increase in the number of CUDA cores, due to increasingly efficient parallelization.Another contributing factor to this scaling behavior is the computation of the sums in Eq. 2, whichcurrently does not rely on the “reduce” algorithm for parallelization.Finally, the LRMSD layer computation time is shown on Figure 6. The forward pass run time exhibitsthe expected linear behavior.To have a meaningful reference timescale for the computational times of the layers implemented inthe library, we measured forward and backward computation times of an LSTM model (Hochreiter &7igure 6: Scaling of computation time for forward and backward passes of LRMSD. The batch sizewas set to 32. We performed 10 measurements per data point and plotted the 95% confidence interval.Figure 7: Scaling of computation time for forward and backward passes of a one-layer LSTM onGPU. The batch size was set to 32, the number of input features was 128, and the number of hiddenunits was set to 256. We performed 10 measurements per data point and plotted the 95% confidenceinterval.Schmidhuber, 1997) on GPU, as implemented in PyTorch (Paszke et al. , 2017). The batch size of theinput is 32 and the number of input features is 128. The LSTM has 256 hidden units and one layer.Figure 7 shows the scaling of the forward and backward passes of this model with the growth of theinput length. We see that the LSTM and backbone models have comparable computation times.Another important concern regarding the backbone model is the numerical stability of chained matrixmultiplications using single-precision arithmetic during the forward pass. To estimate the error of thecomputation we compared the output from a forward pass of the backbone protein model to that ofthe full-atom model, implemented using double-precision arithmetic on CPU. We generate randominput angles for the backbone of a protein and pass them through both full-atom and backbone layers,then compute the distances between equivalent backbone atoms in both models. Figure 8 shows thatthe resulting error is negligible for proteins sequences of any (realistic) length.8igure 8: Scaling of computation error for forward pass for the backbone model. Atomic indexcorresponds to the atom numbers in backbone protein model, for example index 2100 corresponds tothe CA in residue number 700. We performed 10 measurements per data point and plotted the 95%confidence interval. In this paper we have described two differentiable representations that allow the mapping of proteinsequences to protein atomic coordinates. These representations enable the development of end-to-enddifferentiable models and of model-based reinforcement learning (Berkenkamp et al. , 2017). Webelieve these new approaches hold great promise for protein folding and protein modelling in general.It should be mentioned that the scope of T
ORCH P ROTEIN L IBRARY is not limited to these layersonly. To gain access to the chemical properties of protein molecules, functions mapping atomiccoordinates to a scalar value have to be defined. For that purpose, we have also implemented arepresentation of the atomic coordinates as a density map on a three-dimensional (3D) grid. Thisrepresentation can then be transformed into a scalar using 3D convolutional networks. This particular3D representation allows to circumvent some common problems associated with pairwise potentials.Currently available protein structure datasets are often insufficient to extract meaningful pairwisepotentials for all combinations of atom types.Another research direction for which this library is expected to be useful is the prediction ofprotein-protein interactions (PPIs). We have implemented differentiable volume convolutions usingcuFFT (NVIDIA, 2017). These operations are at the core of most algorithms for exhaustive rigidprotein-protein docking (Katchalski-Katzir et al. , 1992). By constructing a differentiable model thatmaps two sets of atomic coordinates to the distribution of relative rotations and translations of one setwith respect to the other, one can in principle learn to dock proteins directly from experimental data.
Acknowledgments
We thank Yoshua Bengio for hosting one of us (G.D.) at the Montreal Institute for Learning Algorithms(MILA) during some of the critical stages of the project, and for access to the MILA computationalresources. This work was supported by a grant from the Natural Sciences and Engineering ResearchCouncil of Canada to G.L. (RGPIN 355789). 9 eferences
AlQuraishi, Mohammed. 2018. End-to-End Differentiable Learning of Protein Structure. bioRxiv .( https://doi.org/10.1101/265231 ).Anonymous. 2019. Learning Protein Structure with a Differentiable Simulator. In: Proceedingsof The International Conference on Learning Representations (ICLR) . Under review for
Inter-national Conference on Learning Representations (ICLR) . ( https://openreview.net/forum?id=Byg3y3C9Km ).Berkenkamp, Felix, Turchetta, Matteo, Schoellig, Angela, & Krause, Andreas. 2017. Safe model-based reinforcement learning with stability guarantees.
Pages 908–918 of: Proceeding of NeuralInformation Processing Systems (NIPS) . ( https://arxiv.org/abs/1705.08551 ).Coutsias, Evangelos A, Seok, Chaok, & Dill, Ken A. 2004. Using quaternions to calculate RMSD.
Journal of computational chemistry , (15), 1849–1857.Dill, Ken A, & MacCallum, Justin L. 2012. The Protein-Folding Problem, 50 Years On. Science , (7620), 1042–1046.Fernandez-Leiro, Rafael, & Scheres, Sjors HW. 2016. Unravelling biological macromolecules withcryo-electron microscopy. Nature , (7620), 339.Hochreiter, Sepp, & Schmidhuber, Jürgen. 1997. Long short-term memory. Neural computation , (8), 1735–1780.Katchalski-Katzir, Ephraim, Shariv, Isaac, Eisenstein, Miriam, Friesem, Asher A, Aflalo, Claude,& Vakser, Ilya A. 1992. Molecular surface recognition: determination of geometric fit betweenproteins and their ligands by correlation techniques. Proceedings of the National Academy ofSciences , (6), 2195–2199.NVIDIA. 2017. cuFFT library . ( https://developer.nvidia.com/cufft ).Paszke, Adam, Gross, Sam, Chintala, Soumith, Chanan, Gregory, Yang, Edward, DeVito, Zachary,Lin, Zeming, Desmaison, Alban, Antiga, Luca, & Lerer, Adam. 2017. Automatic differentiation inPyTorch. In: Proceedings of Neural Information Processing Systems (NIPS) Autodiff Workshop .( https://openreview.net/forum?id=BJJsrmfCZ ).Raman, Srivatsan, Vernon, Robert, Thompson, James, Tyka, Michael, Sadreyev, Ruslan, Pei, Jimin,Kim, David, Kellogg, Elizabeth, DiMaio, Frank, Lange, Oliver, et al. Proteins: Structure, Function, and Bioinformatics , (S9), 89–99.Yang, Jianyi, Yan, Renxiang, Roy, Ambrish, Xu, Dong, Poisson, Jonathan, & Zhang, Yang. 2015.The I-TASSER suite: Protein structure and function prediction. Nature methods ,12