In organisms, proteins are not only the basic building blocks of life, but also the catalyst for various biochemical reactions.With the advancement of science and technology, researchers have gradually gained a deeper understanding of the evolutionary process of proteins and uncovered the mysteries of its unique catalytic capabilities.What is the source of these abilities related to the structure of proteins?
The protein superfamily is the largest protein cluster inferred based on common ancestors.This concept is not only based on sequence similarity, but also on structure and mechanism similarity.
The identification of protein superfamily is usually done using a variety of methods.The most common method is to infer homology by sequence similarity.Although similarity of sequences is considered a good indicator of inferring correlation, this is not the only way.
Sequence similarity is one of the oldest and most commonly used methods.Since amino acid sequences are generally more conservative than DNA sequences, conserved sequence regions are in many cases related to function, especially at catalytic and binding sites.
While sequence similarity can provide clues about homology, detectable sequence similarity may no longer be displayed between proteins over the long-term evolution.
Compared with sequences, protein structures are more conservative during evolution.Even if the amino acid sequence changes significantly, the secondary structural elements and tertiary structural regions of the protein may still be retained.Through the structural alignment program, scientists can find proteins with similar folds, even if their sequences show significant differences.
In the same superfamily, the catalytic mechanism of enzymes is generally retained.Although substrate specificity may differ significantly, the structure and sequence order between catalytic residues will often show similarity.
For example, although catalytic triplet residues in the PA family have evolved to be divergent, their catalytic mechanisms are similar.
Study on protein superfamily represents the limits of our ability to identify common ancestors.Many members of the superfamily appear in the kingdoms of all living things, indicating that their common ancestors exist in the last common ancestor of all life (LUCA).
Most proteins have multiple domains, and according to research, about 66-80% of eukaryotic proteins and 40-60% of prokaryotic proteins have multiple domains.The combinations between these domains often follow a conservative N-terminal to C-terminal structural sort.This implies that during evolution, there is a relatively limited number of naturally occurring domain combinations, but these combinations can perform multiple functions.
For example, members of the alpha/β hydrolase superfamily have alpha/β sheets and are associated with the residue order of the catalytic triplets, which perform a variety of different catalytic reactions.
In different superfamilies, there are many eye-catching examples, such as: the immunoglobulin superfamily, whose structure is sofa-like, involves important cognitive and adhesion processes.For example, members of the Ras superfamily share a common catalytic G domain, indicating that they have similar biological functions.
In order to support the research of protein superfamily, the scientific community has established multiple databases, such as Pfam, PROSITE, etc., which help researchers better understand the structure and function of proteins.In addition, structural alignment algorithms such as DALI are also used to search for homology of protein structures.
Ultimately, the diversity of proteins and the evolution of their catalytic capabilities meet the needs of organisms in the face of different environmental challenges.So, as our understanding of protein superfamily deepens in the future, will we discover new catalytic mechanisms and functions?