Today, with the increasing development of biology and biotechnology, scientists are increasingly aware of the importance of protein structure in revealing its function and evolutionary history. Although sequence similarity has traditionally been considered a primary method for inferring protein homology, indicators of structural similarity have shown greater reliability, especially when it comes to the classification of protein superfamilies.
Protein superfamilies are the largest grouping for inferring a common ancestor, and this relationship is often inferred through structural alignment, even if there is no obvious similarity in sequence.
The identification of protein superfamilies relies on a variety of methods. Through structural similarities, many evolutionarily similar protein members can be identified, and even if they are completely different in sequence, the functions and catalytic mechanisms of these proteins may still be conserved.
Structure is more evolutionarily conserved than sequence. This means that even proteins with highly similar structures that have undergone a long evolutionary process may have completely different amino acid sequences. This is of great interest in biology because many biological studies are based on sequence analysis to infer the function and origin of proteins.
Secondary structural elements and tertiary structural features of proteins are often highly conserved, and many catalytic mechanisms are also conserved across protein superfamilies, even though substrate specificity may vary widely.
It is estimated that about 66% to 80% of eukaryotic proteins have multiple domains, and these domains are often mixed together to form the so-called "domain architecture." This means that while simple sequence comparisons may not reveal how these proteins are related, structural comparisons reveal implicit connections between them.
Protein superfamilies represent the latest advances in scientists' understanding and research of common ancestors. These superfamilies represent the largest evolutionary grouping that can be identified based on direct evidence, with some members occurring in all kingdoms of life, suggesting that the ancestors of these superfamilies reside in the Last Common Ancestor (LUCA) of all life.
Gene duplication (parallelism) is more common among superfamily members than among proteins within the same species, making it more feasible to study the origin of genes through structural correlations.
Of course, although structural similarity provides many insights, in some cases structurally similar proteins do not show obvious sequence similarity, which further makes protein structure a higher priority than sequence. One floor of analytical tools.
With the advancement of protein structure research technology, the scientific community is gradually realizing the importance of structure in explaining protein function, evolution and interaction, which provides important supplementary information for genomics. Going forward, when studying the thousands of different proteins in the world, should we focus more on their structure than on their sequence?