Featured Researches

Other Quantitative Biology

CoHSI II; The average length of proteins, evolutionary pressure and eukaryotic fine structure

The CoHSI (Conservation of Hartley-Shannon Information) distribution is at the heart of a wide-class of discrete systems, defining (amongst other properties) the length distribution of their components. Discrete systems such as the known proteome, computer software and texts are all known to fit this distribution accurately. In a previous paper, we explored the properties of this distribution in detail. Here we will use these properties to show why the average length of components in general and proteins in particular is highly conserved, howsoever measured, demonstrating this on various aggregations of proteins taken from the UniProt database. We will go on to define departures from this equilibrium state, identifying fine structure in the average length of eukaryotic proteins that result from evolutionary processes.

Read more
Other Quantitative Biology

CoHSI III: Long proteins and implications for protein evolution

The length distribution of proteins measured in amino acids follows the CoHSI (Conservation of Hartley-Shannon Information) probability distribution. In previous papers we have verified various predictions of this using the Uniprot database but here we explore a novel predicted relationship between the longest proteins and evolutionary time. We demonstrate from both theory and experiment that the longest protein and the total number of proteins are intimately related by Information Theory and we give a simple formula for this. We stress that no evolutionary explanation is necessary; it is an intrinsic property of a CoHSI system. While the CoHSI distribution favors the appearance of proteins with fewer than 750 amino acids (characteristic of most functional proteins or their constituent domains) its intrinsic asymptotic power-law also favors the appearance of unusually long proteins; we predict that there are as yet undiscovered proteins longer than 45,000 amino acids. In so doing, we draw an analogy between the process of protein folding driven by favorable pathways (or funnels) through the energy landscape of protein conformations, and the preferential information pathways through which CoHSI exerts its constraints in discrete systems. Finally, we show that CoHSI predicts the recent appearance in evolutionary time of the longest proteins, specifically in eukaryotes because of their richer unique alphabet of amino acids, and by merging with independent phylogenetic data, we confirm a predicted consistent relationship between the longest proteins and documented and potential undocumented mass extinctions.

Read more
Other Quantitative Biology

CoHSI IV: Unifying Horizontal and Vertical Gene Transfer - is Mechanism Irrelevant ?

In previous papers we have described with strong experimental support, the organising role that CoHSI (Conservation of Hartley-Shannon Information) plays in determining important global properties of all known proteins, from defining the length distribution, to the natural emergence of very long proteins and their relationship to evolutionary time. Here we consider the insight that CoHSI might bring to a different problem, the distribution of identical proteins across species. Horizontal and Vertical Gene Transfer (HGT/VGT) both lead to the replication of protein sequences across species through a diversity of mechanisms some of which remain unknown. In contrast, CoHSI predicts from fundamental theory that such systems will demonstrate power law behavior independently of any mechanisms, and using the Uniprot database we show that the global pattern of protein re-use is emphatically linear on a log-log plot (adj. R 2 =0.99,p<2.2× 10 −16 over 4 decades); i.e. it is extremely close to the predicted power law. Specifically we show that over 6.9 million proteins in TrEMBL 18-02 are re-used, i.e. their sequence appears identically in between 2 and 9,812 species, with re-used proteins varying in length from 7 to as long as 14,596 amino acids. Using (DL+V) to denote the three domains of life plus viruses, 21,676 proteins are shared between two (DL+V); 22 between three (DL+V) and 5 are shared in all four (DL+V). Although the majority of protein re-use occurs between bacterial species those proteins most frequently re-used occur disproportionately in viruses, which play a fundamental role in this distribution. These results suggest that diverse mechanisms of gene transfer (including traditional inheritance) are irrelevant in determining the global distribution of protein re-use.

Read more
Other Quantitative Biology

CoHSI V: Identical multiple scale-independent systems within genomes and computer software

A mechanism-free and symbol-agnostic conservation principle, the Conservation of Hartley-Shannon Information (CoHSI) is predicted to constrain the structure of discrete systems regardless of their origin or function. Despite their distinct provenance, genomes and computer software share a simple structural property; they are linear symbol-based discrete systems, and thus they present an opportunity to test in a comparative context the predictions of CoHSI. Here, without any consideration of, or relevance to, their role in specifying function, we identify that 10 representative genomes (from microbes to human) and a large collection of software contain identically structured nested subsystems. In the case of base sequences in genomes, CoHSI predicts that if we split the genome into n-tuples (a 2-tuple is a pair of consecutive bases; a 3-tuple is a trio and so on), without regard for whether or not a region is coding, then each collection of n-tuples will constitute a homogeneous discrete system and will obey a power-law in frequency of occurrence of the n-tuples. We consider 1-, 2-, 3-, 4-, 5-, 6-, 7- and 8-tuples of ten species and demonstrate that the predicted power-law behavior is emphatically present, and furthermore as predicted, is insensitive to the start window for the tuple extraction i.e. the reading frame is irrelevant. We go on to provide a proof of Chargaff's second parity rule and on the basis of this proof, predict higher order tuple parity rules which we then identify in the genome data. CoHSI predicts precisely the same behavior in computer software. This prediction was tested and confirmed using 2-, 3- and 4-tuples of the hexadecimal representation of machine code in multiple computer programs, underlining the fundamental role played by CoHSI in defining the landscape in which discrete symbol-based systems must operate.

Read more
Other Quantitative Biology

Coherent and Noncoherent Photonic Communications in Biological Systems

The possible mechanisms of communications between distant bio-systems by means of optical and UV photons are studied. It is argued that their main production mechanism is owed to the biochemical reactions, occurring during the cell division.. In the proposed model the bio-systems perform such communications, radiating the photons in form of short periodic bursts, which were observed experimentally for fish and frog eggs1. For experimentally measured photon rates the communication algorithm is supposedly similar to the exchange of binary encoded data in computer net via optical channels

Read more
Other Quantitative Biology

Comment on Activation of Visual Pigments by Light and Heat (Science 332, 1307-312, 2011)

It is known that the Arrhenius equation, based on the Boltzmann distribution, can model only a part (e.g. half of the activation energy) for retinal discrete dark noise observed for vertebrate rod and cone pigments. Luo et al (Science, 332, 1307-312, 2011) presented a new approach to explain this discrepancy by showing that applying the Hinshelwood distribution instead the Boltzmann distribution in the Arrhenius equation solves the problem successfully. However, a careful reanalysis of the methodology and results shows that the approach of Luo et al is questionable and the results found do not solve the problem completely.

Read more
Other Quantitative Biology

Comparative Salt Tolerance Study of Some Acacia Species at Seed Germination Stage

Objective: The purpose of this study was to assess and compare the seed germination response of six Acacia species under different NaCl concentrations in order to explore opportunities for selection and breeding salt tolerant genotypes. Methodology: Germination of seeds was evaluated under salt stresses using 5 treatment levels: 0, 100, 200, 300 and 400 mM of NaCl. Corrected germination rate (GC), germination rate index (GRI) and mean germination time (MGT) were recorded during 10 days. Results: The results indicated that germination was significantly reduced in all species with the increase in NaCl concentrations. However, significant interspecific variation for salt tolerance was observed. The greatest variability in tolerance was observed at moderate salt stress (200 mM of NaCl) and the decrease in germination appeared to be more accentuated in A. cyanophylla and A. cyclops. Although, A. raddiana, remains the most interesting, it preserved the highest percentage (GC = 80%) and velocity of germination in all species studied in this study, even in the high salt levels. This species exhibited a particular adaptability to salt environment, at least at this stage in the life cycle and could be recommended for plantation establishment in salt affected areas. On the other hand, when ungerminated seeds were transferred from NaCl treatments to distilled water, they recovered largely their germination without a lag period and with high speed. This indicated that the germination inhibition was related to a reversible osmotic stress that induced dormancy rather than specific ion toxicity. Conclusion: This ability to germinate after exposure to higher concentrations of NaCl suggests that studied species, especially the most tolerant could be able to germinate under the salt affected soils and could be utilized for the rehabilitation of damaged arid zones.

Read more
Other Quantitative Biology

Comparative study of the biological activities of the aqueous extracts of two spontaneous plants harvested in the Algerian Sahara

The present study investigates the insecticidal and herbicidal effects of leaf extracts from two plants were harvested in the Northern Algerian Sahara. These are Cleome arabica (Capparaceae) and Pergularia tomentosa (Asclepiadaceae). The efficacy of the extracts from the plants was evaluated by the reflux extraction method. The phytochemical screening of the aqueous extracts of C. arabica shows a remarkable richness in active principles in comparison with the extract of P. tomentosa; including flavonoids, saponosides, glycosides, terpenes, sterols, deoxyose, polyphenols and total alkaloids. The imago of Tribolium confusum treated with aqueous extracts of C. arabica and P. tomentosa at doses of 80% to 100% respectively have mortality rates of 73.33% to 96.67%, and 36.67% to 86.67%. The lethal time 50 (TL50%) of the aqueous extract of C. arabica was estimated about 6.41 days, and 6.94 days for the extract P. tomentosa for the imago of T. confusum. The extracts of P. tomentosa is less toxic than the extracts of C. arabica. The allelopathic potentials of C. arabica and P. tomentosa tested on germination of the seeds of a weed Dactyloctenium aegyptium (Poaceae) and two cultivated species, including Hordeumvulgare and Triticumdurum (Poaceae), show that the inhibitory effect of extracts of C. arabica is very highly significant. It manifests itself in the growth of the aerial and underground part of the H. vulgar and T. durum. The inhibition rate is more than 84.44% for D. aegyptium seeds treated with the different concentrations. The inhibition rates range from 75.56% to 91.11% for T. durum wheat irrigate at 80% to 100%, but are only 55.56% to 77.78% for barley seeds treated with the same concentrations (80% to 100%).

Read more
Other Quantitative Biology

Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification. Furthermore, we introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences to compute information-rich features from the DNA sequences. We also use an existing feature extraction method based on 3-grams to represent amino acids and combine both feature extraction methods with a multitude of machine learning algorithms. Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches. The results of the experiments show that all methods obtain high accuracies on the different DNA datasets. Furthermore, the domain-specific 3-gram feature extraction method leads in general to the best results in the experiments, while the newly proposed technique outperforms all other methods on the smallest Covid-19 dataset

Read more
Other Quantitative Biology

Comparison of SVM and Spectral Embedding in Promoter Biobricks' Categorizing and Clustering

Background: In organisms' genomes, promoters are short DNA sequences on the upstream of structural genes, with the function of controlling genes' transcription. Promoters can be roughly divided into two classes: constitutive promoters and inducible promoters. Promoters with clear functional annotations are practical synthetic biology biobricks. Many statistical and machine learning methods have been introduced to predict the functions of candidate promoters. Spectral Eigenmap has been proved to be an effective clustering method to classify biobricks, while support vector machine (SVM) is a powerful machine learning algorithm, especially when dataset is small. Methods: The two algorithms: spectral embedding and SVM are applied to the same dataset with 375 prokaryotic promoters. For spectral embedding, a Laplacian matrix is built with edit distance, followed by K-Means Clustering. The sequences are represented by numeric vector to serve as dataset for SVM trainning. Results: SVM achieved a high predicting accuracy of 93.07% in 10-fold cross validation for classification of promoters' transcriptional functions. Laplacian eigenmap (spectral embedding) based on editing distance may not be capable for extracting discriminative features for this task.

Read more

Ready to get started?

Join us today