Mansour Ebrahimi
University of Qom
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mansour Ebrahimi.
Saline Systems | 2011
Esmaeil Ebrahimie; Mansour Ebrahimi; Narjes Rahpayma Sarvestani; Mahdi Ebrahimi
Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins.
Journal of Theoretical Biology | 2014
Mohammad Reza Bakhtiarizadeh; Mohammad Moradi-Shahrbabak; Mansour Ebrahimi; Esmaeil Ebrahimie
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.
Environmental Monitoring and Assessment | 2010
Mansour Ebrahimi; Mahnaz Taherianfard
Concentration of heavy metals in aquatic animals mainly occurs due to industrial contamination. In this study, the concentrations of four heavy metals (cadmium, lead, mercury, and arsenic) in organs of two cyprinid fish and in water collected from three sections of the Kor River, Iran were determined using the inductively coupled plasma method. Pathological and hormonal changes due to metal contamination were also measured. The concentrations of heavy metals in tissue of fish from the middle sampling zone were significantly higher (p < 0.05) than those from the other two sampling zones, whereas no significant differences (p > 0.05) were detected between the two sexes and species. High levels of metals were found in the ovaries and testes; estradiol in females and progesterone and testosterone in males from the middle study site were significantly (p < 0.05) lower than values from the other two sites. Pathological changes in blood cells, liver, and kidneys of fishes were significantly higher in highly polluted areas (middle sampling zone). These results show that industrial activities have polluted the river and that the maximum concentrations of Cd, Pb, and Hg were higher than the permissible levels for human consumption.
PLOS ONE | 2011
Mansour Ebrahimi; Amir Lakizadeh; Parisa Agha-Golzadeh; Esmaeil Ebrahimie; Mahdi Ebrahimi
The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a basis for engineering thermostable enzymes in the laboratory.
Bioinformatics and Biology Insights | 2011
E. Ashrafi; Abbas Alemzadeh; Mansour Ebrahimi; Esmaeil Ebrahimie; N. Dadkhodaei
Phytoremediation refers to the use of plants for extraction and detoxification of pollutants, providing a new and powerful weapon against a polluted environment. In some plants, such as Thlaspi spp, heavy metal ATPases are involved in overall metal ion homeostasis and hyperaccumulation. P1B-ATPases pump a wide range of cations, especially heavy metals, across membranes against their electrochemical gradients. Determination of the protein characteristics of P1B-ATPases in hyperaccumulator plants provides a new opportuntity for engineering of phytoremediating plants. In this study, using diverse weighting and modeling approaches, 2644 protein characteristics of primary, secondary, and tertiary structures of P1B-ATPases in hyperaccumulator and nonhyperaccumulator plants were extracted and compared to identify differences between proteins in hyperaccumulator and nonhyperaccumulator pumps. Although the protein characteristics were variable in their weighting, tree and rule induction models; glycine count, frequency of glutamine-valine, and valine-phenylalanine count were the most important attributes highlighted by 10, five, and four models, respectively. In addition, a precise model was built to discriminate P1B-ATPases in different organisms based on their structural protein features. Moreover, reliable models for prediction of the hyperaccumulating activity of unknown P1B-ATPase pumps were developed. Uncovering important structural features of hyperaccumulator pumps in this study has provided the knowledge required for future modification and engineering of these pumps by techniques such as site-directed mutagenesis.
SpringerPlus | 2012
Tahereh Deihimi; Ali Niazi; Mansour Ebrahimi; Kimia Kajbaf; Somaye Fanaee; Mohammad Reza Bakhtiarizadeh; Esmaeile Ebrahimie
Regarding the possible multiple functions of a specific gene, finding the alternative roles of genes is a major challenge. Huge amount of available expression data and the central role of the promoter and its regulatory elements provide unique opportunely to address this issue. The question is that how the expression data and promoter analysis can be applied to uncover the different functions of a gene. A computational approach has been presented here by analysis of promoter regulatory elements, coexpressed gene as well as protein domain and prosite analysis. We applied our approach on Thaumatin like protein (TLP) as example. TLP is of group 5 of pathogenesis related proteins which their antifungal role has been proved previously. In contrast, Osmotin like proteins (OLPs) are basic form of TLPs with proved role only in abiotic stresses. We demonstrated the possible outstanding homolouges involving in both biotic and abiotic stresses by analyzing 300 coexpressed genes for each Arabidopsis TLP and OLP in biotic, abiotic, hormone, and light microarray experiments based on mutual ranking. In addition, promoter analysis was employed to detect transcription factor binding sites (TFBs) and their differences between OLPs and TLPs. A specific combination of five TFBs was found in all TLPs presenting the key structure in functional response of TLP to fungal stress. Interestingly, we found the fungal response TFBs in some of salt responsive OLPs, indicating the possible role of OLPs in biotic stresses. Thirteen TFBS were unique for all OLPs and some found in TLPs, proposing the possible role of these TLPs in abiotic stresses. Multivariate analysis showed the possibility of estimating models for distinguishing biotic and abiotic functions of TIPs based on promoter regulatory elements. This is the first report in identifying multiple roles of TLPs and OLPs in biotic and abiotic stresses. This study provides valuable clues for screening and discovering new genes with possible roles in tolerance against both biotic and abiotic stresses. Interestingly, principle component analysis showed that promoter regulatory elements of TLPs and OLPs are more variable than protein properties reinforcing the prominent role of promoter architecture in determining gene function alteration.
PLOS ONE | 2014
Avat Shekoofa; Y. Emam; Navid Shekoufa; Mansour Ebrahimi; Esmaeil Ebrahimie
Prediction is an attempt to accurately forecast the outcome of a specific situation while using input information obtained from a set of variables that potentially describe the situation. They can be used to project physiological and agronomic processes; regarding this fact, agronomic traits such as yield can be affected by a large number of variables. In this study, we analyzed a large number of physiological and agronomic traits by screening, clustering, and decision tree models to select the most relevant factors for the prospect of accurately increasing maize grain yield. Decision tree models (with nearly the same performance evaluation) were the most useful tools in understanding the underlying relationships in physiological and agronomic features for selecting the most important and relevant traits (sowing date-location, kernel number per ear, maximum water content, kernel weight, and season duration) corresponding to the maize grain yield. In particular, decision tree generated by C&RT algorithm was the best model for yield prediction based on physiological and agronomical traits which can be extensively employed in future breeding programs. No significant differences in the decision tree models were found when feature selection filtering on data were used, but positive feature selection effect observed in clustering models. Finally, the results showed that the proposed model techniques are useful tools for crop physiologists to search through large datasets seeking patterns for the physiological and agronomic factors, and may assist the selection of the most important traits for the individual site and field. In particular, decision tree models are method of choice with the capability of illustrating different pathways of yield increase in breeding programs, governed by their hierarchy structure of feature ranking as well as pattern discovery via various combinations of features.
PLOS ONE | 2014
Arghavan Alisoltani; Hossein Fallahi; Mahdi Ebrahimi; Mansour Ebrahimi; Esmaeil Ebrahimie
A novel integrative pipeline is presented for discovery of potential cancer-susceptibility regions (PCSRs) by calculating the number of altered genes at each chromosomal region, using expression microarray datasets of different human cancers (HCs). Our novel approach comprises primarily predicting PCSRs followed by identification of key genes in these regions to obtain potential regions harboring new cancer-associated variants. In addition to finding new cancer causal variants, another advantage in prediction of such risk regions is simultaneous study of different types of genomic variants in line with focusing on specific chromosomal regions. Using this pipeline we extracted numbers of regions with highly altered expression levels in cancer condition. Regulatory networks were also constructed for different types of cancers following the identification of altered mRNA and microRNAs. Interestingly, results showed that GAPDH, LIFR, ZEB2, mir-21, mir-30a, mir-141 and mir-200c, all located at PCSRs, are common altered factors in constructed networks. We found a number of clusters of altered mRNAs and miRNAs on predicted PCSRs (e.g.12p13.31) and their common regulators including KLF4 and SOX10. Large scale prediction of risk regions based on transcriptome data can open a window in comprehensive study of cancer risk factors and the other human diseases.
Archive | 2010
Esmaeil Ebrahimie; Mansour Ebrahimi
Finding or making thermostable enzymes has been identified as an important goal in a number of different industries. Therefore, understanding the features involved in enzyme thermostability is crucial, and different approaches have been used to extract or manufacture thermostable enzymes. Herein we examined features that contribute to the thermostability of 2,946 proteins. We used various screening techniques (anomaly detection, feature selection), clustering methods (K-Means, TwoStep cluster), decision tree models (Classification and Regression Tree, CHAID, Exhaustive CHAID, QUEST, C5.0), and generalized rule induction (association) (GRI) models to search for patterns of thermostability and to find features that contribute to enzyme thermal stability. We found that Arg as the N-terminal amino acid was found solely in proteins working at temperatures higher than 70 oC. Fifty-four protein features were shown to be important in feature selection modeling, and the number of peer groups with an anomaly index of 2.12 declined from 7 to 2 after being run using only important selected features; however, no changes were found in the numbers of groups when K-Means and TwoStep clustering modeling was performed on datasets with/without feature selection filtering. The depth of the trees generated by various decision tree models varied from 14 (in the C5.0 model with 10-fold cross-validation and with feature selection of the dataset) to 4 (in CHAID models) branches. The performance evaluation of the decision tree models tested here showed that C5.0 was the best and the Quest model was the worst. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models when feature selected datasets were used, but the number of peer groups in clustering models was reduced significantly (p<0.05) compared to datasets without feature selection. In all decision tree models, the frequency of Gln was the most important feature for decision tree rule sets; moreover, in all GRI association rules (100 rules), the frequency of Gln was used in antecedent to support the rules. The importance of Gln in protein thermostability is discussed in this paper.
PLOS ONE | 2012
Faezeh Hosseinzadeh; Mansour Ebrahimi; Bahram Goliaei; Narges Shamabadi
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.