Bahrad A. Sokhansanj
Drexel University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bahrad A. Sokhansanj.
Advances in Bioinformatics | 2008
Gail Rosen; Elaine Garbarine; Diamantino Caseiro; Robi Polikar; Bahrad A. Sokhansanj
A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLASTs tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.
Ageing Research Reviews | 2006
Andres Kriete; Bahrad A. Sokhansanj; Donald L. Coppock; Geoffrey B. West
The aging of an organism is the result of complex changes in structure and function of molecules, cells, tissues, and whole body systems. To increase our understanding of how aging works, we have to analyze and integrate quantitative evidence from multiple levels of biological organization. Here, we define a broader conceptual framework for a quantitative, computational systems biology approach to aging. Initially, we consider fractal supply networks that give rise to scaling laws relating body mass, metabolism and lifespan. This approach provides a top-down view of constrained cellular processes. Concomitantly, multi-omics data generation build such a framework from the bottom-up, using modeling strategies to identify key pathways and their physiological capacity. Multiscale spatio-temporal representations finally connect molecular processes with structural organization. As aging manifests on a systems level, it emerges as a highly networked process regulated through feedback loops between levels of biological organization.
Current Genomics | 2009
Gail Rosen; Bahrad A. Sokhansanj; Robi Polikar; Mary Ann Bruns; Jacob A. Russell; Elaine Garbarine; Steve Essinger; Non Yok
Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.
Metabolomics | 2008
Geoffrey T. Gipson; Kay Tatsuoka; Bahrad A. Sokhansanj; Rachel J. Ball; Susan C. Connor
Assignment of physical meaning to mass spectrometry (MS) data peaks is an important scientific challenge for metabolomics investigators. Improvements in instrumental mass accuracy reduce the number of spurious database matches, however, this alone is insufficient for accurate, unique high-throughput assignment. We present a method for clustering MS instrumental artifacts and a stochastic local search algorithm for the automated assignment of large, complex MS-based metabolomic datasets. Artifact peaks and their associated source peaks are grouped into “instrumental clusters.” Instrumental clusters, peaks grouped together by shared peak shape in the temporal domain, serve as a guide for the number of assignments necessary to completely explain a given dataset. We refine mass only assignments through the intersection of peak correlation pairs with a database of biochemically relevant interaction pairs. Further refinement is achieved through a stochastic local search optimization algorithm that selects individual assignments for each instrumental cluster. The algorithm works by choosing the peak assignment that maximally explains the connectivity of a given cluster. We demonstrate that this methodology provides a significant advantage over standard methods for the assignment of metabolites in a UPLC-MS diabetes dataset.
Molecular BioSystems | 2008
Geoffrey T. Gipson; Kay Tatsuoka; Rachel J. Ball; Bahrad A. Sokhansanj; Michael K. Hansen; Terence E. Ryan; Mark P. Hodson; Brian C. Sweatman; Susan C. Connor
We describe a multi-platform ((1)H NMR, LC-MS, microarray) investigation of metabolic disturbances associated with the leptin receptor defective (db/db) mouse model of type 2 diabetes using novel assignment methodologies. For the first time, several urinary metabolites were found to be associated with diabetes and/or diabetes progression and confirmed in both NMR and LC-MS datasets. The confirmed metabolites were trimethylamine-n-oxide (TMAO), creatine, carnitine, and phenylalanine. TMAO and phenylalanine were both elevated in db/db mice and decreased in these mice with age. Levels of both creatine and carnitine increase in diabetic mice with age and creatine was also significantly decreased in db/db mice. Additionally, many metabolic markers were found by either NMR or LC-MS, but could not be found in both, due to instrumental limitations. This indicates that the combined use of NMR and LC-MS instrumentation provides complementary information that would be otherwise unattainable. Pathway analyses of urinary metabolites and liver, muscle, and adipose tissue transcripts from the db/db model were also performed to identify altered biochemical processes in the diabetic mice. Metabolite and liver transcript levels associated with the TCA cycle and steroid processes were altered in db/db mice. In addition, gene expression in muscle and liver associated with fatty acid processing was altered in the diabetic mice and similar evidence was observed in the LC-MS data. Our findings highlight the importance of a number of processes known to be associated with diabetes and reveal tissue specific responses to the condition. When studying metabolic disorders such as diabetes, multiple platform integrated profiling of metabolite alterations in biofluids can provide important insights into the processes underlying the disease.
international conference of the ieee engineering in medicine and biology society | 2009
Xiaohua Hu; Michael K. Ng; Fang-Xiang Wu; Bahrad A. Sokhansanj
In this paper, we present a novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to such a biomolecular network to obtain various subnetworks. Second, computational models are generated for the subnetworks and simulated to predict their behavior in the cellular context. We discuss and evaluate some of the advanced computational modeling approaches, in particular, state-space modeling, probabilistic Boolean network modeling, and fuzzy logic modeling. The modeling and simulation results represent hypotheses that are tested against high-throughput biological datasets (microarrays and/or genetic screens) under normal and perturbation conditions. Experimental results on time-series gene expression data for the human cell cycle indicate that our approach is promising for subnetwork mining and simulation from large biomolecular networks.
BMC Bioinformatics | 2007
Suman Datta; Bahrad A. Sokhansanj
BackgroundThe functions of human cells are carried out by biomolecular networks, which include proteins, genes, and regulatory sites within DNA that encode and control protein expression. Models of biomolecular network structure and dynamics can be inferred from high-throughput measurements of gene and protein expression. We build on our previously developed fuzzy logic method for bridging quantitative and qualitative biological data to address the challenges of noisy, low resolution high-throughput measurements, i.e., from gene expression microarrays. We employ an evolutionary search algorithm to accelerate the search for hypothetical fuzzy biomolecular network models consistent with a biological data set. We also develop a method to estimate the probability of a potential network model fitting a set of data by chance. The resulting metric provides an estimate of both model quality and dataset quality, identifying data that are too noisy to identify meaningful correlations between the measured variables.ResultsOptimal parameters for the evolutionary search were identified based on artificial data, and the algorithm showed scalable and consistent performance for as many as 150 variables. The method was tested on previously published human cell cycle gene expression microarray data sets. The evolutionary search method was found to converge to the results of exhaustive search. The randomized evolutionary search was able to converge on a set of similar best-fitting network models on different training data sets after 30 generations running 30 models per generation. Consistent results were found regardless of which of the published data sets were used to train or verify the quantitative predictions of the best-fitting models for cell cycle gene dynamics.ConclusionOur results demonstrate the capability of scalable evolutionary search for fuzzy network models to address the problem of inferring models based on complex, noisy biomolecular data sets. This approach yields multiple alternative models that are consistent with the data, yielding a constrained set of hypotheses that can be used to optimally design subsequent experiments.
BioMed Research International | 2011
Gail Rosen; Robi Polikar; Diamantino Caseiro; Steven D. Essinger; Bahrad A. Sokhansanj
High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset.
Fuzzy Systems in Bioinformatics and Computational Biology | 2009
Bahrad A. Sokhansanj; Suman Datta; Xiaohua Hu
Fuzzy logic is an effective language for models that interpret large scale, high throughput molecular biology experiments, including genomics, proteomics,metabolomics, and inhibitor screening. Two important principles apply for biological system modeling: (1) In the post-genome era, the development of novel molecular diagnostics and therapeutics requires interpreting the complex results of high-throughput multiplexed experiments, and a framework to efficiently and rapidly design hypothesis-driven experiments. (2) Biomolecular data are typically noisy and semi-quantitative, in particular because of the typical fluorescence output of high throughput experiments. Fuzzy biomolecular network models coupled with hypothesis generation strategies address these needs. In this chapter, we describe an integrated, data-driven method for extracting system models from data and generating hypotheses for experimental design. The method is based on scalable, linear relationships between nodes of a biomolecular network, representing the expression of genes, proteins, and/or metabolites. Data from high-throughput are fuzzified using a universal normalization method. Best-fitting models are generated through an evolutionary algorithm, and disagreements between plausible hypothetical network models are used as the basis for identifying experimental designs. The result is a modeling and simulation framework that can be easily integrated with text-based and graphical biological knowledge contained within existing literature and databases.
Computational Systems Biology | 2006
Avijit Ghosh; David L. Miller; Rui Zou; Bahrad A. Sokhansanj; Andres Kriete
ABSTRACT Computational and theoretical considerations for the extension of systems biology into the spatiotemporal realm will be discussed. Both limitations and extensions of current approaches within the research community will be investigated, along with the approach taken by our group in a newly developed software package, CellSim. Application of all computational aspects described rely on cellular assays imaged by fluorescence confocal microscopy. Taken together, this extension to systems biology may answer questions about complex protein networks and the role spatial heterogeneity may play in such processes.