Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nagiza F. Samatova is active.

Publication


Featured researches published by Nagiza F. Samatova.


PLOS ONE | 2009

Impact of pretreated Switchgrass and biomass carbohydrates on Clostridium thermocellum ATCC 27405 cellulosome composition: a quantitative proteomic analysis.

Babu Raman; Chongle Pan; Gregory B. Hurst; Miguel Rodriguez; Catherine K McKeown; Patricia K. Lankford; Nagiza F. Samatova; Jonathan R. Mielenz

Background Economic feasibility and sustainability of lignocellulosic ethanol production requires the development of robust microorganisms that can efficiently degrade and convert plant biomass to ethanol. The anaerobic thermophilic bacterium Clostridium thermocellum is a candidate microorganism as it is capable of hydrolyzing cellulose and fermenting the hydrolysis products to ethanol and other metabolites. C. thermocellum achieves efficient cellulose hydrolysis using multiprotein extracellular enzymatic complexes, termed cellulosomes. Methodology/Principal Findings In this study, we used quantitative proteomics (multidimensional LC-MS/MS and 15N-metabolic labeling) to measure relative changes in levels of cellulosomal subunit proteins (per CipA scaffoldin basis) when C. thermocellum ATCC 27405 was grown on a variety of carbon sources [dilute-acid pretreated switchgrass, cellobiose, amorphous cellulose, crystalline cellulose (Avicel) and combinations of crystalline cellulose with pectin or xylan or both]. Cellulosome samples isolated from cultures grown on these carbon sources were compared to 15N labeled cellulosome samples isolated from crystalline cellulose-grown cultures. In total from all samples, proteomic analysis identified 59 dockerin- and 8 cohesin-module containing components, including 16 previously undetected cellulosomal subunits. Many cellulosomal components showed differential protein abundance in the presence of non-cellulose substrates in the growth medium. Cellulosome samples from amorphous cellulose, cellobiose and pretreated switchgrass-grown cultures displayed the most distinct differences in composition as compared to cellulosome samples from crystalline cellulose-grown cultures. While Glycoside Hydrolase Family 9 enzymes showed increased levels in the presence of crystalline cellulose, and pretreated switchgrass, in particular, GH5 enzymes showed increased levels in response to the presence of cellulose in general, amorphous or crystalline. Conclusions/Significance Overall, the quantitative results suggest a coordinated substrate-specific regulation of cellulosomal subunit composition in C. thermocellum to better suit the organisms needs for growth under different conditions. To date, this study provides the most comprehensive comparison of cellulosomal compositional changes in C. thermocellum in response to different carbon sources. Such studies are vital to engineering a strain that is best suited to grow on specific substrates of interest and provide the building blocks for constructing designer cellulosomes with tailored enzyme composition for industrial ethanol production.


Computational Biology and Chemistry | 2006

The sorting direct method for stochastic simulation of biochemical systems with varying reaction execution behavior

James M. McCollum; Gregory D. Peterson; Chris D. Cox; Michael L. Simpson; Nagiza F. Samatova

A key to advancing the understanding of molecular biology in the post-genomic age is the development of accurate predictive models for genetic regulation, protein interaction, metabolism, and other biochemical processes. To facilitate model development, simulation algorithms must provide an accurate representation of the system, while performing the simulation in a reasonable amount of time. Gillespies stochastic simulation algorithm (SSA) accurately depicts spatially homogeneous models with small populations of chemical species and properly represents noise, but it is often abandoned when modeling larger systems because of its computational complexity. In this work, we examine the performance of different versions of the SSA when applied to several biochemical models. Through our analysis, we discover that transient changes in reaction execution frequencies, which are typical of biochemical models with gene induction and repression, can dramatically affect simulator performance. To account for these shifts, we propose a new algorithm called the sorting direct method that maintains a loosely sorted order of the reactions as the simulation executes. Our measurements show that the sorting direct method performs favorably when compared to other well-known exact stochastic simulation algorithms.


Bioinformatics | 2008

From pull-down data to protein interaction networks and complexes with biological relevance

Bing Zhang; Byung-Hoon Park; Tatiana V. Karpinets; Nagiza F. Samatova

MOTIVATION Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance. RESULTS A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.


international conference on parallel processing | 2011

Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data

Sriram Lakshminarasimhan; Neil Shah; Stephane Ethier; Scott Klasky; Robert Latham; Robert B. Ross; Nagiza F. Samatova

Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random nature; for the applications used here, they offer less than 10% compression rate. They also impose significant overhead during decompression, making them unsuitable for data analysis and visualization that require repeated data access. To address this problem, we propose an effective method for In-situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a preconditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥ 0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as Wavelet compression. Moreover, besides being a communication-free and scalable compression technique, ISABELA is an inherently local decompression method, namely it does not decode the entire data, making it attractive for random access.


Concurrency and Computation: Practice and Experience | 2014

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks

Qing Liu; Jeremy Logan; Yuan Tian; Hasan Abbasi; Norbert Podhorszki; Jong Youl Choi; Scott Klasky; Roselyne Tchoua; Jay F. Lofstead; Ron A. Oldfield; Manish Parashar; Nagiza F. Samatova; Karsten Schwan; Arie Shoshani; Matthew Wolf; Kesheng Wu; Weikuan Yu

Applications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era. Copyright


Distributed and Parallel Databases | 2002

RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Nagiza F. Samatova; George Ostrouchov; Al Geist; Anatoli V. Melechko

This paper presents a hierarchical clustering method named RACHET (Recursive Agglomeration of Clustering Hierarchies by Encircling Tactic) for analyzing multi-dimensional distributed data. A typical clustering algorithm requires bringing all the data in a centralized warehouse. This results in O(nd) transmission cost, where n is the number of data points and d is the number of dimensions. For large datasets, this is prohibitively expensive. In contrast, RACHET runs with at most O(n) time, space, and communication costs to build a global hierarchy of comparable clustering quality by merging locally generated clustering hierarchies. RACHET employs the encircling tactic in which the merges at each stage are chosen so as to minimize the volume of a covering hypersphere. For each cluster centroid, RACHET maintains descriptive statistics of constant complexity to enable these choices. RACHETs framework is applicable to a wide class of centroid-based hierarchical clustering algorithms, such as centroid, medoid, and Ward.


Nature Biotechnology | 2009

Improved genome annotation for Zymomonas mobilis

Shihui Yang; Katherine M. Pappas; Loren Hauser; Miriam Land; Gwo-Liang Chen; Gregory B. Hurst; Chongle Pan; Vassili N. Kouvelis; Milton A Typas; Dale A. Pelletier; Dawn M. Klingeman; Yun-Juan Chang; Nagiza F. Samatova; Steven D. Brown

893 respective quality scores and the details of the software and parameters used in study are available at our website (Supplementary Table 1). We have also sequenced the genome of an acetate-tolerant strain derived from Z. mobilis ZM4 ATCC31821 that was selected in another geographically separated laboratory7 and report 454 pyrosequencing and Sanger sequencing and peptide support for our changes to the ZM4 chromosome (Supplementary Table 1). In addition, the entire ZM4 pyrosequencing data set has been deposited in the National Center for Biotechnology Information (NCBI) shortread archive database (Study SRP000908). We processed the updated sequence data using the automated Oak Ridge National Laboratory (ORNL) microbial genome annotation pipeline. Finally, we examined the gene models predicted in the original GenBank annotation, the TIGR reannotation and our new reannotation and updated the ZM4 annotation in a final manual curation step. The final curation was performed in conjunction with a defined set of criteria (available with reannotation) and several proteomics data sets that showed peptide support for more than half of the theoretical proteome. An overview of the extensive changes made to the ZM4 chromosome based upon mass-spectrometry proteomics and pyrosequencing data and six illustrative examples are presented (Table 1 and Supplementary Fig. 1, respectively). We have converted 61 pseudogenes in the original annotation into 43 full-length coding sequences, which include predicted genes with important metabolic and physiological functions (e.g., GenBank acc. nos. for tRNA synthetases ZMO0460, ZMO0843, ZMO0845, ZMO1508, ZMO1878 and flagella gene fliF, ZMO0633) (Supplementary Table 2). Several of the updated chromosomal nucleotides are consistent with earlier ZM4 fosmid DNA sequence data (e.g., GenBank acc. no. AAG29859) and we have peptide support for 6 of our 37 newly predicted chromosomal genes (Supplementary Table 3). We did not identify peptides corresponding to any of the putative genes that we deleted. A comprehensive comparison on a gene-by-gene basis is presented in Supplementary Table 4. We have provided our analysis to the authors of the primary genome annotation and they are in the process of updating their GenBank submission. Plasmid DNA was also identified in our 454-pyrosequencing data, which was the financial sustainability of biomedical innovation in the private sector. This in turn will help secure the future of these areas against any further crises.


conference on high performance computing (supercomputing) | 2005

Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

Yun Zhang; Faisal N. Abu-Khzam; Nicole Baldwin; Elissa J. Chesler; Michael A. Langston; Nagiza F. Samatova

Graph-theoretical approaches to biological network analysis have proven to be effective for small networks but are computationally infeasible for comprehensive genome-scale systems-level elucidation of these networks. The difficulty lies in the NP-hard nature of many global systems biology problems that, in practice, translates to exponential (or worse) run times for finding exact optimal solutions. Moreover, these problems, especially those of an enumerative flavor, are often memory-intensive and must share very large sets of data effectively across many processors. For example, the enumeration of maximal cliques - a core component in gene expression networks analysis, cis regulatory motif finding, and the study of quantitative trait loci for high-throughput molecular phenotypes can result in as many as 3^n/3 maximal cliques for a graph with n vertices. Memory requirements to store those cliques reach terabyte scales even on modest-sized genomes. Emerging hardware architectures with ultra-large globally addressable memory such as the SGI Altix and Cray X1 seem to be well suited for addressing these types of data-intensive problems in systems biology. This paper presents a novel framework that provides exact, parallel and scalable solutions to various graph-theoretical approaches to genome-scale elucidation of biological networks. This framework takes advantage of these large-memory architectures by creating globally addressable bitmap memory indices with potentially high compression rates, fast bitwise-logical operations, and reduced search space. Augmented with recent theoretical advancements based on fixed-parameter tractability, this framework produces computationally feasible performance for genome-scale combinatorial problems of systems biology.


BMC Bioinformatics | 2010

A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry

Chongle Pan; Byung H. Park; William Hayes McDonald; Patricia A. Carey; Jillian F. Banfield; Nathan C. VerBerkmoes; Robert L. Hettich; Nagiza F. Samatova

BackgroundHigh-resolution tandem mass spectra can now be readily acquired with hybrid instruments, such as LTQ-Orbitrap and LTQ-FT, in high-throughput shotgun proteomics workflows. The improved spectral quality enables more accurate de novo sequencing for identification of post-translational modifications and amino acid polymorphisms.ResultsIn this study, a new de novo sequencing algorithm, called Vonode, has been developed specifically for analysis of such high-resolution tandem mass spectra. To fully exploit the high mass accuracy of these spectra, a unique scoring system is proposed to evaluate sequence tags based primarily on mass accuracy information of fragment ions. Consensus sequence tags were inferred for 11,422 spectra with an average peptide length of 5.5 residues from a total of 40,297 input spectra acquired in a 24-hour proteomics measurement of Rhodopseudomonas palustris. The accuracy of inferred consensus sequence tags was 84%. According to our comparison, the performance of Vonode was shown to be superior to the PepNovo v2.0 algorithm, in terms of the number of de novo sequenced spectra and the sequencing accuracy.ConclusionsHere, we improved de novo sequencing performance by developing a new algorithm specifically for high-resolution tandem mass spectral data. The Vonode algorithm is freely available for download at http://compbio.ornl.gov/Vonode.


The ISME Journal | 2010

Cultivation and quantitative proteomic analyses of acidophilic microbial communities

Christopher P. Belnap; Chongle Pan; Nathan C. VerBerkmoes; Mary E. Power; Nagiza F. Samatova; Rudolf L. Carver; Robert L. Hettich; Jillian F. Banfield

Acid mine drainage (AMD), an extreme environment characterized by low pH and high metal concentrations, can support dense acidophilic microbial biofilm communities that rely on chemoautotrophic production based on iron oxidation. Field determined production rates indicate that, despite the extreme conditions, these communities are sufficiently well adapted to their habitats to achieve primary production rates comparable to those of microbial communities occurring in some non-extreme environments. To enable laboratory studies of growth, production and ecology of AMD microbial communities, a culturing system was designed to reproduce natural biofilms, including organisms recalcitrant to cultivation. A comprehensive metabolic labeling-based quantitative proteomic analysis was used to verify that natural and laboratory communities were comparable at the functional level. Results confirmed that the composition and core metabolic activities of laboratory-grown communities were similar to a natural community, including the presence of active, low abundance bacteria and archaea that have not yet been isolated. However, laboratory growth rates were slow compared with natural communities, and this correlated with increased abundance of stress response proteins for the dominant bacteria in laboratory communities. Modification of cultivation conditions reduced the abundance of stress response proteins and increased laboratory community growth rates. The research presented here represents the first description of the application of a metabolic labeling-based quantitative proteomic analysis at the community level and resulted in a model microbial community system ideal for testing physiological and ecological hypotheses.

Collaboration


Dive into the Nagiza F. Samatova's collaboration.

Top Co-Authors

Avatar

Scott Klasky

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

William Hendrix

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Vipin Kumar

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar

Al Geist

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Byung-Hoon Park

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

John Jenkins

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Matthew C. Schmidt

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Paul Breimyer

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Kanchana Padmanabhan

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Steve Harenberg

North Carolina State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge