Gautam B. Singh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gautam B. Singh is active.

Explore More

Publication

Featured researches published by Gautam B. Singh.

Nucleic Acids Research | 1998

The Genome Sequence DataBase (GSDB): improving data quality and data access.

Carol Harger; M. P. Skupski; J. Bingham; Andrew D. Farmer; S. Hoisie; Peter Hraber; Donald Kiphart; L. Krakowski; Mia McLeod; Jolene Schwertfeger; G. A. Seluja; Adam Siepel; Gautam B. Singh; D. Stamper; Peter A. Steadman; Nina Thayer; R. Thompson; P. Wargo; Mark E. Waugh; J. J. Zhuang; P. A. Schad

In 1997 the primary focus of the Genome Sequence DataBase (GSDB; www. ncgr.org/gsdb ) located at the National Center for Genome Resources was to improve data quality and accessibility. Efforts to increase the quality of data within the database included two major projects; one to identify and remove all vector contamination from sequences in the database and one to create premier sequence sets (including both alignments and discontiguous sequences). Data accessibility was improved during the course of the last year in several ways. First, a graphical database sequence viewer was made available to researchers. Second, an update process was implemented for the web-based query tool, Maestro. Third, a web-based tool, Excerpt, was developed to retrieve selected regions of any sequence in the database. And lastly, a GSDB flatfile that contains annotation unique to GSDB (e.g., sequence analysis and alignment data) was developed. Additionally, the GSDB web site provides a tool for the detection of matrix attachment regions (MARs), which can be used to identify regions of high coding potential. The ultimate goal of this work is to make GSDB a more useful resource for genomic comparison studies and gene level studies by improving data quality and by providing data access capabilities that are consistent with the needs of both types of studies.

Somatic Cell and Molecular Genetics | 1998

A Matrix Associated Region Localizes the Human SOCS-1 Gene to Chromosome 16p13.13

Jeffrey A. Kramer; Mark D. Adams; Gautam B. Singh; Norman A. Doggett; Stephen A. Krawetz

The MarFinder algorithm was applied to a newly sequenced segment of 16p13.13 abutting the 3′ end of the human PRM1 → PRM2 → TNP2 locus. A candidate region of matrix attached was identified. Subsequent biophysical analysis showed that this region was attached to the somatic nuclear matrix. Nucleotide sequence analysis also revealed the presence of a CpG island. Data base queries showed that this region contained the SOCS-1 gene. Thus, the SOCS-1 gene is bounded by a somatic MAR and is just 3′ of the spermatid-expressed PRM1 → PRM2 → TNP2 domain at position 16p13.13.

IEEE Transactions on Vehicular Technology | 2009

Using Hidden Markov Models in Vehicular Crash Detection

Gautam B. Singh; Haiping Song

This paper presents a system for automotive crash detection based on hidden Markov models (HMMs). The crash pulse library used for training comprises a number of head-on and oblique angular crash events involving rigid and offset deformable barriers. Stochastic distribution characteristics of crash signals are validated to ensure conformity with the modeling assumptions. This step is achieved by analyzing the quantile-quantile (Q-Q) plot of actual pulses against the assumed bivariate Gaussian distribution. HMM parameters are next induced by utilizing the expectation-maximization (EM) procedure. The search for an optimal crash pulse model proceeds using the ldquoleave-one-outrdquo technique with the exploration encompassing both fully connected and left-right HMM topologies. The optimal crash pulse architecture is identified as a seven-state left-right HMM with its parameters computed using real and computer-aided engineering (CAE)-generated data. The system described in the paper has the following advantages. First, it is fast and can accurately detect crashes within 6 ms. Second, its implementation is simple and uses only two sensors, which makes it less vulnerable to failures, considering the overall simplicity of interconnects. Finally, it represents a general and modularized algorithm that can be adapted to any vehicle line and readily extended to use additional sensors.

Nucleic Acids Research | 1997

The Genome Sequence DataBase version 1.0 (GSDB): from low pass sequences to complete genomes

Carol Harger; M. P. Skupski; Ethan Allen; Christopher Clark; David Crowley; Emily Dickinson; David Easley; Ada Espinosa-Lujan; Andrew D. Farmer; Chris Fields; Leandrita Flores; Lynn Harris; Gifford Keen; Maurice Manning; Mia McLeod; John O'Neill; Maria Pumilia; Rhonda Reinert; David Rider; John Rohrlich; Yolanda Romero; Jolene Schwertfeger; Gustavo Seluja; Adam Siepel; Gautam B. Singh; Linda Smyth; D. Stamper; Judy Stein; Randy Suggs; Rajini Takkallapalli

The Genome Sequence DataBase (GSDB) has completed its conversion to an improved relational database. The new database, GSDB 1.0, is fully operational and publicly available. Data contributions, including both original sequence submissions and community annotation, are being accomplished through the use of a graphical client-server interface tool, the GSDB Annotator, and via GIO (GSDB Input/Output) files. Data retrieval services are being provided through a new Web Query Tool and direct SQL. All methods of data contribution and data retrieval fully support the new data types that have been incorporated into GSDB, including discontiguous sequences, multiple sequence alignments, and community annotation.

Archive | 2003

Statistical Modeling of DNA Sequences and Patterns

Gautam B. Singh

Of fundamental importance in bioinformatics are the mathematical models designed for modeling the biological sequence and to use that as the basis for detection of patterns. Patterns at the various levels of abstractions are the drivers of genomics and proteomics research. Starting at the fine level of granularity, the patterns are comprised of the splice sites, binding sites, and domains. These are subsequently utilized for the definition of patterns at a higher level of abstraction such as introns, exons, repetitive DNA, and locus-control regions.

Sigkdd Explorations | 2000

Discovering Matrix Attachment Regions (MARs) in genomic databases

Gautam B. Singh

Lately, there has been considerable interest in applying Data Mining techniques to scientific and data analysis problems in bioinformatics. Data mining research is being fueled by novel application areas that are helping the development of newer applied algorithms in the field of bioinformatics, an emerging discipline representing the integration of biological and information sciences. This is a shift in paradigm from the earlier and the continuing data mining efforts in marketing research and support for business intelligence. The problem described in this paper is along a new dimension in DNA sequence analysis research and supplements the previously studied stochastic models for evolution and variability. The discovery of novel patterns from genetic databases as described is quite significant because biological pattern play an important role in a large variety of cellular processes and constitute the basis for gene therapy. Biological databases containing the genetic codes from a wide variety of organisms, including humans, have continued their exponential growth over the last decade. At the time of this writing, the GenBank database contains over 300 million sequences and over 2.5 billion characters of sequenced nucleotides. The focus of this paper is on developing a general data mining algorithm for discovering regions of locus control, i.e. those regions that are instrumental for activating genes. One type of such elements of locus control are the MARs or the Matrix Association Regions. Our limited knowledge about MARs has hampered their detection using classical pattern recognition techniques. Consequently, their detection is formulated by utilizing a statistical interestingness measure derived from a set of empirical features that are known to be associated with MARs. This paper presents a systematic approach for finding associations between such empirical features in genomic sequences, and for utilizing this knowledge in detecting biologically interesting control signals, such as MARs. This computational MAR discovery tool is implemented as a web-based software called MAR-Wiz and is available for public access. As our knowledge about the living system continues to evolve, and as the biological databases continue to grow, a pattern learning methodology similar to that described in this paper will be significant for the detection of regulatory signals embedded in genomic sequences.

Genomics | 1995

CLONEPLACER: a software tool for simulating contig formation for ordered shotgun sequencing

Gautam B. Singh; Stephen A. Krawetz

This communication describes a software tool that enables one to simulate large-scale regional mapping using an ordered shotgun sequencing approach. The analysis routines that are provided yield an estimate of the depth of coverage of the physical map, the largest contig formed, and the number of gaps remaining at any given juncture in the project. A detailed listing describing the span of each contig within the physical map is also presented. This provides an a priori means of estimating the resources that will be required to undertake any megabase mapping or sequencing project. CLONEPLACER provides the much needed guide to deriving the optimal strategy.

Molecular Biotechnology | 2005

Databases, models, and algorithms for functional genomics: a bioinformatics perspective.

Gautam B. Singh; Harkirat Singh

A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataolged and maintained within internationally shared databases. Furthermore, the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence. As the biological data is comprised of a mosaic of sequence-levels motifs, it is significant to unravel the synergies of macromolecular coordination utilized in cellspecific differential synthesis of proteins. This article provides an overview of the various pattern representation methodologies and the surveys the pattern databases available for use to the molecular biologists. Our aim is to describe the principles behind the computational modeling and analysis techniques utilized in bioinformatics research, with the objective of providing insight necessary to better understand and effectively utilize the available databases and analysis tools. We also provide a detailed review of DNA sequence level patterns responsible for structural conformations within the Scaffold or Matrix Attachment Regions (S/MARs).

IEEE Engineering in Medicine and Biology Magazine | 2005

Functional proteomics with biolinguistic methods

Gautam B. Singh; Harkirat Singh

In this article, new algorithms for comparative proteomics using extensions of n-gram analysis are described. Results demonstrate that these algorithms are more sensitive than those currently available for both genomics and proteomics analysis, enabling a more accurate portrayal of similarity of gene function. The algorithms allow the comparison of protein sequences using biochemical properties that enable the protein molecules to fold and perform the necessary functions. The algorithms described are amenable to parallelization with effective domain database partitioning. This makes them an attractive alternative for searching protein databases by developing high-speed, functionally partitioned searches.

automated software engineering | 1999

System for automated validation of embedded software in multiple operating configurations

Sridevi Lingamarla; Gautam B. Singh; John Limburg; Mary Watson; Gary Edwards; Scott Gobrogge

Embedded controllers in safety critical applications rely on the highest of software quality standards. Testing is performed to ensure that requirements and specifications are met for all the different environments in which the controllers operate. This paper describes the architecture of a system that uses a relational database for tracking tests, requirements, and configurations. The relational sub-schemas are integrated in a data warehouse and allow traceability from requirements to tests. A normalized representation of test cases enables the system to reason about the test topologies and is used for constructing clusters of similar tests. A representative test from each cluster can in turn provide a rapid estimation, of the softwares requirement coverage and quality.

Explore More