Monzoorul Haque Mohammed
Tata Consultancy Services
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Monzoorul Haque Mohammed.
Briefings in Bioinformatics | 2012
Sharmila S. Mande; Monzoorul Haque Mohammed; Tarini Shankar Ghosh
Characterizing the taxonomic diversity of microbial communities is one of the primary objectives of metagenomic studies. Taxonomic analysis of microbial communities, a process referred to as binning, is challenging for the following reasons. Primarily, query sequences originating from the genomes of most microbes in an environmental sample lack taxonomically related sequences in existing reference databases. This absence of a taxonomic context makes binning a very challenging task. Limitations of current sequencing platforms, with respect to short read lengths and sequencing errors/artifacts, are also key factors that determine the overall binning efficiency. Furthermore, the sheer volume of metagenomic datasets also demands highly efficient algorithms that can operate within reasonable requirements of compute power. This review discusses the premise, methodologies, advantages, limitations and challenges of various methods available for binning of metagenomic datasets obtained using the shotgun sequencing approach. Various parameters as well as strategies used for evaluating binning efficiency are then reviewed.
Gut Pathogens | 2011
Sourav Sen Gupta; Monzoorul Haque Mohammed; Tarini Shankar Ghosh; Suman Kanungo; G. B. Nair; Sharmila S. Mande
BackgroundMalnutrition, a major health problem, affects a significant proportion of preschool children in developing countries. The devastating consequences of malnutrition include diarrhoea, malabsorption, increased intestinal permeability, suboptimal immune response, etc. Nutritional interventions and dietary solutions have not been effective for treatment of malnutrition till date. Metagenomic procedures allow one to access the complex cross-talk between the gut and its microbial flora and understand how a different community composition affects various states of human health. In this study, a metagenomic approach was employed for analysing the differences between gut microbial communities obtained from a malnourished and an apparently healthy child.ResultsOur results indicate that the malnourished child gut has an abundance of enteric pathogens which are known to cause intestinal inflammation resulting in malabsorption of nutrients. We also identified a few functional sub-systems from these pathogens, which probably impact the overall metabolic capabilities of the malnourished child gut.ConclusionThe present study comprehensively characterizes the microbial community resident in the gut of a malnourished child. This study has attempted to extend the understanding of the basis of malnutrition beyond nutrition deprivation.
Bioinformatics | 2012
Monzoorul Haque Mohammed; Anirban Dutta; Tungadri Bose; Sudha Chadaram; Sharmila S. Mande
SUMMARY An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. AVAILABILITY AND IMPLEMENTATION Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Journal of Biosciences | 2012
Tungadri Bose; Monzoorul Haque Mohammed; Anirban Dutta; Sharmila S. Mande
Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND – an algorithm specialized for compressing nucleotide sequence data. By adopting a unique ‘block-length’ encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.
BMC Bioinformatics | 2011
Tarini Shankar Ghosh; Monzoorul Haque Mohammed; Hannah Rajasingh; Sudha Chadaram; Sharmila S. Mande
BackgroundOne of the primary goals of comparative metagenomic projects is to study the differences in the microbial communities residing in diverse environments. Besides providing valuable insights into the inherent structure of the microbial populations, these studies have potential applications in several important areas of medical research like disease diagnostics, detection of pathogenic contamination and identification of hitherto unknown pathogens. Here we present a novel and rapid, alignment-free method called HabiSign, which utilizes patterns of tetra-nucleotide usage in microbial genomes to bring out the differences in the composition of both diverse and related microbial communities.ResultsValidation results show that the metagenomic signatures obtained using the HabiSign method are able to accurately cluster metagenomes at biome, phenotypic and species levels, as compared to an average tetranucleotide frequency based approach and the recently published dinucleotide relative abundance based approach. More importantly, the method is able to identify subsets of sequences that are specific to a particular habitat. Apart from this, being alignment-free, the method can rapidly compare and group multiple metagenomic data sets in a short span of time.ConclusionsThe proposed method is expected to have immense applicability in diverse areas of metagenomic research ranging from disease diagnostics and pathogen detection to bio-prospecting. A web-server for the HabiSign algorithm is available at http://metagenomics.atc.tcs.com/HabiSign/.
Bioinformation | 2011
Tarini Shankar Ghosh; Monzoorul Haque Mohammed; Dinakar Komanduri; Sharmila S. Mande
Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of ‘correct’ assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/
Genomics | 2012
Tarini Shankar Ghosh; Purnachander Gajjalla; Monzoorul Haque Mohammed; Sharmila S. Mande
Recent advances in high throughput sequencing technologies and concurrent refinements in 16S rDNA isolation techniques have facilitated the rapid extraction and sequencing of 16S rDNA content of microbial communities. The taxonomic affiliation of these 16S rDNA fragments is subsequently obtained using either BLAST-based or word frequency based approaches. However, the classification accuracy of such methods is observed to be limited in typical metagenomic scenarios, wherein a majority of organisms are hitherto unknown. In this study, we present a 16S rDNA classification algorithm, called C16S, that uses genus-specific Hidden Markov Models for taxonomic classification of 16S rDNA sequences. Results obtained using C16S have been compared with the widely used RDP classifier. The performance of C16S algorithm was observed to be consistently higher than the RDP classifier. In some scenarios, this increase in accuracy is as high as 34%. A web-server for the C16S algorithm is available at http://metagenomics.atc.tcs.com/C16S/.
Journal of Biosciences | 2011
Monzoorul Haque Mohammed; Sudha Chadaram; Dinakar Komanduri; Tarini Shankar Ghosh; Sharmila S. Mande
Physical partitioning techniques are routinely employed (during sample preparation stage) for segregating the prokaryotic and eukaryotic fractions of metagenomic samples. In spite of these efforts, several metagenomic studies focusing on bacterial and archaeal populations have reported the presence of contaminating eukaryotic sequences in metagenomic data sets. Contaminating sequences originate not only from genomes of micro-eukaryotic species but also from genomes of (higher) eukaryotic host cells. The latter scenario usually occurs in the case of host-associated metagenomes. Identification and removal of contaminating sequences is important, since these sequences not only impact estimates of microbial diversity but also affect the accuracy of several downstream analyses. Currently, the computational techniques used for identifying contaminating eukaryotic sequences, being alignment based, are slow, inefficient, and require huge computing resources. In this article, we present Eu-Detect, an alignment-free algorithm that can rapidly identify eukaryotic sequences contaminating metagenomic data sets. Validation results indicate that on a desktop with modest hardware specifications, the Eu-Detect algorithm is able to rapidly segregate DNA sequence fragments of prokaryotic and eukaryotic origin, with high sensitivity. A Web server for the Eu-Detect algorithm is available at http://metagenomics.atc.tcs.com/Eu-Detect/.
BMC Genomics | 2011
Monzoorul Haque Mohammed; Tarini Shankar Ghosh; Sudha Chadaram; Sharmila S. Mande
BackgroundObtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity.ResultsPerformance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications.ConclusionsIn addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects.AvailabilityA web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
Journal of Biosciences | 2015
Tungadri Bose; Anirban Dutta; Monzoorul Haque Mohammed; Hemang Gandhi; Sharmila S. Mande
Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, ‘FASTR’, for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. ‘FASTR’ – the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant ‘fastr’ string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the ‘FASTR’ methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr.