Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark D'Souza is active.

Publication


Featured researches published by Mark D'Souza.


BMC Bioinformatics | 2008

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

Folker Meyer; Daniel Paarmann; Mark D'Souza; Robert Olson; Elizabeth M. Glass; Michael Kubal; Tobias Paczian; Alexis Rodriguez; Rick Stevens; Andreas Wilke; Jared Wilkening; Robert Edwards

AbstractBackgroundRandom community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers.ResultsA high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats.ConclusionThe open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org


Nature | 2003

Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis

Natalia Ivanova; Alexei Sorokin; Iain Anderson; Nathalie Galleron; Benjamin Candelon; Vinayak Kapatral; Anamitra Bhattacharyya; Gary Reznik; Natalia Mikhailova; Alla Lapidus; Lien Chu; Michael Mazur; Eugene Goltsman; Niels Bent Larsen; Mark D'Souza; Theresa L. Walunas; Yuri Grechkin; Gordon D. Pusch; Robert Haselkorn; Michael Fonstein; S. Dusko Ehrlich; Ross Overbeek; Nikos C. Kyrpides

Bacillus cereus is an opportunistic pathogen causing food poisoning manifested by diarrhoeal or emetic syndromes. It is closely related to the animal and human pathogen Bacillus anthracis and the insect pathogen Bacillus thuringiensis, the former being used as a biological weapon and the latter as a pesticide. B. anthracis and B. thuringiensis are readily distinguished from B. cereus by the presence of plasmid-borne specific toxins (B. anthracis and B. thuringiensis) and capsule (B. anthracis). But phylogenetic studies based on the analysis of chromosomal genes bring controversial results, and it is unclear whether B. cereus, B. anthracis and B. thuringiensis are varieties of the same species or different species. Here we report the sequencing and analysis of the type strain B. cereus ATCC 14579. The complete genome sequence of B. cereus ATCC 14579 together with the gapped genome of B. anthracis A2012 enables us to perform comparative analysis, and hence to identify the genes that are conserved between B. cereus and B. anthracis, and the genes that are unique for each species. We use the former to clarify the phylogeny of the cereus group, and the latter to determine plasmid-independent species-specific markers.


Journal of Bacteriology | 2003

Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655

Svetlana Gerdes; Michael D. Scholle; John W. Campbell; Gábor Balázsi; E. Ravasz; Matthew D. Daugherty; A. L. Somera; N. C. Kyrpides; I. Anderson; M. S. Gelfand; A. Bhattacharya; Vinayak Kapatral; Mark D'Souza; Mark V. Baev; Y. Grechkin; Faika Mseeh; Michael Fonstein; Ross Overbeek; Albert-László Barabási; Zoltn Oltvai; Andrei L. Osterman

Defining the gene products that play an essential role in an organisms functional repertoire is vital to understanding the system level organization of living cells. We used a genetic footprinting technique for a genome-wide assessment of genes required for robust aerobic growth of Escherichia coli in rich media. We identified 620 genes as essential and 3,126 genes as dispensable for growth under these conditions. Functional context analysis of these data allows individual functional assignments to be refined. Evolutionary context analysis demonstrates a significant tendency of essential E. coli genes to be preserved throughout the bacterial kingdom. Projection of these data over metabolic subsystems reveals topologic modules with essential and evolutionarily preserved enzymes with reduced capacity for error tolerance.


Proceedings of the National Academy of Sciences of the United States of America | 2002

The genome sequence of the facultative intracellular pathogen Brucella melitensis

Vito G. DelVecchio; Vinayak Kapatral; Rajendra Redkar; Guy Patra; Cesar V. Mujer; Tamara Los; Natalia Ivanova; Iain Anderson; Anamitra Bhattacharyya; Athanasios Lykidis; Gary Reznik; Lynn Jablonski; Niels Bent Larsen; Mark D'Souza; Axel Bernal; Mikhail Mazur; Eugene Goltsman; Eugene Selkov; Philip H. Elzer; Sue D. Hagius; David O'Callaghan; Jean-Jacques Letesson; Robert Haselkorn; Nikos C. Kyrpides; Ross Overbeek

Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other α-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.


Journal of Bacteriology | 2002

Genome Sequence and Analysis of the Oral Bacterium Fusobacterium nucleatum Strain ATCC 25586

Vinayak Kapatral; Iain Anderson; Natalia Ivanova; Gary Reznik; Tamara Los; Athanasios Lykidis; Anamitra Bhattacharyya; Allen Bartman; Warren Gardner; Galina Grechkin; Lihua Zhu; Olga Vasieva; Lien Chu; Yakov Kogan; Oleg Chaga; Eugene Goltsman; Axel Bernal; Niels Bent Larsen; Mark D'Souza; Theresa L. Walunas; Gordon D. Pusch; Robert Haselkorn; Michael Fonstein; Nikos C. Kyrpides; Ross Overbeek

We present a complete DNA sequence and metabolic analysis of the dominant oral bacterium Fusobacterium nucleatum. Although not considered a major dental pathogen on its own, this anaerobe facilitates the aggregation and establishment of several other species including the dental pathogens Porphyromonas gingivalis and Bacteroides forsythus. The F. nucleatum strain ATCC 25586 genome was assembled from shotgun sequences and analyzed using the ERGO bioinformatics suite (http://www.integratedgenomics.com). The genome contains 2.17 Mb encoding 2,067 open reading frames, organized on a single circular chromosome with 27% GC content. Despite its taxonomic position among the gram-negative bacteria, several features of its core metabolism are similar to that of gram-positive Clostridium spp., Enterococcus spp., and Lactococcus spp. The genome analysis has revealed several key aspects of the pathways of organic acid, amino acid, carbohydrate, and lipid metabolism. Nine very-high-molecular-weight outer membrane proteins are predicted from the sequence, none of which has been reported in the literature. More than 137 transporters for the uptake of a variety of substrates such as peptides, sugars, metal ions, and cofactors have been identified. Biosynthetic pathways exist for only three amino acids: glutamate, aspartate, and asparagine. The remaining amino acids are imported as such or as di- or oligopeptides that are subsequently degraded in the cytoplasm. A principal source of energy appears to be the fermentation of glutamate to butyrate. Additionally, desulfuration of cysteine and methionine yields ammonia, H(2)S, methyl mercaptan, and butyrate, which are capable of arresting fibroblast growth, thus preventing wound healing and aiding penetration of the gingival epithelium. The metabolic capabilities of F. nucleatum revealed by its genome are therefore consistent with its specialized niche in the mouth.


Nucleic Acids Research | 2003

The ERGOTM genome analysis and discovery system

Ross Overbeek; Niels Bent Larsen; Theresa L. Walunas; Mark D'Souza; Gordon D. Pusch; Eugene Selkov; Konstantinos Liolios; Viktor Joukov; Denis Kaznadzey; Iain Anderson; Anamitra Bhattacharyya; Henry Burd; Warren Gardner; Paul Hanke; Vinayak Kapatral; Natalia Mikhailova; Olga Vasieva; Andrei L. Osterman; Veronika Vonstein; Michael Fonstein; Natalia V. Ivanova; Nikos C. Kyrpides

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.


Journal of Bacteriology | 2002

From Genetic Footprinting to Antimicrobial Drug Targets: Examples in Cofactor Biosynthetic Pathways

Svetlana Gerdes; Michael D. Scholle; Mark D'Souza; Axel Bernal; Mark V. Baev; Michael Farrell; Oleg V. Kurnasov; Matthew D. Daugherty; Faika Mseeh; Boris Polanuyer; John W. Campbell; Shubha Anantha; Konstantin Shatalin; Shamim A. K. Chowdhury; Michael Fonstein; Andrei L. Osterman

Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.


Proceedings of the National Academy of Sciences of the United States of America | 2002

Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains

Anamitra Bhattacharyya; Stephanie Stilwagen; Natalia Ivanova; Mark D'Souza; Axel Bernal; Athanasios Lykidis; Vinayak Kapatral; Iain Anderson; Niels Bent Larsen; Tamara Los; Gary Reznik; Eugene Selkov; Theresa L. Walunas; Helene Feil; William S. Feil; Alexander H. Purcell; Jean Louis Lassez; Trevor Hawkins; Robert Haselkorn; Ross Overbeek; Paul Predki; Nikos C. Kyrpides

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two “gapped” Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an “in silico” cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.


Bioinformatics | 2000

PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance

Sabino Liuni; Mark D'Souza

MOTIVATION The identification of sequence patterns involved in gene regulation and expression is a major challenge in molecular biology. In this paper we describe a novel algorithm and the software for searching nucleotide and protein sequences for complex nucleotide patterns including potential secondary structure elements, also allowing for mismatches/mispairings below a user-fixed threshold, and assessing the statistical significance of their occurrence through a Markov chain simulation. RESULTS The application of the proposed algorithm allowed the identification of some functional elements, such as the Iron Responsive Element, the Histone stem-loop structure and the Selenocysteine Insertion Sequence, located in the mRNA untranslated regions of post-transcriptionally regulated genes with the assessment of sensitivity and selectivity of the searching method. AVAILABILITY A Web interface is available at: http://bigarea.area.ba.cnr.it:8000/EmbIT/Pats earch.html.


Nucleic Acids Research | 2006

PUMA2—grid-based high-throughput analysis of genomes and metabolic pathways

Natalia Maltsev; Elizabeth M. Glass; Dinanath Sulakhe; Alexis Rodriguez; Mustafa Syed; Tanuja Bompada; Yi Zhang; Mark D'Souza

The PUMA2 system (available at ) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mitochondrial and 1493 viral genomes and automated metabolic reconstructions for >200 organisms. Genomic data is annotated with information integrated from >20 sequence, structural and metabolic databases and ontologies. PUMA2 supports both automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 environment for further interactive sequence analysis and annotation.

Collaboration


Dive into the Mark D'Souza's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gary Reznik

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andreas Wilke

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Axel Bernal

University of Pennsylvania

View shared research outputs
Researchain Logo
Decentralizing Knowledge