Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eugene Kolker is active.

Publication


Featured researches published by Eugene Kolker.


Nature Biotechnology | 2008

Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project

Chris F. Taylor; Dawn Field; Susanna-Assunta Sansone; Jan Aerts; Rolf Apweiler; Michael Ashburner; Catherine A. Ball; Pierre Alain Binz; Molly Bogue; Tim Booth; Alvis Brazma; Ryan R. Brinkman; Adam Clark; Eric W. Deutsch; Oliver Fiehn; Jennifer Fostel; Peter Ghazal; Frank Gibson; Tanya Gray; Graeme Grimes; John M. Hancock; Nigel Hardy; Henning Hermjakob; Randall K. Julian; Matthew Kane; Carsten Kettner; Christopher R. Kinsinger; Eugene Kolker; Martin Kuiper; Nicolas Le Novère

The Minimum Information for Biological and Biomedical Investigations (MIBBI) project aims to foster the coordinated development of minimum-information checklists and provide a resource for those exploring the range of extant checklists.


Science | 2009

Genome Project Standards in a New Era of Sequencing

Patrick Chain; Darren Grafham; Robert S. Fulton; Michael Fitzgerald; Jessica B. Hostetler; Donna M. Muzny; J. Ali; Bruce W. Birren; David Bruce; Christian Buhay; James R. Cole; Yan Ding; Shannon Dugan; Dawn Field; George M Garrity; Richard A. Gibbs; Tina Graves; Cliff Han; Scott H. Harrison; Sarah K. Highlander; Philip Hugenholtz; H. M. Khouri; Chinnappa D. Kodira; Eugene Kolker; Nikos C. Kyrpides; D. Lang; Alla Lapidus; S. A. Malfatti; Victor Markowitz; T. Metha

More detailed sequence standards that keep up with revolutionary sequencing technologies will aid the research community in evaluating data. For over a decade, genome sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole-genome sequencing that requires reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker “draft”; however, these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and has contributed to many wasted hours. Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The result is an ever-widening gap between drafted and finished genomes that only promises to continue (see the figure, page 236); hence, there is an urgent need to distinguish good from poor data sets.


Omics A Journal of Integrative Biology | 2002

Experimental protein mixture for validating tandem mass spectral analysis

Andrew Keller; Samuel O. Purvine; Alexey I. Nesvizhskii; Sergey Stolyar; David R. Goodlett; Eugene Kolker

Several methods have been used to identify peptides that correspond to tandem mass spectra. In this work, we describe a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of these methods. As an example, these spectra were searched by the SEQUEST application against a human peptide sequence database. The numbers of resulting correct and incorrect peptide assignments were then determined. We show how the sensitivity and error rate are affected by the use of various filtering criteria based upon SEQUEST scores and the number of tryptic termini of assigned peptides.


Standards in Genomic Sciences | 2010

Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project

Jack A. Gilbert; Folker Meyer; Dion Antonopoulos; Pavan Balaji; C. Titus Brown; Christopher T. Brown; Narayan Desai; Jonathan A. Eisen; Dirk Evers; Dawn Field; Wu Feng; Daniel H. Huson; Janet K. Jansson; Rob Knight; James Knight; Eugene Kolker; Kostas Konstantindis; Joel E. Kostka; Nikos C. Kyrpides; Rachel Mackelprang; Alice C. McHardy; Christopher Quince; Jeroen Raes; Alexander Sczyrba; Ashley Shade; Rick Stevens

Between July 18th and 24th 2010, 26 leading microbial ecology, computation, bioinformatics and statistics researchers came together in Snowbird, Utah (USA) to discuss the challenge of how to best characterize the microbial world using next-generation sequencing technologies. The meeting was entitled “Terabase Metagenomics” and was sponsored by the Institute for Computing in Science (ICiS) summer 2010 workshop program. The aim of the workshop was to explore the fundamental questions relating to microbial ecology that could be addressed using advances in sequencing potential. Technological advances in next-generation sequencing platforms such as the Illumina HiSeq 2000 can generate in excess of 250 billion base pairs of genetic information in 8 days. Thus, the generation of a trillion base pairs of genetic information is becoming a routine matter. The main outcome from this meeting was the birth of a concept and practical approach to exploring microbial life on earth, the Earth Microbiome Project (EMP). Here we briefly describe the highlights of this meeting and provide an overview of the EMP concept and how it can be applied to exploration of the microbiome of each ecosystem on this planet.


Nucleic Acids Research | 2012

MOPED: Model Organism Protein Expression Database

Eugene Kolker; Roger Higdon; Winston A. Haynes; Dean Welch; William Broomall; Doron Lancet; Larissa Stanberry; Natali Kolker

Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43 000 proteins with at least one spectral match and more than 11 million high certainty spectra.


Journal of Bacteriology | 2003

Initial Proteome Analysis of Model Microorganism Haemophilus influenzae Strain Rd KW20

Eugene Kolker; Samuel O. Purvine; Michael Y. Galperin; Serg Stolyar; David R. Goodlett; Alexey I. Nesvizhskii; Andrew Keller; Tao Xie; Jimmy K. Eng; Eugene C. Yi; Leroy Hood; Alex F. Picone; Tim Cherny; Brian Tjaden; Andrew F. Siegel; Thomas J. Reilly; Kira S. Makarova; Bernhard O. Palsson; Arnold L. Smith

The proteome of Haemophilus influenzae strain Rd KW20 was analyzed by liquid chromatography (LC) coupled with ion trap tandem mass spectrometry (MS/MS). This approach does not require a gel electrophoresis step and provides a rapidly developed snapshot of the proteome. In order to gain insight into the central metabolism of H. influenzae, cells were grown microaerobically and anaerobically in a rich medium and soluble and membrane proteins of strain Rd KW20 were proteolyzed with trypsin and directly examined by LC-MS/MS. Several different experimental and computational approaches were utilized to optimize the proteome coverage and to ensure statistically valid protein identification. Approximately 25% of all predicted proteins (open reading frames) of H. influenzae strain Rd KW20 were identified with high confidence, as their component peptides were unambiguously assigned to tandem mass spectra. Approximately 80% of the predicted ribosomal proteins were identified with high confidence, compared to the 33% of the predicted ribosomal proteins detected by previous two-dimensional gel electrophoresis studies. The results obtained in this study are generally consistent with those obtained from computational genome analysis, two-dimensional gel electrophoresis, and whole-genome transposon mutagenesis studies. At least 15 genes originally annotated as conserved hypothetical were found to encode expressed proteins. Two more proteins, previously annotated as predicted coding regions, were detected with high confidence; these proteins also have close homologs in related bacteria. The direct proteomics approach to studying protein expression in vivo reported here is a powerful method that is applicable to proteome analysis of any (micro)organism.


Bioinformatics | 2007

A predictive model for identifying proteins by a single peptide match

Roger Higdon; Eugene Kolker

MOTIVATION Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, > or = 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Omics A Journal of Integrative Biology | 2009

Risk Assessment and Communication Tools for Genotype Associations with Multifactorial Phenotypes: The Concept of "Edge Effect" and Cultivating an Ethical Bridge between Omics Innovations and Society

Vural Ozdemir; Guilherme Suarez-Kurtz; Raphaëlle Stenne; Andrew A. Somogyi; Toshiyuki Someya; S. Oguz Kayaalp; Eugene Kolker

Applications of omics technologies in the postgenomics era swiftly expanded from rare monogenic disorders to multifactorial common complex diseases, pharmacogenomics, and personalized medicine. Already, there are signposts indicative of further omics technology investment in nutritional sciences (nutrigenomics), environmental health/ecology (ecogenomics), and agriculture (agrigenomics). Genotype-phenotype association studies are a centerpiece of translational research in omics science. Yet scientific and ethical standards and ways to assess and communicate risk information obtained from association studies have been neglected to date. This is a significant gap because association studies decisively influence which genetic loci become genetic tests in the clinic or products in the genetic test marketplace. A growing challenge concerns the interpretation of large overlap typically observed in distribution of quantitative traits in a genetic association study with a polygenic/multifactorial phenotype. To remedy the shortage of risk assessment and communication tools for association studies, this paper presents the concept of edge effect. That is, the shift in population edges of a multifactorial quantitative phenotype is a more sensitive measure (than population averages) to gauge the population level impact and by extension, policy significance of an omics marker. Empirical application of the edge effect concept is illustrated using an original analysis of warfarin pharmacogenomics and the VKORC1 genetic variation in a Brazilian population sample. These edge effect analyses are examined in relation to regulatory guidance development for association studies. We explain that omics science transcends the conventional laboratory bench space and includes a highly heterogeneous cast of stakeholders in society who have a plurality of interests that are often in conflict. Hence, communication of risk information in diagnostic medicine also demands attention to processes involved in production of knowledge and human values embedded in scientific practice, for example, why, how, by whom, and to what ends association studies are conducted, and standards are developed (or not). To ensure sustainability of omics innovations and forecast their trajectory, we need interventions to bridge the gap between omics laboratory and society. Appreciation of scholarship in history of omics science is one remedy to responsibly learn from the past to ensure a sustainable future in omics fields, both emerging (nutrigenomics, ecogenomics), and those that are more established (pharmacogenomics). Another measure to build public trust and sustainability of omics fields could be legislative initiatives to create a multidisciplinary oversight body, at arms length from conflict of interests, to carry out independent, impartial, and transparent innovation analyses and prospective technology assessment.


PLOS Computational Biology | 2013

Differential Expression Analysis for Pathways

Winston Haynes; Roger Higdon; Larissa Stanberry; Dwayne Collins; Eugene Kolker

Life science technologies generate a deluge of data that hold the keys to unlocking the secrets of important biological functions and disease mechanisms. We present DEAP, Differential Expression Analysis for Pathways, which capitalizes on information about biological pathways to identify important regulatory patterns from differential expression data. DEAP makes significant improvements over existing approaches by including information about pathway structure and discovering the most differentially expressed portion of the pathway. On simulated data, DEAP significantly outperformed traditional methods: with high differential expression, DEAP increased power by two orders of magnitude; with very low differential expression, DEAP doubled the power. DEAP performance was illustrated on two different gene and protein expression studies. DEAP discovered fourteen important pathways related to chronic obstructive pulmonary disease and interferon treatment that existing approaches omitted. On the interferon study, DEAP guided focus towards a four protein path within the 26 protein Notch signalling pathway.


Omics A Journal of Integrative Biology | 2004

Standard Mixtures for Proteome Studies

Samuel O. Purvine; Alex F. Picone; Eugene Kolker

Mixtures of moderate complexity were formed from 23 peptides and 12 proteins digested with trypsin, all individually characterized. These mixtures were analyzed with replicates in full and windowed m/z ranges using online high-performance reverse phase liquid chromatography coupled via electrospray ionization to an ion trap mass spectrometer. The resulting spectra were searched using SEQUEST against databases of different sizes and contents and confidences of the observed identifications were evaluated by our earlier statistical model. These data were then combined with biologically derived spectral data, searched, and further evaluated. All peptides but one and all proteins were identified with high confidence. Additionally, the presence and behavior of quadruply charged peptides was analyzed. The properties of the proposed peptide and protein mixtures as well as the performance of the statistical model were carefully investigated. These mixtures mimic the complexity seen in large-scale proteomics experiments, and are proposed to serve as quality assessment standards for future proteome studies.

Collaboration


Dive into the Eugene Kolker's collaboration.

Top Co-Authors

Avatar

Roger Higdon

Seattle Children's Research Institute

View shared research outputs
Top Co-Authors

Avatar

Natali Kolker

Seattle Children's Research Institute

View shared research outputs
Top Co-Authors

Avatar

Elizabeth Stewart

Boston Children's Hospital

View shared research outputs
Top Co-Authors

Avatar

Vural Ozdemir

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Winston Haynes

Seattle Children's Research Institute

View shared research outputs
Top Co-Authors

Avatar

Samuel O. Purvine

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

William Broomall

Seattle Children's Research Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge