Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kei-Hoi Cheung is active.

Publication


Featured researches published by Kei-Hoi Cheung.


Nature | 1999

Large-scale analysis of the yeast genome by transposon tagging and gene disruption

Petra Ross-Macdonald; Paulo S. R. Coelho; Terry Roemer; Seema Agarwal; Anuj Kumar; Ronald Jansen; Kei-Hoi Cheung; Amy Sheehan; Dawn Symoniatis; Lara Umansky; Matthew Heidtman; F. Kenneth Nelson; Hiroshi Iwasaki; Karl Hager; Mark Gerstein; Perry L. Miller; G. Shirleen Roeder; Michael Snyder

Economical methods by which gene function may be analysed on a genomic scale are relatively scarce. To fill this need, we have developed a transposon-tagging strategy for the genome-wide analysis of disruption phenotypes, gene expression and protein localization, and have applied this method to the large-scale analysis of gene function in the budding yeast Saccharomyces cerevisiae. Here we present the largest collection of defined yeast mutants ever generated within a single genetic background—a collection of over 11,000 strains, each carrying a transposon inserted within a region of the genome expressed during vegetative growth and/or sporulation. These insertions affect nearly 2,000 annotated genes, representing about one-third of the 6,200 predicted genes in the yeast genome. We have used this collection to determine disruption phenotypes for nearly 8,000 strains using 20 different growth conditions; the resulting data sets were clustered to identify groups of functionally related genes. We have also identified over 300 previously non-annotated open reading frames and analysed by indirect immunofluorescence over 1,300 transposon-tagged proteins. In total, our study encompasses over 260,000 data points, constituting the largest functional analysis of the yeast genome ever undertaken.


Nature Biotechnology | 2010

The BioPAX community standard for pathway data sharing

Emek Demir; Michael P. Cary; Suzanne M. Paley; Ken Fukuda; Christian Lemer; Imre Vastrik; Guanming Wu; Peter D'Eustachio; Carl F. Schaefer; Joanne S. Luciano; Frank Schacherer; Irma Martínez-Flores; Zhenjun Hu; Verónica Jiménez-Jacinto; Geeta Joshi-Tope; Kumaran Kandasamy; Alejandra López-Fuentes; Huaiyu Mi; Elgar Pichler; Igor Rodchenkov; Andrea Splendiani; Sasha Tkachev; Jeremy Zucker; Gopal Gopinath; Harsha Rajasimha; Ranjani Ramakrishnan; Imran Shah; Mustafa Syed; Nadia Anwar; Özgün Babur

Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.


intelligent systems in molecular biology | 2005

YeastHub: a semantic web use case for integrating data in the life sciences domain

Kei-Hoi Cheung; Kevin Y. Yip; Andrew Smith; Remko deKnikker; Andy Masiar; Mark Gerstein

MOTIVATION As the semantic web technology is maturing and the need for life sciences data integration over the web is growing, it is important to explore how data integration needs can be addressed by the semantic web. The main problem that we face in data integration is a lack of widely-accepted standards for expressing the syntax and semantics of the data. We address this problem by exploring the use of semantic web technologies-including resource description framework (RDF), RDF site summary (RSS), relational-database-to-RDF mapping (D2RQ) and native RDF data repository-to represent, store and query both metadata and data across life sciences datasets. RESULTS As many biological datasets are presently available in tabular format, we introduce an RDF structure into which they can be converted. Also, we develop a prototype web-based application called YeastHub that demonstrates how a life sciences data warehouse can be built using a native RDF data store (Sesame). This data warehouse allows integration of different types of yeast genome data provided by different resources in different formats including the tabular and RDF formats. Once the data are loaded into the data warehouse, RDF-based queries can be formulated to retrieve and query the data in an integrated fashion. AVAILABILITY The YeastHub website is accessible via the following URL: http://yeasthub.gersteinlab.org.


Nature Biotechnology | 2002

An integrated approach for finding overlooked genes in yeast

Anuj Kumar; Paul M. Harrison; Kei-Hoi Cheung; Ning Lan; Nathaniel Echols; Paul Bertone; Perry L. Miller; Mark Gerstein; Michael Snyder

We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to β-galactosidase (β-gal); non-annotated open reading frames (ORFs) translated as β-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.


Briefings in Bioinformatics | 2008

Bringing Web 2.0 to bioinformatics

Zhang Zhang; Kei-Hoi Cheung; Jeffrey P. Townsend

Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.


Nucleic Acids Research | 2000

ALFRED: an allele frequency database for diverse populations and DNA polymorphisms

Kei-Hoi Cheung; Michael V. Osier; Judith R. Kidd; Andrew J. Pakstis; Perry L. Miller; Kenneth K. Kidd

We have developed a publicly accessible database (ALFRED, the ALlele FREquency Database) that catalogues allele frequency data for a wide range of population samples and DNA polymorphisms. This database is web-accessible through our laboratory (Kidd Lab) Web site: http://info.med.yale.edu/genetics/kkidd. ALFRED currently contains data on 60 populations and 156 genetic systems including single nucleotide polymorphisms (SNPs), short tandem repeat polymorphisms (STRPs), variable number of tandem repeats (VNTRs) and insertion-deletion polymorphisms. While data are not available for all population-DNA polymorphism combinations, over 2000 allele frequency tables have been entered. Our database is designed (i) to address our specific research requirements as well as broader scientific objectives; (ii) to allow researchers and interested educators to easily navigate and retrieve data of interest to them; and (iii) to integrate links to other related public databases such as dbSNP, GenBank and PubMed.


Nucleic Acids Research | 2000

TRIPLES: a database of gene function in Saccharomyces cerevisiae

Anuj Kumar; Kei-Hoi Cheung; Petra Ross-Macdonald; Paulo S. R. Coelho; Perry L. Miller; Michael Snyder

Using a novel multipurpose mini-transposon, we have generated a collection of defined mutant alleles for the analysis of disruption phenotypes, protein localization, and gene expression in Saccharomyces cerevisiae. To catalog this unique data set, we have developed TRIPLES, a Web-accessible database of TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces. Encompassing over 250 000 data points, TRIPLES provides convenient access to information from nearly 7800 transposon-mutagenized yeast strains; within TRIPLES, complete data reports of each strain may be viewed in table format, or if desired, downloaded as tab-delimited text files. Each report contains external links to corresponding entries within the Saccharomyces Genome Database and International Nucleic Acid Sequence Data Library (GenBank). Unlike other yeast databases, TRIPLES also provides on-line order forms linked to each clone report; users may immediately request any desired strain free-of-charge by submitting a completed form. In addition to presenting a wealth of information for over 2300 open reading frames, TRIPLES constitutes an important medium for the distribution of useful reagents throughout the yeast scientific community. Maintained by the Yale Genome Analysis Center, TRIPLES may be accessed at http://ycmi.med.yale.edu/ygac/triples.htm


Journal of Biomedical Informatics | 2008

Methodological Review: HCLS 2.0/3.0: Health care and life sciences data mashup using Web 2.0/3.0

Kei-Hoi Cheung; Kevin Y. Yip; Jeffrey P. Townsend; Matthew Scotch

We describe the potential of current Web 2.0 technologies to achieve data mashup in the health care and life sciences (HCLS) domains, and compare that potential to the nascent trend of performing semantic mashup. After providing an overview of Web 2.0, we demonstrate two scenarios of data mashup, facilitated by the following Web 2.0 tools and sites: Yahoo! Pipes, Dapper, Google Maps and GeoCommons. In the first scenario, we exploited Dapper and Yahoo! Pipes to implement a challenging data integration task in the context of DNA microarray research. In the second scenario, we exploited Yahoo! Pipes, Google Maps, and GeoCommons to create a geographic information system (GIS) interface that allows visualization and integration of diverse categories of public health data, including cancer incidence and pollution prevalence data. Based on these two scenarios, we discuss the strengths and weaknesses of these Web 2.0 mashup technologies. We then describe Semantic Web, the mainstream Web 3.0 technology that enables more powerful data integration over the Web. We discuss the areas of intersection of Web 2.0 and Semantic Web, and describe the potential benefits that can be brought to HCLS research by combining these two sets of technologies.


Nucleic Acids Research | 2003

ALFRED: the ALelle FREquency Database. Update

Haseena Rajeevan; Michael V. Osier; Kei-Hoi Cheung; H. Deng; L. Druskin; R. Heinzen; Judith R. Kidd; Sol Stein; Andrew J. Pakstis; Nick P. Tosches; C.-C. Yeh; Perry L. Miller; Kenneth K. Kidd

Elaboration of ALFRED (http://alfred.med.yale.edu) is being continued in two directions. One of which is developing tools for efficiently annotating the entries and checking the integrity of the data already in the database while the other is to increase the quantity and accessibility of data. Information contained in ALFRED such as, polymorphic sites, number of populations and frequency tables (one sample typed for one site) has significantly increased.


Methods | 2013

Review of software tools for design and analysis of large scale MRM proteomic datasets

Christopher M. Colangelo; Lisa Chung; Can Bruce; Kei-Hoi Cheung

Selective or Multiple Reaction monitoring (SRM/MRM) is a liquid-chromatography (LC)/tandem-mass spectrometry (MS/MS) method that enables the quantitation of specific proteins in a sample by analyzing precursor ions and the fragment ions of their selected tryptic peptides. Instrumentation software has advanced to the point that thousands of transitions (pairs of primary and secondary m/z values) can be measured in a triple quadrupole instrument coupled to an LC, by a well-designed scheduling and selection of m/z windows. The design of a good MRM assay relies on the availability of peptide spectra from previous discovery-phase LC-MS/MS studies. The tedious aspect of manually developing and processing MRM assays involving thousands of transitions has spurred to development of software tools to automate this process. Software packages have been developed for project management, assay development, assay validation, data export, peak integration, quality assessment, and biostatistical analysis. No single tool provides a complete end-to-end solution, thus this article reviews the current state and discusses future directions of these software tools in order to enable researchers to combine these tools for a comprehensive targeted proteomics workflow.

Collaboration


Dive into the Kei-Hoi Cheung's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anuj Kumar

University of Michigan

View shared research outputs
Top Co-Authors

Avatar

Joanne S. Luciano

Rensselaer Polytechnic Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matthias Samwald

Medical University of Vienna

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge