Yum Lina Yip
Swiss Institute of Bioinformatics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yum Lina Yip.
Human Mutation | 2008
Yum Lina Yip; Maria Livia Famiglietti; Arnaud Gos; Paula D. Duek; Fabrice David; Alain Gateau; Amos Marc Bairoch
UniProtKB/Swiss‐Prot (http://beta.uniprot.org/uniprot; last accessed: 19 October 2007) is a manually curated knowledgebase providing information on protein sequences and functional annotation. It is part of the Universal Protein Resource (UniProt). The knowledgebase currently records a total of 32,282 single amino acid polymorphisms (SAPs) touching 6,086 human proteins (Release 53.2, 26 June 2007). Nearly all SAPs are derived from literature reports using strict inclusion criteria. For each SAP, the knowledgebase provides, apart from the position of the mutation and the resulting change in amino acid, information on the effects of SAPs on protein structure and function, as well as their potential involvement in diseases. Presently, there are 16,043 disease‐related SAPs, 14,266 polymorphisms, and 1,973 unclassified variants recorded in UniProtKB/Swiss‐Prot. Relevant information on SAPs can be found in various sections of a UniProtKB/Swiss‐Prot entry. In addition to these, cross‐references to human disease databases as well as other gene‐specific databases, are being added regularly. In 2003, the Swiss‐Prot variant pages were created to provide a concise view of the information related to the SAPs recorded in the knowledgebase. When compared to the information on missense variants listed in other mutation databases, UniProtKB/Swiss‐Prot further records information on direct protein sequencing and characterization including posttranslational modifications (PTMs). The direct links to the Online Mendelian Inheritance in Man (OMIM) database entries further enhance the integration of phenotype information with data at protein level. In this regard, SAP information in UniProtKB/Swiss‐Prot complements nicely those existing in genomic and phenotypic databases, and is valuable for the understanding of SAPs and diseases. Hum Mutat 29(3), 361–366, 2008.
Bioinformatics | 2010
Anaïs Mottaz; Fabrice Pierre André David; Anne-Lise Veuthey; Yum Lina Yip
Summary: The SwissVar portal provides access to a comprehensive collection of single amino acid polymorphisms and diseases in the UniProtKB/Swiss-Prot database via a unique search engine. In particular, it gives direct access to the newly improved Swiss-Prot variant pages. The key strength of this portal is that it provides a possibility to query for similar diseases, as well as the underlying protein products and the molecular details of each variant. In the context of the recently proposed molecular view on diseases, the SwissVar portal should be in a unique position to provide valuable information for researchers and to advance research in this area. Availability: The SwissVar portal is available at www.expasy.org/swissvar Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
BMC Bioinformatics | 2008
Anaïs Mottaz; Yum Lina Yip; Patrick Ruch; Anne-Lise Veuthey
BackgroundAlthough the UniProt KnowledgeBase is not a medical-oriented database, it contains information on more than 2,000 human proteins involved in pathologies. However, these annotations are not standardized, which impairs the interoperability between biological and clinical resources. In order to make these data easily accessible to clinical researchers, we have developed a procedure to link diseases described in the UniProtKB/Swiss-Prot entries to the MeSH disease terminology.ResultsWe mapped disease names extracted either from the UniProtKB/Swiss-Prot entry comment lines or from the corresponding OMIM entry to the MeSH. Different methods were assessed on a benchmark set of 200 disease names manually mapped to MeSH terms. The performance of the retained procedure in term of precision and recall was 86% and 64% respectively. Using the same procedure, more than 3,000 disease names in Swiss-Prot were mapped to MeSH with comparable efficiency.ConclusionsThis study is a first attempt to link proteins in UniProtKB to the medical resources. The indexing we provided will help clinicians and researchers navigate from diseases to genes and from genes to diseases in an efficient way. The mapping is available at: http://research.isb-sib.ch/unimed.
Journal of Bioinformatics and Computational Biology | 2007
Yum Lina Yip; Nathalie Lachenal; Violaine Pillet; Anne-Lise Veuthey
The UniProt/Swiss-Prot Knowledgebase records about 30,500 variants in 5,664 proteins (Release 52.2). Most of these variants are manually curated single amino acid polymorphisms (SAPs) with references to the literature. In order to keep the list of published documents related to SAPs up to date, an automatic information retrieval method is developed to recover texts mentioning SAPs. The method is based on the use of regular expressions (patterns) and rules for the detection and validation of mutations. When evaluated using a corpus of 9,820 PubMed references, the precision of the retrieval was determined to be 89.5% over all variants. It was also found that the use of nonstandard mutation nomenclature and sequence positional correction is necessary to retrieve a significant number of relevant articles. The method was applied to the 5,664 proteins with variants. This was performed by first submitting a PubMed query to retrieve articles using gene or protein names and a list of mutation-related keywords; the SAP detection procedure was then used to recover relevant documents. The method was found to be efficient in retrieving new references on known polymorphisms. New references on known SAPs will be rendered accessible to the public via the Swiss-Prot variant pages.
BMC Bioinformatics | 2008
Fabrice David; Yum Lina Yip
BackgroundSequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap – a new UniProt-PDB residue-residue level mapping – was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps.ResultsSSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings.ConclusionSSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.
Journal of Integrative Bioinformatics | 2007
Anaïs Mottaz; Yum Lina Yip; Patrick Ruch; Anne-Lise Veuthey
Abstract In order to improve the accessibility of genomic and proteomic information to medical researchers, we have developed a procedure to link biological information on proteins involved in diseases to the MeSH and ICD-10 disease terminologies. For this purpose, we took advantage of the manually curated disease annotations in more than 2,000 human protein entries of the UniProt KnowledgeBase. We mapped disease names extracted from the entry comment lines or from the corresponding OMIM entry to the MeSH. The method was assessed on a benchmark set of 200 manually mapped disease comment lines. We obtained a recall of 54% for 91% precision. The same procedure was used to map the more than 3,000 diseases in Swiss-Prot to MeSH with comparable efficiency. Tested on ICD-10, the coverage of the mapped terms was lower, which could be explained by the coarse-grained structure of this terminology for hereditary disease description. The mapping is provided as supplementary material at http://research.isbsib.ch/unimed.
Human Mutation | 2004
Yum Lina Yip; Holger Scheib; Alexander Diemand; Alexandre Gattiker; L Famiglietti; Elisabeth Gasteiger; Amos Marc Bairoch
Computers in Biology and Medicine | 2006
Anand Kumar; Yum Lina Yip; Barry Smith; Pierre Grenon
Human Mutation | 2006
Yum Lina Yip; Vincent Zoete; Holger Scheib; Olivier Michielin
medical informatics europe | 2005
Anand Kumar; Yum Lina Yip; Barry Smith; Dirk Marwede; Daniel D. Novotny