Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lorraine K. Tanabe is active.

Publication


Featured researches published by Lorraine K. Tanabe.


Nature Genetics | 2000

A gene expression database for the molecular pharmacology of cancer.

Uwe Scherf; Douglas T. Ross; Mark Waltham; Lawrence H. Smith; Jae K. Lee; Lorraine K. Tanabe; Kurt W. Kohn; William C. Reinhold; Timothy G. Myers; Darren T. Andrews; Dominic A. Scudiero; Michael B. Eisen; Edward A. Sausville; Yves Pommier; David Botstein; Patrick O. Brown; John N. Weinstein

We used cDNA microarrays to assess gene expression profiles in 60 human cancer cell lines used in a drug discovery screen by the National Cancer Institute. Using these data, we linked bioinformatics and chemoinformatics by correlating gene expression and drug activity patterns in the NCI60 lines. Clustering the cell lines on the basis of gene expression yielded relationships very different from those obtained by clustering the cell lines on the basis of their response to drugs. Gene-drug relationships for the clinical agents 5-fluorouracil and L-asparaginase exemplify how variations in the transcript levels of particular genes relate to mechanisms of drug sensitivity and resistance. This is the first study to integrate large databases on gene expression and molecular pharmacology.


pacific symposium on biocomputing | 1999

EDGAR: Extraction of Drugs, Genes And Relations from the Biomedical Literature

Thomas C. Rindflesch; Lorraine K. Tanabe; John N. Weinstein; Lawrence Hunter

EDGAR (Extraction of Drugs, Genes and Relations) is a natural language processing system that extracts information about drugs and genes relevant to cancer from the biomedical literature. This automatically extracted information has remarkable potential to facilitate computational analysis in the molecular biology of cancer, and the technology is straightforwardly generalizable to many areas of biomedicine. This paper reports on the mechanisms for automatically generating such assertions and on a simple application, conceptual clustering of documents. The system uses a stochastic part of speech tagger, generates an underspecified syntactic parse and then uses semantic and pragmatic information to construct its assertions. The system builds on two important existing resources: the MEDLINE database of biomedical citations and abstracts and the Unified Medical Language System, which provides syntactic and semantic information about the terms found in biomedical abstracts.


Bioinformatics | 2002

Tagging gene and protein names in biomedical text

Lorraine K. Tanabe; W. John Wilbur

MOTIVATION The MEDLINE database of biomedical abstracts contains scientific knowledge about thousands of interacting genes and proteins. Automated text processing can aid in the comprehension and synthesis of this valuable information. The fundamental task of identifying gene and protein names is a necessary first step towards making full use of the information encoded in biomedical text. This remains a challenging task due to the irregularities and ambiguities in gene and protein nomenclature. We propose to approach the detection of gene and protein names in scientific abstracts as part-of-speech tagging, the most basic form of linguistic corpus annotation. RESULTS We present a method for tagging gene and protein names in biomedical text using a combination of statistical and knowledge-based strategies. This method incorporates automatically generated rules from a transformation-based part-of-speech tagger, and manually generated rules from morphological clues, low frequency trigrams, indicator terms, suffixes and part-of-speech information. Results of an experiment on a test corpus of 56K MEDLINE documents demonstrate that our method to extract gene and protein names can be applied to large sets of MEDLINE abstracts, without the need for special conditions or human experts to predetermine relevant subsets. AVAILABILITY The programs are available on request from the authors.


Genome Biology | 2008

Overview of BioCreative II gene mention recognition

Larry Smith; Lorraine K. Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M. Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A. Struble; Richard J. Povinelli; Andreas Vlachos; William A. Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter W. Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov

Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.


meeting of the association for computational linguistics | 2002

Tagging gene and protein names in full text articles

Lorraine K. Tanabe; W. John Wilbur

Current information extraction efforts in the biomedical domain tend to focus on finding entities and facts in structured databases or MEDLINE® abstracts. We apply a gene and protein name tagger trained on Medline abstracts (ABGene) to a randomly selected set of full text journal articles in the biomedical domain. We show the effect of adaptations made in response to the greater heterogeneity of full text.


Journal of Bioinformatics and Computational Biology | 2004

GENERATION OF A LARGE GENE/PROTEIN LEXICON BY MORPHOLOGICAL PATTERN ANALYSIS

Lorraine K. Tanabe; W. John Wilbur

The identification of gene/protein names in natural language text is an important problem in named entity recognition. In previous work we have processed MEDLINE documents to obtain a collection of over two million names of which we estimate that perhaps two thirds are valid gene/protein names. Our problem has been how to purify this set to obtain a high quality subset of gene/protein names. Here we describe an approach which is based on the generation of certain classes of names that are characterized by common morphological features. Within each class inductive logic programming (ILP) is applied to learn the characteristics of those names that are gene/protein names. The criteria learned in this manner are then applied to our large set of names. We generated 193 classes of names and ILP led to criteria defining a select subset of 1,240,462 names. A simple false positive filter was applied to remove 8% of this set leaving 1,145,913 names. Examination of a random sample from this gene/protein name lexicon suggests it is composed of 82% (+/-3%) complete and accurate gene/protein names, 12% names related to genes/proteins (too generic, a valid name plus additional text, part of a valid name, etc.), and 6% names unrelated to genes/proteins. The lexicon is freely available at ftp.ncbi.nlm.nih.gov/pub/tanabe/Gene.Lexicon.


north american chapter of the association for computational linguistics | 2006

A Priority Model for Named Entities

Lorraine K. Tanabe; W. John Wilbur

We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic context-free grammars for gene and protein name classification, and compared the results with the new model. For all three methods, we used a variable order Markov model to predict the nature of strings not represented in the training data. The Priority Model achieves an F-measure of 0.958--0.960, consistently higher than the statistical language model and probabilistic context-free grammar.


intelligent systems in molecular biology | 2005

MedTag: A Collection of Biomedical Annotations

Lawrence H. Smith; Lorraine K. Tanabe; Thomas C. Rindflesch; W. John Wilbur

We present a database of annotated biomedical text corpora merged into a portable data structure with uniform conventions. MedTag combines three corpora, MedPost, ABGene and GENETAG, within a common relational database data model. The GENETAG corpus has been modified to reflect new definitions of genes and proteins. The MedPost corpus has been updated to include 1,000 additional sentences from the clinical medicine domain. All data have been updated with original MEDLINE text excerpts, PubMed identifiers, and tokenization independence to facilitate data accuracy, consistency and usability. The data are available in flat files along with software to facilitate loading the data into a relational SQL database from ftp://ftp.ncbi.nlm.nih.gov/pub/lsmith/MedTag/medtag.tar.gz.


Proceedings of SPIE - The International Society for Optical Engineering | 2001

Analysis of gene expression data of the NCI 60 cancer cell lines using Bayesian hierarchical effects model

Jae K. Lee; Uwe Scherf; Lawrence H. Smith; Lorraine K. Tanabe; John N. Weinstein

From the end of the last decade, NCI has been performing large screening of anticancer drug compounds and molecular targets on a pool of 60 cell lines of various types of cancer. In particular, a complete set of cDNA expression array data on the 60 cell lines are now available. To discover differentially-expressed genes in each type of cancer cell lines, we need to estimate a large number of genetic parameters, especially interaction effects for all combinations of cancer types and genes, by decomposing the total variance into biological and array instrumental components. This error decomposition is important to identify subtle genes with low biological variability. An innovative statistical method is required for simultaneously estimating more than 100,000 parameters of interaction effects and error components. We propose a Bayesian statistical approach based on the construction of a hierarchical model adopting parameterization of a liner effects model. The estimation of the model parameters is performed by Markov Chain Monte Carlo, a recent computer- intensive statistical resampling technique. We have identified novel genes whose effects have not been revealed by the previous clustering approaches to the gene expression data.


Archive | 2005

The Genomic Data Mine

Lorraine K. Tanabe

The genomic data mine represents a fundamental shift from genetics to genomics, essentially from the study of one gene at a time to the study of entire genetic metabolic networks and whole genomes. Experimental laboratory data are deposited into large public repositories and a wealth of computational data mining algorithms and tools are applied to mine the data. The integration of different types of data in the genomic data mine will contribute towards an understanding of the systems biology of living organisms, contributing to improved diagnoses and individualized medicine. This chapter focuses on the genomic data mine consisting of text data, map data, sequence data, and expression data, and concludes with a case study of the Gene Expression Omnibus (GEO).

Collaboration


Dive into the Lorraine K. Tanabe's collaboration.

Top Co-Authors

Avatar

W. John Wilbur

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Lawrence H. Smith

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Alan R. Aronson

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dina Demner-Fushman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Susanne M. Humphrey

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jae K. Lee

University of Virginia

View shared research outputs
Top Co-Authors

Avatar

John N. Weinstein

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Lawrence Hunter

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar

Nicholas C. Ide

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge