John C. Henderson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John C. Henderson is active.

Explore More

Publication

Featured researches published by John C. Henderson.

Journal of Computational Biology | 1997

Finding Genes in DNA with a Hidden Markov Model

John C. Henderson; Kenneth H. Fasman

This study describes a new Hidden Markov Model (HMM) system for segmenting uncharacterized genomic DNA sequences into exons, introns, and intergenic regions. Separate HMM modules were designed and trained for specific regions of DNA: exons, introns, intergenic regions, and splice sites. The models were then tied together to form a biologically feasible topology. The integrated HMM was trained further on a set of eukaryotic DNA sequences and tested by using it to segment a separate set of sequences. The resulting HMM system which is called VEIL (Viterbi Exon-Intron Locator), obtains an overall accuracy on test data of 92% of total bases correctly labelled, with a correlation coefficient of 0.73. Using the more stringent test of exact exon prediction, VEIL correctly located both ends of 53% of the coding exons, and 49% of the exons it predicts are exactly correct. These results compare favorably to the best previous results for gene structure prediction and demonstrate the benefits of using HMMs for this problem.

Journal of Computational Biology | 1998

A Decision Tree System for Finding Genes in DNA

Arthur L. Delcher; Kenneth H. Fasman; John C. Henderson

MORGAN is an integrated system for finding genes in vertebrate DNA sequences. MORGAN uses a variety of techniques to accomplish this task, the most distinctive of which is a decision tree classifier. The decision tree system is combined with new methods for identifying start codons, donor sites, and acceptor sites, and these are brought together in a frame-sensitive dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and noncoding regions (exons and introns). The optimal segmentation is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. The scoring functions in MORGAN are sets of decision trees that are combined to give a probability estimate. Experimental results on a database of 570 vertebrate DNA sequences show that MORGAN has excellent performance by many different measures. On a separate test set, it achieves an overall accuracy of 95 %, with a correlation coefficient of 0.78, and a sensitivity and specificity for coding bases of 83 % and 79%. In addition, MORGAN identifies 58% of coding exons exactly; i.e., both the beginning and end of the coding regions are predicted correctly. This paper describes the MORGAN system, including its decision tree routines and the algorithms for site recognition, and its performance on a benchmark database of vertebrate DNA.

meeting of the association for computational linguistics | 1998

Beyond N -Grams: Can Linguistic Sophistication Improve Language Modeling?

Eric Brill; Radu Florian; John C. Henderson; Lidia Mangu

It seems obvious that a successful model of natural language would incorporate a great deal of both linguistic and world knowledge. Interestingly, state of the art language models for speech recognition are based on a very crude linguistic model, namely conditioning the probability of a word on a small fixed number of preceding words. Despite many attempts to incorporate more sophisticated information into the models, the n-gram model remains the state of the art, used in virtually all speech recognition systems. In this paper we address the question of whether there is hope in improving language modeling by incorporating more sophisticated linguistic and world knowledge, or whether the n-grams are already capturing the majority of the information that can be employed.

north american chapter of the association for computational linguistics | 2004

Direct maximization of average precision by hill-climbing, with a comparison to a maximum entropy approach

William T. Morgan; Warren R. Greiff; John C. Henderson

We describe an algorithm for choosing term weights to maximize average precision. The algorithm performs successive exhaustive searches through single directions in weight space. It makes use of a novel technique for considering all possible values of average precision that arise in searching for a maximum in a given direction. We apply the algorithm and compare this algorithm to a maximum entropy approach.

empirical methods in natural language processing | 2000

Coaxing Confidences from an Old Freind: Probabilistic Classifications from Transformation Rule Lists

Radu Florian; John C. Henderson; Grace Ngai

Transformation-based learning has been successfully employed to solve many natural language processing problems. It has many positive features, but one drawback is that it does not provide estimates of class membership probabilities.In this paper, we present a novel method for obtaining class membership probabilities from a transformation-based rule list classifier. Three experiments are presented which measure the modeling accuracy and cross-entropy of the probabilistic classifier on unseen data and the degree to which the output probabilities from the classifier can be used to estimate confidences in its classification decisions.The results of these experiments show that, for the task of text chunking, the estimates produced by this technique are more informative than those generated by a state-of-the-art decision tree.

north american chapter of the association for computational linguistics | 2015

MITRE: Seven Systems for Semantic Similarity in Tweets

Guido Zarrella; John C. Henderson; Elizabeth M. Merkhofer; Laura Strickhart

This paper describes MITRE’s participation in the Paraphrase and Semantic Similarity in Twitter task (SemEval-2015 Task 1). This effort placed first in Semantic Similarity and second in Paraphrase Identification with scores of Pearson’s r of 61.9%, F1 of 66.7%, and maxF1 of 72.4%. We detail the approaches we explored including mixtures of string matching metrics, alignments using tweet-specific distributed word representations, recurrent neural networks for modeling similarity with those alignments, and distance measurements on pooled latent semantic features. Logistic regression is used to tie the systems together into the ensembles submitted for evaluation.

north american chapter of the association for computational linguistics | 2004

MiTAP for SARS detection

Laurie E. Damianos; Samuel Bayer; Michael Chisholm; John C. Henderson; Lynette Hirschman; William T. Morgan; Marc Ubaldino; Guido Zarrella; James M. Wilson; Marat G. Polyak

The MiTAP prototype for SARS detection uses human language technology for detecting, monitoring, and analyzing potential indicators of infectious disease outbreaks and reasoning for issuing warnings and alerts. MiTAP focuses on providing timely, multilingual information access to analysts, domain experts, and decision-makers worldwide. Data sources are captured, filtered, translated, summarized, and categorized by content. Critical information is automatically extracted and tagged to facilitate browsing, searching, and scanning, and to provide key terms at a glance. The processed articles are made available through an easy-to-use news server and cross-language information retrieval system for access and analysis anywhere, any time. Specialized newsgroups and customizable filters or searches on incoming stories allow users to create their own view into the data while a variety of tools summarize, indicate trends, and provide alerts to potentially relevant spikes of activity.

north american chapter of the association for computational linguistics | 2003

Word alignment baselines

John C. Henderson

Simple baselines provide insights into the value of scoring functions and give starting points for measuring the performance improvements of technological advances. This paper presents baseline unsupervised techniques for performing word alignment based on geometric and word edit distances as well as supervised fusion of the results of these techniques using the nearest neighbor rule.

international conference on human language technology research | 2001

Integrated Feasibility Experiment for Bio-Security: IFE-Bio a TIDES demonstration

Lynette Hirschman; Kris Concepcion; Laurie E. Damianos; David S. Day; John Delmore; Lisa Ferro; John Griffith; John C. Henderson; Jeff Kurtz; Inderjeet Mani; Scott A. Mardis; Tom McEntee; Keith J. Miller; Beverly Nunan; Jay M. Ponte; Florence Reeder; Ben Wellner; George Wilson; Alex Yeh

As part of MITREs work under the DARPA TIDES (Translingual Information Detection, Extraction and Summarization) program, we are preparing a series of demonstrations to showcase the TIDES Integrated Feasibility Experiment on Bio-Security (IFE-Bio). The current demonstration illustrates some of the resources that can be made available to analysts tasked with monitoring infectious disease outbreaks and other biological threats.

north american chapter of the association for computational linguistics | 2003

Exploiting diversity for answering questions

John D. Burger; John C. Henderson

We describe initial experiments in combining the output of question answering systems using data from the 2002 TREC Question Answering task. We explore several distance-based combining methods, as well as a number of distance metrics involving both word and character ngrams.

Explore More