John S. Garofolo
National Institute of Standards and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John S. Garofolo.
international conference on machine learning | 2005
Jonathan G. Fiscus; Nicolas Radde; John S. Garofolo; Audrey N. Le; Jerome Ajot; Christophe Laprun
This paper presents the design and results of the Rich Transcription Spring 2005 (RT-05S) Meeting Recognition Evaluation. This evaluation is the third in a series of community-wide evaluations of language technologies in the meeting domain. For 2005, four evaluation tasks were supported. These included a speech-to-text (STT) transcription task and three diarization tasks: “Who Spoke When”, “Speech Activity Detection”, and “Source Localization.” The latter two were first-time experimental proof-of-concept tasks and were treated as “dry runs”. For the STT task, the lowest word error rate for the multiple distant microphone condition was 30.0% which represented an impressive 33% relative reduction from the best result obtained in the last such evaluation – the Rich Transcription Spring 2004 Meeting Recognition Evaluation. For the diarization “Who Spoke When” task, the lowest diarization error rate was 18.56% which represented a 19% relative reduction from that of RT-04S.
human language technology | 1993
Lynette Hirschman; Madeleine Bates; Deborah Dahl; William M. Fisher; John S. Garofolo; David S. Pallett; Kate Hunicke-Smith; Patti Price; Alexander I. Rudnicky; Evelyne Tzoukermann
The Air Travel Information System (ATIS) domain serves as the common task for DARPA spoken language system research and development. The approaches and results possible in this rapidly growing area are structured by available corpora, annotations of that data, and evaluation methods. Coordination of this crucial infrastructure is the charter of the Multi-Site ATIS Data COllection Working group (MADCOW). We focus here on selection of training and test data, evaluation of language understanding, and the continuing search for evaluation methods that will correlate well with expected performance of the technology in applications.
CLEaR | 2006
Rainer Stiefelhagen; Keni Bernardin; Rachel Bowers; John S. Garofolo; Djamel Mostefa; Padmanabhan Soundararajan
This paper is a summary of the first CLEAR evaluation on CLassification of Events, Activities and Relationships - which took place in early 2006 and concluded with a two day evaluation workshop in April 2006. CLEAR is an international effort to evaluate systems for the multimodal perception of people, their activities and interactions. It provides a new international evaluation framework for such technologies. It aims to support the definition of common evaluation tasks and metrics, to coordinate and leverage the production of necessary multimodal corpora and to provide a possibility for comparing different algorithms and approaches on common benchmarks, which will result in faster progress in the research community. This paper describes the evaluation tasks, including metrics and databases used, that were conducted in CLEAR 2006, and provides an overview of the results. The evaluation tasks in CLEAR 2006 included person tracking, face detection and tracking, person identification, head pose estimation, vehicle tracking as well as acoustic scene analysis. Overall, more than 20 subtasks were conducted, which included acoustic, visual and audio-visual analysis for many of the main tasks, as well as different data domains and evaluation conditions.
human language technology | 1993
David S. Pallett; Johathan G. Fiscus; William M. Fisher; John S. Garofolo
This paper documents benchmark tests implemented within the DARPA Spoken Language Program during the period November, 1992 - January, 1993. Tests were conducted using the Wall Street Journal-based Continuous Speech Recognition (WSJ-CSR) corpus and the Air Travel Information System (ATIS) corpus collected by the Multi-site ATIS Data COllection Working (MADCOW) Group. The WSJ-CSR tests consist of tests of large vocabulary (lexicons of 5,000 to more than 20,000 words) continuous speech recognition systems. The ATIS tests consist of tests of (1) ATIS-domain spontaneous speech (lexicons typically less than 2,000 words), (2) natural language understanding, and (3) spoken language understanding. These tests were reported on and discussed in detail at the Spoken Language Systems Technology Workshop held at the Massachusetts Institute of Technology, January 20-22, 1993.
international conference on acoustics, speech, and signal processing | 2003
Vincent M. Stanford; John S. Garofolo; Olivier Galibert; Martial Michel; Christophe Laprun
Pervasive computing devices, sensors, and networks, provide infrastructure for context aware smart meeting rooms that sense ongoing human activities and respond to them. This requires advances in areas including networking, distributed computing, sensor data acquisition, signal processing, speech recognition, human identification, and natural language processing. Open interoperability and metrology standards for the sensor and recognition technologies can aid R&D programs in making these advances. The NIST (National Institute of Standards and Technology) Smart Space and Meeting Room projects are developing tools for data formats, transport, distributed processing, and metadata. We are using them to create annotated multi modal research corpora and measurement algorithms for smart meeting rooms, which we are making available to the research and development community.
Multimodal Technologies for Perception of Humans | 2008
Rainer Stiefelhagen; Keni Bernardin; Rachel Bowers; R. Travis Rose; Martial Michel; John S. Garofolo
This paper is a summary of the 2007 CLEAR Evaluation on the Classification of Events, Activities, and Relationships which took place in early 2007 and culminated with a two-day workshop held in May 2007. CLEAR is an international effort to evaluate systems for the perception of people, their activities, and interactions. In its second year, CLEAR has developed a following from the computer vision and speech communities, spawning a more multimodal perspective of research evaluation. This paper describes the evaluation tasks, including metrics and databases used, and discusses the results achieved. The CLEAR 2007 tasks comprise person, face, and vehicle tracking, head pose estimation, as well as acoustic scene analysis. These include subtasks performed in the visual, acoustic and audio-visual domains for meeting room and surveillance data.
document analysis systems | 2006
Vasant Manohar; Padmanabhan Soundararajan; Matthew Boonstra; Harish Raju; Dmitry B. Goldgof; Rangachar Kasturi; John S. Garofolo
Text detection and tracking is an important step in a video content analysis system as it brings important semantic clues which is a vital supplemental source of index information. While there has been a significant amount of research done on video text detection and tracking, there are very few works on performance evaluation of such systems. Evaluations of this nature have not been attempted because of the extensive effort required to establish a reliable ground truth even for a moderate video dataset. However, such ventures are gaining importance now. In this paper, we propose a generic method for evaluation of object detection and tracking systems in video domains where ground truth objects can be bounded by simple geometric shapes (polygons, ellipses). Two comprehensive measures, one each for detection and tracking, are proposed and substantiated to capture different aspects of the task in a single score. We choose text detection and tracking tasks to show the effectiveness of our evaluation framework. Results are presented from evaluations of existing algorithms using real world data and the metrics are shown to be effective in measuring the total accuracy of these detection and tracking algorithms.
asian conference on computer vision | 2006
Vasant Manohar; Padmanabhan Soundararajan; Harish Raju; Dmitry B. Goldgof; Rangachar Kasturi; John S. Garofolo
The need for empirical evaluation metrics and algorithms is well acknowledged in the field of computer vision. The process leads to precise insights to understanding current technological capabilities and also helps in measuring progress. Hence designing good and meaningful performance measures is very critical. In this paper, we propose two comprehensive measures, one each for detection and tracking, for video domains where an object bounding approach to ground truthing can be followed. Thorough analysis explaining the behavior of the measures for different types of detection and tracking errors are discussed. Face detection and tracking is chosen as a prototype task where such an evaluation is relevant. Results on real data comparing existing algorithms are presented and the measures are shown to be effective in capturing the accuracy of the detection/tracking systems.
human language technology | 1992
David S. Pallett; Nancy L. Dahlgren; Jonathan G. Fiscus; William M. Fisher; John S. Garofolo; Brett C. Tjaden
This paper documents the third in a series of Benchmark Tests for the DARPA Air Travel Information System (ATIS) common task domain. The first results in this series were reported at the June 1990 Speech and Natural Language Workshop [1], and the second at the February 1991 Speech and Natural Language Workshop [2]. The February 1992 Benchmark Tests include: (1) ATIS domain spontaneous speech recognition system tests, (2) ATIS natural language understanding tests, and (3) ATIS spoken language understanding tests.
human language technology | 1990
David S. Pallett; William M. Fisher; Jonathan G. Fiscus; John S. Garofolo
The first Spoken Language System tests to be conducted in the DARPA Air Travel Information System (ATIS) domain took place during the period June 15 - 20, 1989. This paper presents a brief description of the test protocol, comparator software used for scoring results at NIST, test material selection process, and preliminary tabulation of the scored results for seven SLS systems from five sites: BBN, CMU, MIT/LCS, SRI and Unisys. One system, designated cmu-spi(r) in this paper, made use of digitized speech as input (.wav files), and generated CAS-format answers. Other systems made use of SNOR transcriptions (.snr files) as input.