Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Susan E. Hauser is active.

Publication


Featured researches published by Susan E. Hauser.


Journal of the American Medical Informatics Association | 2006

Automatically Identifying Health Outcome Information in MEDLINE Records

Dina Demner-Fushman; Barbara Few; Susan E. Hauser; George R. Thoma

OBJECTIVE Understanding the effect of a given intervention on the patients health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. DESIGN An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. MEASUREMENTS The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. RESULTS Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. CONCLUSION Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.


document recognition and retrieval | 1999

Automated zone correction in bitmapped document images

Susan E. Hauser; Daniel X. Le; George R. Thoma

The optical character recognition system (OCR) selected by the National Library of Medicine (NLM) as part of its system for automating the production of MEDLINER records frequently segments the scanned page images into zones which are inappropriate for NLMs application. Software has been created in-house to correct the zones using character coordinate and character attribute information provided as part of the OCR output data. The software correctly delineates over 97% of the zones of interest tested to date.


computer-based medical systems | 2004

A testbed system for mobile point-of-care information delivery

Susan E. Hauser; Dina Demner-Fushman; Glenn Ford; George R. Thoma

PubMed on Tap is a testbed system that supports search of and retrieval from the National Library of Medicines MEDLINE/spl reg/ database from a PDA. The goal of the PubMed on Tap project is to discover and implement design principles for point-of-care delivery of clinical support information. The project explores user interface issues, information content and organization, and system performance. Here we present our progress in these areas.


IEEE Journal on Selected Areas in Communications | 1989

Networking AT-class computers for image distribution

Susan E. Hauser; Marc I. Felsen; Michael Gill; George R. Thoma

Using the prototype electronic document storage and retrieval (EDSR) system as a testbed, the Ethernet-based image transmission protocol is demonstrated to be a fast, reliable, and efficient mechanism for transmitting document image files among AT-class computers. The principal bottleneck in the servers performance, i.e. the rate at which data are read from the optical disk via the SCSI host adapter, has been identified and corrected. The transfer rate is expected to increase from approximately 60 kb/s to approximately 480 kb/s, and 800% improvement. A method is suggested for designing a network to support multiuser access to a document image database on optical disks. Depending on the final service rate and the rate at which actual users request images, a network with up to five servers should be able to support from five to ten users per server. >


document recognition and retrieval | 2003

Correcting OCR text by association with historical datasets

Susan E. Hauser; Jonathan Schlaifer; Tehseen F. Sabir; Dina Demner-Fushman; Scott Straughan; George R. Thoma

The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting algorithms to generate electronic bibliographic citation data from paper biomedical journal articles. The OCR server incorporated in MARS performs well in general, but fares less well with text printed in small or italic fonts. Affiliations are often printed in small italic fonts in the journals processed by MARS. Consequently, although the automatic processes generate much of the citation data correctly, the affiliation field frequently contains incorrect data, which must be manually corrected by verification operators. In contrast, author names are usually printed in large, normal fonts that are correctly converted to text by the OCR server. The National Library of Medicine’s MEDLINE database contains 11 million indexed citations for biomedical journal articles. This paper documents our effort to use the historical author, affiliation relationships from this large dataset to find potential correct affiliations for MARS articles based on the author and the affiliation in the OCR output. Preliminary tests using a table of about 400,000 author/affiliation pairs extracted from the corrected data from MARS indicated that about 44% of the author/affiliation pairs were repeats and that about 47% of newly converted author names would be found in this set. A text-matching algorithm was developed to determine the likelihood that an affiliation found in the table corresponding to the OCR text of the first author was the current, correct affiliation. This matching algorithm compares an affiliation found in the author/affiliation table (found with the OCR text of the first author) to the OCR output affiliation, and calculates a score indicating the similarity of the affiliation found in the table to the OCR affiliation. Using a ground truth set of 519 OCR author/OCR affiliation/correct affiliation triples, the matching algorithm is able to select a correct affiliation for the author 43% of the time with a false positive rate of 6%, a true negative rate of 44% and a false negative rate of 7%. MEDLINE citations with United States affiliations typically include the zip code. In addition to using author names as clues to correct affiliations, we are investigating the value of the OCR text of zip codes as clues to correct USA affiliations. Current work includes generation of an author/affiliation/zipcode table from the entire MEDLINE database and development of a daemon module to implement affiliation selection and matching for the MARS system using both author names and zip codes. Preliminary results from the initial version of the daemon module and the partially filled author/affiliation/zipcode table are encouraging.


computer based medical systems | 2001

Automated medical citation records creation for Web-based on-line journals

Daniel X. Le; Loc Q. Tran; Joseph Chow; Jongwoo Kim; Susan E. Hauser; Chan W. Moon; George R. Thoma

With the rapid expansion and utilization of the Internet and Web technologies, there is an increasing number of online medical journals. Online journals pose new challenges in the areas of automated document analysis and content extraction, database citation records creation, data mining and other document-related applications. New techniques are needed to capture, classify, analyze, extract, modify and reformat Web-based document information for computer storage, access and processing. At the National Library of Medicine (NLM), we are developing an automated system, temporarily code-named WebMARS (Web-based Medical Article Record System), to create citation records for the MEDLINE/sup (R)/ database. The system downloads and classifies Web document articles, parses and labels the article contents, extracts and reformats the citation information from the article, presents the entire citation to operators for reconciling (validation), and uploads the citation records to the MEDLINE database.


document recognition and retrieval | 2000

Approximate string matching algorithms for limited-vocabulary OCR output correction

Thomas A. Lasko; Susan E. Hauser

Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian analysis, and Bayesian analysis on an actively thinned reference dictionary were implemented and their accuracy rates compared. Of the five, the Bayesian algorithm produced the most correct matches (87%), and had the advantage of producing scores that have a useful and practical interpretation.


Proceedings of SPIE | 1999

Speech recognition for program control and data entry in a production environment

Susan E. Hauser; Tehseen F. Sabir; George R. Thoma

The Lister Hill National Center for Biomedical Communications, an R&D division of the National Library of Medicine, has developed a PC-based system for semi-automated entry of journal citation data into MEDLINETM. The system, called MARS for Medical Article Records System, includes many automated features but requires a few manual tasks such as scanning and the entry of certain data that are not located on the scanned page. Now that considerable computing power and speed are routinely available on desktop PCs, we think it may be possible to include speech recognition as an optional user interface to reduce operator burden and to improve speed and quality for document scanning and data entry. We undertook a study to determine if speech recognition was sufficiently accurate, reliable and immune to noise to warrant integration with MARS workstations.


computer based medical systems | 1998

Lexicon assistance reduces manual verification of OCR output

Susan E. Hauser; Allen C. Browne; George R. Thoma; Alexa T. McCray

An OCR system chosen for its high recognition rate and low percent of false positives also assigns low confidence values to many characters that are actually correct. Human operators must verify all words containing low-confidence characters. We describe the creation of a lexicon optimized for automatically selectively resetting confidence values to high, thus reducing operator verification time. Two word lists, OCR Correct and OCR Incorrect, were extracted from files that had already been processed and verified, and became the standard for comparing candidate lexicons. A lexicon was selected from several candidate word lists maintained by the National Library of Medicine (NLM). In operation for about six months, lexicon-assisted verification has been reducing the number of words requiring operator verification by over 50%.


1994 Topical Meeting on Optical Data Storage | 1994

Optical disk jukebox performance in multi-user applications

Susan E. Hauser; Gautam Roy; George R. Thoma

The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine, is evaluating an optical disk jukebox as a digital image store to support prototype systems for image distribution over the Internet. This paper summarizes a study undertaken to determine the performance characteristics of the jukebox to support multiple image databases simultaneously accessed by multiple users. A motivation for this investigation is the need to provide users access to digitized images of medical documents and radiographs.

Collaboration


Dive into the Susan E. Hauser's collaboration.

Top Co-Authors

Avatar

George R. Thoma

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Dina Demner-Fushman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Glenn Ford

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Daniel X. Le

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Joshua L. Jacobs

University of Hawaii at Manoa

View shared research outputs
Top Co-Authors

Avatar

Tehseen F. Sabir

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jongwoo Kim

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Lewis E. Berman

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Loc Q. Tran

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Susanne M. Humphrey

National Institutes of Health

View shared research outputs
Researchain Logo
Decentralizing Knowledge