Joan Bachenko
Bell Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joan Bachenko.
meeting of the association for computational linguistics | 1986
Joan Bachenko; Eileen Fitzpatrick; C. E. Wright
While various aspects of syntactic structure have been shown to bear on the determination of phraselevel prosody, the text-to-speech field has lacked a robust working system to test the possible relations between syntax and prosody. We describe an implemented system which uses the deterministic parser Fidditch to create the input for a set of prosody rules. The prosody rules generate a prosody tree that specifies the location and relative strength of prosodic phrase boundaries. These specifications are converted to annotations for the Bell Labs text-to-speech system that dictate modulations in pitch and duration for the input sentence.We discuss the results of an experiment to determine the performance of our system. We are encouraged by an initial 5 percent error rate and we see the design of the parser and the modularity of the system allowing changes that will upgrade this rate.
north american chapter of the association for computational linguistics | 2001
Sergey V. Pakhomov; Michael Schonwetter; Joan Bachenko
In automatic speech recognition (ASR) enabled applications for medical dictations, corpora of literal transcriptions of speech are critical for training both speaker independent and speaker adapted acoustic models. Obtaining these transcriptions is both costly and time consuming. Non-literal transcriptions, on the other hand, are easy to obtain because they are generated in the normal course of a medical transcription operation. This paper presents a method of automatically generating texts that can take the place of literal transcriptions for training acoustic and language models. ATRS is an automatic transcription reconstruction system that can produce near-literal transcriptions with almost no human labor. We will show that (i) adapted acoustic models trained on ATRS data perform as well as or better than adapted acoustic models trained on literal transcriptions (as measured by recognition accuracy) and (ii) language models trained on ATRS data have lower perplexity than language models trained on non-literal data.
Archive | 2010
Eileen Fitzpatrick; Joan Bachenko
Experimental laboratory results, often performed with college student subjects, have proposed several linguistic phenomena as indicative of speaker deception. We have identified a subset of these phenomena that can be formalized as a linguistic model. The model incorporates three classes of language-based deception cues: (1) linguistic devices used to avoid making a direct statement of fact, for example, hedges; (2) preference for negative expressions in word choice, syntactic structure, and semantics; (3) inconsistencies with respect to verb and noun forms, for example, verb tense changes. The question our research addresses is whether the cues we have adapted from laboratory studies will recognize deception in real-world statements by suspects and witnesses. The issue addressed here is how to test the accuracy of these linguistic cues with respect to identifying deception. To perform the test, we assembled a corpus of criminal statements, police interrogations, and civil testimony that we annotated in two distinct ways, first for language-based deception cues and second for verification of the claims made in the narrative data. The paper discusses the possible methods for building a corpus to test the deception cue hypothesis, the linguistic phenomena associated with deception, and the issues involved in assembling a forensic corpus.
conference on applied natural language processing | 1992
Joan Bachenko; Jeffrey Daugherty; Eileen Fitzpatrick
In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechanism; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parsers performance compares favorably with human judgments about phrasing.
Journal of the Acoustical Society of America | 1987
Joan Bachenko; Eileen Fitzpatrick; John Lacy
While text‐to‐speech systems tend to perform well on word pronunciation, they fall short when it comes to providing good prosody for complete sentences. An experimental text‐to‐speech system that uses a natural language parser and prosody rules to determine prosodic phrasing for English input to text‐to‐speech synthesis will be described. Building on information from the syntax tree, the prosody rules specify the location and relative strength of prosodic phrase boundaries; these specifications are then used to dictate modulations in pitch and timing for the Olive‐Liberman synthesizer [J. P. Olive and M. Y. Liberman, J. Acoust. Soc. Am. Suppl. 1 78, S6 (1985)]. Two important assumptions motivate the prosody rules. First, constituency below the level of the sentence is the crucial determinant for boundary location; i.e., boundary location is influeneed by noun, prepositional, and adjective phrases, but not by clauses or verb phrases. Second, the relative strength of boundaries is determined by balancing pr...
Natural Language Engineering | 1995
Joan Bachenko; Eileen Fitzpatrick; Jeffrey Daugherty
Text-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU ). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.
Journal of the Acoustical Society of America | 1993
Joan Bachenko; William A. Gale
Studies of interstress intervals tend to be more suggestive than conclusive because they rely on relatively few speech samples. The study reported here is based on observations taken from 32 000 intervals in the read speech of 106 speakers. A phone recognizer was used to label the onset times of each phone; intervals were identified as the span between stressed vowel onsets and each interval was classed according to its structure (the number of consonants and reduced vowels it contained) as well as duration. The data showed strong regularities in the dependence of time on classification. A model mixing duration, interval structure, and prior probabilities was then constructed and tested on phone lattices; the lattices were generated by the phone recognizer for speech from the resource management task. When durations were fixed but interval structure varied, prior probabilities pruned incorrect answers significantly better than chance; the mixed model’s improvement was inconclusive. However, when both dura...
Proceedings of the Second Workshop on Computational Approaches to Deception Detection | 2016
Eileen Fitzpatrick; Joan Bachenko; Linguistech Llc
In this paper we present an initial experiment in the estimation of the amenability of new domains to true/false classification. We choose four domains, two of which have been classified for deception, and use the out-ofrank distance measure on n-grams to aid in deciding whether the third and fourth domains are amenable to T/F classification. We then use a classifier covered in the literature to train on the verified domains and test on the new domains to determine whether the relative distance measure can be a predictor of classification accuracy.
Computational Linguistics | 1990
Joan Bachenko; Eileen Fitzpatrick
Journal of the Acoustical Society of America | 1998
Colin W. Wightman; Joan Bachenko