Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Julia Hirschberg is active.

Publication


Featured researches published by Julia Hirschberg.


Computational Linguistics | 1993

Empirical studies on the disambiguation of cue phrases

Julia Hirschberg; Diane J. Litman

Cue phrases are linguistic expressions such as now and well that function as explicit indicators of the structure of a discourse. For example, now may signal the beginning of a subtopic or a return to a previous topic, while well may mark subsequent material as a response to prior material, or as an explanatory comment. However, while cue phrases may convey discourse structure, each also has one or more alternate uses. While incidentally may be used sententially as an adverbial, for example, the discourse use initiates a digression. Although distinguishing discourse and sentential uses of cue phrases is critical to the interpretation and generation of discourse, the question of how speakers and hearers accomplish this disambiguation is rarely addressed.This paper reports results of empirical studies on discourse and sentential uses of cue phrases, in which both text-based and prosodic features were examined for disambiguating power. Based on these studies, it is proposed that discourse versus sentential usage may be distinguished by intonational features, specifically, pitch accent and prosodic phrasing. A prosodic model that characterizes these distinctions is identified. This model is associated with features identifiable from text analysis, including orthography and part of speech, to permit the application of the results of the prosodic analysis to the generation of appropriate intonational features for discourse and sentential uses of cue phrases in synthetic speech.


Archive | 1997

Progress in speech synthesis

Jan P. H. van Santen; Joseph P. Olive; Richard Sproat; Julia Hirschberg

1. Section Introduction: Signal Processing and Source Modelling 2.Synthesizing Allophonic Glottaliztion 3. Text-to-Speech SynthesisWith Dynamic Control of Speech 4. Modifiction of the the AperiodicComponent of Speech Signals for Synthesis 5. On the Use of aSinusoidal Model for Speech Synthesis in Text-to-Speech 6. SectionIntroduction: The Analysis of Text in Text-to-Speech Synthesis 7.Language-Independent Data-Oriented Grapheme-to Phoneme Conversion 8.All-Prosodic Speech Synthesis 9. A Model of Timing for Non-SegmentalPhonological Structure 10. a Complete Linguistic analysis for anItalian Text-to-Speech System 11. Discourse Structural Constraints onAccent in Narrative 12. Homograph Disambiguation in Text-to-SpeechSynthesis 13. Section Introduction: Talking Heads in Speech 1= Synthesis 14. Section Introduction: Articulatory Synthesis and VisualSpeech: Bridging the Gap Between Speech Science and SpeechApplications 15. Speech Models and Speech Synthesis 16. A 3D Modelof the Lips and of the Jaw for Visual Speech Synthesis 17. AFramework for Synthesis of Segments based on Pseudo-articulatoryparameters 18. Biomechanical and Physiologically based Speech


meeting of the association for computational linguistics | 2004

Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies

Michel Galley; Kathleen R. McKeown; Julia Hirschberg; Elizabeth Shriberg

We describe a statistical approach for modeling agreements and disagreements in conversational interaction. Our approach first identifies adjacency pairs using maximum entropy ranking based on a set of lexical, durational, and structural features that look both forward and backward in the discourse. We then classify utterances as agreement or disagreement using these adjacency pairs and features that represent various pragmatic influences of previous agreement or disagreement on the current utterance. Our approach achieves 86.9% accuracy, a 4.9% increase over previous work.


Computer Speech & Language | 1992

Automatic classification of intonational phrase boundaries

Michelle Wang; Julia Hirschberg

Abstract The relationship between the intonational characteristics of an utterance and other features inferable from its text represents an important source of information both for speech recognition, to constrain the set of allowable hypotheses, and for speech synthesis, to assign intonational features appropriately from text. This work investigates the usefulness of a number of textual features and additional intonational features in predicting the location of one particular intonational feature—intonational phrase boundaries—in natural speech. The corpus for this investigation is 298 utterances from the 774 in the DARPA-collected Air Travel Information Service (ATIS) database. For statistical modeling, we employ classification and regression tree (CART) techniques. We achieve success rates of just over 90%, representing a major improvement over previous attempts at boundary prediction for spontaneous speech.


Journal of the Acoustical Society of America | 1994

A CORPUS-BASED STUDY OF REPAIR CUES IN SPONTANEOUS SPEECH

Christine H. Nakatani; Julia Hirschberg

The occurrence of disfluencies in fully natural speech poses difficult challenges for spoken language understanding systems. For example, although self-repairs occur in about 10% of spontaneous utterances, they are often unmodeled in speech recognition systems. This is partly due to the fact that little is known about the extent to which cues in the speech signal may facilitate automatic repair processing. In this paper, acoustic and prosodic cues to self-repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair correction. The relative contributions of these speech-based cues, as well as other text-based repair cues, are examined in a statistical model of repair site detection that achieves a precision rate of 91% and recall of 86% on a prosodically labeled corpus of repair utterances.


meeting of the association for computational linguistics | 1996

A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues

Julia Hirschberg; Christine H. Nakatani

This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the nature of this relationship. We also compare the acoustic-prosodic features of initial, medial, and final utterances in a discourse segment.


Science | 2015

Advances in natural language processing.

Julia Hirschberg; Christopher D. Manning

Natural language processing employs computational techniques for the purpose of learning, understanding, and producing human language content. Early computational approaches to language research focused on automating the analysis of the linguistic structure of language and developing basic technologies such as machine translation, speech recognition, and speech synthesis. Today’s researchers refine and make use of such tools in real-world applications, creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance, and identifying sentiment and emotion toward products and services. We describe successes and challenges in this rapidly advancing area.


Speech Communication | 2002

Communication and prosody: functional aspects of prosody

Julia Hirschberg

Interest in the contribution prosodic information makes to human communication has led to increasing expectations that such information could be of use in text-to-speech and speech understanding systems, and in application of these technologies to spoken dialogue systems. To date, research results far exceed their technology applications. This paper suggests some areas in which progress has been made, and some in which more might be made, with particular emphasis upon text-to-speech synthesis and spoken dialogue systems.


conference of the international speech communication association | 2011

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions

Rivka Levitan; Julia Hirschberg

In conversation, speakers become more like each other in various dimensions. This phenomenon, commonly called entrainment, coordination, or alignment, is widely believed to be crucial to the success and naturalness of human interactions. We investigate entrainment in four acoustic and prosodic dimensions. We explore whether speakers coordinate with each other in these dimensions over the conversation as a whole as well as on a turn-by-turn basis and in both relative and absolute terms, and whether this coordination improves over the course of the conversation.


human factors in computing systems | 2002

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Steve Whittaker; Julia Hirschberg; Brian Amento; Litza A. Stark; Michiel Bacchiani; Philip L. Isenhour; Larry Stead; Gary Zamchick; Aaron E. Rosenberg

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMails support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly

Collaboration


Dive into the Julia Hirschberg's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Agustín Gravano

University of Buenos Aires

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge