Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeffrey S. Sorensen.
Journal of the Acoustical Society of America | 2002
Robert E. Donovan; Martin Franz; Salim Roukos; Jeffrey S. Sorensen
In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.
meeting of the association for computational linguistics | 2006
Imed Zitouni; Jeffrey S. Sorensen; Ruhi Sarikaya
Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document. The approach can easily integrate and make effective use of diverse types of information; the model we propose integrates a wide array of lexical, segment-based and part-of-speech tag features. The combination of these feature types leads to a state-of-the-art diacritization model. Using a publicly available corpus (LDCs Arabic Treebank Part 3), we achieve a diacritic error rate of 5.1%, a segment error rate 8.5%, and a word error rate of 17.3%. In case-ending-less setting, we obtain a diacritic error rate of 2.2%, a segment error rate 4.0%, and a word error rate of 7.2%.
international conference on acoustics, speech, and signal processing | 2001
Qiang Cheng; Jeffrey S. Sorensen
The technique of embedding a digital signal into an audio recording or image using techniques that render the signal imperceptible has received significant attention. Embedding an imperceptible, cryptographically secure signal, or watermark, is seen as a potential mechanism that may be used to prove ownership or detect tampering. While there has been a considerable amount of attention devoted to the techniques of spread-spectrum signaling for use in image and audio watermarking applications, there has only been a limited study for embedding data signals in speech. Speech is an uncharacteristically narrow band signal given the perceptual capabilities of the human hearing system. However, using speech analysis techniques, one may design an effective data signal that can be used to hide an arbitrary message in a speech signal. Also included are experiments demonstrating the subliminal channel capacity of the speech data embedding technique developed here.
public key cryptography | 2010
Rosario Gennaro; Carmit Hazay; Jeffrey S. Sorensen
This paper presents an efficient protocol for securely computing the fundamental problem of pattern matching. This problem is defined in the two-party setting, where party P1 holds a pattern and party P2 holds a text. The goal of P1 is to learn where the pattern appears in the text, without revealing it to P2 or learning anything else about P2’s text. Our protocol is the first to address this problem with full security in the face of malicious adversaries. The construction is based on a novel protocol for secure oblivious automata evaluation which is of independent interest. In this problem party P1 holds an automaton and party P2 holds an input string, and they need to decide if the automaton accepts the input, without learning anything else.
meeting of the association for computational linguistics | 2005
Imed Zitouni; Jeffrey S. Sorensen; Xiaoqiang Luo; Radu Florian
Arabic presents an interesting challenge to natural language processing, being a highly inflected and agglutinative language. In particular, this paper presents an in-depth investigation of the entity detection and recognition (EDR) task for Arabic. We start by highlighting why segmentation is a necessary prerequisite for EDR, continue by presenting a finite-state statistical segmenter, and then examine how the resulting segments can be better included into a mention detection system and an entity recognition system; both systems are statistical, build around the maximum entropy principle. Experiments on a clearly stated partition of the ACE 2004 data show that stem-based features can significantly improve the performance of the EDT system by 2 absolute F-measure points. The system presented here had a competitive performance in the ACE 2004 evaluation.
international conference on acoustics speech and signal processing | 1998
Homayoon S. M. Beigi; Stephane Herman Maes; Jeffrey S. Sorensen
This paper presents a distance measure for evaluating the closeness of two sets of distributions. The job of finding the distance between two distributions has been addressed with many solutions present in the literature. To cluster speakers using the pre-computed models of their speech, a need arises for computing a distance between these models which are normally built of a collection of distributions such as Gaussians (e.g., comparison between two HMM models). The definition of this distance measure creates many possibilities for speaker verification, speaker adaptation, speaker segmentation and many other related applications. A distance measure is presented for evaluating the closeness of a collection of distributions with centralized atoms such as Gaussians (but not limited to Gaussians). Several applications including some in speaker recognition with some results are presented using this distance measure.
international conference on acoustics speech and signal processing | 1999
Robert E. Donovan; Martin Franz; Jeffrey S. Sorensen; Salim Roukos
This paper describes a phrase splicing and variable substitution system which offers an intermediate form of automated speech production lying in-between the extremes of recorded utterance playback and full text-to-speech synthesis. The system incorporates a trainable speech synthesiser and an application specific set of pre-recorded phrases. The text to be synthesised is converted to a phone sequence using phone sequences present in the pre-recorded phrases wherever possible, and a pronunciation dictionary elsewhere. The synthesis inventory of the synthesiser is augmented with the synthesis information associated with the pre-recorded phrases used to construct the phone sequence. The synthesiser then performs a dynamic programming search over the augmented inventory to select a segment sequence to produce the output speech. The system enables the seamless splicing of pre-recorded phrases both with other phrases and with synthetic speech. It enables very high quality speech to be produced automatically within a limited domain.
international conference on acoustics, speech, and signal processing | 2007
Ahmad Emami; Kishore Papineni; Jeffrey S. Sorensen
A novel distributed language model that has no constraints on the n-gram order and no practical constraints on vocabulary size is presented. This model is scalable and allows for an arbitrarily large corpus to be queried for statistical estimates. Our distributed model is capable of producing n-gram counts on demand. By using a novel heuristic estimate for the interpolation weights of a linearly interpolated model, it is possible to dynamically compute the language model probabilities. The distributed architecture follows the client-server paradigm and allows for each client to request an arbitrary weighted mixture of the corpus. This allows easy adaptation of the language model to particular test conditions. Experiments using the distributed LM for re-ranking N-best lists of a speech recognition system resulted in considerable improvements in word error rate (WER), while integration with a machine translation decoder resulted in significant improvements in translation quality as measured by the BLEU score.
Machine Translation | 2002
Yuqing Gao; Bowen Zhou; Zijian Diao; Jeffrey S. Sorensen; Michael Picheny
We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer.Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results.
meeting of the association for computational linguistics | 2005
Hany Hassan; Jeffrey S. Sorensen
Translation of named entities (NEs), such as person names, organization names and location names is crucial for cross lingual information retrieval, machine translation, and many other natural language processing applications. Newly named entities are introduced on daily basis in newswire and this greatly complicates the translation task. Also, while some names can be translated, others must be transliterated, and, still, others are mixed. In this paper we introduce an integrated approach for named entity translation deploying phrase-based translation, word-based translation, and transliteration modules into a single framework. While Arabic based, the approach introduced here is a unified approach that can be applied to NE translation for any language pair.