Ron Hoory | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ron Hoory is active.

Explore More

Publication

Featured researches published by Ron Hoory.

international acm sigir conference on research and development in information retrieval | 2006

Spoken document retrieval from call-center conversations

Jonathan Mamou; David Carmel; Ron Hoory

We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording quality, which makes automatic speech recognition (ASR) a highly difficult task. For typical call-center data, even state-of-the-art large vocabulary continuous speech recognition systems produce a transcript with word error rate of 30% or higher. In addition to the output transcript, advanced systems provide word confusion networks (WCNs), a compact representation of word lattices associating each word hypothesis with its posterior probability. Our work exploits the information provided by WCNs in order to improve retrieval performance. In this paper, we show that the mean average precision (MAP) is improved using WCNs compared to the raw word transcripts. Finally, we analyze the effect of increasing ASR word error rate on search effectiveness. We show that MAP is still reasonable even under extremely high error rate.

conference on information and knowledge management | 2005

Automatic analysis of call-center conversations

Gilad Mishne; David Carmel; Ron Hoory; Alexey Roytman; Aya Soffer

We describe a system for automating call-center analysis and monitoring. Our system integrates transcription of incoming calls with analysis of their content; for the analysis, we introduce a novel method of estimating the domain-specific importance of conversation fragments, based on divergence of corpus statistics. Combining this method with Information Retrieval approaches, we provide knowledge-mining tools both for the call-center agents and for administrators of the center.

international conference on acoustics, speech, and signal processing | 2000

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

Dan Chazan; Ron Hoory; Gilad Cohen; Meir Zibulski

This paper presents a novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values. The reconstruction technique is based on the sinusoidal speech representation. A set of sine-wave frequencies is derived using the pitch frequency and voicing decisions, and synthetic phases are then assigned to each respective sine wave. The sine-wave amplitudes are generated by sampling a linear combination of frequency domain basis functions. The basis function gains are determined such that the mel-frequency binned spectrum of the reconstructed speech is similar to the mel-frequency binned spectrum, obtained from the original MFCC vector by IDCT and antilog operations. Natural sounding, good quality intelligible speech is obtained by this procedure.

Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring | 2015

Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease

Alexandra König; Aharon Satt; Alexander Sorin; Ron Hoory; Orith Toledo-Ronen; Alexandre Derreumaux; Valeria Manera; Frans R.J. Verhey; Pauline Aalten; P. H. Robert; Renaud David

To evaluate the interest of using automatic speech analyses for the assessment of mild cognitive impairment (MCI) and early‐stage Alzheimers disease (AD).

international conference on acoustics, speech, and signal processing | 2013

F0 contour prediction with a deep belief network-Gaussian process hybrid model

Raul Fernandez; Asaf Rendel; Bhuvana Ramabhadran; Ron Hoory

In this work we look at using non-parametric, exemplar-based regression for the prediction of prosodic contour targets from textual features in a speech synthesis system. We investigate the performance of Gaussian Process regression on this task when the covariance kernel operates on a variety of input feature spaces. In particular, we consider non-linear features extracted via Deep Belief Networks. We motivate the use of this hybrid model by considering the initial deep-layer model as a feature extractor that can summarize high-level structure from the raw inputs to improve the regression of an exemplar-based model in the second part of the approach. By looking at both objective metrics and perceptual listening tests, we evaluate these proposals against each other, and against the standard clustering-tree techniques implemented in parametric synthesis for the prediction of prosodic targets.

international conference on acoustics, speech, and signal processing | 2004

The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation

Alexander Sorin; Tenkasi V. Ramabadran; Dan Chazan; Ron Hoory; Michael J. McLaughlin; David Pearce; Fan Cr Wang; Yaxin Zhang

We present work that has been carried out in developing the ETSI extended DSR standards ES 202 211 and ES 202 212 (2003). These standards extend the previous ETSI DSR standards: basic front-end ES 201 108 and advanced (noise robust) front-end ES 202 050 respectively. The extensions enable enhanced tonal language recognition as well as server-side speech reconstruction capability. The paper discusses the client-side estimation of pitch and voicing class parameters whereas a companion paper discusses the server-side speech reconstruction. Experimental results show enhancement of tonal language recognition rates of proprietary recognition engines, when the standard extensions are used.

international conference on acoustics, speech, and signal processing | 2006

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Dan Chazan; Ron Hoory; Ariel Sagi; Slava Shechtman; Alexander Sorin; Zhiwei Shuang; Raimo Bakis

This paper describes an efficient sinusoidal modeling framework for high quality wide band (WB) speech synthesis and modification. This technique may serve as a basis for speech compression in the context of small footprint concatenative Text to Speech systems. In addition, it is a useful representation for voice transformation and morphing purposes, e.g., simultaneous pitch modification and spectral envelope warping. The conventional sinusoidal modeling is enhanced with an adaptive frequency dithering mechanism, based on a degree of voicing analysis. Considerable reduction of the amount of model parameters is achieved by high band phase extension. The proposed model is evaluated and compared to the alternative STRAIGHT framework [1]. Being simpler and considerably more efficient than STRAIGHT, it outperforms it in speech quality for both speech reconstruction and transformation.

International Journal of Central Banking | 2014

Multi-modal biometrics for mobile authentication

Hagai Aronowitz; Min Li; Orith Toledo-Ronen; Sivan Harary; Amir B. Geva; Shay Ben-David; Asaf Rendel; Ron Hoory; Nalini K. Ratha; Sharath Pankanti; David Nahamoo

User authentication in the context of a secure transaction needs to be continuously evaluated for the risks associated with the transaction authorization. The situation becomes even more critical when there are regulatory compliance requirements. Need for such systems have grown dramatically with the introduction of smart mobile devices which make it far easier for the user to complete such transaction quickly but with a huge exposure to risk. Biometrics can play a very significant role in addressing such problems as a key indicator of the user identity and thus reducing the risk of fraud. While unimodal biometrics authentication systems are being increasingly experimented by mainstream mobile system manufacturers (e.g., fingerprint in iOS), we explore various opportunities of reducing risk in a multimodal biometrics system. The multimodal system is based on fusion of several biometrics combined with a policy manager. A new biometric modality: chirography which is based on user writing on multi-touch screens using their finger is introduced. Coupling with chirography, we also use two other biometrics: face and voice. Our fusion strategy is based on inter-modality score level fusion that takes into account a voice quality measure. The proposed system has been evaluated on an in-house database that reflects the latest smart mobile devices. On this database, we demonstrate a very high accuracy multi-modal authentication system reaching an EER of 0.1% in an office environment and an EER of 0.5% in challenging noisy environments.

international conference on acoustics, speech, and signal processing | 2012

Towards automatic phonetic segmentation for TTS

Asaf Rendel; Alexander Sorin; Ron Hoory; Andrew P. Breen

Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus.

international conference on acoustics, speech, and signal processing | 2016

Using continuous lexical embeddings to improve symbolic-prosody prediction in a text-to-speech front-end

Asaf Rendel; Raul Fernandez; Ron Hoory; Bhuvana Ramabhadran

The prediction of symbolic prosodic categories from text is an important, but challenging, natural-language processing task given the various ways in which an input can be realized, and the fact that knowledge about what features determine this realization is incomplete or inaccessible to the model. In this work, we look at augmenting baseline features with lexical representations that are derived from text, providing continuous embeddings of the lexicon in a lower-dimensional space. Although learned in an unsupervised fashion, such features capture semantic and syntactic properties that make them amenable for prosody prediction. We deploy various embedding models on prominence- and phrase-break prediction tasks, showing substantial gains, particularly for prominence prediction.

Explore More