Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexander Sorin is active.

Publication


Featured researches published by Alexander Sorin.


Ibm Journal of Research and Development | 1996

Automated forms-processing software and services

Sandeep Gopisetty; Raymond A. Lorie; Jianchang Mao; K. Moidin Mohiuddin; Alexander Sorin; Eyal Yair

While document-image systems for the management of collections of documents, such as forms, offer significant productivity improvements, the entry of information from documents remains a labor-intensive and costly task for most organizations. In this paper, we describe a software system for the machine reading of forms data from their scanned images. We describe its major components: form recognition and “dropout,” intelligent character recognition (ICR), and contextual checking. Finally, we describe applications for which our automated forms reader has been successfully used.


Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring | 2015

Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease

Alexandra König; Aharon Satt; Alexander Sorin; Ron Hoory; Orith Toledo-Ronen; Alexandre Derreumaux; Valeria Manera; Frans R.J. Verhey; Pauline Aalten; P. H. Robert; Renaud David

To evaluate the interest of using automatic speech analyses for the assessment of mild cognitive impairment (MCI) and early‐stage Alzheimers disease (AD).


international conference on acoustics, speech, and signal processing | 2004

The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation

Alexander Sorin; Tenkasi V. Ramabadran; Dan Chazan; Ron Hoory; Michael J. McLaughlin; David Pearce; Fan Cr Wang; Yaxin Zhang

We present work that has been carried out in developing the ETSI extended DSR standards ES 202 211 and ES 202 212 (2003). These standards extend the previous ETSI DSR standards: basic front-end ES 201 108 and advanced (noise robust) front-end ES 202 050 respectively. The extensions enable enhanced tonal language recognition as well as server-side speech reconstruction capability. The paper discusses the client-side estimation of pitch and voicing class parameters whereas a companion paper discusses the server-side speech reconstruction. Experimental results show enhancement of tonal language recognition rates of proprietary recognition engines, when the standard extensions are used.


international conference on acoustics, speech, and signal processing | 2006

High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and Modification

Dan Chazan; Ron Hoory; Ariel Sagi; Slava Shechtman; Alexander Sorin; Zhiwei Shuang; Raimo Bakis

This paper describes an efficient sinusoidal modeling framework for high quality wide band (WB) speech synthesis and modification. This technique may serve as a basis for speech compression in the context of small footprint concatenative Text to Speech systems. In addition, it is a useful representation for voice transformation and morphing purposes, e.g., simultaneous pitch modification and spectral envelope warping. The conventional sinusoidal modeling is enhanced with an adaptive frequency dithering mechanism, based on a degree of voicing analysis. Considerable reduction of the amount of model parameters is achieved by high band phase extension. The proposed model is evaluated and compared to the alternative STRAIGHT framework [1]. Being simpler and considerably more efficient than STRAIGHT, it outperforms it in speech quality for both speech reconstruction and transformation.


international conference on acoustics, speech, and signal processing | 2012

Towards automatic phonetic segmentation for TTS

Asaf Rendel; Alexander Sorin; Ron Hoory; Andrew P. Breen

Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus.


international conference on acoustics, speech, and signal processing | 2015

Coherent modification of pitch and energy for expressive prosody implantation

Alexander Sorin; Slava Shechtman; Vincent Pollet

In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech. In this work we investigate and prove a hypothesis that the naturalness loss is in part attributed to a violation of a fundamental relationship between the instantaneous pitch frequency and instantaneous energy of a speech signal. We propose an enhancement for pitch modification where the instantaneous energy is modified coherently with the pitch frequency and demonstrate the potential of this method in a subjective listening evaluation. The proposed approach is complementary to and can be combined with spectrum shape transformation methods for achieving the maximal possible quality of pitch modification.


international conference on acoustics, speech, and signal processing | 2017

Voice-transformation-based data augmentation for prosodic classification

Raul Fernandez; Andrew Rosenberg; Alexander Sorin; Bhuvana Ramabhadran; Ron Hoory

In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the speakers to generate further training examples. We demonstrate the validity of the approach by improving performance when the amount of base labeled examples is small (showing reductions in the range of 7%–12% for reduced-data conditions) as well as in terms of its generalization to speakers unseen in the training set (showing a relative reduction in the error rate of 8.74% and 4.75%, on the average, for boundaries and accent tasks respectively, in leave-one-speaker-out validation).


Alzheimers & Dementia | 2014

THE DEM@CARE PROJECT SPEECH RECORDING AND AUTOMATIC ANALYSIS FOR THE ASSESSMENT OF ALZHEIMER DISEASE AND RELATED DISORDERS

Aharon Satt; Alexandra König; Alexander Sorin; Orith Toledo-Ronen; Ron Hoory; Renaud David; Frans R.J. Verhey; Pauline Aalten; Philippe Robert

increase the risk for developing AD. Results: The study population consisted of 183 MCI patients at baseline. At follow-up, 74 patients were stable and 109 patients progressed to AD. The presence of significant depressive symptoms in MCI as measured by the CSDD (HR: 2.06; 95% CI: 1.23 3.44; p1⁄40.011) and the GDS-30 (HR: 1.77; 95% CI: 1.10 2.85; p1⁄40.025) were associated with an increased the risk of progression to AD. The severity of depressive symptoms as measured by the GDS-30 was a predictor for progression too (HR: 1.06; 95% CI: 1.01 1.11; p1⁄40.020). Furthermore, also the severity of agitated behavior, especially verbal agitation, and the presence of purposeless activity were associated risk factors for progression, whereas diurnal rhythm disturbances in our study was associated with a decreased risk of progression. Conclusions: Depressive symptoms in MCI appear to be associated with an increased risk of progression to AD.


international conference on acoustics, speech, and signal processing | 2011

Speech processing and retrieval in a personal memory aid system for the elderly

Alexander Sorin; Hagai Aronowitz; Jonathan Mamou; Orith Toledo-Ronen; Ron Hoory; Michael Kuritzky; Yael Erez; Bhuvana Ramabhadran; Abhinav Sethy

The paper presents a new application of automatic speech processing in the Ambient Assisted Living area, developed in the course of a three year research project. Recording and automatic processing of spoken conversations plays a major role in this solution enabling effective search in a personal audio archive and fast browsing of conversations. Processing of elderly conversational speech recorded by a distant PDA microphone poses a great challenge. The speech processing flow includes transcription, speaker tracking and combined indexing and search of spoken terms and participating speakers identity extracted from the audio. We present the entire application and individual speech processing components as well as evaluation results of the individual components and of the end-to-end spoken information retrieval solution.


Journal of the Acoustical Society of America | 2007

System and method for combined frequency-domain and time-domain pitch extraction for speech signals

Tenkasi V. Ramabadran; Alexander Sorin

Collaboration


Dive into the Alexander Sorin's collaboration.

Researchain Logo
Decentralizing Knowledge