Homayoon S. M. Beigi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Homayoon S. M. Beigi is active.

Explore More

Publication

Featured researches published by Homayoon S. M. Beigi.

Journal of the Acoustical Society of America | 2003

Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification

Homayoon S. M. Beigi; Alain Tritschler; Mahesh Viswanathan

A method and apparatus are disclosed for automatically transcribing audio information from an audio-video source and concurrently identifying the speakers. The disclosed audio transcription and speaker classification system includes a speech recognition system, a speaker segmentation system and a speaker identification system. A common front-end processor computes feature vectors that are processed along parallel branches in a multi-threaded environment by the speech recognition system, speaker segmentation system and speaker identification system, for example, using a shared memory architecture that acts in a server-like manner to distribute the computed feature vectors to a channel associated with each parallel branch. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. The speaker identification system thereafter uses an enrolled speaker database to assign a speaker to each identified segment. The audio information from the audio-video source is concurrently transcribed and segmented to identify segment boundaries. Thereafter, the speaker identification system assigns a speaker label to each portion of the transcribed text.

international conference on acoustics speech and signal processing | 1998

A distance measure between collections of distributions and its application to speaker recognition

Homayoon S. M. Beigi; Stephane Herman Maes; Jeffrey S. Sorensen

This paper presents a distance measure for evaluating the closeness of two sets of distributions. The job of finding the distance between two distributions has been addressed with many solutions present in the literature. To cluster speakers using the pre-computed models of their speech, a need arises for computing a distance between these models which are normally built of a collection of distributions such as Gaussians (e.g., comparison between two HMM models). The definition of this distance measure creates many possibilities for speaker verification, speaker adaptation, speaker segmentation and many other related applications. A distance measure is presented for evaluating the closeness of a collection of distributions with centralized atoms such as Gaussians (but not limited to Gaussians). Several applications including some in speaker recognition with some results are presented using this distance measure.

asian conference on computer vision | 1998

Open Sesame! Speech, Password or Key to Secure Your Door?

Stfiphane H. Maes; Homayoon S. M. Beigi

This paper reviews the state of the art in speaker recognition. It clarifies the different technical solutions that have been explored with some success as well as the challenges and limitations of current systems. It also describes the different functions and modalities involved in speaker recognition, where the terminology is still amazingly confused: specialist often uses the same terms for different conecpts. We review the classical techniques used in speaker recognition. Finally, we introduce the revolutionary concepts of speech biometrics. By discussing the impact of these new concepts, the maturity of speaker recognition is re-focussed.

international conference on document analysis and recognition | 1999

Retrieval from spoken documents using content and speaker information

Mahesh Viswanathan; Homayoon S. M. Beigi; Satya Dharanipragada; Alain Tritschler

There has been a recent upsurge in the deployment of emerging technologies such as speech and speaker recognition which are reaching maturity. We discuss the details of the components required to build a system for audio indexing and retrieval for spoken documents using content and speaker based information facilitated by speech and speaker recognition. The real power of spoken document analysis is in using both content and speaker information together in retrieval by combining the results. The experiments described here are in the broadcast news domain, but the underlying techniques can easily be extended to other speech-centric applications and transactions.

international conference on image processing | 1994

Size normalization in on-line unconstrained handwriting recognition

Homayoon S. M. Beigi; Krishna S. Nathan; Gregory James Clary; Jayashree Subrahmonia

In an on-line handwriting recognition system, the motion of the tip of the stylus (pen) is sampled at equal time intervals using a digitizer tablet and the sampled points are passed to a computer which performs the handwriting recognition. In most cases, the basic recognition algorithm performs best for a nominal size of writing as well as a standard orientation (normally horizontal) and a nominal slant (normally fully upright). We discuss and provide solutions to these normalization problems in the context of on-line handwriting recognition. Most of the results presented are also valid for optical character recognition (OCR). Error rate reductions of 54.3% and 35.8% were obtained for the writer-dependent and writer-independent samples through using the proposed normalization scheme.<<ETX>>

International Journal on Document Analysis and Recognition | 2000

Multimedia document retrieval using speech and speaker recognition

Mahesh Viswanathan; Homayoon S. M. Beigi; Satya Dharanipragada; Fereydoun Maali; Alain Tritschler

Abstract. Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.

international conference on multimedia and expo | 2000

Information access using speech, speaker and face recognition

Mahesh Viswanathan; Homayoon S. M. Beigi; Alain Tritschler; Fereydoun Maali

We describe a scheme to combine the results of audio and face identification for multimedia indexing and retrieval. Audio analysis consists of speech and speaker recognition derived from a broadcast news video clip. The video component is analyzed to identify the persons in the same video clip using face recognition. When applied individually both speaker and face recognition schemes have limitations on conditions under which they perform reasonably well. By integrating the match-score results of both audio and video analysis, we find that the two techniques can complement each other. We discuss the system architecture for such a combined system, and discuss how decision fusion is applied to disparate match-scoring systems to yield the final speaker identity.

international conference on multimedia and expo | 2004

Aggressive compression of the dynamics of handwriting and signature signals

Homayoon S. M. Beigi

The fields of biometrics and handwriting recognition can benefit from the presented aggressive compression techniques. A novel approach to compressing the on-line handwriting signal without losing the dynamic information (i.e., velocity) is presented; it provides the means of fitting almost any signature sample within the 57-byte limit of one of the free magnetic strips on conventional credit cards. The method starts with segmenting the on-line handwriting signal into portions which may be approximated by a set of features from which the velocity and position may be reconstructed. These features are then compressed further using a standard compression technique. A Huffman coding technique is used to bring the size of the average signature, including its dynamic information, below the 57-byte limit of credit card magnetic strips.. Normally, two of these strips are blank and may be used for storing the template of the owners signature for future verification

Journal of the Acoustical Society of America | 1999