Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alain Tritschler is active.

Publication


Featured researches published by Alain Tritschler.


Journal of the Acoustical Society of America | 2003

Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering

Alain Tritschler; Mahesh Viswanathan

A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. The speaker identification system uses an enrolled speaker database that includes background models for unenrolled speakers, such as “unenrolled male” or “unenrolled female,” to assign a speaker label to each identified segment. Speaker labels are identified for each speech segment by comparing the segment utterances to the enrolled speaker database and finding the “closest” speaker, if any. A speech segment having an unknown speaker is initially assigned a general speaker label from the set of background models. The “unenrolled” segment is assigned a segment number and receives a cluster identifier assigned by the clustering system. If a given segment is assigned a temporary speaker label associated with an unenrolled speaker, the user can be prompted by the present invention to identify the speaker. Once the user assigns a speaker label to an audio segment having an unknown speaker, the same speaker name can be automatically assigned to any segments that are assigned to the same cluster and the enrolled speaker database can be automatically updated to enroll the previously unknown speaker.


Journal of the Acoustical Society of America | 2003

Methods and apparatus for concurrent speech recognition, speaker segmentation and speaker classification

Homayoon S. M. Beigi; Alain Tritschler; Mahesh Viswanathan

A method and apparatus are disclosed for automatically transcribing audio information from an audio-video source and concurrently identifying the speakers. The disclosed audio transcription and speaker classification system includes a speech recognition system, a speaker segmentation system and a speaker identification system. A common front-end processor computes feature vectors that are processed along parallel branches in a multi-threaded environment by the speech recognition system, speaker segmentation system and speaker identification system, for example, using a shared memory architecture that acts in a server-like manner to distribute the computed feature vectors to a channel associated with each parallel branch. The speech recognition system produces transcripts with time-alignments for each word in the transcript. The speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. The speaker identification system thereafter uses an enrolled speaker database to assign a speaker to each identified segment. The audio information from the audio-video source is concurrently transcribed and segmented to identify segment boundaries. Thereafter, the speaker identification system assigns a speaker label to each portion of the transcribed text.


international conference on document analysis and recognition | 1999

Retrieval from spoken documents using content and speaker information

Mahesh Viswanathan; Homayoon S. M. Beigi; Satya Dharanipragada; Alain Tritschler

There has been a recent upsurge in the deployment of emerging technologies such as speech and speaker recognition which are reaching maturity. We discuss the details of the components required to build a system for audio indexing and retrieval for spoken documents using content and speaker based information facilitated by speech and speaker recognition. The real power of spoken document analysis is in using both content and speaker information together in retrieval by combining the results. The experiments described here are in the broadcast news domain, but the underlying techniques can easily be extended to other speech-centric applications and transactions.


International Journal on Document Analysis and Recognition | 2000

Multimedia document retrieval using speech and speaker recognition

Mahesh Viswanathan; Homayoon S. M. Beigi; Satya Dharanipragada; Fereydoun Maali; Alain Tritschler

Abstract. Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.


international conference on multimedia and expo | 2000

Information access using speech, speaker and face recognition

Mahesh Viswanathan; Homayoon S. M. Beigi; Alain Tritschler; Fereydoun Maali

We describe a scheme to combine the results of audio and face identification for multimedia indexing and retrieval. Audio analysis consists of speech and speaker recognition derived from a broadcast news video clip. The video component is analyzed to identify the persons in the same video clip using face recognition. When applied individually both speaker and face recognition schemes have limitations on conditions under which they perform reasonably well. By integrating the match-score results of both audio and video analysis, we find that the two techniques can complement each other. We discuss the system architecture for such a combined system, and discuss how decision fusion is applied to disparate match-scoring systems to yield the final speaker identity.


Archive | 1999

Methods and apparatus for retrieving audio information using content and speaker information

Homayoon Sadr Mohammed Ibm U. K. Ltd. Beigi; Alain Tritschler; Mahesh Viswanathan


Archive | 1999

Methods and apparatus for tracking speakers in an audio stream

Scott Shaobing Chen; Alain Tritschler; Mahesh Viswanathan


Archive | 2000

Method and device for tracking loudspeaker in audio stream

Scott Shaonbing Chen; Alain Tritschler; Mahesh Viswanathan; アラン・シャルル・ルイ・トレザー; スコット・シャオンビン・チェン; マハシュ・ヴィズワナザン


Archive | 2000

Method and device for simultaneous voice recognition, speaker segmentation and speaker classification

Sadaru Mohammad Beigi Hameion; Alain Tritschler; Mahesh Viswanathan; アラン・シャルル・ルイ・トレザー; ハメイオン・サダル・モハマド・ベイギ; マハシュ・ヴィズワナザン


Archive | 1999

TranSegId: A System for Concurrent Speech Transcription, Speaker Segmentation and Speaker Identification

Mahesh Viswanathan; Homayoon S. M. Beigi; Alain Tritschler; Thomas J. Watson; Fereydoun Maali

Researchain Logo
Decentralizing Knowledge