Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael W. Frandsen is active.

Publication


Featured researches published by Michael W. Frandsen.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

The CALO Meeting Assistant System

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; Stanley Peters; Dilek Hakkani-Tür; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.


spoken language technology workshop | 2008

The CALO meeting speech recognition and understanding system

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Dilek Hakkani-Tür; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Stanley Peters; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO meeting assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.


international conference on acoustics, speech, and signal processing | 2009

Recent advances in SRI'S IraqComm™ Iraqi Arabic-English speech-to-speech translation system

Murat Akbacak; Horacio Franco; Michael W. Frandsen; Saša Hasan; Huda Jameel; Andreas Kathol; Shahram Khadivi; Xin Lei; Arindam Mandal; Saab Mansour; Kristin Precoda; Colleen Richey; Dimitra Vergyri; Wen Wang; Mei Yang; Jing Zheng

We summarize recent progress on SRIs IraqComm™ Iraqi Arabic-English two-way speech-to-speech translation system. In the past year we made substantial developments in our speech recognition and machine translation technology, leading to significant improvements in both accuracy and speed of the IraqComm system. On the 2008 NIST-evaluation dataset our twoway speech-to-text (S2T) system achieved 6% to 8% absolute improvement in BLEU in both directions, compared to our previous year system [1].


international conference on acoustics, speech, and signal processing | 2013

“Can you give me another word for hyperbaric?”: Improving speech translation using targeted clarification questions

Necip Fazil Ayan; Arindam Mandal; Michael W. Frandsen; Jing Zheng; Peter Blasco; Andreas Kathol; Frédéric Béchet; Benoit Favre; Alex Marin; Tom Kwiatkowski; Mari Ostendorf; Luke Zettlemoyer; Philipp Salletmayr; Julia Hirschberg; Svetlana Stoyanchev

We present a novel approach for improving communication success between users of speech-to-speech translation systems by automatically detecting errors in the output of automatic speech recognition (ASR) and statistical machine translation (SMT) systems. Our approach initiates system-driven targeted clarification about errorful regions in user input and repairs them given user responses. Our system has been evaluated by unbiased subjects in live mode, and results show improved success of communication between users of the system.


the second international conference | 2002

DynaSpeak: SRI's scalable speech recognizer for embedded and mobile systems

Horacio Franco; Jing Zheng; John Butzberger; Federico Cesari; Michael W. Frandsen; Jim Arnold; Venkata Ramana Rao Gadde; Andreas Stolcke; Victor Abrash

We introduce SRIs new speech recognition engine, DynaSpeak™, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.


international conference on acoustics, speech, and signal processing | 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Shrikanth Narayanan; Panayiotis G. Georgiou; Abhinav Sethy; Dagen Wang; Murtaza Bulut; Shiva Sundaram; Emil Ettelaie; Sankaranarayanan Ananthakrishnan; Horacio Franco; Kristin Precoda; Dimitra Vergyri; Jing Zheng; Wen Wang; Ramana Rao Gadde; Martin Graciarena; Victor Abrash; Michael W. Frandsen; Colleen Richey

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements


spoken language technology workshop | 2010

Implementing SRI's Pashto speech-to-speech translation system on a smart phone

Jing Zheng; Arindam Mandal; Xin Lei; Michael W. Frandsen; Necip Fazil Ayan; Dimitra Vergyri; Wen Wang; Murat Akbacak; Kristin Precoda

We describe our recent effort implementing SRIs UMPC-based Pashto speech-to-speech (S2S) translation system on a smart phone running the Android operating system. In order to maintain very low latencies of system response on computationally limited smart phone platforms, we developed efficient algorithms and data structures and optimized model sizes for various system components. Our current Android-based S2S system requires less than one-fourth the system memory and significantly lower processor speed with a sacrifice of 15% relative loss of system accuracy, compared to a laptop-based platform.


north american chapter of the association for computational linguistics | 2004

Limited-domain speech-to-speech translation between English and Pashto

Kristin Precoda; Horacio Franco; Ascander Dost; Michael W. Frandsen; John Fry; Andreas Kathol; Colleen Richey; Susanne Z. Riehemann; Dimitra Vergyri; Jing Zheng; Christopher Culy

This paper describes a prototype system for near-real-time spontaneous, bidirectional translation between spoken English and Pashto, a language presenting many technological challenges because of its lack of resources, including both data and expert knowledge. Development of the prototype is ongoing, and we propose to demonstrate a fully functional version which shows the basic capabilities, though not yet their final depth and breadth.


computer, information, and systems sciences, and engineering | 2010

IraqComm and FlexTrans: A Speech Translation System and Flexible Framework

Michael W. Frandsen; Susanne Z. Riehemann; Kristin Precoda

SRI International’s IraqComm system performs bidirectional speech-to-speech machine translation between English and Iraqi Arabic in the domains of force protection, municipal and medical services, and training. The system was developed primarily under DARPA’s TRANSTAC Program and includes: speech recognition components using SRI’s Dynaspeak engine; MT components using SRI’s Gemini and SRInterp; and speech synthesis from Cepstral, LLC. The communication between these components is coordinated by SRI’s Flexible Translation (FlexTrans) Framework, which has an intuitive easy-to-use graphical user interface and an eyes-free hands-free mode, and is highly configurable and adaptable to user needs. It runs on a variety of standard portable hardware platforms and was designed to make it as easy as possible to build systems for other languages, as shown by the rapid development of an analogous system in English/Malay.


international conference on acoustics, speech, and signal processing | 2017

Sensay analyticstm: A real-time speaker-state platform

Andreas Tsiartas; Cory Albright; Nikoletta Bassiou; Michael W. Frandsen; I. Miller; Elizabeth Shriberg; Jennifer Smith; L. Lynn Voss; Valerie Wagner

Growth in voice-based applications and personalized systems has led to increasing demand for speech- analytics technologies that estimate the state of a speaker from speech. Such systems support a wide range of applications, from more traditional call-center monitoring, to health monitoring, to human-robot interactions, and more. To work seamlessly in real-world contexts, such systems must meet certain requirements, including for speed, customizability, ease of use, robustness, and live integration of both acoustic and lexical cues. This demo introduces SenSay AnalyticsTM, a platform that performs real-time speaker-state classification from spoken audio. SenSay is easily configured and is customizable to new domains, while its underlying architecture offers extensibility and scalability.

Collaboration


Dive into the Michael W. Frandsen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge