Pekka Kapanen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pekka Kapanen is active.

Explore More

Publication

Featured researches published by Pekka Kapanen.

international conference on acoustics, speech, and signal processing | 1997

GSM enhanced full rate speech codec

Kari Jarvinen; Janne Vainio; Pekka Kapanen; Tero Honkanen; Petri Haavisto; Redwan Salami; Claude Laflamme; Jean-Pierre Adoul

This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit/s ADPCM). The EFR codec uses 12.2 kbit/s for speech coding and 10.6 kbit/s for error protection. Speech coding is based on the ACELP algorithm (algebraic code excited linear prediction). The codec provides substantial quality improvement compared to the existing GSM full rate and half rate codecs. The old GSM codecs lack wireline quality even in error-free channel conditions, while the EFR codec provides wireline quality not only for error-free conditions but also for the most typical error conditions. With the EFR codec, wireline quality is also sustained in the presence of background noise and in tandem connections (mobile to mobile calls).

Journal of the Acoustical Society of America | 1996

Method and apparatus for decoding LPC-encoded speech using a median filter modification of LPC filter factors to compensate for transmission errors

Pekka Kapanen; Kari Jarvinen

Disclosed herein are methods and apparatus for improving the quality of synthesized speech that is transmitted through a channel that is susceptible to transmission errors. In a presently preferred embodiment of the invention a speech signal is assumed to be first encoded using a Linear Predictive Coding (LPC) technique prior to transmission. The parameters that describe the short-term spectral behavior of the speech signal are received and then applied to and processed by a non-linear median processing block only on an occurrence of a predetermined number of transmission errors in the received LPC speech signal. The median-processed short term speech parameters are subsequently employed, together with a received excitation signal, in a synthesis filter to synthesize a speech signal of improved quality over what would be obtained if the short term speech parameters were not median processed to compensate for the transmission errors.

international conference on multimodal interfaces | 2002

CATCH-2004 multi-modal browser: overview description with usability analysis

Jan Kleindienst; Ladislav Seredi; Pekka Kapanen; Janne Bergman

This paper takes a closer look at the user interface issues in our research multi-modal browser architecture. The browser framework, also briefly introduced in this paper, reuses single-modal browser technologies available for VoiceXML, WML, and HTML browsing. User interface actions on a particular browser are captured, converted to events, and distributed to the other browsers participating (possibly on different hosts) in the multi-modal framework. We have defined a synchronization protocol, which distributes such events with the help of the central component called the Virtual Proxy. The choice of the architecture and the synchronization primitives have profound consequences on handling certain interesting UI use cases. We particularly address those specified by the W3C MultiModal Requirements, which are related to the design of possible strategies of dealing with simultaneous input, solving input inconsistencies, and defining synchronization points. The proposed approaches are illustrated by examples.

Universal Access in The Information Society | 2003

Loosely-coupled approach towards multi-modal browsing

Jan Kleindienst; Ladislav Seredi; Pekka Kapanen; Janne Bergman

Contemplating the concept of universal-access multi-modal browsing comes as one of the emerging “killer” technologies that promises broader and more flexible access to information, faster task completion, and advanced user experience. Inheriting the best from GUI and speech, based on the circumstances, hardware capabilities, and environment, multi-modality’s great advantage is to provide application developers with a scalable blend of input and output channels that may accommodate any user, device, and platform. This article describes a flexible multi-modal browser architecture, named Ferda the Ant, which reuses uni-modal browser technologies available for VoiceXML, WML, and HTML browsing. A central component, the Virtual Proxy, acts as a synchronization coordinator. This browser architecture can be implemented in either a single client configuration, or by distributing the browser components across the network. We have defined and implemented a synchronization protocol to communicate the changes occurring in the context of a component browser to the other browsers participating in the multi-modal browser framework. Browser wrappers implement the required synchronization protocol functionality at each of the component browsers. The component browsers comply with existing content authoring standards, and we have designed a set of markup-level authoring conventions that facilitate maintaining the browser synchronization .

Archive | 1997