Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aki Härmä is active.

Publication


Featured researches published by Aki Härmä.


international conference on multimedia and expo | 2005

Automatic surveillance of the acoustic activity in our living environment

Aki Härmä; Martin F. McKinney; Janto Skowronek

We report an experiment with an acoustic surveillance system comprised of a computer and microphone situated in a typical office environment. The system continuously analyzes the acoustic activity at the recording site, separates all interesting events, and stores them in a database. All interesting acoustic events over duration of more than two months were recorded. A number of low-level signal features are computed from the audio signal and used to classify and identify sound events. The analysis reveals interesting patterns and activities which would be difficult to find by any other means.


international conference on distributed smart cameras | 2010

On efficient use of multi-view data for activity recognition

Tommi Määttä; Aki Härmä; Hamid K. Aghajan

The focus of the paper is on studying five different methods to combine multi-view data from an uncalibrated smart camera network for human activity recognition. The multi-view classification scenarios studied can be divided to two categories: view selection and view fusion methods. Selection uses a single view to classify, whereas fusion merges multi-view data either on the feature- or label-level. The five methods are compared in the task of classifying human activities in three fully annotated datasets: MAS, VIHASI and HOMELAB, and a combination dataset MAS+VIHASI. Classification is performed based on image features computed from silhouette images with a binary tree structured classifier using 1D CRF for temporal modeling. The results presented in the paper show that fusion methods outperform practical selection methods. Selection methods have their advantages, but they strongly depend on how good of a selection criteria is used, and how well this criteria adapts to different environments. Furthermore, fusion of features outperforms other scenarios within more controlled settings. But the more variability exists in camera placement and characteristics of persons, the more likely improved accuracy in multi-view activity recognition can be achieved by combining candidate labels.


ambient intelligence | 2010

Ambient Human-to-Human Communication

Aki Härmä

In the current technological landscape colored by environmental and security concerns the logic of replacing traveling by technical means of communications is undisputable. For example, consider a comparison between a normal family car and a video conference system with two laptop computers connected over the Internet. The power consumption of the car is approximately 25 kW while the two computers and their share of the power consumption in the intermediate routers in total is in the range of 50 W. Therefore, to meet a person using a car at an one hour driving distance is equivalent to 1000 hours of video conference. The difference in the costs is also increasing. An estimate on the same cost difference between travel and video conference twenty years ago gave only three days of continuous video conference for the same situation [29]. The cost of video conference depends on the duration of the session while traveling depends only on the distance. However, in a strict economical and environmental sense even a five minute trip by a car in 2008 becomes more economical than a video conference only when the meeting lasts more than three and half days.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Speaker Distance Detection Using a Single Microphone

Eleftheria Georganti; Tobias May; Steven van de Par; Aki Härmä; John Mourjopoulos

A method to detect the distance of a speaker from a single microphone in a room environment is proposed. Several features, related to statistical parameters of speech source excitation signals, are introduced and are shown to depend on the distance between source and receiver. Those features are used to train a pattern recognizer for distance detection. The method is tested using a database of speech recordings in four rooms with different acoustical properties. Performance is shown to be independent of the signal gain and level, but depends on the reverberation time and the characteristics of the room. Overall, the system performs well especially for close distances and for rooms with low reverberation time and it appears to be robust to small distance mismatches. Finally, a listening test is conducted in order to compare the results of the proposed method to the performance of human listeners.


international conference on acoustics, speech, and signal processing | 2009

Conversation detection in ambient telephony

Aki Härmä; Kien Pham

In some speech communication applications such as distributed hands-free telephony it is important that the system can detect the conversational state of a call. This cannot be performed by speech activity only because the captured signal may also contain conversation between two local people, or additional speech noise sources such as speech sounds from a radio or television. In this paper we compare known algorithms and introduce a new algorithm for the real-time detection of active conversation between an incoming caller and a local user. The method is based on the mutual information in speech activity, detection of back-channel speech activity, and statistics of overlapping speech. The proposed method gives over 90% accuracy within one minute observation period which is a clear improvement over the performance of earlier techniques.


international conference on acoustics, speech, and signal processing | 2011

Stereo audio classification for audio enhancement

Aki Härmä

Stereo audio enhancement and upmixing techniques require spatial analysis of the mixture in order to work optimally for different types of contents. In this paper a method is proposed which classifies the time-frequency regions in stereo audio data into six different classes. The individual classes represent special cases of a generic stereo signal model which is introduced and characterized in the paper. Finally, the developed classifier is tested using realistic stereo audio data.


affective computing and intelligent interaction | 2009

Ambient telephony: Designing a communication system for enhancing social presence in home mediated communication

Jorge Peregrín Emparanza; Pavan Dadlani; Boris E. R. de Ruyter; Aki Härmä

The experience of telephonic communication in the home environment has remained very similar for decades: practical, but intrusive, and providing little experience of social presence. This paper presents the work aiming at improving the experience of social presence experience in telephony. We present the results of several user studies on telephone usage and based on these, propose the use of distributed speakerphone systems (or ambient telephones). We report empirical research comparing two different ambient telephone systems. The first system is an ambient system where the arrays of loudspeakers and microphones are embedded in the ceiling and the home audio system around the home. In the second experiment, we replaced the embedded system by a distributed set of clearly visible and tangible speakerphone units. We report lessons learned and implications for the design of ambient telephone systems.


Information Fusion | 2015

Collaborative detection of repetitive behavior by multiple uncalibrated cameras

Tommi Määttä; Aki Härmä; Hamid K. Aghajan; Henk Corporaal

In smart environments, the embedded sensing systems should intelligently adapt to the behavior of the users. Many interesting types of behavior are characterized by repetition of actions such as certain activities or movements. A generic methodology to detect and classify repetitions that may occur at different scales is introduced in this paper. The proposed method is called Action History Matrices (AHM). The properties of AHM for detecting repetitive movement behavior are demonstrated in analyzing four customer behavior classes in a shop environment observed by multiple uncalibrated cameras. Two different datasets, video recordings in the shop environment and motion path simulations, are created and used in the experiments. The AHM-based system achieves an accuracy of 97% with most suitable scale and naive Bayesian classifier on the single-view simulated movement data. In addition, the performance of two fusion levels and three fusion methods are compared with AHM method on the multi-view recordings. In our results, fusion at the decision-level offers consistently better accuracy than feature-level, and the coverage-based view-selection fusion method (51%) marginally outperforms the majority method. The upper limit with the recorded data for accuracy by view-selection is found to be 75%.


biomedical and health informatics | 2014

Bed exit prediction based on movement and posture data

Aki Härmä; Warner ten Kate; Javier Espina

Falls in nursing homes and hospitals take often place immediately after a bed exit of a patient. An alarm signaling the exit from the bed may already be too late for staff to react. In this paper we explore the possibilities of detecting the sequences of preparatory movements before the bed exit and in this way create an early warning of the preparation of bed exit. The method is described and tested using annotated accelerometer data collected from volunteers. A plausibility assessment is also done by comparing accelerometer data from hospital patients with the output of a bed alarm system. It is demonstrated that the proposed method is able to detect a bed exit already seconds before the patient actually leaves the bed.


international conference on digital signal processing | 2013

Compressive sensing in footstep sounds, hand tremors and speech using K-SVD dictionaries

Andreas I. Koutrouvelis; Aki Härmä; Athanasios Mouchtaris

The application of Compressive Sensing is explored in three signal categories; footstep sounds, hand tremors and speech. An investigation of the reconstruction performance of various dictionaries is undertaken. It is demonstrated that these signal categories are reconstructed with higher SNR performance using K-SVD dictionaries than other fixed dictionaries. In particular, for footstep sounds and hand tremors, the K-SVD dictionaries outperform the fixed dictionaries; Discrete Cosine Transform (DCT), Wavelet Symlet with order 8 Transform and the union of DCT and Discrete Sine Transform. Moreover, in speech reconstruction, the use of a codebook of K-SVD dictionaries instead of a codebook of impulse response matrices improves performance.

Collaboration


Dive into the Aki Härmä's collaboration.

Researchain Logo
Decentralizing Knowledge