Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marko Kos is active.

Publication


Featured researches published by Marko Kos.


Digital Signal Processing | 2013

Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Marko Kos; Zdravko Kacic; Damjan Vlaj

This paper presents novel features and an architecture for an automatic on-line acoustic classification and segmentation system. The system includes speech/non-speech segmentation (with the emphasis on accurate speech/music segmentation), gender segmentation, and speech bandwidth segmentation. This automatic segmentation system can be easily integrated into an automatic continuous speech recognition system, where information about individual acoustic segments can be used for acoustic model selection and adaptation, or as additional information for rich transcription output. Acoustic model adaptation can improve the speech recognition rate and additional information in rich transcription can be useful when searching for some certain events or circumstances (male speaker talking over the phone line, etc.). For speech/non-speech segmentation we propose a new set of features, which are based on an energy variance in a narrow frequency sub-band, called EVFB (Energy Variance of Filter Bank). The proposed features also prove to be an efficient discriminator between speech and music. Segmentation cross-test results show that EVFB features prove to be more robust than MFCC features. Two new features (modified spectral roll-off and high-frequency energy variance) are also proposed for speech bandwidth classification and segmentation. The results show a good and robust performance by the automatic on-line acoustic segmentation system. All experiments and tests were performed on a radio broadcast database and a Slovenian BNSI Broadcast News database. Integration of the automatic on-line acoustic segmentation system into a continuous speech recognition system based on MFCC (mel-frequency cepstral coefficients) features requires only a small additional computational cost because many of the proposed system@?s feature calculation procedures are common to the MFCC features calculation procedure.


EURASIP Journal on Advances in Signal Processing | 2009

Online speech/music segmentation based on the variance mean of filter bank energy

Marko Kos; Matej Grasic; Zdravko Kacic

This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE). The idea that encouraged the features construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than for music. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the features speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a stand-alone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.


IEEE Access | 2017

A Wearable Device and System for Movement and Biometric Data Acquisition for Sports Applications

Marko Kos; Iztok Kramberger

This paper presents a miniature wearable device and a system for detecting and recording the movement and biometric information of a user during sport activities. The wearable device is designed to be worn on a wrist and can monitor skin temperature and pulse rate. Furthermore, it can monitor arm movement and detect gestures using inertial measurement unit. The device can be used for various professional and amateur sport applications and for health monitoring. Because of its small size and minimum weight, it is especially appropriate for swing-based sports like tennis or golf, where any additional weight on the arms would most likely disturb the player and have some influence on the player’s performance. Basic signal processing is performed directly on the wearable device but for more complex signal analysis, the data can be uploaded via the Internet to a cloud service, where it can be processed by a dedicated application. The device is powered by a lightweight miniature LiPo battery and has about 6 h of autonomy at maximum performance.


international conference on systems, signals and image processing | 2009

On-Line Speech/Music Segmentation for Broadcast News Domain

Marko Kos; Matej Grasic; Damjan Vlaj; Zdravko Kacic

This paper presents novel feature-group for on-line speech/music segmentation for broadcast news domain. The features are based on mel-frequency cepstral coefficients variance (MFCCV). The idea behind the feature-group construction is the energy variation in a narrow frequency sub-band. The variation is bigger for speech than for music. For feature discrimination and segmentation ability evaluation the radio broadcast database was used. Results show that MFCCV features perform better than the classic MFCC features. The MFCCV features are very convenient speech/music discriminator for automatic speech recognition system where MFCC features are used, as they perform better than classic MFCC features and only one additional calculation step is needed.


Computers & Electrical Engineering | 2012

Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria

Damjan Vlaj; Zdravko Kacic; Marko Kos

This paper introduces a nonlinear function into the frequency spectrum that improves the detection of vowels, diphthongs, and semivowels within the speech signal. The lower efficiency of consonant detection was solved by implementing the hangover and hangbefore criteria. This paper presents a procedure for faster definition of those optimal constants used by hangover and hangbefore criteria. A nonlinearly changed frequency spectrum is used in the proposed GMM (Gaussian Mixture Model) based VAD (Voice Activity Detection) algorithm. Comparative tests between the proposed VAD algorithm and seven other VAD algorithms were made on the Aurora 2 database. The experiments were based on frame error detection and on speech recognition performance for two types of acoustic training modes (multi-condition and clean only). The lowest average percentage of frame errors was obtained by the proposed VAD algorithm, which also achieved positive improvement in the speech recognition performance for both types of acoustic training modes.


international conference on systems, signals and image processing | 2009

Influence of Hangover and Hangbefore Criteria on Automatic Speech Recognition

Damjan Vlaj; Marko Kos; Matej Grasic; Zdravko Kacic

In this paper the influence of hangover and hangbefore criteria on automatic speech recognition is presented. Voice activity detection (VAD) algorithm is nowadays almost always part of automatic speech recognition systems. Hangover and hangbefore criteria can be integrated into VAD algorithm after basic VAD decision. Hangover and hangbefore criteria can improve speech recognition results. However, there is a question, how many frames should be taken for hangover and hangbefore criteria. The duration of vowels, diphthongs and semivowels is important to define how many frames must be detected as speech, so that we can decide if hangover and hangbefore criteria will be used at all. The frames of consonants have low spectral energy. Especially energy of unvoiced fricatives, unvoiced stops and nasals is very low. First, these frames are detected as silence. However, with hangover and hangbefore criteria they are again considered as speech. Speech recognition experiments show that hangover and hangbefore criteria can improve speech recognition results. Speech recognition experiments also show that hangbefore criterion has a more important role in speech recognition than hangover criterion.


international conference on systems signals and image processing | 2016

Tennis stroke detection and classification using miniature wearable IMU device

Marko Kos; Jernej Zenko; Damjan Vlaj; Iztok Kramberger

This paper presents work related to tennis stroke detection and classification. For arm movement acquisition a miniature wearable IMU device, positioned on the players forearm (right above the wrist) is proposed and presented. The device uses a MEMS-based accelerometer and gyroscope with 6-DOF. For reliable and accurate tennis stroke detection the information obtained from the accelerometer data is used, and for tennis stroke classification, information from gyroscope data is extracted and processed. The proposed system is able to detect and classify three most common tennis strokes: forehand, backhand, and serve. Because of limited memory and lack of processing power, the proposed algorithms for stroke detection and classification are quite simple, but are nonetheless capable of achieving high classification rate. Overall 98.1% tennis stroke classification accuracy was achieved.


international conference on systems signals and image processing | 2016

Pulse rate variability and blood oxidation content identification using miniature wearable wrist device

Jernej Zenko; Marko Kos; Iztok Kramberger

This paper presents the work related to the identification of PRV (pulse rate variability) and SpO2 (blood oxidation content) using miniature wearable wrist device. The extension of currently widely used PPT (photo plethysmography) measuring technique is proposed. Most PPT devices only measure PR (pulse rate), but with minimum adaptations to the sensor, a wider range of parameters can be measured. Two of such parameters are PRV and SpO2. This set of parameters gives a better insight into ones physical and mental state than only PR parameter. In order to develop a device that does not obstruct the user in any way, we propose a miniature non-invasive wearable device, and algorithms that do not need any significant processing power in order to identify the biometric parameters. The device is a lightweight battery powered embedded system, that measures and analyses biometric data for later analysis in correlation with ones activity.


international conference on systems signals and image processing | 2007

Noise Reduction Algorithm for Robust Speech Recognition Using Minimum Statistics Method and Neural Network VAD

Marko Kos

In this paper we present basic ideas of noise reduction for robust speech recognition using minimum statistic algorithm and VAD based on neural networks. Noise estimation is based on minimum statistic procedure and noise subtraction in spectral space is performed based on neural network VAD output. For noise subtraction two different subtraction factors are used. If VAD output indicates noise frame, subtraction is carried out with one subtraction factor. On the other hand if VAD output value indicates speech frame, than subtraction with the other subtraction factor is carried out. Research and tests have been performed on German part of Aurora3 database. Performance was tested according to ETSI ES 201/108 standard. During testing several combinations of parameters have been experimented and optimum values were defined.


Computers & Electrical Engineering | 2017

A speech-based distributed architecture platform for an intelligent ambience☆

Marko Kos; Matej Rojc; Andrej Žgank; Zdravko Kacic; Damjan Vlaj

Abstract In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction between several users and the environment. Furthermore, the sophisticated client/server platforms architecture incorporates robust speech recognition and text-to-speech synthesis engines for more natural human-machine interaction between users and the mobile unit Genesis. Both engines are multilingual oriented. Although the whole system is developed for the Slovenian language, it can be quickly adapted for other languages when appropriate language resources are available. With high speaker independent speech recognition accuracy and low command-to-operation delay, Genesis proves to have good manoeuvrability and it is easy to operate even by a non-experienced operator.

Collaboration


Dive into the Marko Kos's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrej Zgank

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge