Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ivan Tashev is active.

Publication


Featured researches published by Ivan Tashev.


acm multimedia | 2002

Distributed meetings: a meeting capture and broadcasting system

Ross Cutler; Yong Rui; Anoop Gupta; Jonathan J. Cadiz; Ivan Tashev; Li-wei He; Alex Colburn; Zhengyou Zhang; Zicheng Liu; Steve Silverberg

The common meeting is an integral part of everyday life for most workgroups. However, due to travel, time, or other constraints, people are often not able to attend all the meetings they need to. Teleconferencing and recording of meetings can address this problem. In this paper we describe a system that provides these features, as well as a user study evaluation of the system. The system uses a variety of capture devices (a novel 360° camera, a whiteboard camera, an overview camera, and a microphone array) to provide a rich experience for people who want to participate in a meeting from a distance. The system is also combined with speaker clustering, spatial indexing, and time compression to provide a rich experience for people who miss a meeting and want to watch it afterward.


international conference on acoustics, speech, and signal processing | 2005

A new beamformer design algorithm for microphone arrays

Ivan Tashev; Henrique S. Malvar

This paper presents a generic beamformer design algorithm for arbitrary microphone array geometry. It makes efficient use of noise models for ambient and instrumental and microphone directivity patterns. By using a new definition of the target criterion and replacing a multi-dimensional optimization with a much simpler one-dimensional search, we can compute near-optimal solutions in reasonable time. The designed beams achieve noise suppression levels between 10 and 15 dB, for microphone arrays with four to eight elements, and linear and circular geometries. The fast beamformer real-time processing engine consumes less than 2% of the CPU power of a modern personal computer, for a four-microphone array.


international conference on acoustics, speech, and signal processing | 2014

HRTF magnitude synthesis via sparse representation of anthropometric features

Piotr Tadeusz Bilinski; Jens Ahrens; Mark R. P. Thomas; Ivan Tashev; John Platt

We propose a method for the synthesis of the magnitudes of Head-related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subjects anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the magnitudes of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subjects anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF tensor data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For instrumental evaluation we use log-spectral distortion. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.


IEEE Signal Processing Magazine | 2013

Kinect Development Kit: A Toolkit for Gesture- and Speech-Based Human-Machine Interaction [Best of the Web]

Ivan Tashev

Kinect is a device for human?machine interaction, which adds two more input modalities to the palette of the user interface designer: gestures and speech. Kinect is transforming how people interact with computers, kiosks, and other motion-controlled devices from fun applications like playing a virtual violin, to applications in health care and physical therapy, retail, education, and training.


international conference on acoustics, speech, and signal processing | 2009

Voice search of structured media data

Young-In Song; Ye-Yi Wang; Yun-Cheng Ju; Michael L. Seltzer; Ivan Tashev; Alex Acero

This paper addresses the problem of using unstructured queries to search a structured database in voice search applications. By incorporating structural information in music metadata, the end-to-end search error has been reduced by 15% on text queries and up to 11% on spoken queries. Based on that, an HMM sequential rescoring model has reduced the error rate by 28% on text queries and up to 23% on spoken queries compared to the baseline system. Furthermore, a phonetic similarity model has been introduced to compensate speech recognition errors, which has improved the end-to-end search accuracy consistently across different levels of speech recognition accuracy.


international conference on acoustics, speech, and signal processing | 2007

Robust Adaptive Beamforming Algorithm using Instantaneous Direction of Arrival with Enhanced Noise Suppression Capability

Byung-Jun Yoon; Ivan Tashev; Alex Acero

In this paper, we propose a novel adaptive beamforming algorithm with enhanced noise suppression capability. The proposed algorithm incorporates the sound-source presence probability into the adaptive blocking matrix, which is estimated based on the instantaneous direction of arrival of the input signals and voice activity detection. The proposed algorithm guarantees robustness to steering vector errors without imposing ad hoc constraints on the adaptive filter coefficients. It can provide good suppression performance for both directional interference signals as well as isotropic ambient noise. For in-car environment the proposed beamformer shows SNR improvement up to 12 dB without using an additional noise suppressor.


conference on emerging network experiment and technology | 2011

Reclaiming the white spaces: spectrum efficient coexistence with primary users

George Nychis; Ranveer Chandra; Thomas Moscibroda; Ivan Tashev; Peter Steenkiste

TV white spaces offer an exciting opportunity for increasing spectrum availability, but white space devices (WSDs) cannot interfere with primary users, including TV channels and wireless microphones (mics). Mics are particularly challenging because their use is dynamic and it is hard to avoid interference since mic receivers are receive-only devices. For this reason the FCC and other regulatory agencies have made very conservatives rules that require WSDs to vacate any TV channel that is used by a mic. However, our measurements show that mics typically require only 5% of a channel, wasting as much as 95% of the spectrum. We present SEISMIC, a systems that enables WSDs and mics to operate on the same TV channel with zero audible mic interference. SEISMIC implements a MicProtector to measure the interference at the mic receiver and a signaling protocol to notify the WSD of impending interference. This allows the WSD to optimize its transmission (e.g. through subcarrier suppression) without impacting mics. We motivate and describe SEISMIC and present a detailed performance analysis that shows that SEISMIC can regain up to 95% of the spectrum in single mic scenarios, and up to 85% in many (10+) mic environments.


information theory and applications | 2014

HRTF phase synthesis via sparse representation of anthropometric features

Ivan Tashev

We propose a method for the synthesis of the phases of Head-Related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subjects anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the group delay of a given HRTF set can be described by thWe propose a method for the synthesis of the phases of Head-Related Transfer Functions (HRTFs) using a sparse representation of anthropometric features. Our approach treats the HRTF synthesis problem as finding a sparse representation of the subjects anthropometric features w.r.t. the anthropometric features in the training set. The fundamental assumption is that the group delay of a given HRTF set can be described by the same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subjects anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF group delay data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For objective evaluation we use the mean square error of the group delay scaling factor. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.e same sparse combination as the anthropometric data. Thus, we learn a sparse vector that represents the subjects anthropometric features as a linear superposition of the anthropometric features of a small subset of subjects from the training data. Then, we apply the same sparse vector directly on the HRTF group delay data. For evaluation purpose we use a new dataset, containing both anthropometric features and HRTFs. We compare the proposed sparse representation based approach with ridge regression and with the data of a manikin (which was designed based on average anthropometric data), and we simulate the best and the worst possible classifiers to select one of the HRTFs from the dataset. For objective evaluation we use the mean square error of the group delay scaling factor. Experiments show that our sparse representation outperforms all other evaluated techniques, and that the synthesized HRTFs are almost as good as the best possible HRTF classifier.


international conference on acoustics, speech, and signal processing | 2007

Microphone Array Post-Filter using Incremental Bayes Learning to Track the Spatial Distributions of Speech and Noise

Michael L. Seltzer; Ivan Tashev; Alex Acero

While current post-filtering algorithms for microphone array applications can enhance beamformer output signals, they assume that the noise is either incoherent or diffuse, and make no allowances for point noise sources which may be strongly correlated across the microphones. In this paper, we present a novel post-filtering algorithm that alleviates this assumption by tracking the spatial as well as spectral distribution of the speech and noise sources present. A generative statistical model is employed to model the speech and noise sources at distinct regions in the soundfield, and incremental Bayesian learning is used to track the model parameters over time. This approach allows a post-filter derived from these parameters to effectively suppress both diffuse ambient noise and interfering point sources. The performance of the proposed approach is evaluated on multiple recordings made in a realistic office environment.


pacific rim conference on communications, computers and signal processing | 2009

Unified framework for single channel speech enhancement

Ivan Tashev; Andrew William Lovitt; Alex Acero

In this paper we describe a generic architecture for single channel speech enhancement. We assume processing in frequency domain and suppression based speech enhancement methods. The framework consists of a two stage voice activity detector, noise variance estimator, a suppression rule, and an uncertain presence of the speech signal modifier. The evaluation corpus is a synthetic mixture of a clean speech (TIMIT database) and in-car recorded noises. Using the framework multiple speech enhancement algorithms are tuned for maximum performance. We propose a formalized procedure for automated tuning of these algorithms. The optimization criterion is a weighted sum of the mean opinion score (PESQ-MOS), signalto-noise-ratio (SNR), log-spectral distance (LSD), and mean square error (MSE). The proposed framework provides a complete speech enhancement chain and can be used for evaluation and tuning of other suppression rules and voice activity detector algorithms.

Collaboration


Dive into the Ivan Tashev's collaboration.

Researchain Logo
Decentralizing Knowledge