Serap Kirbiz
Istanbul Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Serap Kirbiz.
international conference on acoustics, speech, and signal processing | 2006
Serap Kirbiz; Bilge Gunsel
Most of the watermark (WM) decoding schemes use correlation-based methods because of their simplicity. In these methods, the WM signal embedded through a secret key is assumed as uncorrelated with the host signal. This is a hard restriction that can never be achieved and correlation between the received signal and the secret key becomes greater than zero even though the received signal is un-watermarked. Mostly a decision threshold specified semi-automatically is used at the decoding site. Since the audio watermarking is a nonlinear process that guarantees the inaudibility, there is no analytic way of determining an optimal threshold value that makes the WM decoding problem harder. This paper introduces a learning scheme followed by a nonlinear classification thus eliminates the threshold specification problem. The decoding process is modelled as a three-class classification problem and support vector machines (SVMs) are used in the learning of the embedded data. The decoding and detection performances of the developed system are greater than 98% and 95%, respectively. When the watermark-to-signal-ratio (WSR) is higher than -30 dB, system false alarm ratios remain less than 2%. It is shown that the introduced WM decoding method is robust to additive noise and most of add/remove and filter attacks of Stirmark
Digital Signal Processing | 2013
Serap Kirbiz; Bilge Gunsel
This paper proposes a 2D Non-negative Matrix Factorization (NMF) based single-channel source separation algorithm that emphasizes perceptually important components of audio. Unlike the existing methods, the proposed scheme performs a psychoacoustic pre-processing on the mixture spectrogram in order to supress audio components that are not critical to human hearing sensation while amplifying the perceptually important ones. This yields the auditory spectrogram referred as sonogram of the observed audio mixture and the individual sources are then extracted by 2D NMF. Test results reported in terms of Signal-to-Distortion-Ratio (SDR), Signal-to-Inference-Ratio (SIR) and Signal-to-Artifact-Ratio (SAR) show that the proposed perceptually enhanced separation improves the quality of decomposed audio sources by 1.5-6.5 dB with a reduced computational complexity.
international conference on acoustics, speech, and signal processing | 2011
Serap Kirbiz; Paris Smaragdis
In this paper, we propose an adaptive time-frequency resolution approach for the single channel source separation problem. The aim is to improve the quality and intelligibility of the separated sources by adapting the time-frequency resolution of the analysis window to the characteristic of the signal under consideration. The results evaluated on a large test set show the improvements obtained by the proposed algorithm.
IEEE Transactions on Information Forensics and Security | 2007
Serap Kirbiz; Aweke Negash Lemma; Mehmet Utku Celik; Stefan Katzenbeisser
In digital rights-management systems, forensic watermarking complements encryption and deters the capture and unauthorized redistribution of the rendered content. In this paper, we propose a novel watermarking method which is integrated into the advanced audio coding (AAC) standards decoding process. For predefined frequency bands, the method intercepts and modifies the scale factors, which are utilized for dequantization of spectral coefficients. It thereby modulates the short-time envelope of the bandlimited audio and embeds a watermark which is robust to various attacks, such as capture with a microphone and recompression at lower bit rates. Inclusion of watermark embedding in the AAC decoder has practically no effect on the decoding complexity. As a result, the proposed method can be integrated even into resource-constrained devices, such as portable players without any additional hardware.
Signal Processing | 2014
Serap Kirbiz; Bilge Gunsel
Abstract We propose a single channel audio source separation method to alleviate the smearing effects caused by fixed time-frequency (TF) resolution Short-Time Fourier Transform (STFT). We introduce a multiresolution representation based on Non-negative Tensor Factorization (NTF) where each layer of the tensor represents the mixture signal at a different time-frequency resolution. In order to fuse the information at different layers, the source separation is modeled as a joint optimization problem where the optimal solution is derived based on the Kullback–Leibler (KL) divergence. The resynthesis is made through an additional adaptive weighted fusion procedure which combines the sources separated at different scales by maximizing energy concentration. Numerical results over a large sound database indicate that the proposed joint optimization scheme enhances the quality of the separated sources both in terms of the conventional and the perceptual distortion measures.
international conference on pattern recognition | 2010
Serap Kirbiz; A. Taylan Cemgil; Bilge Gunsel
In this paper we develop a probabilistic interpretation and a full Bayesian inference for non-negative matrix deconvolution (NMFD) model. Our ultimate goal is unsupervised extraction of multiple sound objects from a single channel auditory scene. The proposed method facilitates automatic model selection and determination of the sparsity criteria. Our approach retains attractive features of standard NMFD based methods such as fast convergence and easy implementation. We demonstrate the use of this algorithm in the log-frequency magnitude spectrum domain, where we employ it to perform model order selection and control sparseness directly.
signal processing and communications applications conference | 2009
Serap Kirbiz; Bilge Gunsel
This paper proposes a single-channel audio source decomposition method that integrates perceptual quality criteria into source separation. Unlike the existing methods, the proposed method applies a perceptually weighted non-negative matrix factorization on log-frequency spectrogram of the mixed signal. The weights are adaptively calculated for each critical band based on a perceptual model described by ITU-R BS. 1387 perceptual quality standard. It is shown that the proposed adaptive weighting scheme significantly improves the quality of audio sources estimated by minimizing the weighted divergence between the observed log-frequency spectrogram and the model.
international conference on pattern recognition | 2006
Bilge Gunsel; Serap Kirbiz
Conventional blind watermark (WM) decoding schemes use correlation-based decision rules because of their simplicity. Drawback of the correlator decoders is their performance relies on the decision threshold. Existence of an undesirable correlation between the WM data embedded through a secret key and the host signal makes the decision threshold specification harder, especially in noisy channels. To overcome this drawback, we propose a SVM-based decoding scheme which is capable of learning the embedded WM data in wavelet domain. It is shown that both decoding and detection performance of the introduced WM extraction technique outperforms state-of-the-art correlation-based schemes. Test results demonstrate that learning in the wavelet domain improves robustness to attacks while reducing complexity
international conference on pattern recognition | 2008
Yener Ulker; Bilge Gunsel; Serap Kirbiz
In contrast to the fixed rate modeling of the conventional methods, recently introduced variable rate particle filters (VRPF) achieves to track maneuvering objects with a small number of states by imposing a probability distribution on state arrival times. Although this enables VRPF an appealing method, representing the target motion dynamics with a single model hinders the capability of estimating maneuver parameters precisely. To overcome this weakness we have incorporated multiple model approach with the variable rate model structure. The introduced model referred as Multiple Model Variable Rate Particle Filter (MM-VRPF) utilizes a parsimonious representation for smooth regions of trajectory while it adaptively locates frequent state points at high maneuver regions, resulting in a much more accurate tracking. Simulation results obtained in a bearings-only target tracking problem show that the proposed model outperforms the conventional VRPF, the fixed rate multiple model particle filters (MMPF) and interacting multiple model using extended Kalman filters (IMM-EKF).
acm multimedia | 2006
Bilge Gunsel; Yener Ulker; Serap Kirbiz
This paper introduces an integrated GMM-based blind audio watermark (WM) detection and decoding scheme that eliminates the decision threshold specification problem which constitutes drawback of the conventional decoders. The proposed method models the statistics of watermarked and original audio signals by Gaussian mixture models (GMM) with K components. Learning of the WM data is achieved in wavelet domain and a Maximum Likelihood (ML) classifier is designed for the WM decoding. Dimension of the learning space is optimized by PCA transformation. Robustness to compression, additive noise and the Stirmark benchmark attacks has been evaluated. It is shown that both WM decoding and detection performance of the introduced integrated scheme outperforms conventional correlation-based decoders. Test results demonstrate that learning in the wavelet domain improves robustness to attacks while reducing complexity. Although performance of the proposed GMM-modeling is slightly better than the SVM-based decoder introduced in [1], significant decrease in computational complexity makes the new method appealing.