Tolga Ciloglu
Middle East Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tolga Ciloglu.
content based multimedia indexing | 2009
Ahmet Saracoglu; Ersin Esen; Tuğrul K. Ateş; Banu Oskay Acar; Ünal Zubari; Ezgi Can Ozan; Egemen Özalp; A. Aydin Alatan; Tolga Ciloglu
Content Based Copy Detection (CBCD) emerges as a viable choice against active detection methodology of watermarking. The very first reason is that the media already under circulation cannot be marked and secondly, CBCD inherently can endure various severe attacks, which watermarking cannot. Although in general, media content is handled independently as visual and audio in this work both information sources are utilized in a unified framework, in which coarse representation of fundamental features are employed. From the copy detection perspective, number of attacks on audio content is limited with respect to visual case. Therefore audio, if present, is an indispensable part of a robust video copy detection system. In this study, the validity of this statement is presented through various experiments on a large data set.
international conference on multimedia and expo | 2000
Tolga Ciloglu; S. Utku Karaaslan
This paper investigates some problems encountered in the all-pass watermarking scheme developed by Yardymcy, et al. (1997) and suggests ways to eliminate them. The system under consideration uses all-pass filters to embed data into a speech or music signal by phase modification in consecutive blocks creating some artefacts in doing so. The two approaches we suggest remove these artefacts while keeping the watermark detectable and all the advantages of the method. The results of experimental studies related to the relationship between the block length, and perceptibility and misdetection rate are presented. Studies are carried out to determine immunity against additive noise and ways are suggested to improve the method.
Speech Communication | 2008
Eren Akdemir; Tolga Ciloglu
The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of articulator motion parameters to the automatic segmentation system. Average absolute boundary error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary class dependent manner using both acoustic and visual phone classes, and the performance of the system in different boundary types is discussed. After analyzing the boundary class dependent performance, the error reduction is increased to 18.0% by using the appropriate feature vectors in selected boundaries.
Computer Speech & Language | 2007
Özgül Salor; Bryan L. Pellom; Tolga Ciloglu; Mübeccel Demirekler
This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20ms of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer.
Speech Communication | 2011
Eren Akdemir; Tolga Ciloglu
Bimodal automatic speech segmentation using visual information together with audio data is introduced. The accuracy of automatic segmentation directly affects the quality of speech processing systems using the segmented database. The collaboration of audio and visual data results in lower average absolute boundary error between the manual segmentation and automatic segmentation results. The information from two modalities are fused at the feature level and used in a HMM based speech segmentation system. A Turkish audiovisual speech database has been prepared and used in the experiments. The average absolute boundary error decreases up to 18% by using different audiovisual feature vectors. The benefits of incorporating visual information are discussed for different phoneme boundary types. Each audiovisual feature vector results in a different performance at different types of phoneme boundaries. The average absolute boundary error decreases by approximately 25% by using audiovisual feature vectors selectively for different boundary classes. Visual data is collected using an ordinary webcam. The proposed method is very convenient to be used in practice.
Computer Speech & Language | 2016
Turgay Koç; Tolga Ciloglu
HighlightsWe propose two interactive source-filter models, ISFMs, for speech production.ISFMs have the capability of producing fine details of glottal flow.A parameter estimation method is developed for determining the model parameters.The algorithm yields ISFMs performing better than linear source-filter model. The linear source-filter model of speech production assumes that the source of the speech sounds is independent of the filter. However, acoustic simulations based on the physical speech production models show that when the fundamental frequency of the source harmonics approaches the first formant of the vocal tract filter, the filter has significant effects on the source due to the nonlinear coupling between them. In this study, two interactive system models are proposed under the quasi steady Bernoulli flow and linear vocal tract assumptions. An algorithm is developed to estimate the model parameters. Glottal flow and the linear vocal tract parameters are found by conventional methods. Rosenberg model is used to synthesize the glottal waveform. A recursive optimization method is proposed to find the parameters of the interactive model. Finally, glottal flow produced by the nonlinear interactive system is computed. The experimental results show that the interactive system model produces fine details of glottal flow source accurately.
Journal of the Acoustical Society of America | 2012
Erdal Mehmetcik; Tolga Ciloglu
The aim of speech enhancement algorithms is to increase the quality and intelligibility of noise degraded speech signals. Classical algorithms make modifications in the magnitude spectrum of the noise degraded signal and leave the phase spectrum unchanged. Leaving the phase spectrum unchanged relies on the results of the early listening tests, in which it is concluded that a better phase estimation does not have a significant effect on speech quality. However, a poor phase estimate causes a noticeable distortion in the reconstructed signal. In this work a new phase estimation method (in the voiced segments of speech) is proposed. This method is then incorporated into classical (magnitude based) enhancement algorithms and a new enhancement method is put forward. The performance of the proposed enhancement method is tested with the ITU-T standard PESQ measure and the results are presented.
international symposium on circuits and systems | 1994
Tolga Ciloglu; Zafer Ünver
A local search algorithm for discrete coefficient FIR filter design is presented. The minmax objective function is minimized by moving along the low gradient directions. A new method to forecast these directions is proposed. The algorithm is suitable to design high order filters in a short time. The results are compared to other methods both in quality and computational load.<<ETX>>
international conference on acoustics, speech, and signal processing | 1993
Tolga Ciloglu; Zafer Ünver
A novel method for the application of simulated annealing to discrete coefficient design of FIR (finite impulse response) digital filters in the minimax sense is presented. The characteristics of the problem are investigated to tailor the simulated annealing algorithm to the particular case at hand and, especially, to provide an improvement so that it is capable of handling high-order filters in a reasonable computer time. The details of the algorithm and examples, including filters of length 125, are given, and the proposed method is compared with other methods.<<ETX>>
Journal of Applied Mathematics | 2014
Turgay Koç; Tolga Ciloglu
An automatic method for segmenting glottis in high speed endoscopic video (HSV) images of vocal folds is proposed. The method is based on image histogram modeling. Three fundamental problems in automatic histogram based processing of HSV images, which are automatic localization of vocal folds, deformation of the intensity distribution by nonuniform illumination, and ambiguous segmentation when glottal gap is small, are addressed. The problems are solved by using novel masking, illumination, and reflectance modeling methods. The overall algorithm has three stages: masking, illumination modeling, and segmentation. Firstly, a mask is determined based on total variation norm for the region of interest in HSV images. Secondly, a planar illumination model is estimated from consecutive HSV images and reflectance image is obtained. Reflectance images of the masked HSV are used to form a vertical slice image whose reflectance distribution is modeled by a Gaussian mixture model (GMM). Finally, estimated GMM is used to isolate the glottis from the background. Results show that proposed method provides about 94% improvements with respect to manually segmented data in contrast to conventional method which uses Rayleigh intensity distribution in extracting the glottal areas.