Mehmet Ugur Dogan
Scientific and Technological Research Council of Turkey
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mehmet Ugur Dogan.
signal processing and communications applications conference | 2009
Cemil Demir; Mehmet Ugur Dogan
Using posterior probability based features to segment an audio signal as speech and music has been commonly used method In this study Hidden-Markov-Model (HMM) based acoustic models are used to calculate posterior probabilities. Acoustic Models includes states of context-independent phones as modeling unit. Entropy and Dynamism are found using via the posterior probabilities and these values are used as feature for speech-music discrimination. An HMM based classifier that uses Viterbi decoding is implemented and using discriminative features, audio signals are segmented as speech and music. As a result of the tests, it was found that applied speech-music segmentation method decreasesWord-Error-Rate and increases the speed of recognition.
signal processing and communications applications conference | 2012
Cemil Demir; Mehmet Ugur Dogan; A. Taylan Cemgil; Murat Saraclar
In this study, we analyze the effect of the catalog-based single-channel speech-music separation method, which we proposed previously, on speech recognition performance. In the proposed method, assuming that we know a catalog of the background music, we developed a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). In this paper, we propose to recover the speech signals from the mixed signal in time-domain by detecting the active catalog frames using the catalog-based method. We compare the performances of 3 different signal reconstruction techniques; Expectation-Based, Posterior-Based and Time-Domain reconstruction. Moreover, we compare the performance of our system with the performance of the traditional NMF model. Our method outperforms the NMF method in ASR performance and separation performance in most experimental conditions.
international symposium on computer and information sciences | 2005
Hasan Palaz; Alper Kanak; Yücel Bicil; Mehmet Ugur Dogan
Turkish Recognition ENgine (TREN) is a modular, Hidden Markov Model based (HMM-based), speaker independent and Distributed Component Object Model based (DCOM-based) speech recognition system. TREN contains specialized modules that allow a fully interoperable platform including a Turkish speech recognizer, a feature extractor, an end-point detector and a performance monitoring module. TREN deals with the interaction between two layers constituting the distributed architecture of TREN. The first layer is the central server, which applies some speech signal preprocessing and distributes the recognition calls to the appropriate remote servers according to their current CPU load of the recognition process. The second layer is composed of the remote servers performing the critical recognition task. In order to increase the recognition performance, a Turkish telephony speech database with a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and multithreaded recognizer with dynamic load sharing facilities.
signal processing and communications applications conference | 2011
Erdem Ünal; Mehmet Kayaoglu; Berkay Topcu; Mehmet Ugur Dogan
In this work, an acoustic-based sniper detection system prototype geometry and its operational principals are presented from the signal processing perspective. The prototype consists of a microphone network positioned in a specific geometric structure in the three dimensional space. The system depends on estimating the delay of arrival of sound waves reaching the microphones and using the delay information in order to calculate the direction of the sound source. The time difference of arrival problem is solved by using the generalized cross correlation approach. The estimated delay is then transformed into angles using the far field approximation. The found angle is the angle between the sound source and the microphone pair axis. Using the angles calculated for different microphone pairs, the direction is reported with azimuth and elevation. The study reports the simulation results and laboratory experiments.
signal processing and communications applications conference | 2011
Cemil Demir; Mehmet Ugur Dogan; A. Taylan Cemgil; Murat Saraclar
In this study, single-channel speech source separation is carried out to separate the speech from the background music, which degrades the speech recognition performance especially in broadcast news transcription systems. Since the separation is done using single observation of the source signals, the sources have to be previously modeled using training data. Non-negative Matrix Factorization (NMF) methods are used to model the sources. In order to model the source signals, different training data sets, which contain different music and speech data, are created and the effect of the training data sets are analyzed in this study. The performances of the methods are measured not only using separation performance measure but also with speech recognition performance measures.
signal processing and communications applications conference | 2009
Erdem Ünal; Ahmet Afsin Akin; Alper Kanak; Mehmet Ugur Dogan
In this paper, a system that robustly searches and matches a music input signal to a music collection database using a hash table that is constructed from n-grams of reduced tonal profile. Since the problem that is being studied requires high performance, efficiency and scalability, not only the retrieval accuracy should be high, but also the systems workload on the processing unit and the memory should be in acceptable ranges. With respect to other conventional features, the tonal profile features extracted in this work requires much less space. From the tonal features n-gram blocks are constructed and used in a look up table. Whenever the input signal satisfies some constraints, matching and retrieval are performed The results show that the computation performance and the retrieval accuracy is at promising levels.
signal processing and communications applications conference | 2005
Hasan Palaz; Alper Kanak; Yücel Bicil; Mehmet Ugur Dogan; Tuba Islam
TREN (Turkish Recognition ENgine) is a modular, HMM-based (Hidden Markov Model) and speaker-independent speech recognition system whose system software architecture is based on Distributed Component Object Model (DCOM). TREN contains specialized modules that allow a full interoperable platform including a Turkish speech recognizer, feature extractor, end-point detector and a performance monitoring module. TREN has basically two layers: First layer is the central server that distributes the recognition calls to the appropriate remote servers according to their current CPU load of the recognition process after some speech signal preprocessing and the second layer consists of the remote servers which performs the critical recognition task. This component-based architecture enables TREN applicable to distributed environments. TREN is also trained by considering a wide variety of very common words those best represent the Turkish language. In order to obtain a such database a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and multithreaded recognizer with dynamic load sharing facilities.
signal processing and communications applications conference | 2008
Alper Kanak; Mehmet Kayaoglu; Mehmet Ugur Dogan; İibrahim Soğukpınar
When used with a good fingerprint enhancement algorithm, cost-effective and less complex features may perform well for recognition of low-quality fingerprints. Especially, improving systems using such features is a promising point in large-scale fingerprint verification systems. In this study, scores obtained from features which are extracted from the wedge and ring based tessellation of a fingerprint frequency image are fused with the scores of phase-only correlation based matching method. The matching scores obtained from two independent information sources are improved by applying the proposed cascade score-level fusion scheme. In this scheme, first the reliability of the wedge-ring verification scores are evaluated; if the reliability of the wedge-ring features are not sufficient the phase-only correlation scores are used to make a final decision. In order to improve the verification system a Short-Time Fourier Transform (STFT) based enhancement algorithm and complex filtering to detect reference point on a fingerprint image is applied.
signal processing and communications applications conference | 2012
Erdem Ünal; Mehmet Kayaoglu; Berkay Topcu; Hamza Kaya; Mehmet Ugur Dogan
In this work, design and experimental studies related to TUBITAK-BILGEM Shot Estimation System (AKS in Turkish) will be discussed. AKS is composed of three parts which are, a microphone array that has a specific structural design, an electronic unit that enables synchronous recording of the microphone signals and a graphical user interface that prompts the output of the system. Basic goal is to detect the position of a pre-defined sound source in terms of cartesian coordinates with respect to the original position of the system. First, the position of the sound source is detected in the cartesian coordinates by using only the time difference of arrival information. The time difference of arrival problem is solved using the generalized cross correlation function. In order to compensate for the false alarms that is very common in these systems, an energy based and a Mel Frequency Cepstrum Coefficients based two stage classifier is used. System is tested with shots from four different hand held rifles, from two different distances for each 10 degree angles spreading over 180 degrees. The average directional error for 100m shots is 2,69 degrees while the error declines to 1,51 degrees for 200m shots. The precision of the system is %84,6 while the recall is %96,2.
signal processing and communications applications conference | 2012
Ahmet Afsin Akin; Cemil Demir; Mehmet Ugur Dogan
In this study, some solutions for out of vocabulary (OOV) word problem of automatic speech recognition (ASR) systems which are developed for agglutinative languages like Turkish, are examined and an improvement to this problem is proposed. It has been shown that using sub-word language models outperforms word based models by reducing the OOV word ratio in languages with complex morphology. In this work we propose improvements on both statistical and morphological sub-word language modelling techniques by applying language dependent pre-processing on words before applying sub-word segmentation. In our tests, using the largest Turkish broadcast news corpus to date, we had better results in our proposed models comparing baseline statistical and morphological sub-word language models.