Umit Guz
Işık University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Umit Guz.
spoken language technology workshop | 2006
Gökhan Tür; Umit Guz; Dilek Hakkani-Tür
In this paper, we analyze the effect of model adaptation for dialog act tagging. The goal of adaptation is to improve the performance of the tagger using out-of-domain data or models. Dialog act tagging aims to provide a basis for further discourse analysis and understanding in conversational speech. In this study we used the ICSI meeting corpus with high-level meeting recognition dialog act (MRDA) tags, that is, question, statement, backchannel, disruptions, and floor grabbers/holders. We performed controlled adaptation experiments using the Switchboard (SWBD) corpus with SWBD-DAMSL tags as the out-of-domain corpus. Our results indicate that we can achieve significantly better dialog act tagging by automatically selecting a subset of the Switchboard corpus and combining the confidences obtained by both in-domain and out-of-domain models via logistic regression, especially when the in-domain data is limited.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Umit Guz; Sébastien Cuendet; Dilek Hakkani-Tür; Gökhan Tür
Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.
Computer Speech & Language | 2010
Umit Guz; Gökhan Tür; Dilek Hakkani-Tür; Sébastien Cuendet
There are many speech and language processing problems which require cascaded classification tasks. While model adaptation has been shown to be useful in isolated speech and language processing tasks, it is not clear what constitutes system adaptation for such complex systems. This paper studies the following questions: In cases where a sequence of classification tasks is employed, how important is to adapt the earlier or latter systems? Is the performance improvement obtained in the earlier stages via adaptation carried on to later stages in cases where the later stages perform adaptation using similar data and/or methods? In this study, as part of a larger scale multiparty meeting understanding system, we analyze various methods for adapting dialog act segmentation and tagging models trained on conversational telephone speech (CTS) to meeting style conversations. We investigate the effect of using adapted and unadapted models for dialog act segmentation with those of tagging, showing the effect of model adaptation for cascaded classification tasks. Our results indicate that we can achieve significantly better dialog act segmentation and tagging by adapting the out-of-domain models, especially when the amount of in-domain data is limited. Experimental results show that it is more effective to adapt the models in the latter classification tasks, in our case dialog act tagging, when dealing with a sequence of cascaded classification tasks.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
Umit Guz; Benoit Favre; Dilek Hakkani-Tür; Gökhan Tür
This paper presents novel methods for generative, discriminative, and hybrid sequence classification for segmentation of Turkish word sequences into sentences. In the literature, this task is generally solved using statistical models that take advantage of lexical information among others. However, Turkish has a productive morphology that generates a very large vocabulary, making the task much harder. In this paper, we introduce a new set of morphological features, extracted from words and their morphological analyses. We also extend the established method of hidden event language modeling (HELM) to factored hidden event language modeling (fHELM) to handle morphological information. In order to capture non-lexical information, we extract a set of prosodic features, which are mainly motivated from our previous work for other languages. We then employ discriminative classification techniques, boosting and conditional random fields (CRFs), combined with fHELM, for the task of Turkish sentence segmentation.
international symposium on circuits and systems | 2004
Umit Guz; Hakan Gurkan; B. Siddik Yarman
In this paper, a new modeling method of speech signals is introduced. The proposed method is based on the generation of the so-called predefined signature S = {s/sub R/(t)} and envelope function E = {e/sub K/ (t)} sets (PSEFS). The function sets are independent of any speaker and any language. Once the speech signals are divided into frames with selected lengths, then each frame signal piece X/sub i/(t) is synthesized by means of the mathematical form of x/sub i/(t) = C/sub i/e/sub K/(t)s/sub R/(t). In this representation, C/sub i/ is called the frame coefficient, s/sub R/(t) and e/sub K/(t) are properly assigned from PSEFS respectively. It is shown that the proposed method provides fast reconstruction and substantial compression with acceptable hearing quality.
european conference on circuit theory and design | 2007
Hakan Gurkan; Umit Guz; Binboga Siddik Yarman
In this paper, a novel method to compress electroencephalogram (EEG) signal is proposed. The proposed method is based on the generation classified signature and envelope vector sets (CSEVS) by using an effective k-means clustering algorithm. In this work, on a frame basis, any EEG signal is modeled by multiplying three parameters as called the classified signature vector, classified envelope vector, and frame-scaling coefficient. In this case, EEG signal for each frame is described in terms of the two indices R and K of CSEVS and the frame-scaling coefficient. The proposed method is assessed through the use of root-mean-square error (RMSE) and visual inspection measures. The proposed method achieves good compression ratios with low level reconstruction error while preserving diagnostic information in the reconstructed EEG signal.
international conference of the ieee engineering in medicine and biology society | 2013
Hakan Gurkan; Umit Guz; Binboga Siddik Yarman
In this work, we present a novel biometric authentication approach based on combination of AC/DCT features, MFCC features, and QRS beat information of the ECG signals. The proposed approach is tested on a subset of 30 subjects selected from the PTB database. This subset consists of 13 healthy and 17 non-healthy subjects who have two ECG records. The proposed biometric authentication approach achieves average frame recognition rate of %97.31 on the selected subset. Our experimental results imply that the frame recognition rate of the proposed authentication approach is better than that of ACDCT and MFCC based biometric authentication systems, individually.
EURASIP Journal on Advances in Signal Processing | 2011
Umit Guz
In this paper, a novel image compression method based on generation of the so-called classified energy and pattern blocks (CEPB) is introduced and evaluation results are presented. The CEPB is constructed using the training images and then located at both the transmitter and receiver sides of the communication system. Then the energy and pattern blocks of input images to be reconstructed are determined by the same way in the construction of the CEPB. This process is also associated with a matching procedure to determine the index numbers of the classified energy and pattern blocks in the CEPB which best represents (matches) the energy and pattern blocks of the input images. Encoding parameters are block scaling coefficient and index numbers of energy and pattern blocks determined for each block of the input images. These parameters are sent from the transmitter part to the receiver part and the classified energy and pattern blocks associated with the index numbers are pulled from the CEPB. Then the input image is reconstructed block by block in the receiver part using a mathematical model that is proposed. Evaluation results show that the method provides considerable image compression ratios and image quality even at low bit rates.In this paper, a novel image compression method based on generation of the so-called classified energy and pattern blocks (CEPB) is introduced and evaluation results are presented. The CEPB is constructed using the training images and then located at both the transmitter and receiver sides of the communication system. Then the energy and pattern blocks of input images to be reconstructed are determined by the same way in the construction of the CEPB. This process is also associated with a matching procedure to determine the index numbers of the classified energy and pattern blocks in the CEPB which best represents (matches) the energy and pattern blocks of the input images. Encoding parameters are block scaling coefficient and index numbers of energy and pattern blocks determined for each block of the input images. These parameters are sent from the transmitter part to the receiver part and the classified energy and pattern blocks associated with the index numbers are pulled from the CEPB. Then the input image is reconstructed block by block in the receiver part using a mathematical model that is proposed. Evaluation results show that the method provides considerable image compression ratios and image quality even at low bit rates.
signal processing and communications applications conference | 2008
Hakan Gurkan; Umit Guz; B.S. Yarman
In this paper, a novel method to compress electroencephalogram (EEG) signal is proposed. The proposed method is based on the generation classified signature and envelope vector sets (CSEVS) by using an effective k-means clustering algorithm. In this work, on a frame basis, any EEG signal is modeled by multiplying three parameters as called the classified signature vector, classified envelope vector, and frame-scaling coefficient. In this case, EEG signal for each frame is described in terms of the two indices R and K of CSEVS and the frame-scaling coefficient. The proposed method is assessed through the use of root-mean-square error (RMSE) and visual inspection measures. The proposed method achieves good compression ratios with low level reconstruction error while preserving diagnostic information in the reconstructed EEG signal.
international symposium on circuits and systems | 2006
Umit Guz; Hakan Gurkan; B. Siddik Yarman
In this paper, the method of speech modeling which is called SYMPES is introduced and it is compared with the commercially available methods. It is shown that for the same compression ratio or better, SYMPES yields considerably better hearing quality over the coders such as G.726 at 16 Kbps and voice excited LPC-10E of 2.4Kbps