Erinç Dikici
Boğaziçi University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Erinç Dikici.
Journal on Multimodal User Interfaces | 2008
Oya Aran; Ismail Ari; Lale Akarun; Erinç Dikici; Siddika Parlak; Murat Saraclar; Pavel Campr; Marek Hrúz
The objective of this study is to automatically extract annotated sign data from the broadcast news recordings for the hearing impaired. These recordings present an excellent source for automatically generating annotated data: In news for the hearing impaired, the speaker also signs with the hands as she talks. On top of this, there is also corresponding sliding text superimposed on the video. The video of the signer can be segmented via the help of either the speech or both the speech and the text, generating segmented, and annotated sign videos. We call this application as Signiary, and aim to use it as a sign dictionary where the users enter a word as text and retrieve sign videos of the related sign. This application can also be used to automatically create annotated sign databases that can be used for training recognizers.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Erinç Dikici; Murat Semerci; Murat Saraclar; Ethem Alpaydin
Discriminative language modeling (DLM) is a feature-based approach that is used as an error-correcting step after hypothesis generation in automatic speech recognition (ASR). We formulate this both as a classification and a ranking problem and employ the perceptron, the margin infused relaxed algorithm (MIRA) and the support vector machine (SVM). To decrease training complexity, we try count-based thresholding for feature selection and data sampling from the list of hypotheses. On a Turkish morphology based feature set we examine the use of first and higher order n -grams and present an extensive analysis on the complexity and accuracy of the models with an emphasis on statistical significance. We find that we can save significantly from computation by feature selection and data sampling, without significant loss in accuracy. Using the MIRA or SVM does not lead to any further improvement over the perceptron but the use of ranking as opposed to classification leads to a 0.4% reduction in word error rate (WER) which is statistically significant.
international conference on acoustics, speech, and signal processing | 2012
Arda Çelebi; Hasim Sak; Erinç Dikici; Murat Saraclar; Maider Lehr; Emily Prud'hommeaux; Puyang Xu; Nathan Glenn; Damianos Karakos; Sanjeev Khudanpur; Brian Roark; Kenji Sagae; Izhak Shafran; Daniel M. Bikel; Chris Callison-Burch; Yuan Cao; Keith B. Hall; Eva Hasler; Philipp Koehn; Adam Lopez; Matt Post; Darcey Riley
We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.
signal processing and communications applications conference | 2008
Erinç Dikici; Murat Saraclar
In this study, a method is proposed for the recognition of sliding text in broadcast news videos. Video frames are converted into binary images, from which horizontal and vertical projection histograms are extracted to determine the position of the text band. After some noise removal operations, which make use of the redundancy between video frames, the text image is segmented into individual characters by connected component analysis. Template matching is used for character recognition. The strings obtained by recognition in consecutive images are aligned in space and time, which leads to a complete transcription of the whole program. Some similarly shaped characters may be confused with each other. To overcome this, transformation based learning algorithm is used and corrective rules are learned from a training text. The proposed method achieves 99% character recognition accuracy and 92% word recognition accuracy.
Journal on Multimodal User Interfaces | 2011
Marek Hrúz; Pavel Campr; Erinç Dikici; Ahmet Alp Kindiroglu; Z. Krňoul; Alexander L. Ronzhin; Hasim Sak; Daniel Schorno; Hulya Yalcin; Lale Akarun; Oya Aran; Alexey Karpov; Murat Saraclar; M. Železný
The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice versa. Different spoken languages and sign languages such as English, Russian, Turkish and Czech are considered.
international symposium on computer and information sciences | 2009
Erinç Dikici; Murat Saraclar
GMM supervectors are among the most popular feature sets used in SVM-based text-independent speaker verification systems. Most of the studies use only a single supervector to represent speaker characteristics, against a set of background samples. An alternative would be to divide the total training duration into smaller pieces to increase the number of supervec-tors for training the minority (speaker) class. Similarly, total test duration could also be partitioned, letting the final verification be made by majority voting over decisions on smaller durations. We explore the performance of speaker verification systems in terms of EER and minDCF by breaking down the input sequence into durations of 4 minutes, 1 minute and 10 seconds. We try different training/test data amounts to investigate the generalizability of this approach. Working on the CSLU Speaker Recognition Dataset, we show that the lowest error rates are obtained when the training supervector representative duration is set equal to that of the test samples.
international conference on speech and computer | 2015
Murat Saraclar; Erinç Dikici; Ebru Arısoy
This paper summarizes the research on discriminative language modeling focusing on its application to automatic speech recognition (ASR). A discriminative language model (DLM) is typically a linear or log-linear model consisting of a weight vector associated with a feature vector representation of a sentence. This flexible representation can include linguistically and statistically motivated features that incorporate morphological and syntactic information. At test time, DLMs are used to rerank the output of an ASR system, represented as an N-best list or lattice. During training, both negative and positive examples are used with the aim of directly optimizing the error rate. Various machine learning methods, including the structured perceptron, large margin methods and maximum regularized conditional log-likelihood, have been used for estimating the parameters of DLMs. Typically positive examples for DLM training come from the manual transcriptions of acoustic data while the negative examples are obtained by processing the same acoustic data with an ASR system. Recent research generalizes DLM training by either using automatic transcriptions for the positive examples or simulating the negative examples.
signal processing and communications applications conference | 2014
Erinç Dikici; Murat Saraclar
As a final stage in an automatic speech recognition system, discriminative language modeling (DLM) aims to choose the most accurate word sequence among alternatives which are used as training examples. For supervised training, the manual transriptions of the spoken utterance are available. For unsupervised training this information is not present, therefore the level of accuracy of the training examples is not known. In this study we investigate methods to estimate these accuracies, and execute DLM training by using the perceptron algorithm adapted for structured prediction and reranking problems. The results show that with unsupervised training, it is possible to achieve improvements up to half of the gains obtained with the supervised case.
european signal processing conference | 2015
Erinç Dikici; Murat Saraclar
Discriminative language modeling (DLM) is used as a postprocessing step to correct automatic speech recognition (ASR) errors. Traditional DLM training requires a large number of ASR N-best lists together with their reference transcriptions. It is possible to incorporate additional text data into training via artificial hypothesis generation through confusion modeling. A weighted finite-state transducer (WFST) or a machine translation (MT) system can be used to generate the artificial hypotheses. When the reference transcriptions are not available, training can be done in an unsupervised way via a target output selection scheme. In this paper we adapt the MT-based artificial hypothesis generation approach to un-supervised discriminative language modeling, and compare it with the WFST-based setting. We achieve improvements in word error rate of up to 0.7% over the generative baseline, which is significant at p <; 0.001.
signal processing and communications applications conference | 2013
Erinç Dikici; Murat Saraclar
Discriminative language modeling is a technique used for correcting automatic speech recognition errors, and can be handled as a classification or a ranking problem. The aim of curriculum learning is to train the model with examples or concepts of gradually increasing level of difficulty. In this work, we use the classification and ranking versions of the perceptron algorithm and investigate three different curriculum learning approaches based on selection, ordering and clustering of the training examples. The results show that curriculum learning can help increase the performance of a classifying perceptron system, and with the ranking perceptron, it is possible achieve similar system performance with a shorter training time.