Askar Hamdulla
Xinjiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Askar Hamdulla.
international conference on signal processing | 2008
Kurban Ubul; Askar Hamdulla; Alim Aysa; Abdiryim Raxidin; Rahim Mahmut
Many techniques have been reported for handwriting-based writer identification, but none of these techniques presented that the written text is in Uyghur. In this paper, we propose a technique for off-line writer identification for Uyghur handwriting. Texture features were extracted for wide range of frequency and orientation because of the nature of the Uyghur handwriting. 144 features are extracted from the handwriting sample using 2-D Gabor filters.By applying feature selection and extraction methods on this set of features, subsets of lower dimensionality are obtained. The most discriminant features were selected with a model for feature selection using genetic algorithm techniques. Three classification techniques were used: support vector machine (SVM), weighted Euclidean distance (WED), and the K nearest neighbours (K_NN) classifier. Experiments were performed using Uyghur handwriting samples from 23 different people and very promising results of 88.0% correct identification rate were achieved.
international conference on signal processing | 2010
Mijit Ablimit; Graham Neubig; Masato Mimura; Shinsuke Mori; Tatsuya Kawahara; Askar Hamdulla
Uyghur language is an agglutinative language in which words are formed by suffixes attaching to a stem (or root). Because of the explosive nature in vocabulary of the agglutinative languages, several morpheme-based language models are built and experiments are implemented. Morpheme is the smallest meaning bearing unit. In this research, morpheme is referred to any of prefix, stem, or suffix. As a result, a large vocabulary ASR system is built on the basis of Julius system. Several ASR results on language models based on different units (word, morpheme, and syllable) are compared.
information sciences, signal processing and their applications | 2012
Kurban Ubul; Andy Adler; Gulirana Abliz; Maimaitijiang Yasheng; Askar Hamdulla
Many techniques have been published on handwriting signature recognition, but none of these techniques presented are about Uyghur handwritten signature due to its complex nature. In this paper, we propose methods for off-line signature recognition for Uyghur handwriting first time. The signature images were pre-processed based on the nature of Uyghur signature. The preprocessing included noise reduction, binarization and normalization. Then multi-dimensional modified grid information features were extracted according to the character of Uyghur signature and its writing style. Finally, three kinds of classification techniques were used: Euclidean distance (ED) classifier, K nearest neighbor (K-NN) classifier and Bayes classifier. Experiments were performed using Uyghur signature samples from 50 different people with 1000 signatures. A promising result of 93.53% average correct recognition rate was achieved.
international conference on computer science and education | 2009
Kurban Ubul; Askar Hamdulla; Alim Aysa
Digital Signal Processing (DSP) is an important and growing subject area in Electrical/Computer Engineering (ECE), Computer Science and other Engineering/Science disciplines. Since 1997, the authors have taught an undergraduate DSP courses at Xinjiang University (XJU). While the subject of DSP has become very popular with ECE students and with the growing DSP job market, the subject matter is still considered to be a difficult and complex one for students. This paper presents an approach to teaching discrete-time (or digital) DSP using a speech analysis software, Praat. The authors of XJU had enhanced the learning experience for their students by adding the software to their class offering to reduce the difficulty of understanding the theoretical DSP.
Speech Communication | 2014
Mijit Ablimit; Tatsuya Kawahara; Askar Hamdulla
Abstract For automatic speech recognition (ASR) of agglutinative languages, selection of a lexical unit is not obvious. The morpheme unit is usually adopted to ensure sufficient coverage, but many morphemes are short, resulting in weak constraints and possible confusion. We propose a discriminative approach for lexicon optimization that directly contributes to ASR error reduction by taking into account not only linguistic constraints but also acoustic–phonetic confusability. It is based on an evaluation function for each word defined by a set of features and their weights, which are optimized by the difference in word error rates (WERs) between ASR hypotheses obtained by the morpheme-based model and those by the word-based model. Then, word or sub-word entries with higher evaluation scores are selected to be added to the lexicon. We investigate several discriminative models to realize this approach. Specifically, we implement it with support vector machines (SVM), logistic regression (LR) model as well as the simple perceptron algorithm. This approach was successfully applied to an Uyghur large-vocabulary continuous speech recognition system, resulting in a significant reduction of WER with a modest lexicon size and a small out-of-vocabulary rate. The use of SVM for a sub-word lexicon results in the best performance, outperforming the word-based model as well as conventional statistical concatenation approaches. The proposed learning approach is realized in an unsupervised manner because it does not require correct transcription for training data.
Wuhan University Journal of Natural Sciences | 2012
Mayire Ibrayim; Askar Hamdulla
Based on the analysis of the unique shapes and writing styles of Uyghur characters, we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on its modules. In the preprocessing procedure, we use the linear and nonlinear normalization based on dot density method. Both structural and statistical features are extracted due to the fact that there are some very similar characters in Uyghur literature. In clustering analysis, we adopt the dynamic clustering algorithm based on the minimum spanning tree (MST), and use the k-nearest neighbor matching classification as classifier. The testing results of prototype system show that the recognition rates for characters of the four different types (independent, suffix, intermediate, and initial type) are 74.67%, 70.42%, 63.33%, and 72.02%, respectively; the recognition rates for the case of five candidates for those characters are 94.34%, 94.19%, 93.15%, and 95.86%, respectively. The ideas and methods used in this paper have some commonality and usefulness for the recognition of other characters that belong to Altaic languages family.
international conference on advanced language processing and web information technology | 2008
Turdi Tohti; Winira Musajan; Askar Hamdulla
The spelling errors often occur in the web pages or in the user query phrases, and the non-Unicode character coding scheme used by some of the Uyghur, Kazak, and Kyrgyz language based websites have a serious impact on recall and accuracy of Uyghur, Kazak, and Kyrgyz information retrieval system (UKKIRS). In this paper, studied and proposed the most effective solutions and ideas for above actual problems: in view of the problem of character coding varieties, proposed a character code conversion method from the non-Unicode to Unicode; For spelling errors, proposed a reconstruction and a root-expansion method based on user query phrases. The experimental results indicated that, the proposed algorithms solved well the problems mentioned above, and are very dedicated to this UKKIRS.
international symposium on signal processing and information technology | 2013
Wujiahemaiti Simayi; Mayire Ibrayim; Dilmurat Tursun; Askar Hamdulla
In this paper, the center distance feature (CDF) is presented as an efficient approach for on-line Uyghur handwritten character recognition. Based on early research for on-line Uyghur handwritten character recognition, a further research is conducted with center distance feature, abbreviated as CDF. This paper introduces the extraction of center distance feature and its three different methods such as CDF-2, CDF-4 and CDF-8 which have improved the average recognition accuracy respectively to 78.17%, 90.47% and 94.50% for the 32 isolated forms of Uyghur characters. 12800 samples from 400 different writers are participated into experiments. The system is trained using 70 percent of total samples and tested on the remained 30 percent.
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011
Mijit Ablimit; Askar Hamdulla; Tatsuya Kawahara
For large-vocabulary continuous speech recognition (LVCSR) of highly-inflected languages, selection of an appropriate recognition unit is the first important step. The morpheme-based approach is often adopted because of its high coverage and linguistic properties. But morpheme units are short, often consisting of one or two phonemes, thus they are more likely to be confused in ASR than word units. Generally, word units provide better linguistic constraint, but increases the vocabulary size explosively, causing OOV (out-of-vocabulary) and data sparseness problems in language modeling. In this research, we investigate approaches of selecting word entries by concatenating morpheme sequences, which would reduce word error rate (WER). Specifically, we compare the ASR results of the word-based model and those of the morpheme-based model, and extract typical patterns which would reduce the WER. This method has been successfully applied to an Uyghur LVCSR system, resulting in a significant reduction of WER without a drastic increase of the vocabulary size.
computational intelligence | 2009
Kurban Ubul; Dilmurat Tursun; Askar Hamdulla; Alim Aysa
This paper proposes a method for texture feature extraction by integrating Gabor filters and independent component analysis (ICA) for Uyghur handwriting based writer identification. That is, the texture image is firstly filtered by a given bank of Gabor filters, and then higher dimensional feature vectors are constructed from the filtered texture images. Next, the dimensionality of these vectors is reduced by means of principal component analysis (PCA). Finally, the independent components in the resulting vectors with dimensionality reduced are analyzed and extracted by us. Experiments were performed using KNN-5 classifier to Uyghur handwriting samples from 55 different people and promising results of 92.5% correct identification rate were achieved.