Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yating Yang is active.

Publication


Featured researches published by Yating Yang.


secure web services | 2010

An ontology-based semantic retrieval model for Uyghur search engine

Bo Ma; Yating Yang; Xi Zhou; Junlin Zhou

In recent years, semantic search has been successfully used for information retrieval, however, it is still in the early stage, and there is not semantic search engine for minorities in China. In this paper, we propose a semantic retrieval model which is comprised of resources collection, semantic annotation, query analysis and results ranking. For semantic annotation, we use domain ontology and two bilingual dictionaries to extract keywords for annotation. For query analysis, we present a method which combines lexical relationship and semantic relationship to analyse users query. And for results ranking, we propose a modulative method that ranks results based on how predictable a result might be for users, which is a blend of semantic and information-theoretic techniques. The preliminary experimental results show the capability of the proposed model to boost the precision and recall rates of webpage searching.


acm transactions on asian and low resource language information processing | 2016

A Semisupervised Tag-Transition-Based Markovian Model for Uyghur Morphology Analysis

Eziz Tursun; Debasis Ganguly; Turghun Osman; Yating Yang; Ghalip Abdukerim; Jun-Lin Zhou; Qun Liu

Morphological analysis, which includes analysis of part-of-speech (POS) tagging, stemming, and morpheme segmentation, is one of the key components in natural language processing (NLP), particularly for agglutinative languages. In this article, we investigate the morphological analysis of the Uyghur language, which is the native language of the people in the Xinjiang Uyghur autonomous region of western China. Morphological analysis of Uyghur is challenging primarily because of factors such as (1) ambiguities arising due to the likelihood of association of a multiple number of POS tags with a word stem or a multiple number of functional tags with a word suffix, (2) ambiguous morpheme boundaries, and (3) complex morphopholonogy of the language. Further, the unavailability of a manually annotated training set in the Uyghur language for the purpose of word segmentation makes Uyghur morphological analysis more difficult. In our proposed work, we address these challenges by undertaking a semisupervised approach of learning a Markov model with the help of a manually constructed dictionary of “suffix to tag” mappings in order to predict the most likely tag transitions in the Uyghur morpheme sequence. Due to the linguistic characteristics of Uyghur, we incorporate a prior belief in our model for favoring word segmentations with a lower number of morpheme units. Empirical evaluation of our proposed model shows an accuracy of about 82%. We further improve the effectiveness of the tag transition model with an active learning paradigm. In particular, we manually investigated a subset of words for which the model prediction ambiguity was within the top 20%. Manually incorporating rules to handle these erroneous cases resulted in an overall accuracy of 93.81%.


NLPCC | 2014

Detection of Loan Words in Uyghur Texts

Chenggang Mi; Yating Yang; Lei Wang; Xiao Li; Kamali Dalielihan

For low-resource languages like Uyghur, data sparseness is always a serious problem in related information processing, especially in some tasks based on parallel texts. To enrich bilingual resources, we detect Chinese and Russian loan words from Uyghur texts according to phonetic similarities between a loan word and its corresponding donor language word. In this paper, we propose a novel approach based on perceptron model to discover loan words from Uyghur texts, which consider the detection of loan words in Uyghur as a classification procedure. The experimental results show that our method is capable of detecting the Chinese and Russian loan words in Uyghur Texts effectively.


Journal of Computers | 2014

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation

Chenggang Mi; Yating Yang; Xi Zhou; Lei Wang; Xiao Li; Eziz Tursun

In statistical machine translation, large amount of unreasonable phrase pairs in a phrase table can affect the decoding efficiency and the overall translation performance, especially in Uyghur-Chinese machine translation. In this paper, we present a novel phrase table filtering model based on binary classification, which consider differences between Uyghur and Chinese, and draw lessons from binary classification in machine learning. In our model, four features are considered: 1) Difference in length between source and target phrase; 2) Proportion of translated words in phrase pairs; 3) Proportion of symbol words; 4) Average number of co-occurrence words in training corpus. We use this model to generate a filtered phrase table. Experimental results show that this new filtering model can improve the performance and efficiency of our current Uygur-Chinese machine translation system.


China Workshop on Machine Translation | 2014

Character Tagging-Based Word Segmentation for Uyghur

Yating Yang; Chenggang Mi; Bo Ma; Rui Dong; Lei Wang; Xiao Li

For effectively obtain information in Uyghur words, we present a novel method based on character tagging for Uyghur word segmentation. In this paper, we suggest five labels for characters in a Uyghur word, include: Su, Bu, Iu, Eu and Au, according to our method, we segment Uyghur words as a sequence labeling procedure, which use Conditional Random Fields (CRFs) as the basic labeling model. Experimental show that our method collect more features in Uyghur words, therefore outperform several traditional used word segmentation models significantly.


international conference on signal processing | 2010

Speech endpoint detection algorithm for Uyghur based on acoustic frequency feature

Yating Yang; Bo Ma; Osman Turghun; Xiao Li

Accurate endpoint detection is important for improving the speech recognition capability. This paper proposes an effective endpoint detection algorithm based on the acoustic frequency feature for Uyghur. The spectrum of each speech frame is divided into several sub-bands, and the maximum average spectral density of these sub-bands is used as the detection criteria to distinguish the speech and noise. At the same time, a dynamically updated threshold and a smoothing window are used to improve the performance of the algorithm. It is characterized by higher accuracy or flexibility, faster processing speed and less computation. Experimental results show that the proposed algorithm achieves better performance compared to energy based and zero-crossing rate based algorithms.


recent advances in natural language processing | 2017

Log-linear Models for Uyghur Segmentation in Spoken Language Translation

Chenggang Mi; Yating Yang; Rui Dong; Xi Zhou; Lei Wang; Xiao Li; Tonghai Jiang

To alleviate data sparsity in spoken Uyghur machine translation, we proposed a log-linear based morphological segmentation approach. Instead of learning model only from monolingual annotated corpus, this approach optimizes Uyghur segmentation for spoken translation based on both bilingual and monolingual corpus. Our approach relies on several features such as traditional conditional random field (CRF) feature, bilingual word alignment feature and monolingual suffixword co-occurrence feature. Experimental results shown that our proposed segmentation model for Uyghur spoken translation achieved 1.6 BLEU score improvements compared with the state-of-the-art baseline.


National CCF Conference on Natural Language Processing and Chinese Computing | 2017

Learning Bilingual Lexicon for Low-Resource Language Pairs

ShaoLin Zhu; Xiao Li; Yating Yang; Lei Wang; Chenggang Mi

Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.


Mathematical Problems in Engineering | 2017

Filtering Reordering Table Using a Novel Recursive Autoencoder Model for Statistical Machine Translation

Jinying Kong; Yating Yang; Lei Wang; Xi Zhou; Tonghai Jiang; Xiao Li

In phrase-based machine translation (PBMT) systems, the reordering table and phrase table are very large and redundant. Unlike most previous works which aim to filter phrase table, this paper proposes a novel deep neural network model to prune reordering table. We cast the task as a deep learning problem where we jointly train two models: a generative model to implement rule embedding and a discriminative model to classify rules. The main contribution of this paper is that we optimize the reordering model in PBMT by filtering reordering table using a recursive autoencoder model. To evaluate the performance of the proposed model, we performed it on public corpus to measure its reordering ability. The experimental results show that our approach obtains high improvement in BLEU score with less scale of reordering table on two language pairs: English-Chinese (


Journal of Information Processing Systems | 2017

Using Semantic Knowledge in the Uyghur-Chinese Person Name Transliteration

Alim Murat; Osman Turghun; Yating Yang; Xi Zhou; Lei Wang; Xiao Li

In this paper, we propose a transliteration approach based on semantic information (i.e., language origin and gender) which are automatically learnt from the person name, aiming to transliterate the person name of Uyghur into Chinese. The proposed approach integrates semantic scores (i.e., performance on language origin and gender detection) with general transliteration model and generates the semantic knowledge-based model which can produce the best candidate transliteration results. In the experiment, we use the datasets which contain the person names of different language origins: Uyghur and Chinese. The results show that the proposed semantic transliteration model substantially outperforms the general transliteration model and greatly improves the mean reciprocal rank (MRR) performance on two datasets, as well as aids in developing more efficient transliteration for named entities.

Collaboration


Dive into the Yating Yang's collaboration.

Top Co-Authors

Avatar

Xi Zhou

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lei Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xiao Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chenggang Mi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Bo Ma

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Tonghai Jiang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Rui Dong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Junlin Zhou

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jinying Kong

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Osman Turghun

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge