Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Virach Sornlertlamvanich is active.

Publication


Featured researches published by Virach Sornlertlamvanich.


Proceedings of the fifth international workshop on on Information retrieval with Asian languages | 2000

Character cluster based Thai information retrieval

Thanaruk Theeramunkong; Virach Sornlertlamvanich; Thanasan Tanhermhong; Wirat Chinnan

Some languages including Thai, Japanese and Chinese do not have explicit word boundary. This causes the problem of word boundary ambiguity that results in decreasing the accuracy of information retrieval. This paper proposes a new technique so-called character clustering to reduce the ambiguity of word boundary in Thai documents and hence improve searching efficiency. To investigate the efficiency, a set of experiments using Thai newspapers is conducted in both non-indexing and indexing searching approaches. The experimental results show our method outperform the traditional methods in both non-indexing and indexing approaches in all measures.


meeting of the association for computational linguistics | 2006

Infrastructure for Standardization of Asian Language Resources

Takenobu Tokunaga; Virach Sornlertlamvanich; Thatsanee Charoenporn; Nicoletta Calzolari; Monica Monachini; Claudia Soria; Chu-Ren Huang; Yingju Xia; Hao Yu; Laurent Prévot; Kiyoaki Shirai

As an area of great linguistic and cultural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language resources that is compatible with an international standard has at least three strong advantages: to increase the competitive edge of Asian countries, to bring Asian countries to closer to their western counterparts, and to bring more cohesion among Asian countries. To achieve this goal, we have launched a two year project to create a common standard for Asian language resources. The project is comprised of four research items, (1) building a description framework of lexical entries, (2) building sample lexicons, (3) building an upper-layer ontology and (4) evaluating the proposed framework through an application. This paper outlines the project in terms of its aim and approach.


international conference on computational linguistics | 2002

Improving translation quality of rule-based machine translation

Paisarn Charoenpornsawat; Virach Sornlertlamvanich; Thatsanee Charoenporn

This paper proposes machine learning techniques, which help disambiguate word meaning. These methods focus on considering the relationship between a word and its surroundings, described as context information in the paper. Context information is produced from rule-based translation such as part-of-speech tags, semantic concept, case relations and so on. To automatically extract the context information, we apply machine learning algorithms which are C4.5, C4.5rule and RIPPER. In this paper, we test on ParSit, which is an interlingual-based machine translation for English to Thai. To evaluate our approach, an verb-to-be is selected because it has increased in frequency and it is quite difficult to be translated into Thai by using only linguistic rules. The result shows that the accuracy of C4.5, C4.5rule and RIPPER are 77.7%, 73.1% and 76.1% respectively whereas ParSit give accuracy only 48%.


Proceedings of the 7th Workshop on Asian Language Resources | 2009

Thai WordNet Construction

Sareewan Thoongsup; Thatsanee Charoenporn; Kergrit Robkop; Tan Sinthurahat; Chumpol Mokarat; Virach Sornlertlamvanich; Hitoshi Isahara

This paper describes semi-automatic construction of Thai WordNet and the applied method for Asian wordNet. Based on the Princeton WordNet, we develop a method in generating a WordNet by using an existing bi-lingual dictionary. We align the PWN synset to a bilingual dictionary through the English equivalent and its part-of-speech (POS), automatically. Manual translation is also employed after the alignment. We also develop a web-based collaborative workbench, called KUI (Knowledge Unifying Initiator), for revising the result of synset assignment and provide a framework to create Asian WordNet via the linkage through PWN synset.


north american chapter of the association for computational linguistics | 2003

A context-sensitive homograph disambiguation in Thai text-to-speech synthesis

Virongrong Tesprasit; Paisarn Charoenpornsawat; Virach Sornlertlamvanich

Homograph ambiguity is an original issue in Text-to-Speech (TTS). To disambiguate homograph, several efficient approaches have been proposed such as part-of-speech (POS) n-gram, Bayesian classifier, decision tree, and Bayesian-hybrid approaches. These methods need words or/and POS tags surrounding the question homographs in disambiguation. Some languages such as Thai, Chinese, and Japanese have no word-boundary delimiter. Therefore before solving homograph ambiguity, we need to identify word boundaries. In this paper, we propose a unique framework that solves both word segmentation and homograph ambiguity problems altogether. Our model employs both local and long-distance contexts, which are automatically extracted by a machine learning technique called Winnow.


Archive | 2011

Knowledge, Information and Creativity Support Systems

Thanaruk Theeramunkong; Susumu Kunifuji; Virach Sornlertlamvanich; Cholwich Nattee

This volume consists of a number of selected papers that were presented at the 9th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2014) in Limassol, Cyprus, after they were substantially revised and extended. The 26 regular papers and 19 short papers included in this proceedings cover all aspects of knowledge management, knowledge engineering, intelligent information systems, and creativity in an information technology context, including computational creativity and its cognitive and collaborative aspects.


meeting of the association for computational linguistics | 2000

The state of the art in Thai language processing

Virach Sornlertlamvanich; Tanapong Potipiti; Chai Wutiwiwatchai; Pradit Mittrapiyanuruk

This paper reviews the current state of technology and research progress in the Thai language processing. It resumes the characteristics of the Thai language and the approaches to overcome the difficulties in each processing task.


international conference on computational linguistics | 1996

The automatic extraction of open compounds from text corpora

Virach Sornlertlamvanich; Hozumi Tanaka

This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit unexplicit word segmentation. Without applying word segmentation techniques to the inputted plain text, we generate n-gram data from it. We then count the occurence of each string and sort them in alphabetical order. It is significant that the frequency of occurrence of strings decreases when the window size of observation is extended. From the statistical point of view, a word is a string with a fixed pattern that is used repeatedly, meaning that it should occur with a higher frequency than a string that is not a word. We observe the variation of frequency of the sorted n-gram data and extract the strings that experience a significant change in frequency of occurrence when their length is extended. We apply this occurrence test to both the right and left hand sides of all strings to ensure the accurate detection of both boundaries of the string. The method returned satisfying results regardless of the size of the input file.


computer and information technology | 2009

Applying Collective Intelligence for Search Improvement on Thai Herbal Information

Verayuth Lertnattee; Sinthop Chomya; Thanaruk Theeramunkong; Virach Sornlertlamvanich

Knowledge about herbal medicine can be contributed from experts in several cultures. With the conventional techniques, it is hard to find the way which the experts can build a self-sustainable community for exchanging their information. In this paper, the Knowledge Unifying Initiator for Herbal Information (KUIHerb) is used as a platform for building a web community for collecting the intercultural herbal knowledge with the concept of a collective intelligence. With this system, herb identification, herbal vocabulary and medicinal usages can be collected from this system. KUIHerb provides herbal vocabulary which is dynamically and confidentially applied for searching improvement on the Thai herbal search engine. Three strategies are utilized: (1) providing a set of technical terms in Thai with can be added into the dictionary. These terms are utilized by Thai word segmentation for improving the indexing process (2) A set of synonyms of these technical terms in both Thai and English is built for helping users from a lot of keywords of the same term and (3) a set of keywords from herbal usages can be combined with the name keyword. From the results, information collected from KUIHerb is useful for searching.


IEICE Transactions on Information and Systems | 2006

Construction of Thai Lexicon from Existing Dictionaries and Texts on the Web

Thatsanee Charoenporn; Canasai Kruengkrai; Thanaruk Theeramunkong; Virach Sornlertlamvanich

A lexicon is an important linguistic resource needed for both shallow and deep language processing. Currently, there are few machine-readable Thai dictionaries available, and most of them do not satisfy the computational requirements. This paper presents the design of a Thai lexicon named the TCLs Computational Lexicon (TCLLEX) and proposes a method to construct a large-scale Thai lexicon by re-using two existing dictionaries and a large number of texts on the Internet. In addition to morphological, syntactic, semantic case role and logical information in the existing dictionaries, a sort of semantic constraint called selectional preference is automatically acquired by analyzing Thai texts on the web and then added into the lexicon. In the acquisition process of the selectional preferences, the so-called Bayesian Information Criterion (BIC) is applied as the measure in a tree cut model. The experiments are done to verify the feasibility and effectiveness of obtained selection preferences.

Collaboration


Dive into the Virach Sornlertlamvanich's collaboration.

Researchain Logo
Decentralizing Knowledge