Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Enya Kong Tang is active.

Publication


Featured researches published by Enya Kong Tang.


2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009

MASS: A Malay language LVCSR corpus resource

Tien-Ping Tan; Xiong Xiao; Enya Kong Tang; Eng Siong Chng; Haizhou Li

This paper presents the development of the speech, text and pronunciation dictionary resources required to build a large vocabulary speech recognizer for the Malay language. This project is a collaboration project among three universities: USM, MMU from Malaysia and NTU from Singapore. The Malay speech corpus consists of read speech (speaker independent/ dependent and accent independent/ dependent) and broadcast news. To date, 90 speakers have been recorded which is equal to a total of nearly 70 hours of read speech, and 10 hours of broadcast news from local TV stations in Malaysia was transcribed. The text corpus consists of 700Mbytes of data extracted from Malaysias local news web pages from 1998–2008 and a rule based G2P tool is develop to generate the pronunciation dictionary.


POLIBITS | 2011

Low Cost Construction of a Multilingual Lexicon from Bilingual Lists

Lian Tze Lim; Bali Ranaivo-Malançon; Enya Kong Tang

Manually constructing multilingual translation lexicons can be very costly, both in terms of time and human effort. Although there have been many efforts at (semi-)automatically merging bilingual machine readable dictionaries to produce a multilingual lexicon, most of these approaches place quite specific requirements on the input bilingual resources. Unfortunately, not all bilingual dictionaries fulfil these criteria, especially in the case of under-resourced language pairs. We describe a low cost method for constructing a multilingual lexicon using only simple lists of bilingual translation mappings. The method is especially suitable for under-resourced language pairs, as such bilingual resources are often freely available and easily obtainable from the Internet, or digitised from simple, conventional paper-based dictionaries. The precision of random samples of the resultant multilingual lexicon is around 0.70-0.82, while coverage for each language, precision and recall can be controlled by varying threshold values. Given the very simple input resources, our results are encouraging, especially in incorporating under-resourced languages into multilingual lexical resources.


Expert Systems With Applications | 2017

Meaning preservation in Example-based Machine Translation with structural semantics

Chong Chai Chua; Tek Yong Lim; Lay-Ki Soon; Enya Kong Tang; Bali Ranaivo-Malançon

Abstract The main tasks in Example-based Machine Translation (EBMT) comprise of source text decomposition, following with translation examples matching and selection, and finally adaptation and recombination of the target translation. As the natural language is ambiguous in nature, the preservation of source text’s meaning throughout these processes is complex and challenging. A structural semantics is introduced, as an attempt towards meaning-based approach to improve the EBMT system. The structural semantics is used to support deeper semantic similarity measurement and impose structural constraints in translation examples selection. A semantic compositional structure is derived from the structural semantics of the selected translation examples. This semantic compositional structure serves as a representation structure to preserve the consistency and integrity of the input sentence’s meaning structure throughout the recombination process. In this paper, an English to Malay EBMT system is presented to demonstrate the practical application of this structural semantics. Evaluation of the translation test results shows that the new translation framework based on the structural semantics has outperformed the previous EBMT framework.


conference on information and knowledge management | 2014

Learning to Match Heterogeneous Structures using Partially Labeled Data

Saravadee Sae Tan; Tek Yong Lim; Lay-Ki Soon; Enya Kong Tang

This paper addresses the problem of matching between highly heterogeneous structures. The problem is modeled as a classification task where training examples are used to learn the matching between structures. In our approach, training is performed using partially labeled data. We propose a Greedy Mapping approach to generate training examples from partially labeled data. Different types of structures may have different types of attributes that can be exploited to enhance the matching problem. We utilize three types of attributes, namely, text content, structure name and path correspondence, in the matching problem. Experiments are performed on two types of structures: semantic domain and semantic role. We evaluate the effectiveness of the Greedy Mapping as well as the performance on different types of attributes. Finally, the results are presented and discussed.


international conference on computational linguistics | 2013

Semi-automatic acquisition of two-level morphological rules for iban language

Suhaila Saee; Lay-Ki Soon; Tek Yong Lim; Bali Ranaivo-Malançon; Enya Kong Tang

We describe in this paper a semi-automatic acquisition of morphological rules for morphological analyser in the case of under-resourced language, which is Iban language. We modify ideas from previous automatic morphological rules acquisition approaches, where the input requirements has become constraints to develop the analyser for under-resourced language. This work introduces three main steps in acquiring the rules from the under-resourced language, which are morphological data acquisition, morphological information validation and morphological rules extraction. The experiment shows that this approach gives successful results with 0.76 of precision and 0.99 of recall. Our findings also suggest that the availability of linguistic references and the selection of assorted techniques for morphology analysis could lead to the design of the workflow. We believe this workflow will assist other researchers to build morphological analyser with the validated morphological rules for the under-resourced languages.


international conference on asian language processing | 2014

Automatic acquisition of morphological resources for Melanau language

Suhaila Saee; Lay-Ki Soon; Tek Yong Lim; Bali Ranaivo-Malançon; Jovianna Juk; Enya Kong Tang

Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.


international conference on asian language processing | 2012

From Raw Text to Morphological Rules for Iban Morphological Analyser

Suhaila Saee; Lay-Ki Soon; Tek-Yong Lim; Bali Ranaivo-Malançon; Enya Kong Tang

To extend a complete workflow of automatic acquisition of morphological rules for morphological analyser, we propose a semi-automatic workflow for under-resourced language, which is Iban language. The workflow focuses in determining the rules to be used for building Iban morphological analyser without prior knowledge of language-specific morphological rules. This work introduces three main steps in acquiring the rules from the under-resourced language, which are morphological rules extraction, validation of the extracted rules and evaluation of the generated rules. From the proposed workflow, 25 rules were generated from 744 rules candidate. This work has achieved 76% of precision and 99% of recall. We believe the workflow will assist other researchers to build morphological analyser with the validated morphological rules for the under-resourced languages.


2011 International Conference on Semantic Technology and Information Retrieval | 2011

Modeling semantic correspondence in heterogeneous structured document collection

Saravadee Sae Tan; Enya Kong Tang; Bali Ranaivo-Malançon; Gian Chand Sodhy

On the web, most structured document collections consist of documents from different sources and marked up with different types of structures. The diversity of structures has led to the emergence of heterogeneous structured documents. The heterogeneity of structured documents is one of the reason for query-document mismatch in structured document retrieval. In structured document retrieval, a user is assumed to have intimate knowledge of the document structures and is able to specify contextual constraints in their queries. However, it is impossible for the user to know all structures in heterogeneous structured document collections. In this paper, we propose to include similar correspondence relations in the representation model for structured document retrieval. The similar correspondences make the relations between similar contents explicit in order to improve structured document retrieval effectiveness. We introduce a generic and flexible structured document model to represent heterogeneous structured documents as well as the similar correspondences in the document collections. We also illustrate how the proposed model can be utilized in structured document retrieval.


european conference on research and advanced technology for digital libraries | 2000

Effects of Cognitive and Problem Solving Style on Internet Search Tool

Tek Yong Lim; Enya Kong Tang

This paper presents a research proposal on user-oriented evaluation method to compare the usability of Internet search tools. Cognitive style and problem solving style are identified individual difference factors. Meta-search, portal and individual search engines are Internet search tool available. Usability of each search tools based on relevancy and satisfaction is another factor of this study. The ultimate aim of the research is to contribute to the knowledge concerning individual differences and information retrieval technology. In particular we hope to get a better understanding of which presentation structures and user interface attributes work best and why.


Expert Systems With Applications | 2016

Learning to extract domain-specific relations from complex sentences

Saravadee Sae Tan; Tek Yong Lim; Lay-Ki Soon; Enya Kong Tang

Collaboration


Dive into the Enya Kong Tang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tek Yong Lim

Universiti Sains Malaysia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tien-Ping Tan

Universiti Sains Malaysia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge