Bali Ranaivo-Malançon
Universiti Malaysia Sarawak
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bali Ranaivo-Malançon.
international conference on asian language processing | 2009
Ee-Lee Ng; Alvin W. Yeo; Bali Ranaivo-Malançon
The main focus of this study is to identify the closely related languages amongst the indigenous languages of Sarawak and major languages such as Bahasa Melayu and English. The indigenous languages involved in this study include Iban (standard), Bidayuh (Bau-Jagoi), Kelabit (Bario), Melanau (Matu-Daro), Sa’ban (Long Banga) and Penan (East Baram). The relationship between the languages is established via the proportion of cognates in the Swadesh list of the language pairs. The orthographic approach, which primarily examines the spelling of the vocabulary words, is used. The outcome of this study reveals that some indigenous languages are more closely related to to Bahasa Melayu than others. The findings from this research serve as an initial solution to answer the greater challenges in computational linguistics, such as the use of closely related languages as Pivot solutions in problems related to the under-resourced languages.
POLIBITS | 2011
Lian Tze Lim; Bali Ranaivo-Malançon; Enya Kong Tang
Manually constructing multilingual translation lexicons can be very costly, both in terms of time and human effort. Although there have been many efforts at (semi-)automatically merging bilingual machine readable dictionaries to produce a multilingual lexicon, most of these approaches place quite specific requirements on the input bilingual resources. Unfortunately, not all bilingual dictionaries fulfil these criteria, especially in the case of under-resourced language pairs. We describe a low cost method for constructing a multilingual lexicon using only simple lists of bilingual translation mappings. The method is especially suitable for under-resourced language pairs, as such bilingual resources are often freely available and easily obtainable from the Internet, or digitised from simple, conventional paper-based dictionaries. The precision of random samples of the resultant multilingual lexicon is around 0.70-0.82, while coverage for each language, precision and recall can be controlled by varying threshold values. Given the very simple input resources, our results are encouraging, especially in incorporating under-resourced languages into multilingual lexical resources.
Expert Systems With Applications | 2017
Chong Chai Chua; Tek Yong Lim; Lay-Ki Soon; Enya Kong Tang; Bali Ranaivo-Malançon
Abstract The main tasks in Example-based Machine Translation (EBMT) comprise of source text decomposition, following with translation examples matching and selection, and finally adaptation and recombination of the target translation. As the natural language is ambiguous in nature, the preservation of source text’s meaning throughout these processes is complex and challenging. A structural semantics is introduced, as an attempt towards meaning-based approach to improve the EBMT system. The structural semantics is used to support deeper semantic similarity measurement and impose structural constraints in translation examples selection. A semantic compositional structure is derived from the structural semantics of the selected translation examples. This semantic compositional structure serves as a representation structure to preserve the consistency and integrity of the input sentence’s meaning structure throughout the recombination process. In this paper, an English to Malay EBMT system is presented to demonstrate the practical application of this structural semantics. Evaluation of the translation test results shows that the new translation framework based on the structural semantics has outperformed the previous EBMT framework.
international conference on computational linguistics | 2013
Suhaila Saee; Lay-Ki Soon; Tek Yong Lim; Bali Ranaivo-Malançon; Enya Kong Tang
We describe in this paper a semi-automatic acquisition of morphological rules for morphological analyser in the case of under-resourced language, which is Iban language. We modify ideas from previous automatic morphological rules acquisition approaches, where the input requirements has become constraints to develop the analyser for under-resourced language. This work introduces three main steps in acquiring the rules from the under-resourced language, which are morphological data acquisition, morphological information validation and morphological rules extraction. The experiment shows that this approach gives successful results with 0.76 of precision and 0.99 of recall. Our findings also suggest that the availability of linguistic references and the selection of assorted techniques for morphology analysis could lead to the design of the workflow. We believe this workflow will assist other researchers to build morphological analyser with the validated morphological rules for the under-resourced languages.
Knowledge Technology Week | 2011
Jason Yong-Jin Tee; Lay-Ki Soon; Bali Ranaivo-Malançon
This paper presents an approach to find associations between Web documents using collocated word pairs. Given two Web documents which are connected via a hyperlink, we attempt to find the contextual association of these two Web pages by using collocations of word pairs from a statistical point of view. Our preliminary experimental results show that our approach is able to extract fairly coherent word pairs to derive associations between hyperlinked Web documents.
international conference on electronics and information engineering | 2010
Hossein Shahsavand Baghdadi; Bali Ranaivo-Malançon
Capturing specific data among an HTML file and encapsulate it somehow to be usable for other tools, is a significant challenge in web mining. This paper is going to introduce HT2X[ML] which is a tool to extract customized information from HTML files in both user-customized and automatic way and convert them into well-formed XML and plain text format. The result would be suitable to use by other tools in any purposes.
computer and information technology | 2017
Abrar Noor Akramin Kamarudin; Bali Ranaivo-Malançon; Nadianatra Musa
Recently, the Internet is accessible by the children in the rural area. The purpose of this paper is to understand what affects their true information findings based on the analysis of the survey. Data collection is obtained from the secondary school students (N=237) in Serian, a rural area of Sarawak. A self-administered questionnaire was conducted in three public secondary schools to obtain a socio-demographic profile, language usage, medium of assistance required, Internet access platforms and Internet content exposure. Logistic regression model tested the parents educational level, Internet availability at home, English usage when online and mobile device easiness for learning in explaining the respondents ability to find true information from the Internet. The focus group is able to obtain true information from the Internet if they are able to use English as their main Internet language as well as other factors. Based on the findings, an initial system framework is designed to personalize the childrens Internet access.
Archive | 2015
Daniel Yong Wen Tan; Bali Ranaivo-Malançon; Narayanan Kulathuramaiyer
Searching for information inside a repository of digitised historical documents is a very common task. A timeline interface that represents the historical content which can perform the same search function will reveal better results to researchers. This paper presents the integration of SIMILE Timeline within a wiki, named Wiki SaGa, containing digitised version of Sarawak Gazette. The proposed approach allows display of events and relevant information search compared to traditional list of documents.
international conference on asian language processing | 2014
Bali Ranaivo-Malançon; Suhaila Saee; Jennifer Fiona Wilfred Busu
The goal of the project presented in this paper is to explore the linguistic knowledge hidden in printed dictionaries of minority languages. Firstly, the printed dictionary has to be converted into a machine readable dictionary. The second step is to make use of existing language processing tools to discover the hidden knowledge. To illustrate the proposed idea, a version of an English-Penan dictionary is used as the case-study. It appears that even with a small amount of data, some interesting information, like the first list of functional words, some collocations, and an insight of the morphological structure of the Penan language can be discovered.
international conference on asian language processing | 2014
Suhaila Saee; Lay-Ki Soon; Tek Yong Lim; Bali Ranaivo-Malançon; Jovianna Juk; Enya Kong Tang
Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.