Rachel Edita Roxas
De La Salle University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rachel Edita Roxas.
natural language processing and knowledge engineering | 2009
Tin Tin Cheng; Jeffrey Leonard Cua; Mark Davies Tan; Kenneth Gerard Yao; Rachel Edita Roxas
Legal TRUTHS (TuRning Unstructured Texts to Helpful Structure) is a system that extracts relevant information from Philippine Supreme Court decisions, specifically on criminal cases. We describe here the processes involved in the development of Legal TRUTHS focusing on the issues relating to the domain and the geographical setting of the source documents, and the performance evaluation results are also presented. Pertinent information to be extracted for criminal cases such as the crime, the date and time of commission, the plaintiff, and the penalty were determined from a sample set of documents. Sections of these documents were identified for initial segmentation of the data. Automatic filtering of the data was involved in drawing out relevant information from the texts. From 25 training documents and also the same set for testing, performance showed over-all precision at 91.7%, recall at 99.5%, and F-measure at 95.6%. Testing on another 50 documents showed over-all precision at 84.3%, recall at 95.8%, and F-measure at 91.0%.
natural language processing and knowledge engineering | 2009
Nathalie Rose; T. Lim; Patrick Saint-Dizier; Rachel Edita Roxas
Comparative and evaluative question answering (QA) systems provide objective answers to questions that involve comparisons and evaluations based on a quantifiable set of criteria. As evaluations involve inferences and computations, answers are not lifted from source text. This entails the need for correct semantic interpretation of comparative expressions, converting them to quantifiable criteria before data can be obtained from source text, processing these information, and formulating natural language answers from the result of the processing. As business intelligence (BI) requires comparisons and interpretations of seemingly unrelated facts, a QA system for this domain would be beneficial. This paper presents a study of some comparative and evaluative questions that are raised in the domain of business intelligence. How these questions are processed is also discussed.
Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2009) | 2009
Nathalie Rose Lim; Patrick Saint-Dizier; Rachel Edita Roxas
Comparative and evaluative question answering (QA) requires a detailed semantic analysis of comparative expressions and complex processing. Semantics of predicates from questions have to be translated to quantifiable criteria before extraction of information can be done. This paper presents some challenges faced in answering comparative and evaluative questions. An application on the domain of business intelligence is discussed.
international conference oriental cocosda held jointly with conference on asian spoken language research and evaluation | 2013
Nathaniel Oco; Rachel Edita Roxas; Joel Ilao
In this study, we present Dices coefficient on trigram profiles as metric for language similarity. As testbed, we focused on eight Philippine languages. No known language similarity value for these languages exists. Documents containing transcribed audio recordings, news articles, religious and literary texts were taken from an online corpus and used as training data. Character trigram profiles were then generated using an n-gram generator and language similarity was computed. The results were matched against those reported in the literature and against the language family tree. To evaluate the metric, it was applied to five languages with known similarity values. The results were then compared with an existing lexical similarity metric. The average difference is 27%. Analyses of the results reveal that phonetic spelling play an important role in language similarity. As future work, the metric can be used on phonetic transcriptions.
Proceedings of the 7th Workshop on Asian Language Resources | 2009
Rachel Edita Roxas; Charibeth Cheng; Nathalie Rose Lim
We present the diverse research activities on Philippine languages from all over the country, with focus on the Center for Language Technologies of the College of Computer Studies, De La Salle University, Manila, where majority of the work are conducted. These projects include the formal representation of Philippine languages and the processes involving these languages. Language representation entails the manual and automatic development of language resources such as lexicons and corpora for various human languages including Philippine languages, across various forms such as text, speech and video files. Tools and applications on languages that we have worked on include morphological processes, part of speech tagging, language grammars, machine translation, sign language processing and speech systems. Future directions are also presented.
information technology based higher education and training | 2006
Rachel Edita Roxas; Nathalie Rose Lim; Natasja Gail Bautista
A system for the automatic generation of plagiarism detectors that find similar programs in a set of student programs is presented. Existing plagiarism detectors are either applied to a programming language or a pre-defined set of programming languages. The general purpose one usually employs string matching to perform similarity measures that are based on plagiarism detection among documents in general, and not in programs in particular, thus, losing much of the structure and logic of programs in the process. On the other hand, plagiarism detectors for specific languages only cater to that particular set of languages. This study provides a means for the user to specify the programming language of the student programs to be analyzed. Moreover, an automatic plagiarism detector system must be immune to the transformations that students perform on copied programs. These transformations are usually dependent on several factors namely: the type of programming problems and correspondingly, the complexity of the project to be implemented by the students, and also the programming language paradigm of the programs. Thus, the similarity measures employed by the system should be determined by these factors and can be specified by the professor. He/she has the option to specify how the similarities among the student programs will be captured. The system provides an interface for the specification of the particular programming language in which the student programs are implemented, and a knowledgebase of similarity measures that the user would like to include in the analysis of the student programs. Hence, the system provides flexibility in the programming language of the student programs to be analyzed and the similarity measures that the professor wishes to employ. Initial qualitative and quantitative evaluations illustrate a flexible, convenient and cost-effective tool for building plagiarism detectors for effective detection of programs in various imperative and procedural programming languages. The approach also addresses some of the changes that students perform on copied programs which JPlag fails to handle, thus, allowing for improved accuracy in terms of the reduction of false-positives, increasing the chance of catching plagiarized programs. These changes include modification of control structures, use of temporary variables and subexpressions, in-lining and re-factoring of methods, and redundancy (variables or methods that were not used). Comprehensive tests on other programming languages under various programming language paradigms such as object-oriented, logic and functional languages, considering the different changes that the students employ to copied programs (such as the tests done in JPlag) are also recommended for empirical evaluation
international conference on recent trends in information technology | 2013
Nathaniel Oco; Joel Ilao; Rachel Edita Roxas
Computational approaches in language identification often result in highnumber of false positivesand low recall rates, especially if the languages involved come from the same subfamily. In this paper, we aim to determine the cause of this problemby measuring language similarity through trigrams. Religious and literary texts were used as training data. Our experiments involving language identification show that the number of common trigrams for a given language pair is inversely proportional to precision and recall rates, whereas the average word length is directly proportional to the number of true positives. Future directions include improving language modeling and providing an approach to increase precision and recall.
meeting of the association for computational linguistics | 2000
Rachel Edita Roxas; Allan Borra
This is a paper that describes computational linguistic activities on Philippines languages. The Philippines is an archipelago with vast numbers of islands and numerous languages. The tasks of understanding, representing and implementing these languages require enormous work. An extensive amount of work has been done on understanding at least some of the major Philippine languages, but little has been done on the computational aspect. Majority of the latter has been on the purpose of machine translation.
international conference on humanoid nanotechnology information technology communication and control environment and management | 2014
Nicco Nocon; Nathaniel Oco; Joel Ilao; Rachel Edita Roxas
Communication between different nations is essential. Languages which are foreign to another impose difficulty in understanding. For this problem to be resolved, options are limited to learning the language, having a dictionary as a guide, or making use of a translator. This paper discusses the development of ASEANMT-Phil, a phrase-based statistical machine translator, to be utilized as a tool beneficial for assisting ASEAN countries. The data used for training and testing came from Wikipedia articles comprising of 124,979 and 1,000 sentence pairs, respectively. ASEANMT-Phil was experimented on different settings producing the BLEU score of 32.71 for Filipino-English and 31.15 for English-Filipino. Future Directions for the translator includes the following: improvement of data through changing or adding the domain or size; implementing an additional approach; and utilizing a larger dictionary to the approach.
language resources and evaluation | 2008
Rachel Edita Roxas; Allan Borra; Charibeth Cheng; Nathalie Rose Lim; Ethel Ong; Michelle Wendy Tan
In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has been on theoretical linguistics and little is done on the computational aspects of these languages, attempts are reported here on the manual construction of these language resources such as the grammar, lexicon, morphological information, and the corpora which were literally built from almost non-existent digital forms. Due to the inherent difficulties of manual construction, we also discuss our experiments on various technologies for automatic extraction of these resources to handle the intricacies of the Filipino language, designed with the intention of using them for the MT system. To implement the different MT engines and to ensure the improvement of translation quality, other language tools (such as the morphological analyzer and generator, and the part of speech tagger) were developed.