Rayner Alfred
Universiti Malaysia Sabah
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rayner Alfred.
International Journal of Machine Learning and Computing | 2014
Rayner Alfred; Leow Chin Leong; Chin Kim On; Patricia Anthony
A Named-Entity Recognition (NER) is part of the process in Text Mining and it is a very useful process for information extraction. This NER tool can be used to assist user in identifying and detecting entities such as person, location or organization. However, different languages may have different morphologies and thus require different NER processes. For instance, an English NER process cannot be applied in processing Malay articles due to the different morphology used in different languages. This paper proposes a Rule-Based Named-Entity Recognition algorithm for Malay articles. The proposed Malay NER is designed based on a Malay part-of-speech (POS) tagging features and contextual features that had been implemented to handle Malay articles. Based on the POS results, proper names will be identified or detected as the possible candidates for annotation. Besides that, there are some symbols and conjunctions that will also be considered in the process of identifying named-entity for Malay articles. Several manually constructed dictionaries will be used to handle three named-entities; Person, Location and Organizations. The experimental results show a reasonable output of 89.47% for the F-Measure value. The proposed Malay NER algorithm can be further improved by having more complete dictionaries and refined rules to be used in order to identify the correct Malay entities system.
advanced data mining and applications | 2006
Rayner Alfred; Dimitar Kazakov
A new approach is needed to handle huge dataset stored in multiple tables in a very-large database. Data mining and Knowledge Discovery in Databases (KDD) promise to play a crucial role in the way people interact with databases, especially decision support databases where analysis and exploration operations are essential. In this paper, we present related works in Relational Data Mining, define the basic notions of data mining for decision support and the types of data aggregation as a means of categorizing or summarizing data. We then present a novel approach to relational domain learning to support the development of decision making models by introducing automated construction of hierarchical multi-attribute model for decision making. We will describe how relational dataset can naturally be handled to support the construction of hierarchical multi-attribute model by using relational aggregation based on pattern’s distance. In this paper, we presents the prototype of “Dynamic Aggregation of Relational Attributes” (hence called DARA) that is capable of supporting the construction of hierarchical multi-attribute model for decision making. We experimentally show these results in a multi-relational domain that shows higher percentage of correctly classified instances and illustrate set of rules extracted from the relational domains to support decision-making.
ieee international conference on control system, computing and engineering | 2013
Chang Sim Vui; Gan Kim Soon; Chin Kim On; Rayner Alfred; Patricia Anthony
Stock market is a promising financial investment that can generate great wealth. However, the volatile nature of the stock market makes it a very high risk investment. Thus, a lot of researchers have contributed their efforts to forecast the stock market pricing and average movement. Researchers have used various methods in computer science and economics in their quests to gain a piece of this volatile information and make great fortune out of the stock market investment. This paper investigates various techniques for the stock market prediction using artificial neural network (ANN). The aim of this paper is to provide a review of the applications of ANN in stock market prediction in order to determine what can be done in the future.
international conference hybrid intelligent systems | 2011
Joe Henry Obit; Djamila Ouelhadj; Dario Landa-Silva; Teong Khan Vun; Rayner Alfred
This paper proposes tackling the difficult course timetabling problem using a multi-agent approach. The proposed design seeks to deal with the problem using a distributed solution environment in which a mediator agent coordinates various timetabling agents that cooperate to improve a common global solution. Initial timetables provided to the multi-agent system are generated using several hybrid heuristics that combine graph colouring heuristics and local search in different ways. The hybrid heuristics are capable of generating feasible timetables for all instances of the two sets of benchmark problems used here. We discuss how these initialisation hybrid heuristics can be incorporated into the proposed multi-agent approach in order to conduct distributed timetabling. This preliminary work serves as a solid basis towards the design of an effective multi-agent distributed timetabling system.
asia international conference on modelling and simulation | 2008
Rayner Alfred
This paper addresses the question whether or not the descriptive accuracy of the DARA (Dynamic Aggregation of Relational Attributes) algorithm benefits from the feature construction process. This involves solving the problem of constructing a set of relevant features used to generate patterns representing records in the TF-IDF weighted frequency matrix in order to cluster these records. In this paper, feature construction will be applied to enhance the results of the data summarisation approach in learning data stored in multiple tables with high cardinality of one-to-many relations. It is expected that the predictive accuracy of a classfication problem can be improved by improving the descriptive accuracy of the data summarisation approach, provided that the summarised data is fed into the target table as one of the features considered in the classification task.
Journal of Advances in Computer Networks | 2014
Haviluddin; Rayner Alfred
This paper presents an approach for a network traffic characterization by using an ARIMA (Autoregressive Integrated Moving Average) technique. The dataset used in this study is obtained from the internet network traffic activities of the Mulawarman University for a period of a week. The results are obtained using the Box-Jenkins Methodology. The Box-Jenkins methodology consists of five ARIMA models which include ARIMA (2, 1, 1) (1, 1, 1) ¹², ARIMA (1, 1, 1) (1, 1, 1) ¹², ARIMA (2, 1, 0) (1, 1, 1) ¹², ARIMA (0, 1, 0) (1, 1, 1) ¹², and ARIMA (0, 1, 0) (1, 2, 1) ¹². In this paper, ARIMA (0, 1, 0) (1, 2, 1) ¹² was selected as the best model that can be used to model the internet network traffic.
advances in databases and information systems | 2007
Rayner Alfred; Dimitar Kazakov
Handling numerical data stored in a relational database is different from handling those numerical data stored in a single table due to the multiple occurrences of an individual record in the non-target table and nondeterminate relations between tables. Most traditional data mining methods only deal with a single table and discretize columns that contain continuous numbers into nominal values. In a relational database, multiple records with numerical attributes are stored separately from the target table, and these records are usually associated with a single structured individual stored in the target table. Numbers in multi-relational data mining (MRDM) are often discretized, after considering the schema of the relational database, in order to reduce the continuous domains to more manageable symbolic domains of low cardinality, and the loss of precision is assumed to be acceptable. In this paper, we consider different alternatives for dealing with continuous attributes in MRDM. The discretization procedures considered in this paper include algorithms that do not depend on the multi-relational structure of the data and also that are sensitive to this structure. In this experiment, we study the effects of taking the one-to-many association issue into consideration in the process of discretizing continuous numbers. We implement a new method of discretization, called the entropy-instance-based discretization method, and we evaluate this discretization method with respect to C4.5 on three varieties of a well-known multirelational database (Mutagenesis), where numeric attributes play an important role. We demonstrate on the empirical results obtained that entropy-based discretization can be improved by taking into consideration the multiple-instance problem.
asian conference on intelligent information and database systems | 2013
Rayner Alfred; Adam Mujat; Joe Henry Obit
The Malay language is an Austronesian language spoken in most countries in the South East Asia region that includes Malaysia, Indonesia, Singapore, Brunei and Thailand. Traditional linguistics is well developed for Malay but there are very limited resources and tools that are available or made accessible for computer linguistic analysis of Malay language. Assigning part of speech (POS) to running words in a sentence for Malay language is one of the pipeline processes in Natural Language Processing (NLP) tasks and it is not well investigated. This paper outlines an approach to perform the Part of Speech (POS) tagging for Malay text articles. We apply a simple Rule-based Part of Speech (RPOS) tagger to perform the tagging operation on Malay text articles. POS tagging can be described as a task of performing automatic annotation of syntactic categories for each word in a text document. A rule-based POS tagger generally involves a POS tag dictionary and a set of rules in order to identify the words that are considered parts of speech. In this paper, we propose a framework that applies Malay affixing rules to identify the Malay POS tag and the relation between words in order to select the best POS tag for words that have two or more valid POS tags. The results show that the performance accuracy of the ruled-based POS tagger is higher compared to a statistical POS tagger. This indicates that the proposed RPOS tagger is able to predict any unknown words POS at some promising accuracy.
data mining and optimization | 2011
Mohd Shamrie Sainin; Rayner Alfred
Feature selection for data mining optimization receives quite a high demand especially on high-dimensional feature vectors of a data. Feature selection is a method used to select the best feature (or combination of features) for the data in order to achieve similar or better classification rate. Currently, there are three types of feature selection methods: filter, wrapper and embedded. This paper describes a genetic based wrapper approach that optimizes feature selection process embedded in a classification technique called a supervised Nearest Neighbour Distance Matrix (NNDM). This method is implemented and tested on several datasets obtained from the UCI Machine Learning Repository and other datasets. The results demonstrate a significant impact on the predictive accuracy for feature selection combined with the supervised NNDM in classifying new instances. Therefore it can be used in other applications that require feature dimension reduction such as image and bioinformatics classifications.
computational intelligence | 2010
Rayner Alfred
The importance of input representation has been recognized already in machine learning. This article discusses the application of genetic‐based feature construction methods to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). Here, feature construction methods are applied to improve the descriptive accuracy of the DARA algorithm. The DARA algorithm is designed to summarize data stored in the nontarget tables by clustering them into groups, where multiple records stored in nontarget tables correspond to a single record stored in a target table. This article addresses the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. This involves solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic‐based algorithm. This work also evaluates several scoring measures used as fitness functions to find the best set of constructed features.