Amal Shehan Perera
University of Moratuwa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amal Shehan Perera.
ieee international advance computing conference | 2009
D. A. Meedeniya; Amal Shehan Perera
Wide availability of electronic data has led to the vast interest in text analysis, information retrieval and text categorization methods. To provide a better service, there is a need for non-English based document analysis and categorizing systems, as is currently available for English text documents. This study is mainly focused on categorizing Indic Language documents. The main techniques examined in this study include data pre-processing and document clustering. The approach makes use of a transformation based on the text frequency and the inverse document frequency, which enhances the clustering performance. This approach is based on Latent Semantic Analysis, k-means clustering and Gaussian Mixture Model clustering. A text corpus categorized by human readers is utilized to test the validity of the suggested approach. The technique introduced in this work enables the processing of text documents written in Sinhala, and empowers citizens and organizations to do their daily work eficiently.
acm multimedia | 2002
William Perrizo; William Jockheck; Amal Shehan Perera; Dongmei Ren; Weihua Wu; Yi Zhang
The DataSURG group at NDSU has a long-standing interest in data mining remotely sensed imagery (RSI) for agricultural, forestry and other prediction and analysis applications. A spatial data structure, the Peano count tree, was developed that provided an efficient, lossless, data mining ready representation of the many types of data involved in these applications. This data structure has made possible the mining of multiple very large data sets, including time-sequence of RSI and multimedia land data. The Peano count tree (P-tree) technology provides an efficient way to store and mine images of any format, together with pertinent land data of still other formats. With the invention of Gene chips and gene expression microarrays (MA data) for use in medicine, plant science and many other application areas, new multimedia data mining challenges appeared. MA data presents a one-time, gene expression level map of thousands of genes subjected to hundreds of conditions. An important multimedia plant science application of the near future is to integrate macro-scale analysis of RSI with the micro-scale analysis of MA and to do the latter across multiple organisms. Most of the MA research has been done for a particular organism and the results have been archived as text abstracts (e.g., Medline abstracts). It will therefore be necessary to combine text mining with most multimedia RSI and MA mining. This is truly a multimedia data mining setting. The way text is almost always mined today is to extract pertinent features into tables and to then mine the tables (i.e., extract structured records from the unstructured text first). P-trees are a convenient technology to mine all media involved in this research. In fact, in almost all multimedia data mining applications, feature extraction converts the pertinent data to relational or tabular form, and then the tuples or rows are data mined. If multi-medias are going to be mined by first converting to a common format or media, a good candidate common data structure for that purpose is the P-tree. The P-tree data structure is designed for just such a data mining setting.
acm symposium on applied computing | 2005
Imad Rahal; Dongmei Ren; Amal Shehan Perera; Hassan Najadat; William Perrizo; Riad M. Rahhal; Willy Valdivia
Data arising from genomic and proteomic experiments is amassing at high speeds resulting in huge amounts of raw data; consequently, the need for analyzing such biological data --- the understanding of which is still lagging way behind --- has been prominently solicited in the post-genomic era we are currently witnessing. In this paper we attempt to analyze annotated genome data by applying a very central data-mining technique known as association rule mining with the aim of discovering rules capable of yielding deeper insights into this type of data. We propose a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to researcher in an incremental and interactive mode.
international conference on advances in ict for emerging regions | 2014
U. L. D. N. Gunasinghe; W. A. M. De Silva; N. H. N. D. de Silva; Amal Shehan Perera; W. A. D. Sashika; W. D. T. P. Premasiri
In Natural Language Processing and Text mining related works, one of the important aspects is measuring the sentence similarity. When measuring the similarity between sentences there are three major branches which can be followed. One procedure is measuring the similarity based on the semantic structure of sentences while the other procedures are based on syntactic similarity measure and hybrid measures. Syntactic similarity based methods take into account the co-occurring words in strings. Semantic similarity measures consider the semantic similarity between words based on a Semantic Net. In most of the time, easiest way to calculate the sentence similarity is using the syntactic measures, which do not consider grammatical structure of sentences. There are sentences which have the same meaning with different words. By considering both semantic and syntactic similarity we can improve the quality of the similarity measure rather than depending only on semantic or syntactic similarity. This paper follows the sentence similarity measure algorithm which is developed based on both syntactic and semantic similarity measures. This algorithm is based on measuring the sentence similarity by adhering to a vector space model generated for the word nodes in the sentences. In this implementation we consider two types of relationships. One of them is relationship between verbs in the sentence pairs while the other one is the relationship between nouns in the sentence pairs. One of the major advantages of this method is, it can be used for variable length sentences. In the experiment and results section we have been included our gain with this algorithm for a selected set of sentence pairs and have been compared with the actual human ratings for the similarity of the sentence pairs.
Sigkdd Explorations | 2006
William Perrizo; Amal Shehan Perera
In this paper, we describe a reliable high performance classification system that includes a Nearest Neighbor Vote based classification and a Local Decision Boundary based classification combined with an evolutionary algorithm for parameter optimization and a vertical data structure (Predicate Tree or P-tree1 for processing efficiency.
international conference on innovative computing technology | 2017
Vindula Jayawardana; Dimuthu Lakmal; Nisansa de Silva; Amal Shehan Perera; Keet Sugathadasa; Buddhi Ayesha
Selecting a representative vector for a set of vectors is a very common requirement in many algorithmic tasks. Traditionally, the mean or median vector is selected. Ontology classes are sets of homogeneous instance objects that can be converted to a vector space by word vector embeddings. This study proposes a methodology to derive a representative vector for ontology classes whose instances were converted to the vector space. We start by deriving five candidate vectors which are then used to train a machine learning model that would calculate a representative vector for the class. We show that our methodology out-performs the traditional mean and median vector representations.
vehicular technology conference | 2016
Charith Chitraranjan; Anne M. Denton; Amal Shehan Perera
Tracking vehicles from mobile phones has applications, among others, in traffic monitoring, location-based services, and personal navigation. We address the problem of tracking vehicles from received signal strength (RSS) sequences generated by mobile phones carried by passengers. A mobile phone periodically measures the RSS levels from the associated cell tower and several (six for GSM) strongest neighbor cell towers. Each such measurement is known as an RSS fingerprint. However, due to various effects, the contents of fingerprints may vary over time even when measured at the same location. These variations have two components. First is the fluctuation of the RSS levels. Second is the variation of the set of cell towers reported in fingerprints. The latter is not properly modeled by traditional methods. To address both components of variation, we propose a probabilistic model for RSS fingerprints that specifies for each gird-location in the area of interest, the distribution of the probability of observing any fingerprint at that location. We then use it as the observation model of a Dynamic Bayesian Network to track vehicles. Experiments on several roads demonstrate a 40% reduction in average error with our method compared to its traditional counterparts. Using RSS sequences of phone calls made by road users, our algorithm produced better travel-time estimates than comparison methods for a selected road segment with an average error of 13% with respect to travel-times computed through manual license plate recognition.
international conference on pervasive computing | 2015
Charith Chitraranjan; Amal Shehan Perera; Anne M. Denton
Tracking vehicles has many applications, especially in traffic engineering, including estimation of travel time/speed, traffic density, and Origin-Destination matrices. In this paper, we propose local alignment of mobile phone signal strength measurements to track the movement of vehicles, and demonstrate its application to travel-time estimation for a road segment. We use local alignment instead of the traditionally used global alignment to allow for vehicles changing roads. More specifically, we use local dynamic time warping (LDTW) to align the signal strength trace of a phone carried in a vehicle, to a reference trace that we had collected for the relevant road segment. The signal strength trace from a mobile phone includes the strength of the signals received from the serving cell and six neighbor cells that form a multivariate time series. We perform the alignments on these multi-dimensional time series as they provide better location specificity than the univariate time series of the strongest cell, used in existing alignment-based methods. Experiments on drive test data show that our LDTW-based algorithm yields a lower positioning error with respect to ground truth (GPS traces), than comparison methods. Application of LDTW on real world call traces, made available to us by a mobile service provider, produced travel-time estimates with an average error of 11% and significant correlation with respect to travel-times computed through manual number plate recognition of vehicles.
ieee international conference on teaching assessment and learning for engineering | 2012
Shahani Markus Weerawarana; Amal Shehan Perera; Vishaka Nanayakkara
We present the design of a software engineering project course targeted towards fostering innovation and creativity and in instilling a sense of software engineering rigor in students. The discussion includes our experiences in conducting this course over the past three years for the students following the Bachelor of Engineering (Honors) degree program in the Department of Computer Science and Engineering of the Faculty of Engineering at the University of Moratuwa. We describe the challenges we faced and our approaches towards addressing such issues as well as our encouraging successes.
intelligent data analysis | 2015
Thilina Rathnayake; Maheshakya Wijewardena; Thimal Kempitiya; Kevin Rathnasekara; Thushan Ganegedara; Amal Shehan Perera; Damminda Alahakoon
Self Organizing Maps (SOM) are widely used in data mining and high-dimensional data visualization due to its unsupervised nature and robustness. Growing Self Organizing Maps (GSOM) is a variant of SOM algorithm which allows nodes to be grown so that it can represent the input space better. Without using a fixed 2D grid like SOM, GSOM starts with four nodes and keeps track of the quantization error in each node. New nodes are grown from an existing node if its error value exceeds a pre-defined threshold. Ability of the GSOM algorithm to represent input space accurately is vital to extend its applicability to a wider spectrum of problems. This ability can be improved by identifying nodes that represent low probability regions in the input space and removing them periodically from the map. This will improve the homogeneity and completeness of the final clustering result. A new extension to GSOM algorithm based on node deletion is proposed in this paper as a solution to this problem. Furthermore, two new algorithms inspired by cache replacement policies are presented. First algorithm is based on Adaptive Replacement Cache (ARC) and maintains two separate Least Recently Used (LRU) lists of the nodes. Second algorithm is built on Frequency Based Replacement policy (FBR) and maintains a single LRU list. These algorithms consider both recent and frequent trends in the GSOM grid before deciding on the nodes to be deleted. The experiments conducted suggest that the FBR based method for node deletion outperforms the standard algorithm and other existing node deletion methods.