Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rivindu Perera is active.

Publication


Featured researches published by Rivindu Perera.


pacific rim international conference on artificial intelligence | 2014

The Role of Linked Data in Content Selection

Rivindu Perera; Parma Nand

This paper explores the appropriateness of utilizing Linked Data as a knowledge source for content selection. Content Selection is a crucial subtask in Natural Language Generation which has the function of determining the relevancy of contents from a knowledge source based on a communicative goal. The recent online era has enabled us to accumulate extensive amounts of generic online knowledge some of which has been made available as structured knowledge sources for computational natural language processing purposes. This paper proposes a model for content selection by utilizing a generic structured knowledge source, DBpedia, which is a replica of the unstructured counterpart, Wikipedia. The proposed model uses log likelihood to rank the contents from DBpedia Linked Data for relevance to a communicative goal. We performed experiments using DBpedia as the Linked Data resource using two keyword datasets as communicative goals. To optimize parameters we used keywords extracted from QALD-2 training dataset and QALD-2 testing dataset is used for the testing. The results was evaluated against the verbatim based selection strategy. The results showed that our model can perform 18.03% better than verbatim selection.


conference on intelligent text processing and computational linguistics | 2015

A Multi-strategy Approach for Lexicalizing Linked Open Data

Rivindu Perera; Parma Nand

This paper aims at exploiting Linked Data for generating natural text, often referred to as lexicalization. We propose a framework that can generate patterns which can be used to lexicalize Linked Data triples. Linked Data is structured knowledge organized in the form of triples consisting of a subject, a predicate and an object. We use DBpedia as the Linked Data source which is not only free but is currently the fastest growing data source organized as Linked Data. The proposed framework utilizes the Open Information Extraction (OpenIE) to extract relations from natural text and these relations are then aligned with triples to identify lexicalization patterns. We also exploit lexical semantic resources which encode knowledge on lexical, semantic and syntactic information about entities. Our framework uses VerbNet and WordNet as semantic resources. The extracted patterns are ranked and categorized based on the DBpedia ontology class hierarchy. The pattern collection is then sorted based on the score assigned and stored in an index embedded database for use in the framework as well as for future lexical resource. The framework was evaluated for syntactic accuracy and validity by measuring the Mean Reciprocal Rank (MRR) of the first correct pattern. The results indicated that framework can achieve 70.36% accuracy and a MRR value of 0.72 for five DBpedia ontology classes generating 101 accurate lexicalization patterns.


international conference on tools with artificial intelligence | 2014

Real Text-CS - Corpus Based Domain Independent Content Selection Model

Rivindu Perera; Parma Nand

Content selection is a highly domain dependent task responsible for retrieving relevant information from a knowledge source using a given communicative goal. This paper presents a domain independent content selection model using keywords as communicative goal. We employ DBpedia triple store as our knowledge source and triples are selected based on weights assigned to each triple. The calculation of the weights is carried out through log likelihood distance between a domain corpus and a general reference corpus. The method was evaluated using keywords extracted from QALD dataset and the performance was compared with cross entropy based statistical content selection. The evaluation results showed that the proposed method can perform 32% better than cross entropy based statistical content selection.


5th International Conference on Knowledge Engineering and Semantic Web (KESW) | 2014

Interaction History Based Answer Formulation for Question Answering

Rivindu Perera; Parma Nand

With the rapid growth in information access methodologies, question answering has drawn considerable attention among others. Though question answering has emerged as an interesting new research domain, still it is vastly concentrated on question processing and answer extraction approaches. Latter steps like answer ranking, formulation and presentations are not treated in depth. Weakness we found in this arena is that answers that a particular user has acquired are not considered, when processing new questions. As a result, current systems are not capable of linking two questions such as “When is the Apple founded?” with a previously processed question “When is the Microsoft founded?” generating an answer in the form of “Apple is founded one year later Microsoft founded, in 1976”. In this paper we present an approach towards question answering to devise an answer based on the questions already processed by the system for a particular user which is termed as interaction history for the user. Our approach is a combination of question processing, relation extraction and knowledge representation with inference models. During the process we primarily focus on acquiring knowledge and building up a scalable user model to formulate future answers based on current answers that same user has processed. According to evaluation we carried out based on the TREC resources shows that proposed technology is promising and effective in question answering.


The Prague Bulletin of Mathematical Linguistics | 2016

RealText-lex: A Lexicalization Framework for RDF Triples

Rivindu Perera; Parma Nand; Gisela Klette

Abstract The online era has made available almost cosmic amounts of information in the public and semi-restricted domains, prompting development of corresponding host of technologies to organize and navigate this information. One of these developing technologies deals with encoding information from free form natural language into a structured form as RDF triples. This representation enables machine processing of the data, however the processed information can not be directly converted back to human language. This has created a need to be able to lexicalize machine processed data existing as triples into a natural language, so that there is seamless transition between machine representation of information and information meant for human consumption. This paper presents a framework to lexicalize RDF triples extracted from DBpedia, a central interlinking hub for the emerging Web of Data. The framework comprises of four pattern mining modules which generate lexicalization patterns to transform triples to natural language sentences. Among these modules, three are based on lexicons and the other works on extracting relations by exploiting unstructured text to generate lexicalization patterns. A linguistic accuracy evaluation and a human evaluation on a sub-sample showed that the framework can produce patterns which are accurate and emanate human generated qualities.


australasian joint conference on artificial intelligence | 2015

Answer Presentation with Contextual Information: A Case Study Using Syntactic and Semantic Models

Rivindu Perera; Parma Nand

Answer presentation is a subtask in Question Answering that investigates the ways of presenting an acquired answer to the user in a format that is close to a human generated answer. In this research we explore models to retrieve additional, relevant, contextual information corresponding to a question and present an enriched answer by integrating the additional information as natural language. We investigate the role of Bag of Words (BoW) and Bag of Concepts (BoC) models to retrieve the relevant contextual information. The information source utilized to retrieve the information is a Linked Data resource, DBpedia, which encodes large amounts of knowledge corresponding to Wikipedia in a structured form as triples. The experiments utilizes the QALD question sets consisted of training and testing sets each containing 100 questions. The results from these experiments shows that pragmatic aspects, which are often neglected by BoW (syntactic models) and BoC (semantic models), form a critical part of contextual information selection.


pacific rim international conference on artificial intelligence | 2014

A HMM POS Tagger for Micro-blogging Type Texts

Parma Nand; Rivindu Perera; Ramesh Lal

The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one.


Progress in Artificial Intelligence | 2017

Utilizing Typed Dependency Subtree Patterns for Answer Sentence Generation in Question Answering Systems

Rivindu Perera; Parma Nand; Asif Naeem

Question Answering over Linked Data (QALD) refer to the use of Linked Data by question answering systems, and in recent times this has become increasingly popular as it opens up a massive Linked Data cloud which is a rich source of encoded knowledge. However, a major shortfall of current QALD systems is that they focus on presenting a single fact or factoid answer which is derived using SPARQL (SPARQL Protocol and RDF Query Language) queries. There is now an increased interest in development of human-like systems which would be able to answer questions and even hold conversations by constructing sentences akin to humans. In this paper, we introduce a new answer construction and presentation system, which utilizes the linguistic structure of the source question and the factoid answer to construct an answer sentence which closely emanates a human-generated answer. We employ both semantic Web technology and the linguistic structure to construct the answer sentences. The core of the research resides on extracting dependency subtree patterns from the questions and utilizing them in conjunction with the factoid answer to generate the answer sentence with a natural feel akin to an answer from a human when asked the question. We evaluated the system for both linguistic accuracy and naturalness using human evaluation. These evaluation processes showed that the proposed approach is able to generate answer sentences which have linguistic accuracy and natural readability quotients of more than 70%. In addition, we also carried out a feasibility analysis on using automatic metrics for answer sentence evaluation. The results from this phase showed that the there is not a strong correlation between the results from automatic metric evaluation and the human ratings of the machine-generated answers.


international conference on computational linguistics | 2017

An Ensemble Architecture for Linked Data Lexicalization

Rivindu Perera; Parma Nand

Linked Data has revamped the representation of knowledge by introducing the triple data structure which can encode knowledge with the associated semantics including the context by interlinking with external resources across documents. Although Linked Data is an attractive and effective mechanism to represent knowledge as created and consumed by humans in the form of a natural language, it still has a dimension of separation from natural language. Hence, in recent times, there has been an increase interest in transforming Linked Data into natural language in order to harness the benefits of Linked Data in applications interacting with natural language. This paper presents a framework that lexicalizes the Linked Data triples into natural language using an ensemble architecture. The proposed architecture is comprised of four different pattern based modules which lexicalize triples by analysing the triple features. The four pattern mining modules are based on occupational metonyms, Context Free Grammar (CFG), relation extraction using Open Information Extraction (OpenIE), and triple properties. The framework was evaluated using a two-fold evaluation process consisting of linguistic accuracy analysis and human evaluation for a test sample. The linguistic accuracy evaluation showed that the framework can produce 283 accurate lexicalization patterns for a set of 25 ontology classes resulting in a 70.75% accuracy, which is an approximately 91% increase compared to the existing state-of-the-art model.


international conference on cloud computing | 2017

Lexicalizing linked data for a human friendly web

Rivindu Perera; Parma Nand; Wen-Hsin Yang; Kohichi Toshioka

The consumption of Linked Data has dramatically increased with the increasing momentum towards semantic web. Linked data is essentially a very simplistic format for representation of knowledge in that all the knowledge is represented as triples which can be linked using one or more components from the triple. To date, most of the efforts has been towards either creating linked data by mining the web or making it available for users as a source of knowledgebase for knowledge engineering applications. In recent times there has been a growing need for these applications to interact with users in a natural language which required the transformation of the linked data knowledge into a natural language. The aim of the RealText project described in this paper, is to build a scalable framework to transform Linked Data into natural language by generating lexicalization patterns for triples. A lexicalization pattern is a syntactical pattern that will transform a given triple into a syntactically correct natural language sentence. Using DBpedia as the Linked Data resource, we have generated 283 accurate lexicalization patterns for a sample set of 25 ontology classes. We performed human evaluation on a test sub-sample with an inter-rater agreement of 0.86 and 0.80 for readability and accuracy respectively. This results showed that the lexicalization patterns generated language that are accurate, readable and emanates qualities of a human produced language.

Collaboration


Dive into the Rivindu Perera's collaboration.

Top Co-Authors

Avatar

Parma Nand

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Asif Naeem

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ramesh Lal

Auckland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge