Is this you? Create Your Porfile

Norisma Idris

Information Technology University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Norisma Idris is active.

Explore More

Publication

Featured researches published by Norisma Idris.

Expert Systems With Applications | 2015

PDLK: Plagiarism detection using linguistic knowledge

Asad Abdi; Norisma Idris; Rasim M. Alguliyev; Ramiz M. Aliguliyev

Abstract Plagiarism is described as the reuse of someone elses previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets.

International Journal of Information Management | 2016

Towards knowledge modeling and manipulation technologies

Andrew Thomas Bimba; Norisma Idris; Ahmed Al-Hunaiyyan; Rohana Mahmud; Ahmed Abdelaziz; Suleman Khan; Victor Chang

We identified different knowledge base modelling and manipulation techniques based on 4 categories.Compared knowledge base modelling and manipulation technologies based on their underlying theories, knowledge representation technique, knowledge acquisition technique, challenges, applications, development tools and development languages.We discussed the relevance of knowledge-based business.We proposed a promising technique for knowledge-based business management and other knowledge related applications. A system which represents knowledge is normally referred to as a knowledge based system (KBS). This article focuses on surveying publications related to knowledge base modelling and manipulation technologies, between the years 20002015. A total of 185 articles excluding the subject descriptive articles which are mentioned in the introductory parts, were evaluated in this survey. The main aim of this study is to identify different knowledge base modelling and manipulation techniques based on 4 categories; 1) linguistic knowledge base; 2) expert knowledge base; 3) ontology and 4) cognitive knowledge base. This led to the proposition of 8 research questions, which focused on the different categories of knowledge base modelling technologies, their underlying theories, knowledge representation technique, knowledge acquisition technique, challenges, applications, development tools and development languages. A part of the findings from this survey is the high dependence of linguistic knowledge base, expert knowledge base and ontology on volatile expert knowledge. A promising technique for knowledge-based business management and other knowledge related applications is also discussed.

Information Processing and Management | 2014

An architecture for Malay Tweet normalization

Mohammad Arshi Saloot; Norisma Idris; Rohana Mahmud

Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other languages, such as Malay. Thus, in this paper, we propose an approach to normalize the Malay Twitter messages based on corpus-driven analysis. An architecture for Malay Tweet normalization is presented, which comprises seven main modules: (1) enhanced tokenization, (2) In-Vocabulary (IV) detection, (3) specialized dictionary query, (4) repeated letter elimination, (5) abbreviation adjusting, (6) English word translation, and (7) de-tokenization. A parallel Tweet dataset, consisting of 9000 Malay Tweets, is used in the development and testing stages. To measure the performance of the system, an evaluation is carried out. The result is promising whereby we score 0.83 in BLEU against the baseline BLEU, which scores 0.46. To compare the accuracy of the architecture with other statistical approaches, an SMT-like normalization system is implemented, trained, and evaluated with an identical parallel dataset. The experimental results demonstrate that we achieve higher accuracy by the normalization system, which is designed based on the features of Malay Tweets, compared to the SMT-like system.

soft computing | 2017

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Asad Abdi; Norisma Idris; Rasim M. Alguliyev; Ramiz M. Aliguliyev

In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.

Literary and Linguistic Computing | 2016

Twitter corpus creation: The case of a Malay Chat-style-text Corpus (MCC)

Mohammad Arshi Saloot; Norisma Idris; AiTi Aw; Dirk Thorleuchter

In recent years, social networks, microblogs, and short message service have deeply penetrated peoples lives, and thus, chat-style text is a common phenomenon. This chat-style text has many unknown features for linguists, which can be discovered by analyzing a chat-style corpus. The process of constructing a corpus conforms to specific corpus criteria, such as representativeness, sampling, variety, and chronology. Up to now, literature does not provide specific corpus criteria for creating a chat-style-text corpus. In contrast to related work, corpus criteria for creating a chat-style corpus are provided. An exhaustive and reliable Malay chat-style text corpus is still lacking. Thus, the provided criteria are used to demonstrate the process of constructing a Twitter corpus known as the Malay Chat-style Corpus (MCC). The MCC, which has 1 million twitter messages, consists of 14,484,384 word instances, 646,807 terms and metadata, such as posting time, used twitter client application, and type of Twitter message (simple Tweet, Retweet, Reply). Furthermore, the results of the analysis of the MCC reveal characteristics of the corpus including the most frequent terms and collocations, Zipf law diagram, Twitter peak hours, and percentages of message types. Finally, representativeness of the corpus is evaluated by employing cartography and automatic language identification methods. This corpus and the process of corpus creating are valuable for researchers working in linguistics, natural language processing, and data mining.

Swarm and evolutionary computation | 2018

Algorithmic design issues in adaptive differential evolution schemes: Review and taxonomy

Rawaa Dawoud Al-Dabbagh; Ferrante Neri; Norisma Idris; Mohd Sapiyan Baba

Abstract The performance of most metaheuristic algorithms depends on parameters whose settings essentially serve as a key function in determining the quality of the solution and the efficiency of the search. A trend that has emerged recently is to make the algorithm parameters automatically adapt to different problems during optimization, thereby liberating the user from the tedious and time-consuming task of manual setting. These fine-tuning techniques continue to be the object of ongoing research. Differential evolution (DE) is a simple yet powerful population-based metaheuristic. It has demonstrated good convergence, and its principles are easy to understand. DE is very sensitive to its parameter settings and mutation strategy; thus, this study aims to investigate these settings with the diverse versions of adaptive DE algorithms. This study has two main objectives: (1) to present an extension for the original taxonomy of evolutionary algorithms (EAs) parameter settings that has been overlooked by prior research and therefore minimize any confusion that might arise from the former taxonomy and (2) to investigate the various algorithmic design schemes that have been used in the different variants of adaptive DE and convey them in a new classification style. In other words, this study describes in depth the structural analysis and working principle that underlie the promising and recent work in this field, to analyze their advantages and disadvantages and to gain future insights that can further improve these algorithms. Finally, the interpretation of the literature and the comparative analysis of the algorithmic schemes offer several guidelines for designing and implementing adaptive DE algorithms. The proposed design framework provides readers with the main steps required to integrate any proposed meta-algorithm into parameter and/or strategy adaptation schemes.

Proceedings of the Workshop on Noisy User-generated Text | 2015

Toward Tweets Normalization Using Maximum Entropy

Mohammad Arshi Saloot; Norisma Idris; Liyana Shuib; Ram Gopal Raj; AiTi Aw

The use of social network services and microblogs, such as Twitter, has created valuable text resources, which contain extremely noisy text. Twitter messages contain so much noise that it is difficult to use them in natural language processing tasks. This paper presents a new approach using the maximum entropy model for normalizing Tweets. The proposed approach addresses words that are unseen in the training phase. Although the maximum entropy needs a training dataset to adjust its parameters, the proposed approach can normalize unseen data in the training set. The principle of maximum entropy emphasizes incorporating the available features into a uniform model. First, we generate a set of normalized candidates for each out-ofvocabulary word based on lexical, phonemic, and morphophonemic similarities. Then, three different probability scores are calculated for each candidate using positional indexing, a dependency-based frequency feature and a language model. After the optimal values of the model parameters are obtained in a training phase, the model can calculate the final probability value for candidates. The approach achieved an 83.12 BLEU score in testing using 2,000 Tweets. Our experimental results show that the maximum entropy approach significantly outperforms previous well-known normalization approaches.

soft computing | 2018

QAPD: an ontology-based question answering system in the physics domain

Asad Abdi; Norisma Idris; Zahrah Binti Ahmad

The tremendous development in information technology led to an explosion of data and motivated the need for powerful yet efficient strategies for knowledge discovery. Question answering (QA) systems made it possible to ask questions and retrieve answers using natural language queries. In ontology-based QA system, the knowledge-based data, where the answers are sought, have a structured organization. The question-answer retrieval of ontology knowledge base provides a convenient way to obtain knowledge for use. In this paper, QAPD, an ontology-based QA system applied to the physics domain, which integrates natural language processing, ontologies and information retrieval technologies to provide informative information for users, is presented. This system allows users to retrieve information from formal ontologies using input queries formulated in natural language. We proposed inferring schema mapping method, which uses the combination of semantic and syntactic information, and attribute-based inference to transform users’ questions into ontological knowledge base query. In addition, a novel domain ontology for physics domain, called EAEONT, is presented. Relevant standards and regulations have been utilized extensively during the ontology building process. The original characteristic of system is the strategy used to fill the gap between users’ expressiveness and formal knowledge representation. This system has been developed and tested on the English language and using an ontology modeling the physics domain. The performance level achieved enables the use of the system in real environments.

International Journal of Intelligent Information Technologies | 2017

A Model for Text Summarization

Rasim M. Alguliyev; Ramiz M. Aliguliyev; Nijat R. Isazade; Asad Abdi; Norisma Idris

Text summarization is a process for creating a concise version of documents preserving its main content. In this paper, to cover all topics and reduce redundancy in summaries, a two-stage sentences selection method for text summarization is proposed. At the first stage, to discover all topics the sentences set is clustered by using k-means method. At the second stage, optimum selection of sentences is proposed. From each cluster the salient sentences are selected according to their contribution to the topic cluster and their proximity to other sentences in cluster to avoid redundancy in summaries until the appointed summary length is reached. Sentence selection is modeled as an optimization problem. In this study, to solve the optimization problem an adaptive differential evolution with novel mutation strategy is employed. With a test on benchmark DUC2001 and DUC2002 data sets, the ROUGE value of summaries got by the proposed approach demonstrated its validity, compared to the traditional methods of sentence selection and the top three performing systems for DUC2001 and DUC2002.

Computer and Information Science | 2009

A Summary Sentence Decomposition Algorithm for Summarizing Strategies Identification

Norisma Idris; Sapiyan Baba; Rukaini Abdullah

Expert summarizers employ a number of strategies to produce summaries. Teachers need to identify which strategies are used by students to help them improve their summary writing. However, the task is time consuming. This paper reports on our effort to develop an algorithm to identify the summarizing strategies employed by students using summary sentence decomposition. The summarizing strategies used by experts are identified and translated into a set of heuristic rules. A summary sentence decomposition algorithm is then developed based on the heuristic rules. A preliminary test was carried out and the results are discussed.

Explore More