Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Naomie Salim is active.

Publication


Featured researches published by Naomie Salim.


systems man and cybernetics | 2012

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

Salha Alzahrani; Naomie Salim; Ajith Abraham

Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarists behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.


Journal of Chemical Information and Computer Sciences | 2003

Combination of fingerprint-based similarity coefficients using data fusion.

Naomie Salim; John D. Holliday; Peter Willett

Many different types of similarity coefficients have been described in the literature. Since different coefficients take into account different characteristics when assessing the degree of similarity between molecules, it is reasonable to combine them to further optimize the measures of similarity between molecules. This paper describes experiments in which data fusion is used to combine several binary similarity coefficients to get an overall estimate of similarity for searching databases of bioactive molecules. The results show that search performances can be improved by combining coefficients with little extra computational cost. However, there is no single combination which gives a consistently high performance for all search types.


Journal of Chemical Information and Computer Sciences | 2003

Analysis and display of the size dependence of chemical similarity coefficients

John D. Holliday; Naomie Salim; Martin Whittle; Peter Willett

We discuss the size-bias inherent in several chemical similarity coefficients when used for the similarity searching or diversity selection of compound collections. Limits to the upper bounds of 14 standard similarity coefficients are investigated, and the results are used to identify some exceptional characteristics of a few of the coefficients. An additional numerical contribution to the known size bias in the Tanimoto coefficient is identified. Graphical plots with respect to relative bit density are introduced to further assess the coefficients. Our methods reveal the asymmetries inherent in most similarity coefficients that lead to bias in selection, most notably with the Forbes and Russell-Rao coefficients. Conversely, when applied to the recently introduced Modified Tanimoto coefficient our methods provide support for the view that it is less biased toward molecular size than most. In this work we focus our discussion on fragment-based bit strings, but we demonstrate how our approach can be generalized to continuous representations.


Applied Soft Computing | 2012

An improved plagiarism detection scheme based on semantic role labeling

Ahmed Hamza Osman; Naomie Salim; Mohammed Salem Binwahlan; Rihab Alteeb; Albaraa Abuobieda

Plagiarism occurs when the content is copied without permission or citation. One of the contributing factors is that many text documents on the internet are easily copied and accessed. This paper introduces a plagiarism detection technique based on the Semantic Role Labeling (SRL). The technique analyses and compares text based on the semantic allocation for each term inside the sentence. SRL is superior in generating arguments for each sentence semantically. Weighting for each argument generated by SRL to study its behaviour is also introduced in this paper. It was found that not all arguments affect the plagiarism detection process. In addition, experimental results on PAN-PC-09 data sets showed that our method significantly outperforms the modern methods for plagiarism detection in terms of Recall, Precision and F-measure.


Computer Methods and Programs in Biomedicine | 2016

Arrhythmia recognition and classification using combined linear and nonlinear features of ECG signals

Fatin A. Elhaj; Naomie Salim; Arief R. Harris; Tan Tian Swee; Taquia Ahmed

Arrhythmia is a cardiac condition caused by abnormal electrical activity of the heart, and an electrocardiogram (ECG) is the non-invasive method used to detect arrhythmias or heart abnormalities. Due to the presence of noise, the non-stationary nature of the ECG signal (i.e. the changing morphology of the ECG signal with respect to time) and the irregularity of the heartbeat, physicians face difficulties in the diagnosis of arrhythmias. The computer-aided analysis of ECG results assists physicians to detect cardiovascular diseases. The development of many existing arrhythmia systems has depended on the findings from linear experiments on ECG data which achieve high performance on noise-free data. However, nonlinear experiments characterize the ECG signal more effectively sense, extract hidden information in the ECG signal, and achieve good performance under noisy conditions. This paper investigates the representation ability of linear and nonlinear features and proposes a combination of such features in order to improve the classification of ECG data. In this study, five types of beat classes of arrhythmia as recommended by the Association for Advancement of Medical Instrumentation are analyzed: non-ectopic beats (N), supra-ventricular ectopic beats (S), ventricular ectopic beats (V), fusion beats (F) and unclassifiable and paced beats (U). The characterization ability of nonlinear features such as high order statistics and cumulants and nonlinear feature reduction methods such as independent component analysis are combined with linear features, namely, the principal component analysis of discrete wavelet transform coefficients. The features are tested for their ability to differentiate different classes of data using different classifiers, namely, the support vector machine and neural network methods with tenfold cross-validation. Our proposed method is able to classify the N, S, V, F and U arrhythmia classes with high accuracy (98.91%) using a combined support vector machine and radial basis function method.


Journal of Chemical Information and Modeling | 2010

Ligand-based virtual screening using bayesian networks

Ammar Abdo; Beining Chen; Christoph Mueller; Naomie Salim; Peter Willett

A Bayesian inference network (BIN) provides an interesting alternative to existing tools for similarity-based virtual screening. The BIN is particularly effective when the active molecules being sought have a high degree of structural homogeneity but has been found to perform less well with structurally heterogeneous sets of actives. In this paper, we introduce an alternative network model, called a Bayesian belief network (BBN), that seeks to overcome this limitation of the BIN approach. Simulated virtual screening experiments with the MDDR, WOMBAT and MUV data sets show that the BIN and BBN methods allow effective screening searches to be carried out. However, the results obtained are not obviously superior to those obtained using a much simpler approach that is based on the use of the Tanimoto coefficient and of the square roots of fragment occurrence frequencies.


Journal of Cheminformatics | 2014

Chemical named entities recognition: a review on approaches and applications

Safaa Eltyeb; Naomie Salim

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to “text mine” these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.


Information Processing and Management | 2010

Fuzzy swarm diversity hybrid model for text summarization

Mohammed Salem Binwahlan; Naomie Salim; Ladda Suanmali

High quality summary is the target and challenge for any automatic text summarization. In this paper, we introduce a different hybrid model for automatic text summarization problem. We exploit strengths of different techniques in building our model: we use diversity-based method to filter similar sentences and select the most diverse ones, differentiate between the more important and less important features using the swarm-based method and use fuzzy logic to make the risks, uncertainty, ambiguity and imprecise values of the text features weights flexibly tolerated. The diversity-based method focuses to reduce redundancy problems and the other two techniques concentrate on the scoring mechanism of the sentences. We presented the proposed model in two forms. In the first form of the model, diversity measures dominate the behavior of the model. In the second form, the diversity constraint is no longer imposed on the model behavior. That means the diversity-based method works same as fuzzy swarm-based method. The results showed that the proposed model in the second form performs better than the first form, the swarm model, the fuzzy swarm method and the benchmark methods. Over results show that combination of diversity measures, swarm techniques and fuzzy logic can generate good summary containing the most important parts in the document.


Applied Soft Computing | 2015

A framework for multi-document abstractive summarization based on semantic role labelling

Atif Khan; Naomie Salim; Yogan Jaya Kumar

We have proposed a framework for multi-document abstractive summarization based on semantic role labeling (SRL). To the best of our knowledge, SRL has not been employed for abstractive summarization.The integration of genetic algorithm with SRL based framework for abstractive summarization results gives improved summarization results.My study focus on two highlights and discussion is based on these two highlights. We propose a framework for abstractive summarization of multi-documents, which aims to select contents of summary not from the source document sentences but from the semantic representation of the source documents. In this framework, contents of the source documents are represented by predicate argument structures by employing semantic role labeling. Content selection for summary is made by ranking the predicate argument structures based on optimized features, and using language generation for generating sentences from predicate argument structures. Our proposed framework differs from other abstractive summarization approaches in a few aspects. First, it employs semantic role labeling for semantic representation of text. Secondly, it analyzes the source text semantically by utilizing semantic similarity measure in order to cluster semantically similar predicate argument structures across the text; and finally it ranks the predicate argument structures based on features weighted by genetic algorithm (GA). Experiment of this study is carried out using DUC-2002, a standard corpus for text summarization. Results indicate that the proposed approach performs better than other summarization systems.


Journal of Chemical Information and Modeling | 2011

New fragment weighting scheme for the bayesian inference network in ligand-based virtual screening

Ammar Abdo; Naomie Salim

Many of the conventional similarity methods assume that molecular fragments that do not relate to biological activity carry the same weight as the important ones. One possible approach to this problem is to use the Bayesian inference network (BIN), which models molecules and reference structures as probabilistic inference networks. The relationships between molecules and reference structures in the Bayesian network are encoded using a set of conditional probability distributions, which can be estimated by the fragment weighting function, a function of the frequencies of the fragments in the molecule or the reference structure as well as throughout the collection. The weighting function combines one or more fragment weighting schemes. In this paper, we have investigated five different weighting functions and present a new fragment weighting scheme. Later on, these functions were modified to combine the new weighting scheme. Simulated virtual screening experiments with the MDL Drug Data Report (23) and maximum unbiased validation data sets show that the use of new weighting scheme can provide significantly more effective screening when compared with the use of current weighting schemes.

Collaboration


Dive into the Naomie Salim's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Faisal Saeed

Universiti Teknologi Malaysia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yogan Jaya Kumar

Universiti Teknikal Malaysia Melaka

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Atif Khan

Universiti Teknologi Malaysia

View shared research outputs
Top Co-Authors

Avatar

Saidah Saad

National University of Malaysia

View shared research outputs
Top Co-Authors

Avatar

Adekunle Isiaka Obasa

Universiti Teknologi Malaysia

View shared research outputs
Top Co-Authors

Avatar

Ameer Tawfik Albaham

Universiti Teknologi Malaysia

View shared research outputs
Researchain Logo
Decentralizing Knowledge