Salha Alzahrani
Taif University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Salha Alzahrani.
systems man and cybernetics | 2012
Salha Alzahrani; Naomie Salim; Ajith Abraham
Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarists behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.
international conference on applications of digital information and web technologies | 2009
Salha Alzahrani; Naomie Salim
As one of the richest human languages in terms of words constructions and diversity of meanings, judging similarity amongst statements in Arabic documents is complex. In this paper, we present a mechanism for gauging similarity of Arabic documents using fuzzy IR model. Similarity degree of two documents is the averaged similarity among statements treated as equal although they have been restructured or reworded. We introduced some fuzzy similarity sets such as near duplicate, very similar, similar, slightly similar, dissimilar and very dissimilar. These similarity sets can be implemented as a spectrum of values ranges from 1 (duplicate) and 0 (different). Our corpus collection has been built in which all stop words were removed and nonstop words were stemmed using typical Arabic IR techniques. The corpora has 100 documents with 4477 statements and 54346 non-stop-word, stemmed words in total. Another 15 query documents with 303 statements and 1620 words were specifically constructed for our test. Experimental results show that fuzzy IR can be used to define the extent documents are similar or dissimilar, where similarity can be mapped to one of the proposed fuzzy sets. The performance of our fuzzy IR system, measured in fuzzy precision and fuzzy recall, shows that it outperforms Boolean IR in retrieving more documents that have similar content but with different synonyms.
world congress on information and communication technologies | 2011
Salha Alzahrani; Naomie Salim; Ajith Abraham; Vasile Palade
Existing anti-plagiarism tools are, in fact, text matching systems but do not make accurate judgments about plagiarism. Texts that are acceptable to be redundant and texts that are cited properly are all highlighted as plagiarism, and the real decision of plagiarism is left up to the user. To reduce the human input and to give more reliance to automatic plagiarism detectors, we propose an Intelligent Plagiarism Reasoner (iPlag), which works by combining several analytical procedures. Scholarly documents under investigation are segmented into logical tree-structured representation using a procedure called D-SEGMENT. Statistical methods are utilised to assign numerical weights to structural components under a technique called C-WEIGHT. Relevance ranking (R-RANK) and plagiarism screening approaches (P-SCREEN) are adjusted to incorporate structural weights, citation evidences, syntax-based and semantic-based methods into plagiarism detection results. We encourage current plagiarism detection systems to adapt the proposed framework.
Online Information Review | 2015
Taiseer Abdalla Elfadil Eisa; Naomie Salim; Salha Alzahrani
Purpose – The purpose of this paper is to analyse the state-of-the-art techniques used to detect plagiarism in terms of their limitations, features, taxonomies and processes. Design/methodology/approach – The method used to execute this study consisted of a comprehensive search for relevant literature via six online database repositories namely; IEEE xplore, ACM Digital Library, ScienceDirect, EI Compendex, Web of Science and Springer using search strings obtained from the subject of discussion. Findings – The findings revealed that existing plagiarism detection techniques require further enhancements as existing techniques are incapable of efficiently detecting plagiarised ideas, figures, tables, formulas and scanned documents. Originality/value – The contribution of this study lies in its ability to have exposed the current trends in plagiarism detection researches and identify areas where further improvements are required so as to complement the performances of existing techniques.
intelligent systems design and applications | 2010
Salha Alzahrani; Naomie Salim; Chow Kok Kent; Mohammed Salem Binwahlan; Ladda Suanmali
This work presents the design and development of a web-based system that supports cross-language similarity analysis and plagiarism detection. A suspicious document dq in a language Lq is to be submitted to the system via a PHP web-based interface. The system will accept the text through either uploading or pasting it directly to a text-area. In order to lighten large texts and provide an ideal set of queries, we introduce the idea of query document reduction via summarisation. Our proposed system utilised a fuzzy swarm-based summarisation tool originally built in Java. Then, the summary is used as a query to find similar web resources in languages Lx other than Lq via a dictionary-based translation. Thereafter, a detailed similarity analysis across the languages Lq and Lx is performed and friendly report of results is produced. Such report has global similarity score on the whole document, which assures high flexibility of utilisation.
asia information retrieval symposium | 2013
Salha Alzahrani
The aim of this paper is to give a detailed and explicit design, composition and documentation of a new Arabic News Corpus (ArNeCo). We used RSS feeds from Google news as a big container of article titles, and crawled the web to extract the text. About 11,000 documents with more than 6 million words were tagged as belonging to one of 6 domains: Business, Entertainment, Health, Science-Technology, Sports, and World. Metadata has been added to the corpus as a whole and to each domain independently. The developed corpus, called ArNeCo, has been analysed to ensure that it has a considerable quality and quantity, and published on the Internet for research purposes. This article aims to help potential users of ArNeCo to understand the nature of the corpus and to do information retrieval research in many ways such as in the formulation of queries, justification of decisions taken or interpretation of results gained. Besides the corpus, this article presents a method for developing corpora that can keep track of recent natural language texts posted on the Internet by using RSS feeds.
international conference on intelligent computing | 2017
Taiseer Abdalla Elfadil Eisa; Naomie Salim; Salha Alzahrani
Plagiarism is the process of copying someone else’s text or figure verbatim or without due recognition of the source. A lot of techniques have been proposed for detecting plagiarism in texts, but a few techniques exist for detecting figure plagiarism. This paper focuses on detecting plagiarism in scientific figures. Existing techniques are not applicable to figures. Detecting plagiarism in figures requires extraction of information from its components to enable comparison between figures. Consequently, content-based figure plagiarism detection technique is proposed and evaluated based on the existing limitations. The proposed technique was based on the feature extraction and similarity computation methods. Feature extraction method is capable of extracting contextual features of figures in aid of understanding the components contained in figures, while similarity detection method is capable of categorizing a figure either as plagiarized or as non-plagiarized depending on the threshold value. Empirical results showed that the proposed technique was accurate and scalable.
2017 6th ICT International Student Project Conference (ICT-ISPC) | 2017
Taiseer Abdalla Elfadil Eisa; Naomie Salim; Salha Alzahrani
In an academic environment, plagiarism is the process of copying someone elses text, idea or data verbatim or without due recognition of the source, which is a serious academic offence. Many techniques have been proposed in the literature for detecting plagiarism in texts, but only a few techniques exist for detecting figure plagiarism. The main problem associated with existing techniques of plagiarism detection is that they are not applicable to non-textual elements of figures in research publications. This paper focuses on detecting plagiarism in scientific figures. Textual-reference representation based figure plagiarism detection techniques are proposed and evaluated, based on existing limitations. The proposed techniques use enhanced feature extraction such as textual features and similarity computation methods such as similarity based on textual-reference of figures. The enhanced feature extraction method was found to be capable of extracting textual references such as captions and description texts. The similarity detection method was capable of categorising a given figure as either plagiarised or non-plagiarised from a source collection of scientific publications, depending on a certain threshold value. Results showed that the proposed technique achieved precision=0.78 and recall=0.67 result in terms of the evaluation measure.
cross-language evaluation forum | 2010
Salha Alzahrani; Naomie Salim
Journal of the Association for Information Science and Technology | 2012
Salha Alzahrani; Vasile Palade; Naomie Salim; Ajith Abraham