Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vani K is active.

Publication


Featured researches published by Vani K.


advances in computing and communications | 2014

Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection

Deepa Gupta; Vani K; Charan Kamal Singh

Plagiarism is one of the most serious crimes in academia and research fields. In this modern era, where access to information has become much easier, the act of plagiarism is rapidly increasing. This paper aligns on external plagiarism detection method, where the source collection of documents is available against which the suspicious documents are compared. Primary focus is to detect intelligent plagiarism cases where semantics and linguistic variations play an important role. The paper explores the different preprocessing methods based on Natural Language Processing (NLP) techniques. It further explores fuzzy-semantic similarity measures for document comparisons. The system is finally evaluated using PAN 20121 data set and performances of different methods are compared.


Journal of Engineering Science and Technology Review | 2016

Study on Extrinsic Text Plagiarism Detection Techniques and Tools

Vani K; Deepa Gupta

The swift evolution of technology has facilitated the access of information through different means which has opened the doors to plagiarism. In today’s world of technological outburst, plagiarism is aggravating and has become a serious concern in academia, research and many other fields. To curb this intellectual theft and to ensure academic integrity, efficient software systems to detect them are in urgent need. In this paper, a study on plagiarism is done with the focus on extrinsic text plagiarism detection, which is a fast emerging research area in this domain. The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art. Further an overview of some of the available detection software tools, their features and detection efficiency is discussed with some of the output demos. The paper also throws light on the popular PAN competition, which is conducted yearly since 2009 in plagiarism domain and the major tasks involved in it. Further it attempts to identify the problems existing in available tools and the research gaps where immense explorations can be done.


Expert Systems With Applications | 2017

Detection of idea plagiarism using syntax–Semantic concept extractions with genetic algorithm

Vani K; Deepa Gupta

Abstract Plagiarism is increasingly becoming a major issue in the academic and educational domains. Automated and effective plagiarism detection systems are direly required to curtail this information breach, especially in tackling idea plagiarism. The proposed approach is aimed to detect such plagiarism cases, where the idea of a third party is adopted and presented intelligently so that at the surface level, plagiarism cannot be unmasked. The reported work aims to explore syntax-semantic concept extractions with genetic algorithm in detecting cases of idea plagiarism. The work mainly focuses on idea plagiarism where the source ideas are plagiarized and represented in a summarized form. Plagiarism detection is employed at both the document and passage levels by exploiting the document concepts at various structural levels. Initially, the idea embedded within the given source document is captured using sentence level concept extraction with genetic algorithm. Document level detection is facilitated with word-level concepts where syntactic information is extracted and the non-plagiarized documents are pruned. A combined similarity metric that utilizes the semantic level concept extraction is then employed for passage level detection. The proposed approach is tested on PAN13-14 1 plagiarism corpus for summary obfuscation data, which represents a challenging case of idea plagiarism. The performance of the current approach and its variations are evaluated both at the document and passage levels, using information retrieval and PAN plagiarism measures respectively. The results are also compared against six top ranked plagiarism detection systems submitted as a part of PAN13-14 competition. The results obtained are found to exhibit significant improvement over the compared systems and hence reflects the potency of the proposed syntax-semantic based concept extractions in detecting idea plagiarism.


advances in computing and communications | 2015

Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system

Vani K; Deepa Gupta

Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet_score.


international conference on contemporary computing | 2014

Using K-means cluster based techniques in external plagiarism detection

Vani K; Deepa Gupta

Text document categorization is one of the rapidly emerging research fields, where documents are identified, differentiated and classified manually or algorithmically. The paper focuses on application of automatic text document categorization in plagiarism detection domain. In todays world plagiarism has become a prime concern, especially in research and educational fields. This paper aims on the study and comparison of different methods of document categorization in external plagiarism detection. Here the primary focus is to explore the unsupervised document categorization/ clustering methods using different variations of K-means algorithm and compare it with the general N-gram based method and Vector Space Model based method. Finally the analysis and evaluation is done using data set from PAN-20131 and performance is compared based on precision, recall and efficiency in terms of time taken for algorithm execution.


Information Processing and Management | 2018

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

Vani K; Deepa Gupta

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN11http://pan.webis.de. competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.


Expert Systems With Applications | 2017

Text plagiarism classification using syntax based linguistic features

Vani K; Deepa Gupta

An approach that utilizes minimal and effective syntax based linguistic features for plagiarism classification extracted using shallow natural language processing techniques.A two-phase feature selection approach that identifies minimal and best features for plagiarism classification.Detailed analysis of the impact and dependencies of plagiarism types and complexities on the extracted features. The proposed work models document level text plagiarism detection as a binary classification problem, where the task is to distinguish a given suspicious-source document pair as plagiarized or non-plagiarized. The objective is to explore the potency of syntax based linguistic features extracted using shallow natural language processing techniques for plagiarism classification task. Shallow syntactic features, viz., part of speech tags and chunks are utilized after effective pre-processing and filtrations for pruning the irrelevant information. The work further proposes the modelling of this classification phase as an intermediate stage, which will be post candidate source retrieval and before exhaustive passage level detections. A two-phase feature selection approach is proposed, which improves the effectiveness of classification by selecting appropriate set of features as the input to machine learning based classifiers. The proposed approach is evaluated on smaller and larger test conditions using the corpus of plagiarized short answers (PSA) and plagiarism instances collected from PAN corpus respectively. Under both the test conditions, performances are evaluated using general as well as advanced classification metrics. Another main contribution of the current work is the analysis of dependencies and impact of the extracted features, upon the type and complexity of plagiarism imposed in the documents. The proposed results are compared with the two state-of-the-art approaches and they outperform the baseline approaches significantly. This in turn reflects the cogency of syntactic linguistic features in document level plagiarism classification, especially for the instances close to manual or real plagiarism scenarios.


Intelligent Systems Technologies and Applications | 2016

Exploration of Fuzzy C Means Clustering Algorithm in External Plagiarism Detection System

N. Riya Ravi; Vani K; Deepa Gupta

With the advent of World Wide Web, plagiarism has become a prime issue in field of academia. A plagiarized document may contain content from a number of sources available on the web and it is beyond any individual to detect such plagiarism manually. This paper focuses on the exploration of soft clustering, via, Fuzzy C Means algorithm in the candidate retrieval stage of external plagiarism detection task. Partial data sets from PAN 2013 corpus is used for the evaluation of the system and the results are compared with existing approaches, via, N-gram and K Means Clustering. The performance of the systems is measured using the standard measures, precision and recall and comparison is done.


Journal of the Association for Information Science and Technology | 2018

Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection

Vani K; Deepa Gupta

The objective of the work is to explore the potency of integrating structural and citation information with effective syntax‐semantic text‐based analysis for scientific plagiarism detection. One of the major limitations in todays plagiarism checkers is their sole dependence on text‐based detection, where they ignore the citation and structural information. Further, the text‐based detection approaches that they employ usually fail to trace out intelligent manipulations. In the proposed work, a plagiarism detection system is presented that employs the effective coupling of various modules, namely, logical structure classifications and citation parsing, two‐stage candidate document selections, syntax‐semantic‐based exhaustive passage level analysis with plagiarism analysis using structural and citation information. Further, a new plagiarism score, namely, weighted overall similarity index is proposed, opposed to the general plagiarism scores. The proposed approach is evaluated on the data set created by Alzahrani et al. ( ), which contains scientific publications imposed with various plagiarism complexities. Comparison of the final system results is done against a potential baseline approach. The proposed approach exhibits considerable improvement over the comparative baseline, and hence reflects the potency of syntax‐semantic text‐based analysis with structural and citation information.


journal of engineering science and technology | 2016

Plagiarism detection in text documents using sentence bounded stop word n-grams

Deepa Gupta; Vani K; L.M. Leema

Collaboration


Dive into the Vani K's collaboration.

Top Co-Authors

Avatar

Deepa Gupta

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

N. Riya Ravi

Amrita Vishwa Vidyapeetham

View shared research outputs
Researchain Logo
Decentralizing Knowledge