Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Deepa Gupta is active.

Publication


Featured researches published by Deepa Gupta.


workshop on statistical machine translation | 2006

Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output

Maja Popović; Adrià de Gispert; Deepa Gupta; Patrik Lambert; Hermann Ney; José B. Mariño; Marcello Federico; Rafael E. Banchs

Evaluation of machine translation output is an important but difficult task. Over the last years, a variety of automatic evaluation measures have been studied, some of them like Word Error Rate (WER), Position Independent Word Error Rate (PER) and BLEU and NIST scores have become widely used tools for comparing different systems as well as for evaluating improvements within one system. However, these measures do not give any details about the nature of translation errors. Therefore some analysis of the generated output is needed in order to identify the main problems and to focus the research efforts. On the other hand, human evaluation is a time consuming and expensive task. In this paper, we investigate methods for using of morpho-syntactic information for automatic evaluation: standard error measures WER and PER are calculated on distinct word classes and forms in order to get a better idea about the nature of translation errors and possibilities for improvements.


international conference natural language processing | 2006

Improving statistical word alignments with morpho-syntactic transformations

Adrià de Gispert; Deepa Gupta; Maja Popović; Patrik Lambert; José B. Mariño; Marcello Federico; Hermann Ney; Rafael E. Banchs

This paper presents a wide range of statistical word alignment experiments incorporating morphosyntactic information. By means of parallel corpus transformations according to information of POS-tagging, lemmatization or stemming, we explore which linguistic information helps improve alignment error rates. For this, evaluation against a human word alignment reference is performed, aiming at an improved machine translation training scheme which eventually leads to improved SMT performance. Experiments are carried out in a Spanish–English European Parliament Proceedings parallel corpus, both in a large and a small data track. As expected, improvements due to introducing morphosyntactic information are bigger in case of data scarcity, but significant improvement is also achieved in a large data task, meaning that certain linguistic knowledge is relevant even in situations of large data availability.


advances in computing and communications | 2014

Using Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection

Deepa Gupta; Vani K; Charan Kamal Singh

Plagiarism is one of the most serious crimes in academia and research fields. In this modern era, where access to information has become much easier, the act of plagiarism is rapidly increasing. This paper aligns on external plagiarism detection method, where the source collection of documents is available against which the suspicious documents are compared. Primary focus is to detect intelligent plagiarism cases where semantics and linguistic variations play an important role. The paper explores the different preprocessing methods based on Natural Language Processing (NLP) techniques. It further explores fuzzy-semantic similarity measures for document comparisons. The system is finally evaluated using PAN 20121 data set and performances of different methods are compared.


Journal of Engineering Science and Technology Review | 2016

Study on Extrinsic Text Plagiarism Detection Techniques and Tools

Vani K; Deepa Gupta

The swift evolution of technology has facilitated the access of information through different means which has opened the doors to plagiarism. In today’s world of technological outburst, plagiarism is aggravating and has become a serious concern in academia, research and many other fields. To curb this intellectual theft and to ensure academic integrity, efficient software systems to detect them are in urgent need. In this paper, a study on plagiarism is done with the focus on extrinsic text plagiarism detection, which is a fast emerging research area in this domain. The different extrinsic detection techniques and the methodologies involved are reviewed based on the current state of art. Further an overview of some of the available detection software tools, their features and detection efficiency is discussed with some of the output demos. The paper also throws light on the popular PAN competition, which is conducted yearly since 2009 in plagiarism domain and the major tasks involved in it. Further it attempts to identify the problems existing in available tools and the research gaps where immense explorations can be done.


advances in computing and communications | 2013

Identifying the best feature combination for sentiment analysis of customer reviews

C.a Priyanka; Deepa Gupta

Opinions are increasingly available in form of reviews and feedback at websites, blogs, and microblogs which influence future customers. From human perspective, it is difficult to read all the opinions and summarize them which require an automated and faster opinion mining to classify the reviews. In this paper different features namely, N-gram features, POS based features and features based on the lexicon SentiWordNet, have been investigated. The Support Vector Machines (SVM) classifier has been modeled with presence as feature representation for classification of the reviews into positive and negative classes thereby identifying the best feature combination. Results of Experiments conducted on smart phone reviews for different feature combinations have been presented. A highest accuracy up till 92% and 95% has been obtained for small and large datasets, respectively.


Expert Systems With Applications | 2017

Detection of idea plagiarism using syntax–Semantic concept extractions with genetic algorithm

Vani K; Deepa Gupta

Abstract Plagiarism is increasingly becoming a major issue in the academic and educational domains. Automated and effective plagiarism detection systems are direly required to curtail this information breach, especially in tackling idea plagiarism. The proposed approach is aimed to detect such plagiarism cases, where the idea of a third party is adopted and presented intelligently so that at the surface level, plagiarism cannot be unmasked. The reported work aims to explore syntax-semantic concept extractions with genetic algorithm in detecting cases of idea plagiarism. The work mainly focuses on idea plagiarism where the source ideas are plagiarized and represented in a summarized form. Plagiarism detection is employed at both the document and passage levels by exploiting the document concepts at various structural levels. Initially, the idea embedded within the given source document is captured using sentence level concept extraction with genetic algorithm. Document level detection is facilitated with word-level concepts where syntactic information is extracted and the non-plagiarized documents are pruned. A combined similarity metric that utilizes the semantic level concept extraction is then employed for passage level detection. The proposed approach is tested on PAN13-14 1 plagiarism corpus for summary obfuscation data, which represents a challenging case of idea plagiarism. The performance of the current approach and its variations are evaluated both at the document and passage levels, using information retrieval and PAN plagiarism measures respectively. The results are also compared against six top ranked plagiarism detection systems submitted as a part of PAN13-14 competition. The results obtained are found to exhibit significant improvement over the compared systems and hence reflects the potency of the proposed syntax-semantic based concept extractions in detecting idea plagiarism.


international conference on contemporary computing | 2015

Exploring sentiment analysis on twitter data

Manju Venugopalan; Deepa Gupta

The growing popularity of microblogging websites has transformed these into rich resources for sentiment mining. Even though opinion mining has more than a decade of research to boost about, it is mostly confined to the exploration of formal text patterns like online reviews, news articles etc. Exploration of the challenges offered by informal and crisp microblogging have taken roots but there is scope for a large way ahead. The proposed work aims at developing a hybrid model for sentiment classification that explores the tweet specific features and uses domain independent and domain specific lexicons to offer a domain oriented approach and hence analyze and extract the consumer sentiment towards popular smart phone brands over the past few years. The experiments have proved that the results improve by around 2 points on an average over the unigram baseline.


intelligent systems design and applications | 2012

Analysis of multimodal time series data of robotic environment

G Radhakrishnan; Deepa Gupta; R. Abhishek; Ankita Ajith; T. S. B. Sudarshan

Autonomous mobile robots equipped with an array of sensors are being increasingly deployed in disaster environments to assist rescue teams. The sensors attached to the robots send multimodal time series data about the disaster environments which can be analyzed to extract useful information about the environment in which the robots are deployed. A set of data mining tasks that effectively cluster various robotic environments have been investigated. The effectiveness of these data mining techniques have been demonstrated using an available robotic dataset. The accuracy of the proposed technique has been measured using a manual reference cluster set.


advances in computing and communications | 2015

Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system

Vani K; Deepa Gupta

Plagiarism is an illicit act which has become a prime concern mainly in educational and research domains. This deceitful act is usually referred as an intellectual theft which has swiftly increased with the rapid technological developments and information accessibility. Thus the need for a system/ mechanism for efficient plagiarism detection is at its urgency. In this paper, an investigation of different combined similarity metrics for extrinsic plagiarism detection is done and it focuses on unfolding the importance of combined similarity metrics over the commonly used single metric usage in plagiarism detection task. Further the impact of utilizing part of speech tagging (POS) in the plagiarism detection model is analyzed. Different combinations of the four single metrics, Cosine similarity, Dice coefficient, Match coefficient and Fuzzy-Semantic measure is used with and without POS tag information. These systems are evaluated using PAN1 -2014 training and test data set and results are analyzed and compared using standard PAN measures, viz, recall, precision, granularity and plagdet_score.


international conference on contemporary computing | 2014

Using K-means cluster based techniques in external plagiarism detection

Vani K; Deepa Gupta

Text document categorization is one of the rapidly emerging research fields, where documents are identified, differentiated and classified manually or algorithmically. The paper focuses on application of automatic text document categorization in plagiarism detection domain. In todays world plagiarism has become a prime concern, especially in research and educational fields. This paper aims on the study and comparison of different methods of document categorization in external plagiarism detection. Here the primary focus is to explore the unsupervised document categorization/ clustering methods using different variations of K-means algorithm and compare it with the general N-gram based method and Vector Space Model based method. Finally the analysis and evaluation is done using data set from PAN-20131 and performance is compared based on precision, recall and efficiency in terms of time taken for algorithm execution.

Collaboration


Dive into the Deepa Gupta's collaboration.

Top Co-Authors

Avatar

Vani K

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

Sangita Khare

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

G Radhakrishnan

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

Niladri Chatterjee

Indian Institute of Technology Delhi

View shared research outputs
Top Co-Authors

Avatar

Manju Venugopalan

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Amalendu Jyotishi

Amrita Vishwa Vidyapeetham

View shared research outputs
Top Co-Authors

Avatar

G. Veena

Amrita Vishwa Vidyapeetham

View shared research outputs
Researchain Logo
Decentralizing Knowledge