Is this you? Create Your Porfile

Parth Mehta

Dhirubhai Ambani Institute of Information and Communication Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Parth Mehta is active.

Explore More

Publication

Featured researches published by Parth Mehta.

meeting of the association for computational linguistics | 2016

From Extractive to Abstractive Summarization: A Journey

Parth Mehta

The availability of large documentsummary corpora have opened up new possibilities for using statistical text generation techniques for abstractive summarization. Progress in Extractive text summarization has become stagnant for a while now and in this work we compare the two possible alternates to it. We present an argument in favor of

Journal of Quantitative Linguistics | 2016

Large Scale Quantitative Analysis of three Indo-Aryan Languages

Parth Mehta; Prasenjit Majumder

Abstract In this paper, we present a thorough quantitative analysis of large scale media text of three Indo-Aryan languages, viz. Hindi, Gujarati and Bengali. Population wise they together amount to 600 million speakers. Understanding and processing media text is very important from sociological, cultural and information science/theoretic stand points. We did a detailed study to understand the statistical nature of these data. The study demonstrates effect of size and category of media text on term distributions. We establish that while higher order n-grams tend to follow Zipf’s law, the same is not always true for unigrams. We attempt to model the change in term distribution in two separate parts: effect on steepness of the term distribution and that on the tail of the term distribution. To the best of our knowledge this is the first exploratory study of these three languages on such a large scale.

Information Processing and Management | 2018

Effective aggregation of various summarization techniques

Parth Mehta; Prasenjit Majumder

Abstract A large number of extractive summarization techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what are the factors that actually affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarization system. In this work we examine the roles of three principle components of an extractive summarization technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than only one, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarization systems. A statistically significant improvement of about 5% to 10% in ROUGE-1 recall was achieved by aggregating various sentence similarity measures. As opposed to this aggregation of several ranking algorithms did not show a significant improvement in ROUGE score, but even in this case the resultant meta-systems were more robust than candidate systems. The results suggest that new extractive summarization techniques should particularly focus on defining a better sentence similarity metric and use multiple sentence similarity scores and ranking algorithms in favour of a particular combination.

european conference on information retrieval | 2018

Content Based Weighted Consensus Summarization

Parth Mehta; Prasenjit Majumder

Multi-document summarization has received a great deal of attention in the past couple of decades. Several approaches have been proposed, many of which perform equally well and it is becoming in- creasingly difficult to choose one particular system over another. An ensemble of such systems that is able to leverage the strengths of each individual systems can build a better and more robust summary. Despite this, few attempts have been made in this direction. In this paper, we describe a category of ensemble systems which use consensus between the candidate systems to build a better meta-summary. We highlight two major shortcomings of such systems: the inability to take into account relative performance of individual systems and overlooking content of candidate summaries in favour of the sentence rankings. We propose an alternate method, content-based weighted consensus summarization, which address these concerns. We use pseudo-relevant summaries to estimate the performance of individual candidate systems, and then use this information to generate a better aggregate ranking. Experiments on DUC 2003 and DUC 2004 datasets show that the proposed system outperforms existing consensus-based techniques by a large margin.

CLEF (Working Notes) | 2016

Author Masking through Translation.

Yashwant Keswani; Harsh Trivedi; Parth Mehta; Prasenjit Majumder

arXiv: Information Retrieval | 2018

Exploiting local and global performance of candidate systems for aggregation of summarization techniques

Parth Mehta; Prasenjit Majumder

arXiv: Information Retrieval | 2018

Attention based Sentence Extraction from Scientific Articles using Pseudo-Labeled data.

Parth Mehta; Gaurav Arora; Prasenjit Majumder

forum for information retrieval evaluation | 2016

Proceedings of the 8th annual meeting of the Forum on Information Retrieval Evaluation

Prasenjit Majumder; Mandar Mitra; Jainisha Sankhavara; Parth Mehta

forum for information retrieval evaluation | 2014

Proceedings of the 7th Forum for Information Retrieval Evaluation

Prasenjit Majumder; Mandar Mitra; Sukomal Pal; Madhulika Agrawal; Parth Mehta

international joint conference on natural language processing | 2013

Optimum Parameter Selection for K.L.D. Based Authorship Attribution in Gujarati

Parth Mehta; Prasenjit Majumder

Explore More

Collaboration

Dive into the Parth Mehta's collaboration.

Top Co-Authors

Prasenjit Majumder

Dhirubhai Ambani Institute of Information and Communication Technology

View shared research outputs

Top Co-Authors

Mandar Mitra

Indian Statistical Institute

View shared research outputs

Top Co-Authors

Madhulika Agrawal

Dhirubhai Ambani Institute of Information and Communication Technology

View shared research outputs

Top Co-Authors

Gaurav Arora

Indian Institute of Chemical Technology

View shared research outputs

Top Co-Authors

Harsh Trivedi

Dhirubhai Ambani Institute of Information and Communication Technology

View shared research outputs

Top Co-Authors

Sukomal Pal

Indian School of Mines

View shared research outputs

Explore More