Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Parth Mehta.
meeting of the association for computational linguistics | 2016
Parth Mehta
The availability of large documentsummary corpora have opened up new possibilities for using statistical text generation techniques for abstractive summarization. Progress in Extractive text summarization has become stagnant for a while now and in this work we compare the two possible alternates to it. We present an argument in favor of
Journal of Quantitative Linguistics | 2016
Parth Mehta; Prasenjit Majumder
Abstract In this paper, we present a thorough quantitative analysis of large scale media text of three Indo-Aryan languages, viz. Hindi, Gujarati and Bengali. Population wise they together amount to 600 million speakers. Understanding and processing media text is very important from sociological, cultural and information science/theoretic stand points. We did a detailed study to understand the statistical nature of these data. The study demonstrates effect of size and category of media text on term distributions. We establish that while higher order n-grams tend to follow Zipf’s law, the same is not always true for unigrams. We attempt to model the change in term distribution in two separate parts: effect on steepness of the term distribution and that on the tail of the term distribution. To the best of our knowledge this is the first exploratory study of these three languages on such a large scale.
Information Processing and Management | 2018
Parth Mehta; Prasenjit Majumder
Abstract A large number of extractive summarization techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what are the factors that actually affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarization system. In this work we examine the roles of three principle components of an extractive summarization technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than only one, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarization systems. A statistically significant improvement of about 5% to 10% in ROUGE-1 recall was achieved by aggregating various sentence similarity measures. As opposed to this aggregation of several ranking algorithms did not show a significant improvement in ROUGE score, but even in this case the resultant meta-systems were more robust than candidate systems. The results suggest that new extractive summarization techniques should particularly focus on defining a better sentence similarity metric and use multiple sentence similarity scores and ranking algorithms in favour of a particular combination.
european conference on information retrieval | 2018
Parth Mehta; Prasenjit Majumder
Multi-document summarization has received a great deal of attention in the past couple of decades. Several approaches have been proposed, many of which perform equally well and it is becoming in- creasingly difficult to choose one particular system over another. An ensemble of such systems that is able to leverage the strengths of each individual systems can build a better and more robust summary. Despite this, few attempts have been made in this direction. In this paper, we describe a category of ensemble systems which use consensus between the candidate systems to build a better meta-summary. We highlight two major shortcomings of such systems: the inability to take into account relative performance of individual systems and overlooking content of candidate summaries in favour of the sentence rankings. We propose an alternate method, content-based weighted consensus summarization, which address these concerns. We use pseudo-relevant summaries to estimate the performance of individual candidate systems, and then use this information to generate a better aggregate ranking. Experiments on DUC 2003 and DUC 2004 datasets show that the proposed system outperforms existing consensus-based techniques by a large margin.
CLEF (Working Notes) | 2016
Yashwant Keswani; Harsh Trivedi; Parth Mehta; Prasenjit Majumder
arXiv: Information Retrieval | 2018
Parth Mehta; Prasenjit Majumder
arXiv: Information Retrieval | 2018
Parth Mehta; Gaurav Arora; Prasenjit Majumder
forum for information retrieval evaluation | 2016
Prasenjit Majumder; Mandar Mitra; Jainisha Sankhavara; Parth Mehta
forum for information retrieval evaluation | 2014
Prasenjit Majumder; Mandar Mitra; Sukomal Pal; Madhulika Agrawal; Parth Mehta
international joint conference on natural language processing | 2013
Parth Mehta; Prasenjit Majumder
Collaboration
Dive into the Parth Mehta's collaboration.
Dhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputsDhirubhai Ambani Institute of Information and Communication Technology
View shared research outputs