Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pawan Goyal is active.

Publication


Featured researches published by Pawan Goyal.


conference on information and knowledge management | 2015

Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach

Koustav Rudra; Subham Ghosh; Niloy Ganguly; Pawan Goyal; Saptarshi Ghosh

Microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however, this information is immersed among hundreds of thousands of tweets, mostly containing sentiments and opinion of the masses, that are posted during such events. To effectively utilize microblogging sites during disaster events, it is necessary to (i) extract the situational information from among the large amounts of sentiment and opinion, and (ii) summarize the situational information, to help decision-making processes when time is critical. In this paper, we develop a novel framework which first classifies tweets to extract situational information, and then summarizes the information. The proposed framework takes into consideration the typicalities pertaining to disaster events where (i) the same tweet often contains a mixture of situational and non-situational information, and (ii) certain numerical information, such as number of casualties, vary rapidly with time, and thus achieves superior performance compared to state-of-the-art tweet summarization approaches.


acm/ieee joint conference on digital libraries | 2014

Towards a stratified learning approach to predict future citation counts

Tanmoy Chakraborty; Suhansanu Kumar; Pawan Goyal; Niloy Ganguly; Animesh Mukherjee

In this paper, we study the problem of predicting future citation count of a scientific article after a given time interval of its publication. To this end, we gather and conduct an exhaustive analysis on a dataset of more than 1.5 million scientific papers of computer science domain. On analysis of the dataset, we notice that the citation count of the articles over the years follows a diverse set of patterns; on closer inspection we identify six broad categories of citation patterns. This important observation motivates us to adopt stratified learning approach in the prediction task, whereby, we propose a two-stage prediction model - in the first stage, the model maps a query paper into one of the six categories, and then in the second stage a regression module is run only on the subpopulation corresponding to that category to predict the future citation count of the query paper. Experimental results show that the categorization of this huge dataset during the training phase leads to a remarkable improvement (around 50%) in comparison to the well-known baseline system.


IEEE Transactions on Knowledge and Data Engineering | 2013

A Context-Based Word Indexing Model for Document Summarization

Pawan Goyal; Laxmidhar Behera; Tm McGinnity

Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods [1].


acm conference on hypertext | 2016

Summarizing Situational Tweets in Crisis Scenario

Koustav Rudra; Siddhartha Banerjee; Niloy Ganguly; Pawan Goyal; Muhammad Imran; Prasenjit Mitra

During mass convergence events such as natural disasters, microblogging platforms like Twitter are widely used by affected people to post situational awareness messages. These crisis-related messages disperse among multiple categories like infrastructure damage, information about missing, injured, and dead people etc. The challenge here is to extract important situational updates from these messages, assign them appropriate informational categories, and finally summarize big trove of information in each category. In this paper, we propose a novel framework which first assigns tweets into different situational classes and then summarize those tweets. In the summarization phase, we propose a two stage summarization framework which first extracts a set of important tweets from the whole set of information through an Integer-linear programming (ILP) based optimization technique and then follows a word graph and content word based abstractive summarization technique to produce the final summary. Our method is time and memory efficient and outperforms the baseline in terms of quality, coverage of events, locations et al., effectiveness, and utility in disaster scenarios.


Proceedings of the 3rd International Symposium on Sanskrit Computational Linguistics | 2008

Translation Divergence in English-Sanskrit-Hindi Language Pairs

Pawan Goyal; R. Mahesh K. Sinha

The development of a machine translation system needs that we identify the patterns of divergence between two languages. Though a number of MT developers have given attention to this problem, it is difficult to derive general strategies which can be used for any language pair. Therefore, further exploration is always needed to identify different sources of translation divergence in different pairs of translation languages. In this paper, we discuss translation pattern between English-Sanskrit and Hindi-Sanskrit of various constructions to identify the divergence in English-Sanskrit-Hindi language pairs. This will enable us to come up with strategies to handle these situations and coming up with correct translation. The base has been the classification of translation divergence presented by Dorr [Dorr, 1994].


Natural Language Engineering | 2015

An automatic approach to identify word sense changes in text media across timescales

Sunny Mitra; Ritwik Mitra; Suman Kalyan Maity; Martin Riedl; Chris Biemann; Pawan Goyal; Animesh Mukherjee

In this paper, we propose an unsupervised and automated method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books and millions of tweets posted per day. We construct distributional-thesauribased networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we propose a split/join based approach to compare the sense clusters at two different time points to find if there is ‘birth’ of a new sense. The approach also helps us to find if an older sense was ‘split’ into more than one sense or a newer sense has been formed from the ‘join’ of older senses or a particular sense has undergone ‘death’. We use this completely unsupervised approach (a) within the Google books data to identify word sense differences within a media, and (b) across Google books and Twitter data to identify differences in word sense distribution across different media. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet.


Iete Technical Review | 2008

Application of Bayesian Framework In Natural Language Understanding

Pawan Goyal; Laxmidhar Behera; Tm McGinnity

Abstract A natural language understanding (NLU) system has to handle a large amount of data. A graphical model serves as an advantageous tool for data analysis encoding the dependencies among variables and learning causal relationships. Over the last two decades, the Bayesian network has become a popular representation for encoding uncertain expert knowledge in expert systems. It is an ideal representation for combining prior knowledge; it avoids overfitting of data. Efficient algorithms have been developed for learning Bayesian networks from data, allowing Bayesian networks to be applied to a wide category of problems. In this paper, we give a comprehensive and state-of the-art introduction to the application of Bayesian networks in different aspects of an NLU system, with emphasis on information retrieval. The extensions and variants of Bayesian networks applied to NLU problems have been described. Examples of application examples are given, in order to illustrate the use of Bayesian networks.


conference on information and knowledge management | 2015

The Role Of Citation Context In Predicting Long-Term Citation Profiles: An Experimental Study Based On A Massive Bibliographic Text Dataset

Mayank Singh; Vikas Patidar; Suhansanu Kumar; Tanmoy Chakraborty; Animesh Mukherjee; Pawan Goyal

The impact and significance of a scientific publication is measured mostly by the number of citations it accumulates over the years. Early prediction of the citation profile of research articles is a significant as well as challenging problem. In this paper, we argue that features gathered from the citation contexts of the research papers can be very relevant for citation prediction. Analyzing a massive dataset of nearly 1.5 million computer science articles and more than 26 million citation contexts, we show that average countX (number of times a paper is cited within the same article) and average citeWords (number of words within the citation context) discriminate between various citation ranges as well as citation categories. We use these features in a stratified learning framework for future citation prediction. Experimental results show that the proposed model significantly outperforms the existing citation prediction models by a margin of 8-10% on an average under various experimental settings. Specifically, the features derived from the citation context help in predicting long-term citation behavior.


Information Retrieval | 2013

A novel neighborhood based document smoothing model for information retrieval

Pawan Goyal; Laxmidhar Behera; Tm McGinnity

In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.


IEEE Internet Computing | 1999

Integration of call signaling and resource management for IP telephony

Pawan Goyal; Albert G. Greenberg; Charles Robert Kalmanek; William Todd Marshall; Partho Pratim Mishra; Doug Nortz; K. K. Ramakrishnan

One challenge to realizing a robust Internet Telephony service is the need to integrate resource management with a call signaling architecture. Also important in deploying the service is the need to preserve user privacy and security. The Distributed Open Signaling Architecture meets these requirements. DOSA establishes a framework for coordination between call signaling, which controls access to telephony-specific services, and resource management, which controls access to network-layer resources. The authors describe the architecture and an implementation approach that has proved viable in detailed simulations from real usage data.

Collaboration


Dive into the Pawan Goyal's collaboration.

Top Co-Authors

Avatar

Animesh Mukherjee

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Mayank Singh

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Niloy Ganguly

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Amrith Krishna

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Laxmidhar Behera

Indian Institute of Technology Kanpur

View shared research outputs
Top Co-Authors

Avatar

Suman Kalyan Maity

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Tanmoy Chakraborty

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Koustav Rudra

Indian Institute of Technology Kharagpur

View shared research outputs
Top Co-Authors

Avatar

Tm McGinnity

Nottingham Trent University

View shared research outputs
Top Co-Authors

Avatar

Muhammad Imran

Qatar Computing Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge