Koustav Rudra
Indian Institute of Technology Kharagpur
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Koustav Rudra.
conference on information and knowledge management | 2015
Koustav Rudra; Subham Ghosh; Niloy Ganguly; Pawan Goyal; Saptarshi Ghosh
Microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information is available in these sites; however, this information is immersed among hundreds of thousands of tweets, mostly containing sentiments and opinion of the masses, that are posted during such events. To effectively utilize microblogging sites during disaster events, it is necessary to (i) extract the situational information from among the large amounts of sentiment and opinion, and (ii) summarize the situational information, to help decision-making processes when time is critical. In this paper, we develop a novel framework which first classifies tweets to extract situational information, and then summarizes the information. The proposed framework takes into consideration the typicalities pertaining to disaster events where (i) the same tweet often contains a mixture of situational and non-situational information, and (ii) certain numerical information, such as number of casualties, vary rapidly with time, and thus achieves superior performance compared to state-of-the-art tweet summarization approaches.
acm conference on hypertext | 2016
Koustav Rudra; Siddhartha Banerjee; Niloy Ganguly; Pawan Goyal; Muhammad Imran; Prasenjit Mitra
During mass convergence events such as natural disasters, microblogging platforms like Twitter are widely used by affected people to post situational awareness messages. These crisis-related messages disperse among multiple categories like infrastructure damage, information about missing, injured, and dead people etc. The challenge here is to extract important situational updates from these messages, assign them appropriate informational categories, and finally summarize big trove of information in each category. In this paper, we propose a novel framework which first assigns tweets into different situational classes and then summarize those tweets. In the summarization phase, we propose a two stage summarization framework which first extracts a set of important tweets from the whole set of information through an Integer-linear programming (ILP) based optimization technique and then follows a word graph and content word based abstractive summarization technique to produce the final summary. Our method is time and memory efficient and outperforms the baseline in terms of quality, coverage of events, locations et al., effectiveness, and utility in disaster scenarios.
communication systems and networks | 2015
Anirban Sen; Koustav Rudra; Saptarshi Ghosh
Microblogging sites such as Twitter and Weibo are increasingly being used to enhance situational awareness during various natural and man-made disaster events such as floods, earthquakes, and bomb blasts. During any such event, thousands of microblogs (tweets) are posted in short intervals of time. Typically, only a small fraction of these tweets contribute to situational awareness, while the majority merely reflect the sentiment or opinion of people. Real-time extraction of tweets that contribute to situational awareness is especially important for relief operations when time is critical. However, automatically differentiating such tweets from those that reflect opinion / sentiment is a non-trivial challenge, mainly because of the very small size of tweets and the informal way in which tweets are written (frequent use of emoticons, abbreviations, and so on). This study applies Natural Language Processing (NLP) techniques to address this challenge. We extract low-level syntactic features from the text of tweets, such as the presence of specific types of words and parts-of-speech, to develop a classifier to distinguish between tweets which contribute to situational awareness and tweets which do not. Experiments over tweets related to four diverse disaster events show that the proposed features identify situational awareness tweets with significantly higher accuracy than classifiers based on standard bag-of-words models.
empirical methods in natural language processing | 2016
Koustav Rudra; Shruti Rijhwani; Rafiya Begum; Kalika Bali; Monojit Choudhury; Niloy Ganguly
Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment (Dewaele, 2010). Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, specifically Hindi-English bilinguals, to understand the language of preference, if any, for expressing opinion and sentiment. To this end, we develop classifiers for opinion detection in these languages, and further classifying opinionated tweets into positive, negative and neutral sentiments. Our study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing. As an aside, we explore some common pragmatic functions of code-switching through sentiment detection.
advances in social networks analysis and mining | 2016
Koustav Rudra; Ashish Sharma; Niloy Ganguly; Saptarshi Ghosh
Millions of microblogs are posted during disasters, which include not only information about the present situation, but also the emotions / opinions of the masses. While most of the prior research has been on extracting situational information, this work focuses on a particular type of non-situational tweets - communal tweets, i.e., abusive posts targeting specific religious / racial groups. We characterize the communal tweets posted during five recent disaster events, and the users who posted such tweets. We find that communal tweets are posted not only by common users, but also by many popular users (having tens of thousands of followers), most of whom are related to the media and politics. As a result, communal tweets get much higher exposure (retweets) than non-communal tweets. Further, users posting communal tweets form strong connected groups in the social network. Considering the potentially adverse effects of communal tweets during disasters, we also indicate a way to counter such tweets, by utilizing anti-communal tweets posted by some users during such events.
Information Systems Frontiers | 2018
Koustav Rudra; Ashish Sharma; Niloy Ganguly; Muhammad Imran
During a new disease outbreak, frustration and uncertainties among affected and vulnerable population increase. Affected communities look for known symptoms, prevention measures, and treatment strategies. On the other hand, health organizations try to get situational updates to assess the severity of the outbreak, known affected cases, and other details. Recent emergence of social media platforms such as Twitter provide convenient ways and fast access to disseminate and consume information to/from a wider audience. Research studies have shown potential of this online information to address information needs of concerned authorities during outbreaks, epidemics, and pandemics. In this work, we target three types of end-users (i) vulnerable population—people who are not yet affected and are looking for prevention related information (ii) affected population—people who are affected and looking for treatment related information, and (iii) health organizations—like WHO, who are interested in gaining situational awareness to make timely decisions. We use Twitter data from two recent outbreaks (Ebola and MERS) to build an automatic classification approach useful to categorize tweets into different disease related categories. Moreover, the classified messages are used to generate different kinds of summaries useful for affected and vulnerable communities as well as health organizations. Results obtained from extensive experimentation show the effectiveness of the proposed approach.
international conference on digital health | 2017
Koustav Rudra; Ashish Sharma; Niloy Ganguly; Muhammad Imran
At the outbreak of an epidemic, affected communities want/need to get aware of disease symptoms, preventive measures, and treatment strategies. On the other hand, health organizations try to get situational updates to assess the severity of the outbreak, known affected cases, and other details. Recent emergence of social media platforms such as Twitter provide convenient ways and fast access to disseminate and consume information to/from a wider audience. Research studies have shown potential of this online information to address information needs of concerned authorities during outbreaks, epidemics, and pandemics. In this work, we target three communities (i) people who are not affected yet and are looking for prevention-related information (ii) people who are affected and looking for treatment-related information, and (iii) health organizations like WHO, who are interested in gaining situational awareness to make timely decisions. We use Twitter data from two recent outbreaks (Ebola and MERS) to built an automatic classification approach using low level lexical features which are useful to categorize tweets into different disease-related categories.
pacific-asia conference on knowledge discovery and data mining | 2015
Koustav Rudra; Abhijnan Chakraborty; Manav Sethi; Shreyasi Das; Niloy Ganguly; Saptarshi Ghosh
To help users find popular topics of discussion, Twitter periodically publishes ‘trending topics’ (trends) which are the most discussed keywords (e.g., hashtags) at a certain point of time. Inspection of the trends over several months reveals that while most of the trends are related to events in the off-line world, such as popular television shows, sports events, or emerging technologies, a significant fraction are not related to any topic / event in the off-line world. Such trends are usually known as idioms, examples being #4WordsBeforeBreakup, #10thingsIHateAboutYou etc. We perform the first systematic measurement study on Twitter idioms. We find that tweets related to a particular idiom normally do not cluster around any particular topic or event. There are a set of users in Twitter who predominantly discuss idioms – common, not-so-popular, but active users who mostly use Twitter as a conversational platform – as opposed to other users who primarily discuss topical contents. The implication of these findings is that within a single online social network, activities of users may have very different semantics; thus, tasks like community detection and recommendation may not be accomplished perfectly using a single universal algorithm. Specifically, we run two (link-based and content-based) algorithms for community detection on the Twitter social network, and show that idiom oriented users get clustered better in one while topical users in the other. Finally, we build a novel service which shows trending idioms and recommends idiom users to follow.
international conference on mining intelligence and knowledge exploration | 2013
Manjira Sinha; Koustav Rudra; Tirthankar Dasgupta; Anupam Basu
Sentence comprehension is an integral and important part of whole text comprehension. It involves complex cognitive actions, as a reader has to work through lexical, syntactic and semantic aspects in order to understand a sentence. One of the vital features of a sentence is word order or surface forms. Different languages have evolved different systems of word orders, which reflect the cognitive structure of the native users of that language. Therefore, word order affects the cognitive load exerted by a sentence as experienced by the reader. Computational modeling approach to quantify the effect of word order on difficulty of sentence understanding can provide a great advantage in study of text readability and its applications. Handful of works has done in English and other languages to address the issue. Bangla, which is the fifth mostly spoken languages in the world and a relatively free word order language, still does not have any computational model to quantify the reading difficulty of a sentence. In this paper, we have developed models to predict the comprehending difficulty of a simple sentence according to its different surface forms in Bangla. In the course of action, we have also established that difficulty measures for English do not hold in Bangla. Our model has been validated against an extensive user survey.
international acm sigir conference on research and development in information retrieval | 2018
Koustav Rudra; Pawan Goyal; Niloy Ganguly; Prasenjit Mitra; Muhammad Imran
In recent times, humanitarian organizations increasingly rely on social media to search for information useful for disaster response. These organizations have varying information needs ranging from general situational awareness (i.e., to understand a bigger picture) to focused information needs e.g., about infrastructure damage, urgent needs of affected people. This research proposes a novel approach to help crisis responders fulfill their information needs at different levels of granularities. Specifically, the proposed approach presents simple algorithms to identify sub-events and generate summaries of big volume of messages around those events using an Integer Linear Programming (ILP) technique. Extensive evaluation on a large set of real world Twitter dataset shows (a). our algorithm can identify important sub-events with high recall (b). the summarization scheme shows (6---30%) higher accuracy of our system compared to many other state-of-the-art techniques. The simplicity of the algorithms ensures that the entire task is done in real time which is needed for practical deployment of the system.