Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shantanu Godbole is active.

Publication


Featured researches published by Shantanu Godbole.


conference on information and knowledge management | 2010

Building re-usable dictionary repositories for real-world text mining

Shantanu Godbole; Indrajit Bhattacharya; Ajay Gupta; Ashish Verma

Text mining, though still a nascent industry, has been growing quickly along with the awareness of the importance of unstructured data in business analytics, customer retention and extension, social media, and legal applications. There has been a recent increase in the number of commercial text mining product and service offerings, but successful or wide-spread deployments are rare, mainly due to a dependence on the expertise and skill of practitioners. Accordingly, there is a growing need for re-usable repositories for text mining. In this paper, we focus on dictionary-based text mining and its role in enabling practitioners in understanding and analyzing large text datasets. We motivate and define the problem of exploratory dictionary construction for capturing concepts of interest, and propose a framework for efficient construction, tuning, and re-use of these dictionaries across datasets. The construction framework offers a range of interaction modes to the user to quickly build concept dictionaries over large datasets. We also show how to adapt one or more dictionaries across domains and tasks, thereby enabling reuse of knowledge and effort in industrial practice. We present results and case studies on real-life CRM analytics datasets, where such repositories and tooling significantly cut down practitioner time and effort for dictionary-based text mining.


international conference on data engineering | 2009

Business Intelligence from Voice of Customer

L. Venkata Subramaniam; Tanveer A. Faruquie; Shajith Ikbal; Shantanu Godbole; Mukesh K. Mohania

In this paper, we present a first of a kind system, called Business Intelligence from Voice of Customer (BIVoC), that can: 1) combine unstructured information and structured information in an information intensive enterprise and 2) derive richer business insights from the combined data. Unstructured information, in this paper, refers to Voice of Customer (VoC) obtained from interaction of customer with enterprise namely, conversation with call-center agents, email, and sms. Structured database reflect only those business variables that are static over (a longer window of) time such as, educational qualification, age group, and employment details. In contrast, a combination of unstructured and structured data provide access to business variables that reflect upto date dynamic requirements of the customers and more importantly indicate trends that are difficult to derive from a larger population of customers through any other means. For example, some of the variables reflected in unstructured data are problem/interest in a certain product, expression of dissatisfaction with the business provided, and some unexplored category of people showing certain interest/problem. This gives the BIVoC system the ability to derive business insights that are richer, more valuable and crucial to the enterprises than the traditional business intelligence systems which utilize onlystructured information. We demostrate the effectiveness of BIVoC system through one of our real-life engagements where the problem is to determine how to improve agent productivity in a call center scenario. We also highlight major challenges faced while dealing with unstructured information such as handling noise and linking with structured data.


knowledge discovery and data mining | 2008

Text classification, business intelligence, and interactivity: automating C-Sat analysis for services industry

Shantanu Godbole; Shourya Roy

Text classification has matured as a research discipline over the last decade. Independently, business intelligence over structured databases has long been a source of insights for enterprises. In this work, we bring the two together for Customer Satisfaction(C-Sat) analysis in the services industry. We present ITACS, a solution combining text classification and business intelligence integrated with a novel interactive text labeling interface. ITACS has been deployed in multiple client accounts in contact centers. It can be extended to any services industry setting to analyze unstructured text data and derive operational and business insights. We highlight importance of interactivity in real-life text classification settings. We bring out some unique research challenges about label-sets, measuring accuracy, and interpretability that need serious attention in both academic and industrial research. We recount invaluable experiences and lessons learned as data mining researchers working toward seeing research technology deployed in the services industry.


knowledge discovery and data mining | 2008

Structured entity identification and document categorization: two tasks with one joint model

Indrajit Bhattacharya; Shantanu Godbole; Sachindra Joshi

Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two tasks have much to gain from each other. Apart from direct references to entities in a database, such as names of person entities, documents often also contain words that are correlated with discriminative entity attributes, such age-group and income-level of persons. This happens naturally in many enterprise domains such as CRM, Banking, etc. Then, entity identification, which is typically vulnerable against noise and incompleteness in direct references to entities in documents, can benefit from document categorization with respect to such attributes. In return, entity identification enables documents to be categorized according to different label-sets arising from entity attributes without requiring any supervision. In this paper, we propose a probabilistic generative model for joint entity identification and document categorization. We show how the parameters of the model can be estimated using an EM algorithm in an unsupervised fashion. Using extensive experiments over real and semi-synthetic data, we demonstrate that the two tasks can benefit immensely from each other when performed jointly using the proposed model.


ieee international conference on services computing | 2008

Text to Intelligence: Building and Deploying a Text Mining Solution in the Services Industry for Customer Satisfaction Analysis

Shantanu Godbole; Shourya Roy

We present our experiences in building and deploying a text mining solution in services industry settings, specifically in contact centers. We describe the voice of customer (VoC) and customer satisfaction (C-Sat) analysis settings and outline several unique research challenges brought about by this confluence of text mining and industrial services research. We describe our system for integrated text classification, business intelligence and interactive text labeling for C-Sat analysis. We recount invaluable lessons learned as computer science researchers in services research engagements. The system has been deployed in multiple accounts in contact centers and can be extended to any industrial CRM service practice to analyze unstructured text data.


european conference on machine learning | 2010

A cluster-level semi-supervision model for interactive clustering

Avinava Dubey; Indrajit Bhattacharya; Shantanu Godbole

Semi-supervised clustering models, that incorporate user provided constraints to yield meaningful clusters, have recently become a popular area of research. In this paper, we propose a cluster-level semi-supervision model for inter-active clustering. Prototype based clustering algorithms typically alternate between updating cluster descriptions and assignment of data items to clusters. In our model, the user provides semi-supervision directly for these two steps. Assignment feedback re-assigns data items among existing clusters, while cluster description feedback helps to position existing cluster centers more meaningfully. We argue that providing such supervision is more natural for exploratory data mining, where the user discovers and interprets clusters as the algorithm progresses, in comparison to the pair-wise instance level supervision model, particularly for high dimensional data such as document collection. We show how such feedback can be interpreted as constraints and incorporated within the kmeans clustering framework. Using experimental results on multiple real-world datasets, we show that this framework improves clustering performance significantly beyond traditional k-means. Interestingly, when given the same number of feedbacks from the user, the proposed framework significantly outperforms the pair-wise supervision model.


international conference on data mining | 2009

Cross-Guided Clustering: Transfer of Relevant Supervision across Domains for Improved Clustering

Indrajit Bhattacharya; Shantanu Godbole; Sachindra Joshi; Ashish Verma

Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred to a clustering task in a target domain, by providing a relevant supervised partitioning of a dataset from a different source domain. The target clustering is made more meaningful for the human user by trading off intrinsic clustering goodness on the target dataset for alignment with relevant supervised partitions in the source dataset, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-domain similarity measure that discovers hidden relationships across domains with potentially different vocabularies. Using multiple real-world datasets, we show that our approach improves clustering accuracy significantly over traditional k-means.


knowledge discovery and data mining | 2008

An integrated system for automatic customer satisfaction analysis in the services industry

Shantanu Godbole; Shourya Roy

Text classification has matured well as a research discipline over the years. At the same time, business intelligence over databases has long been a source of insights for enterprises. With the growing importance of the services industry, customer relationship management and contact center operations have become very important. Specifically, the voice of the customer and customer satisfaction (C-Sat) have emerged as invaluable sources of insights about how an enterprises products and services are percieved by customers. In this demonstration, we present the IBM Technology to Automate Customer Satisfaction analysis (ITACS) system that combines text classification technology, and a business intelligence solution along with an interactive document labeling interface for automating C-Sat analysis. This system has been successfully deployed in client accounts in large contact centers and can be extended to any services industry setting for analyzing unstructured text data. This demonstration will highlight the importance of intervention and interactivity in real-world text classification settings. We will point out unique research challenges in this domain regarding label-sets, measuring accuracy, and interpretability of results and we will discuss solutions and open questions.


ACM Transactions on Knowledge Discovery From Data | 2012

Cross-Guided Clustering: Transfer of Relevant Supervision across Tasks

Indrajit Bhattacharya; Shantanu Godbole; Sachindra Joshi; Ashish Verma

Lack of supervision in clustering algorithms often leads to clusters that are not useful or interesting to human reviewers. We investigate if supervision can be automatically transferred for clustering a target task, by providing a relevant supervised partitioning of a dataset from a different source task. The target clustering is made more meaningful for the human user by trading-off intrinsic clustering goodness on the target task for alignment with relevant supervised partitions in the source task, wherever possible. We propose a cross-guided clustering algorithm that builds on traditional k-means by aligning the target clusters with source partitions. The alignment process makes use of a cross-task similarity measure that discovers hidden relationships across tasks. When the source and target tasks correspond to different domains with potentially different vocabularies, we propose a projection approach using pivot vocabularies for the cross-domain similarity measure. Using multiple real-world and synthetic datasets, we show that our approach improves clustering accuracy significantly over traditional k-means and state-of-the-art semi-supervised clustering baselines, over a wide range of data characteristics and parameter settings.


acm conference on hypertext | 2007

Toward interactive learning by concept ordering

Shantanu Godbole; Sachindra Joshi; Sameep Mehta; Ganesh Ramakrishnan

In this paper we present a visual education tool for efficient and effective learning. The toolkit is based on a simple premise: simple concepts should be learned before advanced ones. We propose algorithms to automatically capture such pre-requisite dependence relationships between concepts. We extract concept definitions from the webs hyperlinked environment and create a concept graph arranged in a hierarchical structure and presented to the user in an interactive fashion. Thereafter, the user guides the learning process in a hyperlinked environment, by selecting a target concept, exploring the associated learning graph, learning pre-requisite concepts, and repeating this process till her learning goal is reached. To measure usefulness and correctness of our approach, we conducted a user study with 25 users using precision and recall measures. Overall, the feedback from users was encouraging. We believe this is a positive step toward building user driven interactive learning systems.

Researchain Logo
Decentralizing Knowledge