Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sachin Pawar is active.

Publication


Featured researches published by Sachin Pawar.


applications of natural language to data bases | 2016

Aspects from Appraisals!! A Label Propagation with Prior Induction Approach

Nitin Ramrakhiyani; Sachin Pawar; Girish Keshav Palshikar; Manoj Apte

Performance appraisal (PA) is an important Human Resources exercise conducted by most organizations. The text data generated during the PA process can be a source of valuable insights for management. As a new application area, analysis of a large PA dataset (100K sentences) of supervisor feedback text is carried out. As the first contribution, the paper redefines the notion of an aspect in the feedback text. Aspects in PA text are like activities characterized by verb-noun pairs. These activities vary dynamically from employee to employee (e.g. conduct training, improve coding) and can be challenging to identify than the static properties of products like a camera (e.g. price, battery life). Another important contribution of the paper is a novel enhancement to the Label Propagation (LP) algorithm to identify aspects from PA text. It involves induction of a prior distribution for each node and iterative identification of new aspects starting from a seed set. Evaluation using a manually labelled set of 500 verb-noun pairs suggests an improvement over multiple baselines.


european conference on information retrieval | 2018

Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets

Shashank Gupta; Manish Gupta; Vasudeva Varma; Sachin Pawar; Nitin Ramrakhiyani; Girish Keshav Palshikar

Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official reporting. Current state-of-the-art methods in ADR mention extraction use Recurrent Neural Networks (RNN), which typically need large labeled corpora. Towards this end, we propose a semi-supervised method based on co-training which can exploit a large pool of unlabeled tweets to augment the limited supervised training data, and as a result enhance the performance. Experiments with 0.1M tweets show that the proposed approach outperforms the state-of-the-art methods for the ADR mention extraction task by 5% in terms of F1 score.


BMC Bioinformatics | 2018

Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction

Shashank Gupta; Sachin Pawar; Nitin Ramrakhiyani; Girish Keshav Palshikar; Vasudeva Varma

BackgroundSocial media is a useful platform to share health-related information due to its vast reach. This makes it a good candidate for public-health monitoring tasks, specifically for pharmacovigilance. We study the problem of extraction of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from Twitter. Medical information extraction from social media is challenging, mainly due to short and highly informal nature of text, as compared to more technical and formal medical reports.MethodsCurrent methods in ADR mention extraction rely on supervised learning methods, which suffer from labeled data scarcity problem. The state-of-the-art method uses deep neural networks, specifically a class of Recurrent Neural Network (RNN) which is Long-Short-Term-Memory network (LSTM). Deep neural networks, due to their large number of free parameters rely heavily on large annotated corpora for learning the end task. But in the real-world, it is hard to get large labeled data, mainly due to the heavy cost associated with the manual annotation.ResultsTo this end, we propose a novel semi-supervised learning based RNN model, which can leverage unlabeled data also present in abundance on social media. Through experiments we demonstrate the effectiveness of our method, achieving state-of-the-art performance in ADR mention extraction.ConclusionIn this study, we tackle the problem of labeled data scarcity for Adverse Drug Reaction mention extraction from social media and propose a novel semi-supervised learning based method which can leverage large unlabeled corpus available in abundance on the web. Through empirical study, we demonstrate that our proposed method outperforms fully supervised learning based baseline which relies on large manually annotated corpus for a good performance.


applications of natural language to data bases | 2013

Unsupervised Gazette Creation Using Information Distance

Sangameshwar Patil; Sachin Pawar; Girish Keshav Palshikar; Savita Suhas Bhat; Rajiv Radheyshyam Srivastava

Named Entity extraction (NEX) problem consists of automatically constructing a gazette containing instances for each NE of interest. NEX is important for domains which lack a corpus with tagged NEs. In this paper, we propose a new unsupervised (bootstrapping) NEX technique, based on a new variant of the Multiword Expression Distance (MED)[1] and information distance [2]. Efficacy of our method is shown using comparison with BASILISK and PMI in agriculture domain. Our method discovered 8 new diseases which are not found in Wikipedia.


text, speech and dialogue | 2018

Identifying Participant Mentions and Resolving Their Coreferences in Legal Court Judgements

Ajay Gupta; Devendra Verma; Sachin Pawar; Sangameshwar Patil; Swapnil Hingmire; Girish Keshav Palshikar; Pushpak Bhattacharyya

Legal court judgements have multiple participants (e.g. judge, complainant, petitioner, lawyer, etc.). They may be referred to in multiple ways, e.g., the same person may be referred as lawyer, counsel, learned counsel, advocate, as well as his/her proper name. For any analysis of legal texts, it is important to resolve such multiple mentions which are coreferences of the same participant. In this paper, we propose a supervised approach to this challenging task. To avoid human annotation efforts for Legal domain data, we exploit ACE 2005 dataset by mapping its entities to participants in Legal domain. We use basic Transfer Learning paradigm by training classification models on general purpose text (news in ACE 2005 data) and applying them to Legal domain text. We evaluate our approach on a sample annotated test dataset in Legal domain and demonstrate that it outperforms state-of-the-art baselines.


european conference on information retrieval | 2018

Multi-task Learning for Extraction of Adverse Drug Reaction Mentions from Tweets

Shashank Gupta; Manish Gupta; Vasudeva Varma; Sachin Pawar; Nitin Ramrakhiyani; Girish Keshav Palshikar

Adverse drug reactions (ADRs) are one of the leading causes of mortality in health care. Current ADR surveillance systems are often associated with a substantial time lag before such events are officially published. On the other hand, online social media such as Twitter contain information about ADR events in real-time, much before any official reporting. Current state-of-the-art in ADR mention extraction uses Recurrent Neural Networks (RNN), which typically need large labeled corpora. Towards this end, we propose a multi-task learning based method which can utilize a similar auxiliary task (adverse drug event detection) to enhance the performance of the main task, i.e., ADR extraction. Furthermore, in the absence of auxiliary task dataset, we propose a novel joint multi-task learning method to automatically generate weak supervision dataset for the auxiliary task when a large pool of unlabeled tweets is available. Experiments with 0.48M tweets show that the proposed approach outperforms the state-of-the-art methods for the ADR mention extraction task by 7.2% in terms of F1 score.


applications of natural language to data bases | 2018

An Unsupervised Approach for Cause-Effect Relation Extraction from Biomedical Text

Raksha Sharma; Girish Keshav Palshikar; Sachin Pawar

Identification of Cause-effect (CE) relation mentions, along with the arguments, are crucial for creating a scientific knowledge-base. Linguistically complex constructs are used to express CE relations in text, mainly using generic causative (causal) verbs (cause, lead, result etc). We observe that some generic verbs have a domain-specific causative sense (inhibit, express) and some domains have altogether new causative verbs (down-regulate). Not every mention of a generic causative verb (e.g., lead) indicates a CE relation mention. We propose a linguistically-oriented unsupervised iterative co-discovery approach to identify domain-specific causative verbs, starting from a small set of seed causative verbs and an unlabeled corpus. We use known causative verbs to extract CE arguments, and use known CE arguments to discover causative verbs (hence co-discovery). Since causes and effects are typically agents, events, actions, or conditions, we use WordNet hypernym categories to identify suitable CE arguments. PMI is used to measure linguistic associations between a causative verb and its argument. Once we have a list of domain-specific causative verbs, we use it to extract CE relation mentions from a given corpus in an unsupervised manner, filtering out non-causative use of a causative verb using WordNet hypernym check of its arguments. Our approach extracts 256 domain-specific causative verbs from 10, 000 PubMed abstracts of Leukemia papers, and outperforms several baselines for extracting intra-sentence CE relation mentions.


ieee international conference on data science and advanced analytics | 2017

HiSPEED: A System for Mining Performance Appraisal Data and Text

Girish Keshav Palshikar; Manoj Apte; Sachin Pawar; Nitin Ramrakhiyani

Performance appraisal (PA) is a crucial HR process that enables an organization to periodically measure and evaluate every employee’s performance and also to drive performance improvements. In this paper, we describe a novel system called HiSPEED to analyze PA data using automated statistical, data mining and text mining techniques, to generate novel and actionable insights/patterns and to help in improving the quality and effectiveness of the PA process. The goal is to produce insights that can be used to answer (in part) the crucial “business questions” that HR executives and business leadership face in talent management. The business questions pertain to (1) improving the quality of the goal setting process, (2) improving the quality of the self-appraisal comments and supervisor feedback comments, (3) discovering high-quality supervisor suggestions for performance improvements, (4) discovering evidence provided by employees to support their self-assessments, (5) measuring the quality of supervisor assessments, (6) understanding the root causes of poor and exceptional performances, (7) detecting instances of personal and systemic biases and so forth. The paper discusses specially designed algorithms to answer these business questions and illustrates them by reporting the insights produced on a real-life PA dataset from a large multinational IT services organization.


ieee international conference on data science and advanced analytics | 2016

Role Models: Mining Role Transitions Data in IT Project Management

Girish Keshav Palshikar; Sachin Pawar; Nitin Ramrakhiyani

The notion of roles is crucial in project management across various domains. A role indicates a broad set of tasks, activities, deliverables and responsibilities that the person needs to carry out within a project. Assigning roles to team members clarifies the expectations of work items to be delivered by each and structures the interactions of the team among themselves as well as with external stakeholders. This paper analyzes a sizeable real-life dataset regarding the actual usage of roles in software development and maintenance projects in a large multinational IT organization. The paper introduces and formalizes concepts such as seniority level of a role, career progression and career lines, formulates various business questions related to role-based project management, proposes analytics techniques to answer them and outlines the actual results produced to answer the business questions. The business questions are related to dependencies between roles, patterns in role assignments and durations, predicting role changes, discovering insights useful for meeting career aspirations, interesting role sequences etc. The proposed analytics algorithms are based on Markov models, sequence mining, classification and survival analysis.


forum for information retrieval evaluation | 2013

A System for Classification of Propositions of the Indian Supreme Court Judgements

Nitin Ramrakhiyani; Sachin Pawar; Girish Keshav Palshikar

In this work, we describe a system for classification of propositions from legal judgements of the Supreme Court of India. The system was submitted for participation to the Information Access in the Legal Domain track at the Forum for Information Retrieval Evaluation (FIRE) 2013. The system uses a multi-class Maximum Entropy classifier and various specially designed features to capture the underlying characteristics of the legal propositions. The best performing feature set was chosen by 10-fold cross-validation over the training set. The system achieved an accuracy of 65.03% on the training set and an accuracy of 51.02% on the test set.

Collaboration


Dive into the Sachin Pawar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nitin Ramrakhiyani

Tata Research Development and Design Centre

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pushpak Bhattacharyya

Indian Institute of Technology Bombay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vasudeva Varma

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Manoj Apte

Tata Consultancy Services

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shashank Gupta

Indian Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Manish Gupta

International Institute of Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge