Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sauptik Dhar is active.

Publication


Featured researches published by Sauptik Dhar.


IEEE Transactions on Neural Networks | 2011

Practical Conditions for Effectiveness of the Universum Learning

Vladimir Cherkassky; Sauptik Dhar; Wuyang Dai

Many applications of machine learning involve analysis of sparse high-dimensional data, in which the number of input features is larger than the number of data samples. Standard inductive learning methods may not be sufficient for such data, and this provides motivation for nonstandard learning settings. This paper investigates a new learning methodology called learning through contradictions or Universum support vector machine (U-SVM). U-SVM incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. This paper investigates possible advantages of U-SVM versus standard SVM, and describes the practical conditions necessary for the effectiveness of the U-SVM. These conditions are based on the analysis of the univariate histograms of projections of training samples onto the normal direction vector of (standard) SVM decision boundary. Several empirical comparisons are presented to illustrate the practical utility of the proposed approach.


IEEE Transactions on Systems, Man, and Cybernetics | 2015

Development and Evaluation of Cost-Sensitive Universum-SVM

Sauptik Dhar; Vladimir Cherkassky

Many machine learning applications involve analysis of high-dimensional data, where the number of input features is larger than/comparable to the number of data samples. Standard classification methods may not be sufficient for such data, and this provides motivation for nonstandard learning settings. One such new learning methodology is called learning through contradiction or Universum-support vector machine (U-SVM). Recent studies have shown U-SVM to be quite effective for sparse high-dimensional data sets. However, all these earlier studies have used balanced data sets with equal misclassification costs. This paper extends the U-SVM formulation to problems with different misclassification costs, and presents practical conditions for the effectiveness of this cost-sensitive U-SVM. Several empirical comparisons are presented to validate the proposed approach.


Archive | 2015

Interpretation of Black-Box Predictive Models

Vladimir Cherkassky; Sauptik Dhar

Many machine learning applications involve predictive data-analytic modeling using black-box techniques. A common problem in such studies is understanding/interpretation of estimated nonlinear high-dimensional models. Whereas human users naturally favor simple interpretable models, such models may not be practically feasible with modern adaptive methods such as Support Vector Machines (SVMs) , Multilayer Perceptron Networks (MLPs), AdaBoost , etc. This chapter provides a brief survey of the current techniques for visualization and interpretation of SVM-based classification models, and then highlights potential problems with such methods. We argue that, under the VC-theoretical framework, model interpretation cannot be achieved via technical analysis of predictive data-analytic models. That is, any meaningful interpretation should incorporate application domain knowledge outside data analysis. We also describe a simple graphical technique for visualization of SVM classification models.


international conference on machine learning and applications | 2012

Cost-Sensitive Universum-SVM

Sauptik Dhar; Vladimir Cherkassky

Many applications of machine learning involve analysis of sparse high-dimensional data, where the number of input features is larger than the number of data samples. Standard classification methods may not be sufficient for such data, and this provides motivation for non-standard learning settings. One such new learning methodology is called Learning through Contradictions or Universum support vector machine (U-SVM) [1, 2]. Recent studies [2-10] have shown U-SVM to be quite effective for such sparse high-dimensional data settings. However, these studies use balanced data sets with equal misclassification costs. This paper extends the U-SVM for problems with different misclassification costs, and presents practical conditions for the effectiveness of the cost sensitive U-SVM. Finally, several empirical comparisons are presented to illustrate the utility of the proposed approach.


international symposium on neural networks | 2017

Universum learning for SVM regression

Sauptik Dhar; Vladimir Cherkassky

This paper extends the idea of Universum learning to regression problems. We propose new Universum-SVM formulation for regression problems that incorporates a priori knowledge in the form of additional data samples. These additional data samples, or Universum samples, belong to the same application domain as the training samples, but they follow a different distribution. Several empirical comparisons are presented to illustrate the utility of the proposed approach.


international conference on big data | 2015

A data-driven approach towards patient identification for telehealth programs

Martha Ganser; Sauptik Dhar; Unmesh Kurup; Carlos Cunha; Aca Gacic

Telehealth provides an opportunity to reduce healthcare costs through remote patient monitoring, but is not appropriate for all individuals. Our goal was to identify the patients for whom telehealth has the greatest impact, as measured through cost savings and patient engagement. For prediction of cost savings, challenges included the high variability of medical costs and the effect of selection bias on the cost difference between intervention patients and controls. Using Medicare claims data, we computed cost savings by comparing each telehealth patient to a group of control patients who had similar healthcare resource utilization. These estimates were then used to train a predictive model using logistic regression. Filtering the patients based on the model resulted in an average cost savings of


international conference on big data | 2015

ADMM based scalable machine learning on Spark

Sauptik Dhar; Congrui Yi; Naveen Ramakrishnan; Mohak Shah

10K in the group of patients with the highest healthcare utilization, an improvement over the current expected loss of


international conference on machine learning and applications | 2015

Patient Identification for Telehealth Programs

Martha Ganser; Sauptik Dhar; Unmesh Kurup; Carlos Cunha; Aca Gacic

2K (without filtering). Groups of patients with lower healthcare utilization also showed improvement, though less pronounced. To identify highly engaged patients, we developed predictive models of telehealth compliance and of patient satisfaction. Performance of these models were generally poor, with an AUC ranging from 0.54 to 0.64.


international symposium on neural networks | 2011

Application of SOM to analysis of Minnesota soil survey data

Sauptik Dhar; Vladimir Cherkassky

Most machine learning algorithms involve solving a convex optimization problem. Traditional in-memory convex optimization solvers do not scale well with the increase in data. This paper identifies a generic convex problem for most machine learning algorithms and solves it using the Alternating Direction Method of Multipliers (ADMM). Finally such an ADMM problem transforms to an iterative system of linear equations, which can be easily solved at scale in a distributed fashion. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 1.3.


DMIN | 2010

Simple Method for Interpretation of High-Dimensional Nonlinear SVM Classification Models.

Vladimir Cherkassky; Sauptik Dhar

Telehealth provides an opportunity to reduce healthcare costs through remote patient monitoring, but is not appropriate for all individuals. Our goal was to identify the patients for whom telehealth has the greatest impact. Challenges included the high variability of medical costs and the effect of selection bias on the cost difference between intervention patients and controls. Using Medicare claims data, we computed cost savings by comparing each telehealth patient to a group of control patients who had similar healthcare resource utilization. These estimates were then used to train a predictive model using logistic regression. Filtering the patients based on the model resulted in an average cost savings of

Collaboration


Dive into the Sauptik Dhar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wuyang Dai

University of Minnesota

View shared research outputs
Researchain Logo
Decentralizing Knowledge