Budhaditya Saha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Budhaditya Saha is active.

Explore More

Publication

Featured researches published by Budhaditya Saha.

international conference on data mining | 2007

Infrequent Item Mining in Multiple Data Streams

Budhaditya Saha; Mihai Lazarescu; Svetha Venkatesh

The problem of extracting infrequent patterns from streams and building associations between these patterns is becoming increasingly relevant today as many events of interest such as attacks in network data or unusual stories in news data occur rarely. The complexity of the prob- lem is compounded when a system is required to deal with data from multiple streams. To address these problems, we present a framework that combines the time based associa- tion mining with a pyramidal structure that allows a rolling analysis of the stream and maintains a synopsis of the data without requiring increasing memory resources. We apply the algorithms and show the usefulness of the techniques.

Knowledge and Information Systems | 2016

Multiple task transfer learning with small sample sizes

Budhaditya Saha; Sunil Kumar Gupta; Dinh Q. Phung; Svetha Venkatesh

Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning.

Knowledge and Information Systems | 2013

Detection of cross-channel anomalies

Duc-Son Pham; Budhaditya Saha; Dinh Q. Phung; Svetha Venkatesh

The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common among the individual channel anomalies and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single-channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large-scale data stream analysis.

Knowledge and Information Systems | 2016

A new transfer learning framework with application to model-agnostic multi-task learning

Sunil Kumar Gupta; Santu Rana; Budhaditya Saha; Dinh Q. Phung; Svetha Venkatesh

Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner’s choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

pacific-asia conference on knowledge discovery and data mining | 2015

Prediciton of Emergency Events: A Multi-Task Multi-Label Learning Approach

Budhaditya Saha; Sunil Kumar Gupta; Svetha Venkatesh

Prediction of patient outcomes is critical to plan resources in an hospital emergency department. We present a method to exploit longitudinal data from Electronic Medical Records (EMR), whilst exploiting multiple patient outcomes. We divide the EMR data into segments where each segment is a task, and all tasks are associated with multiple patient outcomes over a 3, 6 and 12 month period. We propose a model that learns a prediction function for each task-label pair, interacting through two subspaces: the first subspace is used to impose sharing across all tasks for a given label. The second subspace captures the task-specific variations and is shared across all the labels for a given task. The proposed model is formulated as an iterative optimization problems and solved using a scalable and efficient Block co-ordinate descent (BCD) method. We apply the proposed model on two hospital cohorts - Cancer and Acute Myocardial Infarction (AMI) patients collected over a two year period from a large hospital emergency department. We show that the predictive performance of our proposed models is significantly better than those of several state-of-the-art multi-task and multi-label learning methods.

pacific-asia conference on knowledge discovery and data mining | 2013

Clustering Patient Medical Records via Sparse Subspace Representation

Budhaditya Saha; Duc-Son Pham; Dinh Q. Phung; Svetha Venkatesh

The health industry is facing increasing challenge with “big data” as traditional methods fail to manage the scale and complexity. This paper examines clustering of patient records for chronic diseases to facilitate a better construction of care plans. We solve this problem under the framework of subspace clustering. Our novel contribution lies in the exploitation of sparse representation to discover subspaces automatically and a domain-specific construction of weighting matrices for patient records. We show the new formulation is readily solved by extending existing l1-regularized optimization algorithms. Using a cohort of both diabetes and stroke data we show that we outperform existing benchmark clustering techniques in the literature.

IEEE Journal of Biomedical and Health Informatics | 2016

A Framework for Classifying Online Mental Health-Related Communities With an Interest in Depression

Budhaditya Saha; Thin Nguyen; Dinh Q. Phung; Svetha Venkatesh

Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620 000 posts made by 80 000 users in 247 online communities. We have extracted the topics and psycholinguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modeling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines.

international conference on data mining | 2012

Sparse Subspace Representation for Spectral Document Clustering

Budhaditya Saha; Dinh Q. Phung; Duc-Son Pham; Svetha Venkatesh

We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pair wise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.

IEEE Journal of Biomedical and Health Informatics | 2017

A Framework for Mixed-Type Multioutcome Prediction With Applications in Healthcare

Budhaditya Saha; Sunil Kumar Gupta; Dinh Q. Phung; Svetha Venkatesh

Health analysis often involves prediction of multiple outcomes of mixed type. The existing work is restrictive to either a limited number or specific outcome types. We propose a framework for mixed-type multioutcome prediction. Our proposed framework proposes a cumulative loss function composed of a specific loss function for each outcome type–as an example, least square (continuous outcome), hinge (binary outcome), Poisson (count outcome), and exponential (nonnegative outcome). To model these outcomes jointly, we impose a commonality across the prediction parameters through a common matrix normal prior. The framework is formulated as iterative optimization problems and solved using an efficient block-coordinate descent method. We empirically demonstrate both scalability and convergence. We apply the proposed model to a synthetic dataset and then on two real-world cohorts: a cancer cohort and an acute myocardial infarction cohort collected over a two-year period. We predict multiple emergency-related outcomes–as example, future emergency presentations (binary), emergency admissions (count), emergency length of stay days (nonnegative), and emergency time to next admission day (nonnegative). We show that the predictive performance of the proposed model is better than several state-of-the-art baselines.

international conference on pattern recognition | 2016

Transfer learning for rare cancer problems via Discriminative Sparse Gaussian Graphical model

Budhaditya Saha; Sunil Kumar Gupta; Dinh Q. Phung; Svetha Venkatesh

Mortality prediction of rare cancer types with a small number of high-dimensional samples is a challenging task. We propose a transfer learning model where both classes in rare cancers (target task) are modeled in a joint framework by transferring knowledge from the source task. The knowledge transfer is at the data level where only “related” data points are chosen to train the target task. Moreover, both positive and negative class in training enhances the discrimination power of the proposed framework. Overall, this approach boosts the generalization performance of target task with a small number of data points. The formulation of the proposed framework is convex and expressed as a primal problem. We convert this to a dual problem and efficiently solve by alternating direction multipliers method. Our experiments with both synthetic and three real-world datasets show that our framework outperforms state-of-the-art single-task, multi-task, and transfer learning baselines.

Explore More