R. Bharat Rao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where R. Bharat Rao is active.

Explore More

Publication

Featured researches published by R. Bharat Rao.

international conference on machine learning | 2008

Bayesian multiple instance learning: automatic feature selection and inductive transfer

Vikas C. Raykar; Balaji Krishnapuram; Jinbo Bi; Murat Dundar; R. Bharat Rao

We propose a novel Bayesian multiple instance learning (MIL) algorithm. This algorithm automatically identifies the relevant feature subset, and utilizes inductive transfer when learning multiple (conceptually related) classifiers. Experimental results indicate that the proposed MIL method is more accurate than previous MIL algorithms and selects a much smaller set of useful features. Inductive transfer further improves the accuracy of the classifier as compared to learning each task individually.

Sigkdd Explorations | 2006

Data mining for improved cardiac care

R. Bharat Rao; Sriram Krishnan; Radu Stefan Niculescu

Cardiovascular Disease (CVD) is the single largest killer in the world. Although, several CVD treatment guidelines have been developed to improve quality of care and reduce healthcare costs, for a number of reasons, adherence to these guidelines remains poor. Further, due to the extremely poor quality of data in medical patient records, most of todays healthcare IT systems cannot provide significant support to improve the quality of CVD care (particularly in chronic CVD situations which contribute to the majority of costs).We present REMIND, a Probabilistic framework for Reliable Extraction and Meaningful Inference from Nonstructured Data. REMIND integrates the structured and unstructured clinical data in patient records to automatically create high-quality structured clinical data. There are two principal factors that enable REMIND to overcome the barriers associated with inference from medical records. First, patient data is highly redundant -- exploiting this redundancy allows us to deal with the inherent errors in the data. Second, REMIND performs inference based on external medical domain knowledge to combine data from multiple sources and to enforce consistency between different medical conclusions drawn from the data -- via a probabilistic reasoning framework that overcomes the incomplete, inconsistent, and incorrect nature of data in medical patient records.This high-quality structuring allows existing patient records to be mined to support guideline compliance and to improve patient care. However, once REMIND is configured for an institutions data repository, many other important clinical applications are also enabled, including: quality assurance; therapy selection for individual patients; automated patient identification for clinical trials; data extraction for research studies; and to relate financial and clinical factors. REMIND provides value across the continuum of healthcare, ranging from small physician practice databases to the most complex hospital IT systems, from acute cardiac care to chronic CVD management, and to experimental research studies. REMIND is currently deployed across multiple disease areas over a total of 5,000,000 patients across the US.

european conference on machine learning | 2008

An Improved Multi-task Learning Approach with Applications in Medical Diagnosis

Jinbo Bi; Tao Xiong; Shipeng Yu; Murat Dundar; R. Bharat Rao

We propose a family of multi-task learning algorithms for collaborative computer aided diagnosis which aims to diagnose multiple clinically-related abnormal structures from medical images. Our formulations eliminate features irrelevant to all tasks, and identify discriminative features for each of the tasks. A probabilistic model is derived to justify the proposed learning formulations. By equivalence proof, some existing regularization-based methods can also be interpreted by our probabilistic model as imposing a Wishart hyperprior. Convergence analysis highlights the conditions under which the formulations achieve convexity and global convergence. Two real-world medical problems: lung cancer prognosis and heart wall motion analysis, are used to validate the proposed algorithms.

international conference on machine learning | 2004

A fast iterative algorithm for fisher discriminant using heterogeneous kernels

Glenn Fung; Murat Dundar; Jinbo Bi; R. Bharat Rao

We propose a fast iterative classification algorithm for Kernel Fisher Discriminant (KFD) using heterogeneous kernel models. In contrast with the standard KFD that requires the user to predefine a kernel function, we incorporate the task of choosing an appropriate kernel into the optimization problem to be solved. The choice of kernel is defined as a linear combination of kernels belonging to a potentially large family of different positive semidefinite kernels. The complexity of our algorithm does not increase significantly with respect to the number of kernels on the kernel family. Experiments on several benchmark datasets demonstrate that generalization performance of the proposed algorithm is not significantly different from that achieved by the standard KFD in which the kernel parameters have been tuned using cross validation. We also present results on a real-life colon cancer dataset that demonstrate the efficiency of the proposed method.

knowledge discovery and data mining | 2006

Computer aided detection via asymmetric cascade of sparse hyperplane classifiers

Jinbo Bi; Senthil Periaswamy; Kazunori Okada; Toshiro Kubota; Glenn Fung; Marcos Salganicoff; R. Bharat Rao

This paper describes a novel classification method for computer aided detection (CAD) that identifies structures of interest from medical images. CAD problems are challenging largely due to the following three characteristics. Typical CAD training data sets are large and extremely unbalanced between positive and negative classes. When searching for descriptive features, researchers often deploy a large set of experimental features, which consequently introduces irrelevant and redundant features. Finally, a CAD system has to satisfy stringent real-time requirements.This work is distinguished by three key contributions. The first is a cascade classification approach which is able to tackle all the above difficulties in a unified framework by employing an asymmetric cascade of sparse classifiers each trained to achieve high detection sensitivity and satisfactory false positive rates. The second is the incorporation of feature computational costs in a linear program formulation that allows the feature selection process to take into account different evaluation costs of various features. The third is a boosting algorithm derived from column generation optimization to effectively solve the proposed cascade linear programs.We apply the proposed approach to the problem of detecting lung nodules from helical multi-slice CT images. Our approach demonstrates superior performance in comparison against support vector machines, linear discriminant analysis and cascade AdaBoost. Especially, the resulting detection system is significantly sped up with our approach.

knowledge discovery and data mining | 2008

Privacy-preserving cox regression for survival analysis

Shipeng Yu; Glenn Fung; Rómer Rosales; Sriram Krishnan; R. Bharat Rao; Cary Dehing-Oberije; Philippe Lambin

Privacy-preserving data mining (PPDM) is an emergent research area that addresses the incorporation of privacy preserving concerns to data mining techniques. In this paper we propose a privacy-preserving (PP) Cox model for survival analysis, and consider a real clinical setting where the data is horizontally distributed among different institutions. The proposed model is based on linearly projecting the data to a lower dimensional space through an optimal mapping obtained by solving a linear programming problem. Our approach differs from the commonly used random projection approach since it instead finds a projection that is optimal at preserving the properties of the data that are important for the specific problem at hand. Since our proposed approach produces an sparse mapping, it also generates a PP mapping that not only projects the data to a lower dimensional space but it also depends on a smaller subset of the original features (it provides explicit feature selection). Real data from several European healthcare institutions are used to test our model for survival prediction of non-small-cell lung cancer patients. These results are also confirmed using publicly available benchmark datasets. Our experimental results show that we are able to achieve a near-optimal performance without directly sharing the data across different data sources. This model makes it possible to conduct large-scale multi-centric survival analysis without violating privacy-preserving requirements.

IWDM '08 Proceedings of the 9th international workshop on Digital Mammography | 2008

Multiple-Instance Learning Improves CAD Detection of Masses in Digital Mammography

Balaji Krishnapuram; Jonathan Stoeckel; Vikas C. Raykar; R. Bharat Rao; Philippe Bamberger; Eli Ratner; Nicolas J. Merlet; Inna Stainvas; Menahem Abramov; Alexandra Manevitch

We propose a novel multiple-instance learning(MIL) algorithm for designing classifiers for use in computer aided detection(CAD). The proposed algorithm has 3 advantages over classical methods. First, unlike traditional learning algorithms that minimize the candidate level misclassification error, the proposed algorithm directly optimizes the patient-wise sensitivity. Second, this algorithm automatically selects a small subset of statistically useful features. Third, this algorithm is very fast, utilizes all of the available training data (without the need for cross-validation etc.), and requires no human hand tuning or intervention. Experimentally the algorithm is more accurate than state of the art support vector machine (SVM) classifier, and substantially reduces the number of features that have to be computed.

knowledge discovery and data mining | 2007

LungCAD: a clinically approved, machine learning system for lung cancer detection

R. Bharat Rao; Jinbo Bi; Glenn Fung; Marcos Salganicoff; Nancy A. Obuchowski; David P. Naidich

We present LungCAD, a computer aided diagnosis (CAD) system that employs a classification algorithm for detecting solid pulmonary nodules from CT thorax studies. We briefly describe some of the machine learning techniques developed to overcome the real world challenges in this medical domain. The most significant hurdle in transitioning from a machine learning research prototype that performs well on an in-house dataset into a clinically deployable system, is the requirement that the CAD system be tested in a clinical trial. We describe the clinical trial in which LungCAD was tested: a large scale multi-reader, multi-case (MRMC) retrospective observational study to evaluate the effect of CAD in clinical practice for detecting solid pulmonary nodules from CT thorax studies. The clinical trial demonstrates that every radiologist that participated in the trial had a significantly greater accuracy with LungCAD, both for detecting nodules and identifying potentially actionable nodules; this, along with other findings from the trial, has resulted in FDA approval for LungCAD in late 2006.

european conference on machine learning | 2006

Batch classification with applications in computer aided diagnosis

Volkan Vural; Glenn Fung; Balaji Krishnapuram; Jennifer G. Dy; R. Bharat Rao

Most classification methods assume that the samples are drawn independently and identically from an unknown data generating distribution, yet this assumption is violated in several real life problems. In order to relax this assumption, we consider the case where batches or groups of samples may have internal correlations, whereas the samples from different batches may be considered to be uncorrelated. Two algorithms are developed to classify all the samples in a batch jointly, one based on a probabilistic analysis and another based on a mathematical programming approach. Experiments on three real-life computer aided diagnosis (CAD) problems demonstrate that the proposed algorithms are significantly more accurate than a naive SVM which ignores the correlations among the samples.

Sigkdd Explorations | 2008

KDD cup 2008 and the workshop on mining medical data

R. Bharat Rao; Oksana Yakhnenko; Balaji Krishnapuram

In this report we summarize the KDD Cup 2008 task, which addressed a problem of early breast cancer detection. We describe the data and the challenges, the results and summarize the algorithms used by the winning teams. We also summarize the workshop on Mining Medical Data held in conjunction with SIGKDD on August 24, 2008 in Las Vegas, NV that brought together researchers working on various aspects of applying machine learning and data mining to challenging tasks in medical and health care domains.

Explore More