Chandan K. Reddy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chandan K. Reddy is active.

Explore More

Publication

Featured researches published by Chandan K. Reddy.

Archive | 2013

Data Clustering: Algorithms and Applications

Charu C. Aggarwal; Chandan K. Reddy

Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering processincluding how to verify the quality of the underlying clustersthrough supervision, human intervention, or the automated generation of alternative clusters.

Journal of Big Data | 2015

A survey on platforms for big data analytics

Dilpreet Singh; Chandan K. Reddy

The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of each of these platforms based on various metrics such as scalability, data I/O rate, fault tolerance, real-time processing, data size supported and iterative task support. In addition to the hardware, a detailed description of the software frameworks used within each of these platforms is also discussed along with their strengths and drawbacks. Some of the critical characteristics described here can potentially aid the readers in making an informed decision about the right choice of platforms depending on their computational needs. Using a star ratings table, a rigorous qualitative comparison between different platforms is also discussed for each of the six characteristics that are critical for the algorithms of big data analytics. In order to provide more insights into the effectiveness of each of the platform in the context of big data analytics, specific implementation level details of the widely used k-means clustering algorithm on various platforms are also described in the form pseudocode.

IEEE Transactions on Visualization and Computer Graphics | 2013

UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization

Jaegul Choo; Changhyun Lee; Chandan K. Reddy; Haesun Park

Topic modeling has been widely used for analyzing text document collections. Recently, there have been significant advancements in various topic modeling techniques, particularly in the form of probabilistic graphical modeling. State-of-the-art techniques such as Latent Dirichlet Allocation (LDA) have been successfully applied in visual text analytics. However, most of the widely-used methods based on probabilistic modeling have drawbacks in terms of consistency from multiple runs and empirical convergence. Furthermore, due to the complicatedness in the formulation and the algorithm, LDA cannot easily incorporate various types of user feedback. To tackle this problem, we propose a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). Centered around its semi-supervised formulation, UTOPIAN enables users to interact with the topic modeling method and steer the result in a user-driven manner. We demonstrate the capability of UTOPIAN via several usage scenarios with real-world document corpuses such as InfoVis/VAST paper data set and product review data sets.

IEEE Transactions on Knowledge and Data Engineering | 2012

Scalable and Parallel Boosting with MapReduce

Indranil Palit; Chandan K. Reddy

In this era of data abundance, it has become critical to process large volumes of data at much faster rates than ever before. Boosting is a powerful predictive model that has been successfully used in many real-world applications. However, due to the inherent sequential nature, achieving scalability for boosting is nontrivial and demands the development of new parallelized versions which will allow them to efficiently handle large-scale data. In this paper, we propose two parallel boosting algorithms, AdaBoost.PL and LogitBoost.PL, which facilitate simultaneous participation of multiple computing nodes to construct a boosted ensemble classifier. The proposed algorithms are competitive to the corresponding serial versions in terms of the generalization performance. We achieve a significant speedup since our approach does not require individual computing nodes to communicate with each other for sharing their data. In addition, the proposed approach also allows for preserving privacy of computations in distributed environments. We used MapReduce framework to implement our algorithms and demonstrated the performance in terms of classification accuracy, speedup and scaleup using a wide variety of synthetic and real-world data sets.

knowledge discovery and data mining | 2013

Big data analytics for healthcare

Jimeng Sun; Chandan K. Reddy

Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. In this tutorial, we introduce the characteristics and related mining challenges on dealing with big medical data. Many of those insights come from medical informatics community, which is highly related to data mining but focuses on biomedical specifics. We survey various related papers from data mining venues as well as medical informatics venues to share with the audiences key problems and trends in healthcare analytics research, with different applications ranging from clinical text mining, predictive modeling, survival analysis, patient similarity, genetic data analysis, and public health. The tutorial will include several case studies dealing with some of the important healthcare applications.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models

Chandan K. Reddy; Hsiao-Dong Chiang; Bala Rajaratnam

The expectation maximization (EM) algorithm is widely used for learning finite mixture models despite its greedy nature. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) to compute neighborhood local maxima on the likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases, namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve local maxima with improved likelihood values. The EM phase obtains the local maximum of the likelihood function and the stability region phase helps to escape out of the local maximum by moving toward the neighboring stability regions. Though applied to Gaussian mixtures in this paper, our technique can be easily generalized to any other parametric finite mixture model. The algorithm has been tested on both synthetic and real data sets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally.

Health Care Management Science | 2011

A probabilistic model for predicting the probability of no-show in hospital appointments

Adel Alaeddini; Kai Yang; Chandan K. Reddy; Susan Yu

The number of no-shows has a significant impact on the revenue, cost and resource utilization for almost all healthcare systems. In this study we develop a hybrid probabilistic model based on logistic regression and empirical Bayesian inference to predict the probability of no-shows in real time using both general patient social and demographic information and individual clinical appointments attendance records. The model also considers the effect of appointment date and clinic type. The effectiveness of the proposed approach is validated based on a patient dataset from a VA medical center. Such an accurate prediction model can be used to enable a precise selective overbooking strategy to reduce the negative effect of no-shows and to fill appointment slots while maintaining short wait times.

Microbial Ecology | 2010

CMEIAS color segmentation: an improved computing technology to process color images for quantitative microbial ecology studies at single-cell resolution.

Colin Gross; Chandan K. Reddy; Frank B. Dazzo

Quantitative microscopy and digital image analysis are underutilized in microbial ecology largely because of the laborious task to segment foreground object pixels from background, especially in complex color micrographs of environmental samples. In this paper, we describe an improved computing technology developed to alleviate this limitation. The system’s uniqueness is its ability to edit digital images accurately when presented with the difficult yet commonplace challenge of removing background pixels whose three-dimensional color space overlaps the range that defines foreground objects. Image segmentation is accomplished by utilizing algorithms that address color and spatial relationships of user-selected foreground object pixels. Performance of the color segmentation algorithm evaluated on 26 complex micrographs at single pixel resolution had an overall pixel classification accuracy of 99+%. Several applications illustrate how this improved computing technology can successfully resolve numerous challenges of complex color segmentation in order to produce images from which quantitative information can be accurately extracted, thereby gain new perspectives on the in situ ecology of microorganisms. Examples include improvements in the quantitative analysis of (1) microbial abundance and phylotype diversity of single cells classified by their discriminating color within heterogeneous communities, (2) cell viability, (3) spatial relationships and intensity of bacterial gene expression involved in cellular communication between individual cells within rhizoplane biofilms, and (4) biofilm ecophysiology based on ribotype-differentiated radioactive substrate utilization. The stand-alone executable file plus user manual and tutorial images for this color segmentation computing application are freely available at http://cme.msu.edu/cmeias/. This improved computing technology opens new opportunities of imaging applications where discriminating colors really matter most, thereby strengthening quantitative microscopy-based approaches to advance microbial ecology in situ at individual single-cell resolution.

bioinformatics and biomedicine | 2008

Boosting Methods for Protein Fold Recognition: An Empirical Comparison

Yazhene Krishnaraj; Chandan K. Reddy

Protein fold recognition is the prediction of proteins tertiary structure (Fold) given the proteins sequence without relying on sequence similarity. Using machine learning techniques for protein fold recognition, most of the state-of-the-art research has focused on more traditional algorithms such as support vector machines (SVM), k-nearest neighbor (KNN) and neural networks (NN). In this paper, we present an empirical study of two variants of boosting algorithms - AdaBoost and LogitBoost for the problem of fold recognition. Prediction accuracy is measured on a dataset with proteins from 27 most populated folds from the SCOP database, and is compared with results from other literature using SVM, KNN and NN algorithms on the same dataset. Overall, boosting methods achieve 60% fold recognition accuracy on an independent test protein dataset which is the highest prediction achieved when compared with the accuracy values obtained with other methods proposed in the literature. Boosting algorithms have the potential to build efficient classification models in a very fast manner.

international conference on data mining | 2013

Cox Regression with Correlation Based Regularization for Electronic Health Records

Bhanukiran Vinzamuri; Chandan K. Reddy

Survival Regression models play a vital role in analyzing time-to-event data in many practical applications ranging from engineering to economics to healthcare. These models are ideal for prediction in complex data problems where the response is a time-to-event variable. An event is defined as the occurrence of a specific event of interest such as a chronic health condition. Cox regression is one of the most popular survival regression model used in such applications. However, these models have the tendency to over fit the data which is not desirable for healthcare applications because it limits their generalization to other hospital scenarios. In this paper, we address these challenges for the cox regression model. We combine two unique correlation based regularizers with cox regression to handle correlated and grouped features which are commonly seen in many practical problems. The proposed optimization problems are solved efficiently using cyclic coordinate descent and Alternate Direction Method of Multipliers algorithms. We conduct experimental analysis on the performance of these algorithms over several synthetic datasets and electronic health records (EHR) data about heart failure diagnosed patients from a hospital. We demonstrate through our experiments that these regularizers effectively enhance the ability of cox regression to handle correlated features. In addition, we extensively compare our results with other regularized linear and logistic regression algorithms. We validate the goodness of the features selected by these regularized cox regression models using the biomedical literature and different feature selection algorithms.

Explore More