Subhash C. Bagui | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Subhash C. Bagui is active.

Explore More

Publication

Featured researches published by Subhash C. Bagui.

Pattern Recognition | 2003

Breast cancer detection using rank nearest neighbor classification rules

Subhash C. Bagui; Sikha Bagui; Kuhu Pal; Nikhil R. Pal

In this article, we propose a new generalization of the rank nearest neighbor (RNN) rule for multivariate data for diagnosis of breast cancer. We study the performance of this rule using two well known databases and compare the results with the conventional k-NN rule. We observe that this rule performed remarkably well, and the computational complexity of the proposed k-RNN is much less than the conventional k-NN rule.

Pattern Recognition Letters | 1995

A multistage generalization of the rank nearest neighbor classification rule

Subhash C. Bagui; Nikhil R. Pal

Abstract We consider the problem of classifying an unknown observation from one of s (⩾ 2) univariate classes (or populations) using a multi-stage left and right rank nearest neighbor (RNN) rule. We derive the asymptotic error rate (i.e., total probability of misclassification (TPMC)) of the m -stage univariate RNN ( m -URNN) rule, and show that as the number of stages increases, the limiting TPMC of the m -stage univariate role decreases. Monte Carlo simulations are used to study the behavior of the m -URNN rule and compare it with the conventional k -NN rule. Finally, we incorporate an extension of the m -URNN role to multivariate observations with empirical results.

International Journal of Data Analysis Techniques and Strategies | 2009

Deriving strong association mining rules using a dependency criterion, the lift measure

Sikha Bagui; Jiri Just; Subhash C. Bagui

Traditional association mining rule algorithms have two major drawbacks: first, there is a need to repeatedly scan the dataset and second, they generate too many association rules. In this paper, we have presented a dependency-based association mining rule algorithm, implemented using an array list structure in JAVA, that does not require more than one scan of the full dataset and generates a lot less strong association mining rules. The additional dependency criterion used was the lift measure.

Journal of Statistical Planning and Inference | 1999

Classification of multiple observations using multi-stage rank nearest neighbor rule

Subhash C. Bagui; K. L. Mehra

Abstract In this article, a multi-stage (M-stage) rank nearest-neighbor (MRNN)-type rule is proposed and studied for the classification of a sample of multiple (m) independent univariate observations between two populations. The asymptotic total probability of misclassification (TPMC) – viz., the asymptotic risk R(M)(m) – for the proposed MRNN rule is derived. It is shown firstly that (i) the asymptotic risk R(1)(2) of the 1st stage RNN rule for m=2 is lower than the corresponding risk R(1)(1) for m=1, by a factor less than one, and secondly that (ii) for m=2, the M-stage rule asymptotic risk R(M)(2) decreases as the number M of the stages employed increases. The former result leads to an improved upper bound on R(1)(2) in terms of Bayes risk R ∗ (1) (cf. Cover and Hart (1967) IEEE Trans. Inform. Theory, Das Gupta and Lin (1980) Sankhy a A). Also, a cross-validation-type estimator for the asymptotic risk R(1)(m) is shown to be asymptotically unbiased and L2-consistent. Finally, some comparative Monte-Carlo results are reported to illuminate the performance characteristics of the proposed rule in small sample situations.

Communications in Statistics-theory and Methods | 2002

STATISTICAL TESTS FOR NESTED DESIGNS UNDER PARTIAL BALANCE

Dulal K. Bhaumik; Ravindra Khattree; Subhash C. Bagui

ABSTRACT In this article, we derive locally optimum tests and Raos score tests for a two-stage nested design under a certain partial balance. Specifically, it is assumed that for any given level of leading factor, the number of observations at every error stage below it are equal. Formulas are provided so as to efficiently compute the test statistics.

Pattern Recognition Letters | 1993

Classification using first-stage rank nearest neighbor rule for multiple classes

Subhash C. Bagui

Abstract In this article, the first-stage rank nearest neighbor (RNN) rule is used to classify an unknown observation into one of the s (⩾2) populations (or classes). We derive the asymptotic risk (i.e., the total probability of misclassification) (TPMC)) of this rule, which turns out to be exactly the same as the limiting risk of the 1-NN rule of Cover and Hart (1967) for s classes. The proposed estimate of the limiting TPMC of the first-stage RNN rule is shown to be asymptotically unbiased and consistent. Finally, Monte Carlo results are reported to learn the performance of the first-stage RNN rule in comparison with the 1-NN rule in a small sample situation.

The American Statistician | 2013

A Few Counter Examples Useful in Teaching Central Limit Theorems

Subhash C. Bagui; Dulal K. Bhaumik; K. L. Mehra

In probability theory, central limit theorems (CLTs), broadly speaking, state that the distribution of the sum of a sequence of random variables (r.v.s), suitably normalized, converges to a normal distribution as their number n increases indefinitely. However, the preceding convergence in distribution holds only under certain conditions, depending on the underlying probabilistic nature of this sequence of r.v.s. If some of the assumed conditions are violated, the convergence may or may not hold, or if it does, this convergence may be to a nonnormal distribution. We shall illustrate this via a few counter examples. While teaching CLTs at an advanced level, counter examples can serve as useful tools for explaining the true nature of these CLTs and the consequences when some of the assumptions made are violated.

Calcutta Statistical Association Bulletin | 1993

An NN Classification Rule for Multiple Observations

Subhash C. Bagui

We consider the problem of classifying multiple (m) observations into one of two populations using a nearest neighbor (NN) type rule. We derive the limiting risk R(m) of the proposed NN rule. For m = 2, we obtain an improved upper bound for R(m) and show that R(m) ⩽ R(m-I) for m = 2, 3. AMS (1980) Subject classification: Primary 62H30; Secondary 62F15.

Computational Biology and Chemistry | 2017

Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models

Xingang Fang; Sikha Bagui; Subhash C. Bagui

The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets.

International Journal of Sustainable Society | 2012

Designing a relational database for tracking and analysis of atmospheric deposition of mercury and trace metals in the Pensacola (Florida) Bay Watershed

Sikha Bagui; Jessie Brown; Jane M. Caffrey; Subhash C. Bagui

The need to track and analyse the atmospheric deposition of mercury and trace metals in the Pensacola (Florida) Bay Watershed in recent years has resulted in the need for a data management system that will allow data to be efficiently stored, checked for errors, manipulated, retrieved for analysis and shared within the research community. In this paper, we describe a relational database that was developed as a data management tool to address the needs of those maintaining and using atmospheric deposition of mercury and trace metal data in the Pensacola Bay Watershed area. We present the overall design of the database and show useful queries that can be used to clean and maintain the integrity of the data, perform calculations on the data, join and union tables and retrieve the data for presentation and analysis.

Explore More