Subhash C. Bagui
University of West Florida
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Subhash C. Bagui.
Pattern Recognition | 2003
Subhash C. Bagui; Sikha Bagui; Kuhu Pal; Nikhil R. Pal
In this article, we propose a new generalization of the rank nearest neighbor (RNN) rule for multivariate data for diagnosis of breast cancer. We study the performance of this rule using two well known databases and compare the results with the conventional k-NN rule. We observe that this rule performed remarkably well, and the computational complexity of the proposed k-RNN is much less than the conventional k-NN rule.
Pattern Recognition Letters | 1995
Subhash C. Bagui; Nikhil R. Pal
Abstract We consider the problem of classifying an unknown observation from one of s (⩾ 2) univariate classes (or populations) using a multi-stage left and right rank nearest neighbor (RNN) rule. We derive the asymptotic error rate (i.e., total probability of misclassification (TPMC)) of the m -stage univariate RNN ( m -URNN) rule, and show that as the number of stages increases, the limiting TPMC of the m -stage univariate role decreases. Monte Carlo simulations are used to study the behavior of the m -URNN rule and compare it with the conventional k -NN rule. Finally, we incorporate an extension of the m -URNN role to multivariate observations with empirical results.
International Journal of Data Analysis Techniques and Strategies | 2009
Sikha Bagui; Jiri Just; Subhash C. Bagui
Traditional association mining rule algorithms have two major drawbacks: first, there is a need to repeatedly scan the dataset and second, they generate too many association rules. In this paper, we have presented a dependency-based association mining rule algorithm, implemented using an array list structure in JAVA, that does not require more than one scan of the full dataset and generates a lot less strong association mining rules. The additional dependency criterion used was the lift measure.
Journal of Statistical Planning and Inference | 1999
Subhash C. Bagui; K. L. Mehra
Abstract In this article, a multi-stage (M-stage) rank nearest-neighbor (MRNN)-type rule is proposed and studied for the classification of a sample of multiple (m) independent univariate observations between two populations. The asymptotic total probability of misclassification (TPMC) – viz., the asymptotic risk R(M)(m) – for the proposed MRNN rule is derived. It is shown firstly that (i) the asymptotic risk R(1)(2) of the 1st stage RNN rule for m=2 is lower than the corresponding risk R(1)(1) for m=1, by a factor less than one, and secondly that (ii) for m=2, the M-stage rule asymptotic risk R(M)(2) decreases as the number M of the stages employed increases. The former result leads to an improved upper bound on R(1)(2) in terms of Bayes risk R ∗ (1) (cf. Cover and Hart (1967) IEEE Trans. Inform. Theory, Das Gupta and Lin (1980) Sankhy a A). Also, a cross-validation-type estimator for the asymptotic risk R(1)(m) is shown to be asymptotically unbiased and L2-consistent. Finally, some comparative Monte-Carlo results are reported to illuminate the performance characteristics of the proposed rule in small sample situations.
Communications in Statistics-theory and Methods | 2002
Dulal K. Bhaumik; Ravindra Khattree; Subhash C. Bagui
ABSTRACT In this article, we derive locally optimum tests and Raos score tests for a two-stage nested design under a certain partial balance. Specifically, it is assumed that for any given level of leading factor, the number of observations at every error stage below it are equal. Formulas are provided so as to efficiently compute the test statistics.
Pattern Recognition Letters | 1993
Subhash C. Bagui
Abstract In this article, the first-stage rank nearest neighbor (RNN) rule is used to classify an unknown observation into one of the s (⩾2) populations (or classes). We derive the asymptotic risk (i.e., the total probability of misclassification) (TPMC)) of this rule, which turns out to be exactly the same as the limiting risk of the 1-NN rule of Cover and Hart (1967) for s classes. The proposed estimate of the limiting TPMC of the first-stage RNN rule is shown to be asymptotically unbiased and consistent. Finally, Monte Carlo results are reported to learn the performance of the first-stage RNN rule in comparison with the 1-NN rule in a small sample situation.
The American Statistician | 2013
Subhash C. Bagui; Dulal K. Bhaumik; K. L. Mehra
In probability theory, central limit theorems (CLTs), broadly speaking, state that the distribution of the sum of a sequence of random variables (r.v.s), suitably normalized, converges to a normal distribution as their number n increases indefinitely. However, the preceding convergence in distribution holds only under certain conditions, depending on the underlying probabilistic nature of this sequence of r.v.s. If some of the assumed conditions are violated, the convergence may or may not hold, or if it does, this convergence may be to a nonnormal distribution. We shall illustrate this via a few counter examples. While teaching CLTs at an advanced level, counter examples can serve as useful tools for explaining the true nature of these CLTs and the consequences when some of the assumptions made are violated.
Calcutta Statistical Association Bulletin | 1993
Subhash C. Bagui
We consider the problem of classifying multiple (m) observations into one of two populations using a nearest neighbor (NN) type rule. We derive the limiting risk R(m) of the proposed NN rule. For m = 2, we obtain an improved upper bound for R(m) and show that R(m) ⩽ R(m-I) for m = 2, 3. AMS (1980) Subject classification: Primary 62H30; Secondary 62F15.
Computational Biology and Chemistry | 2017
Xingang Fang; Sikha Bagui; Subhash C. Bagui
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets.
International Journal of Sustainable Society | 2012
Sikha Bagui; Jessie Brown; Jane M. Caffrey; Subhash C. Bagui
The need to track and analyse the atmospheric deposition of mercury and trace metals in the Pensacola (Florida) Bay Watershed in recent years has resulted in the need for a data management system that will allow data to be efficiently stored, checked for errors, manipulated, retrieved for analysis and shared within the research community. In this paper, we describe a relational database that was developed as a data management tool to address the needs of those maintaining and using atmospheric deposition of mercury and trace metal data in the Pensacola Bay Watershed area. We present the overall design of the database and show useful queries that can be used to clean and maintain the integrity of the data, perform calculations on the data, join and union tables and retrieve the data for presentation and analysis.