Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Subhash C. Basak is active.

Publication


Featured researches published by Subhash C. Basak.


Journal of Chemical Information and Computer Sciences | 2003

Assessing model fit by cross-validation.

Douglas M. Hawkins; Subhash C. Basak; Denise Mills

When QSAR models are fitted, it is important to validate any fitted model-to check that it is plausible that its predictions will carry over to fresh data not used in the model fitting exercise. There are two standard ways of doing this-using a separate hold-out test sample and the computationally much more burdensome leave-one-out cross-validation in which the entire pool of available compounds is used both to fit the model and to assess its validity. We show by theoretical argument and empiric study of a large QSAR data set that when the available sample size is small-in the dozens or scores rather than the hundreds, holding a portion of it back for testing is wasteful, and that it is much better to use cross-validation, but ensure that this is done properly.


Discrete Applied Mathematics | 1988

Determining structural similarity of chemicals using graph-theoretic indices

Subhash C. Basak; Gerald J. Niemi

Abstract Ninety (90) graph-theoretic indices were calculated for a diverse set of 3692 chemicals to test the efficacy of using graph-theoretic indices in determining similarity of chemicals in a large, diverse data base of structures. Principal component analysis was used to reduce the 90-dimensional space to a 10-dimensional subspace which explains 93% of the variance. Distance between chemicals in this 10-dimensional space was used to measure similarity. To test this approach, ten chemicals were chosen at random from the set of 3692 chemicals and the five nearest neighbors for each of these ten target chemicals were determined. The results show that this measure of similarity reflects intuitive notions of chemical similarity.


Journal of Chemical Information and Computer Sciences | 2001

On the characterization of DNA primary sequences by triplet of nucleic acid bases

Milan Randić; Xiaofeng Guo; Subhash C. Basak

We consider construction of a set of smaller 4 x 4 matrices to represent DNA primary sequences which are based on enumeration of all 64 triplets of nucleic acids bases. The leading eigenvalue from the constructed matrices has been selected as an invariant for construction of a vector to characterize DNA. Additional invariants considered of the derived condensed matrices of DNA include a 64-component vector, the components of which consist of ordered triplets XYZ, with X, Y, Z = A, C, G, T. Construction of similarity/dissimilarity tables based on different invariants for a set of sequences of DNA belonging to the first exon of the beta-globin gene of eight species illustrates the utility of newly formulated invariants for DNA.


Chemical Physics Letters | 2001

A novel 2-D graphical representation of DNA sequences of low degeneracy

Xiaofeng Guo; Milan Randić; Subhash C. Basak

Some 2-D and 3-D graphical representations of DNA sequences have been given by Nandy, Leong and Mogenthaler, and Randic et al., which give visual characterizations of DNA sequences. In this Letter, we introduce a novel graphical representation of DNA sequences by taking four special vectors in 2-D space to represent the four nucleic acid bases in DNA sequences, so that a DNA sequence is denoted on a plane by a successive vector sequence, which is also a directed walk on the plane. It is showed that the novel graphical representation of DNA sequences has lower degeneracy and less overlapping.


Journal of Chemical Information and Computer Sciences | 2000

Topological indices: their nature and mutual relatedness

Subhash C. Basak; Alexandru T. Balaban; Brian D. Gute

We calculated 202 molecular descriptors (topological indices, TIs) for two chemical databases (a set of 139 hydrocarbons and another set of 1037 diverse chemicals). Variable cluster analysis of these TIs grouped these structures into 14 clusters for the first set and 18 clusters for the second set. Correspondences between the same TIs in the two sets reveal how and why the various classes of TIs are mutually related and provide insight into what aspects of chemical structure they are expressing.


Journal of Mathematical Chemistry | 1991

Predicting properties of molecules using graph invariants

Subhash C. Basak; Gerald J. Niemi; Gilman D. Veith

Topological indices (TIs) have been used to study structure-activity relationships (SAR) with respect to the physical, chemical, and biological properties of congeneric sets of molecules. Since there are many TIs and many are correlated, it is important that we identify redundancies and extract useful information from TIs into a smaller number of parameters. Moreover, it is important to determine if TIs, or parameters derived from TIs, can be used for global SAR models of diverse sets of chemicals. We calculated seventy-one TIs for three groups of molecules of increasing complexity and diversity: (a) 74 alkanes, (b) 29 alkylbenzenes, and (c) 37 polycyclic aromatic hydrocarbons (PAHs). Principal components analysis (PCA) revealed that a few principal components (PCs) could extract most of the information encoded by the seventy-one TIs. The structural basis of the first few PCs could be derived from their pattern of correlation with individual TIs. For the three sets of molecules, viz. alkanes, alkylbenzenes and PAHs, PCs were able to predict the boiling points reasonably well. Also, for the combined set of 140 chemicals consisting of the alkanes, alkylbenzenes and PAHs, the derived PCs were not as effective in predicting properties as in the case of individual classes of compounds.


Journal of Chemical Information and Computer Sciences | 1996

A Comparative Study of Topological and Geometrical Parameters in Estimating Normal Boiling Point and Octanol/Water Partition Coefficient

Subhash C. Basak; Brian D. Gute

We have used topological, topochemical and geometrical parameters in predicting:  (a) normal boiling point of a set of 1023 chemicals and (b) lipophilicity (log P, octanol/water) of 219 chemicals. The results show that topological and topochemical variables can explain most of the variance in the data. The addition of geometrical parameters to the models provide marginal improvement in the models predictive power. Among the three classes of descriptors, the topochemical indices were the most effective in predicting properties.


Journal of Chemical Information and Computer Sciences | 2001

Prediction of Mutagenicity of Aromatic and Heteroaromatic Amines from Structure: A Hierarchical QSAR Approach

Subhash C. Basak; Denise Mills; and Alexandru T. Balaban; Brian D. Gute

Due to the lack of experimental data, there has been increasing use of theoretical structural descriptors in the hazard assessment of chemicals. We have used a hierarchical approach to develop class-specific quantitative structure-activity relationship (QSAR) models for the prediction of mutagenicity of a set of 95 aromatic and heteroaromatic amines. The hierarchical approach begins with the simplest molecular descriptors, the topostructural, which encode limited chemical information. The complexity is then increased, adding topochemical, geometric, and finally quantum chemical parameters. We have also added log P to the set of independent variables. The results indicate that the topological parameters, i.e., the topostructural and topochemical indices, explain the majority of the variance, and that the inclusion of log P, geometric, and quantum chemical parameters does not result in significantly improved predictive models.


Mathematical Modelling | 1987

Topological indices: their nature, mutual relatedness, and applications

Subhash C. Basak; V.R. Magnuson; Gerald J. Niemi; Ronald R. Regal; Gilman D. Veith

Abstract During the last two decades a large number of numerical graph invariants (topological indices) have been defined and used for correlation analysis in theoretical chemistry, pharmacology, toxicology, and environmental chemistry. However, no systematic study has been undertaken to determine to what extent these indices are correlated with each other. In the present paper we have carried out a principal component analysis (PCA) of 90 topological parameters derived from 3692 distinct chemicals taken from an environmental database consisting of nearly nineteen thousand compounds. The PCA using the correlation matrix resulted in 10 principal components (PCs) with eigenvalues greater than 1. These ten PCs explained over 92% of the variance in the standardized data. The first four PCs explained over 78% of the variance and the interpretations of these four PCs is given in terms of the chemical structures at the extremes of these PCs.


Journal of Chemical Information and Computer Sciences | 1999

Optimal Molecular Descriptors Based on Weighted Path Numbers

Milan Randić; Subhash C. Basak

We consider weighted path numbers as molecular descriptors for structure−property−activity studies. However, instead of using prescribed weights for paths we have optimized the weights so that the standard error in regression analysis is as small as possible. In particular we consider the boiling points of alcohols and use of weighted paths to differentiate an oxygen from a carbon atom.

Collaboration


Dive into the Subhash C. Basak's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jessica J. Kraker

University of Wisconsin–Eau Claire

View shared research outputs
Top Co-Authors

Avatar

Gilman D. Veith

United States Environmental Protection Agency

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge