Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wojtek J. Krzanowski is active.

Publication


Featured researches published by Wojtek J. Krzanowski.


Biometrics | 1988

Principles of multivariate analysis: a user's perspective

Wojtek J. Krzanowski

Part I: Looking at multivariate data Part II: Samples, populations, and models Part III: Analysing ungrouped data Part IV: Analysing grouped data Part V: Analysing association among variables Appendix: some basic matrix theory A1 Definitions A2 Elementary arithmetic operations A3 Determinants and inverses A4 Quadratic forms A5 Latent roots and vectors A6 Matrix square root A7 Partitioned matrices A8 Vector differentiation References Index


Biometrics | 1988

A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering

Wojtek J. Krzanowski; Y. T. Lai

Marriott (1971, Biometrics 27, 501-514) used a heuristic argument to derive the criterion g2 l W I for determining the number of groups in a data set when the clustering objective function is the withingroup determinant I W 1. An analogous argument is employed to derive a criterion for use with the within-group sum-of-squares objective function trace (W). The behaviour of both Marriotts criterion and the new criterion is investigated by Monte Carlo methods. For homogeneous data based on uniform and independent variables, the performance of the new criterion is close to expectation while Marriotts criterion shows much more extreme behaviour. For grouped data, the new criterion correctly identifies the number of groups in 85% of data sets under a wide range of conditions, while Marriotts criterion shows a success rate of less than 40%. The new criterion is illustrated on the wellknown Iris data, and some cautionary comments are made about its use.


Journal of the American Statistical Association | 1979

Between-Groups Comparison of Principal Components

Wojtek J. Krzanowski

Abstract A method is given for comparing principal component analyses conducted on the same variables in two different groups of individuals, and an extension to the case of more than two groups is outlined. The technique leads to a latent root and vector problem, which has also arisen in the comparison of factor patterns in separate factor analyses. Emphasis in the present article is on the underlying geometry and interpretation of the results. An illustrative example is provided.


Technometrics | 1982

Cross-Validatory Choice of the Number of Components From a Principal Component Analysis

H. T. Eastment; Wojtek J. Krzanowski

A method is described for choosing the number of components to retain in a principal component analysis when the aim is dimensionality reduction. The correspondence between principal component analysis and the singular value decomposition of the data matrix is used. The method is based on successively predicting each element in the data matrix after deleting the corresponding row and column of the matrix, and makes use of recently published algorithms for updating a singular value decomposition. These are very fast, which renders the proposed technique a practicable one for routine data analysis.


Applied statistics | 1995

Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data

Wojtek J. Krzanowski; Philip Jonathan; W. V. Mccarthy; M. R. Thomas

SUMMARY Currently popular techniques such as experimental spectroscopy and computer-aided molecular modelling lead to data having very many variables observed on each of relatively few individuals. A common objective is discrimination between two or more groups, but the direct application of standard discriminant methodology fails because of singularity of covariance matrices. The problem has been circumvented in the past by prior selection of a few transformed variables, using either principal component analysis or partial least squares. Although such selection ensures nonsingularity of matrices, the decision process is arbitrary and valuable information on group structure may be lost. We therefore consider some ways of estimating linear discriminant functions without such prior selection. Several spectroscopic data sets are analysed with each method, and questions of bias of assessment procedures are investigated. All proposed methods seem worthy of consideration in practice.


Journal of the American Statistical Association | 1975

Discrimination and Classification Using Both Binary and Continuous Variables

Wojtek J. Krzanowski

Abstract The likelihood ratio classification rule is derived from the location model, applicable when the data contains both binary and continuous variables. A method is proposed for estimating the rule in practical situations and assessing its performance. Losses incurred by the estimation procedure are investigated, and use of Fishers linear discriminant function on such data is studied for the case of known population parameters. Finally, the proposed rule is applied to some data sets, and its performance is compared with that of some other classification rules.


Biometrics | 1987

Cross-Validation in Principal Component Analysis

Wojtek J. Krzanowski

SUMMARY This paper describes a form of cross-validation, in the context of principal component analysis, which has a number of useful aspects as regards multivariate data inspection and description. Topics covered include choice of dimensionality, identification of influential observations, and selection of important variables. The methods are motivated by and illustrated on a well-known data set. 1. Data Set and Objectives Jeffers (1967) described two detailed multivariate case studies, one of which concerned 19 variables measured on each of 40 winged aphids alatee adelges) that had been caught in a light trap. The 19 variables are listed in Table 1. Principal component analysis (PCA) was used to examine the structure in the data, and if possible to answer the following questions: (i) How many dimensions of the individuals are being measured? (ii) How many distinct taxa are present in the habitat? (iii) Which variables among the 19 are redundant for distinguishing between taxa, and which must be retained in future work? Of the 19 variables, 14 are length or width measurements, four are counts, and one (anal fold) is a presence/absence variable scored 0 or 1. In view of this disparity in variable type, Jeffers elected to standardise the data and thus effect the PCA by finding the latent roots and vectors of the correlation (rather than covariance) matrix of the data. The elements of each latent vector provide the coefficients of one of 19 linear combinations of the standardised original variables that successively maximise sample variance subject to being orthogonal with each other, and the corresponding latent root is the sample variance of that linear combination. The 19 observations for each aphid were subjected to each of these 19 linear transformations to form the 19 principal component scores for that aphid. The above questions were then answered as follows: (i) The latent roots of the correlation matrix were as given in Table 1. The four largest comprise 73.0%, 12.5%, 3.9%, and 2.6%, respectively, of the total variance (19.0) of the standardised variables; the dimensionality of the data was therefore taken to be 2. (ii) When the scores of the first two principal components for the 40 aphids were plotted against orthogonal axes, the resulting 40 points divided into four groups as shown in Figure 1. Hence, four distinct species were identified for the aphids. (iii) From consideration of the size of coefficients in the first three principal components, it was concluded that only the four variables length of tibia, number of ovipositor


Information & Software Technology | 1997

Software diversity: practical statistics for its measurement and exploitation

Derek Partridge; Wojtek J. Krzanowski

Abstract The topic of this paper is the exploitation of diversity to enhance computer system reliability. It is well established that a diverse system composed of multiple alternative versions is more reliable than any single version alone, and this knowledge has occasionally been exploited in safety-critical applications. However, it is not clear what this property is, nor how the available diversity in a collection of versions is best exploited. We develop, define, illustrate and assess diversity measures, voting strategies for diversity exploitation, and interactions between the two. We take the view that a proper understanding of such issues is required if multiversion software engineering is to be elevated from the current “try it and see” procedure to a systematic technology. In addition, we introduce inductive programming techniques, particularly neural computing, as a cost-effective route to the practical use of multiversion systems outside the demanding requirements of safety-critical systems, i.e. in general software engineering.


Computational Statistics & Data Analysis | 2005

Improved biclustering of microarray data demonstrated through systematic performance tests

Heather Turner; Trevor C. Bailey; Wojtek J. Krzanowski

A new algorithm is presented for fitting the plaid model, a biclustering method developed for clustering gene expression data. The approach is based on speedy individual differences clustering and uses binary least squares to update the cluster membership parameters, making use of the binary constraints on these parameters and simplifying the other parameter updates. The performance of both algorithms is tested on simulated data sets designed to imitate (normalised) gene expression data, covering a range of biclustering configurations. Empirical distributions for the components of these data sets, including non-systematic error, are derived from a real set of microarray data. A set of two-way quality measures is proposed, based on one-way measures commonly used in information retrieval, to evaluate the quality of a retrieved bicluster with respect to a target bicluster in terms of both genes and samples. By defining a one-to-one correspondence between target biclusters and retrieved biclusters, the performance of each algorithm can be assessed. The results show that, using appropriately selected starting criteria, the proposed algorithm out-performs the original plaid model algorithm across a range of data sets. Furthermore, through the rigorous assessment of the plaid model a benchmark for future evaluation of biclustering methods is established.


Technometrics | 1977

The Performance of Fisher's Linear Discriminant Function Under Non-Optimal Conditions

Wojtek J. Krzanowski

A review is given of the published work on the performance of Fishers linear discriminant function when underlying assumptions are violated. Some new results are presented for the case of classification using both binary and continuous variables, and conditions for success or failure of the linear discriminant function are investigated.

Collaboration


Dive into the Wojtek J. Krzanowski's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vitaly Schetinin

University of Bedfordshire

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Livia Jakaite

University of Bedfordshire

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge