Krzysztof J. Cios | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krzysztof J. Cios is active.

Explore More

Publication

Featured researches published by Krzysztof J. Cios.

Artificial Intelligence in Medicine | 2002

Uniqueness of medical data mining

Krzysztof J. Cios; G. William Moore

This article addresses the special features of data mining with medical data. Researchers in other fields may not be aware of the particular constraints and difficulties of the privacy-sensitive, heterogeneous, but voluminous data of medicine. Ethical and legal aspects of medical data mining are discussed, including data ownership, fear of lawsuits, expected benefits, and special administrative issues. The mathematical understanding of estimation and hypothesis formation in medical data may be fundamentally different than those from other data collection activities. Medicine is primarily directed at patient-care activity, and only secondarily as a research resource; almost the only justification for collecting medical data is to benefit the individual patient. Finally, medical data have a special status based upon their applicability to all people; their urgency (including life-or-death); and a moral obligation to be used for beneficial purposes.

IEEE Transactions on Knowledge and Data Engineering | 2004

CAIM discretization algorithm

Lukasz Kurgan; Krzysztof J. Cios

The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.

Artificial Intelligence in Medicine | 2001

Knowledge discovery approach to automated cardiac SPECT diagnosis

Lukasz Kurgan; Krzysztof J. Cios; Ryszard Tadeusiewicz; Marek R. Ogiela; Lucy S. Goodenday

The paper describes a computerized process of myocardial perfusion diagnosis from cardiac single proton emission computed tomography (SPECT) images using data mining and knowledge discovery approach. We use a six-step knowledge discovery process. A database consisting of 267 cleaned patient SPECT images (about 3000 2D images), accompanied by clinical information and physician interpretation was created first. Then, a new user-friendly algorithm for computerizing the diagnostic process was designed and implemented. SPECT images were processed to extract a set of features, and then explicit rules were generated, using inductive machine learning and heuristic approaches to mimic cardiologists diagnosis. The system is able to provide a set of computer diagnoses for cardiac SPECT studies, and can be used as a diagnostic tool by a cardiologist. The achieved results are encouraging because of the high correctness of diagnoses.

Neurocomputing | 1996

Time series forecasting by combining RBF networks, certainty factors, and the Box-Jenkins model

Donald K. Wedding; Krzysztof J. Cios

Abstract A method is described for using Radial Basis Function (RBF) neural networks to generate certainty factors along with normal output. When RBF output with low certainty factors values are discarded, the overall accuracy of the network is increased. In this paper, RBF networks are used in a time series application. The RBF neural networks are trained to generate both time series forecasts and certainty factors. Their output is then combined with the Univariant Box-Jenkins (UBJ) models to predict future values of data. This combination approach is shown to improve the overall reliability of time series forecasting. Three possible methods for combining the two forecasts into one hybrid forecast are discussed.

BMC Bioinformatics | 2008

SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

Lukasz Kurgan; Krzysztof J. Cios; Ke Chen

BackgroundProtein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.ResultsSCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifiers input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.ConclusionThe SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPREDs predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

Archive | 2005

Trends in Data Mining and Knowledge Discovery

Krzysztof J. Cios; Lukasz Kurgan

Data mining and knowledge discovery (DMKD) is a fast-growing field of research. Its popularity is caused by an ever increasing demand for tools that help in revealing and comprehending information hidden in huge amounts of data. Such data are generated on a daily basis by federal agencies, banks, insurance companies, retail stores, and on the WWW. This explosion came about through the increasing use of computers, scanners, digital cameras, bar codes, etc. We are in a situation where rich sources of data, stored in databases, warehouses, and other data repositories, are readily available but not easily analyzable. This causes pressure from the federal, business, and industry communities for improvements in the DMKD technology. What is needed is a clear and simple methodology for extracting the knowledge hidden in the data. In this chapter, an integrated DMKD process model based on technologies like XML, PMML, SOAP, UDDI, and OLE BD-DM is introduced. These technologies help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories. They also enable integration and automation of the DMKD tasks. This chapter describes a six-step DMKD process model and its component technologies.

IEEE Transactions on Neural Networks | 1992

A machine learning method for generation of a neural network architecture: a continuous ID3 algorithm

Krzysztof J. Cios; Ning Liu

The relation between the decision trees generated by a machine learning algorithm and the hidden layers of a neural network is described. A continuous ID3 algorithm is proposed that converts decision trees into hidden layers. The algorithm allows self-generation of a feedforward neural network architecture. In addition, it allows interpretation of the knowledge embedded in the generated connections and weights. A fast simulated annealing strategy, known as Cauchy training, is incorporated into the algorithm to escape from local minima. The performance of the algorithm is analyzed on spiral data.

ieee international conference on fuzzy systems | 1992

Continuous ID3 algorithm with fuzzy entropy measures

Krzysztof J. Cios; L.M. Sztandera

Fuzzy entropy measures are used to obtain a quick convergence of a continuous ID3 (CID3) algorithm proposed by K.J. Cios and N. Liu (1991), which allows for self-generation of a hierarchical feedforward neural network architecture by converting decision trees into hidden layers of a neural network. To demonstrate the learning capacity of the fuzzy version of the CID3 algorithm, it was tested on difficult spiral data consisting of 192 points, with 96 points for each spiral. One spiral is generated as a reflection of another, making the problem highly not linearly separable. A remarkable decrease in convergence time is achieved by using a fuzzy entropy measure with generalized Dombi operations.<<ETX>>

BioMed Research International | 2014

Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records

Beata Strack; Jonathan P. DeShazo; Chris Gennings; Juan Luis Olmo; Sebastián Ventura; Krzysztof J. Cios; John N. Clore

Management of hyperglycemia in hospitalized patients has a significant bearing on outcome, in terms of both morbidity and mortality. However, there are few national assessments of diabetes care during hospitalization which could serve as a baseline for change. This analysis of a large clinical database (74 million unique encounters corresponding to 17 million unique patients) was undertaken to provide such an assessment and to find future directions which might lead to improvements in patient safety. Almost 70,000 inpatient diabetes encounters were identified with sufficient detail for analysis. Multivariable logistic regression was used to fit the relationship between the measurement of HbA1c and early readmission while controlling for covariates such as demographics, severity and type of the disease, and type of admission. Results show that the measurement of HbA1c was performed infrequently (18.4%) in the inpatient setting. The statistical model suggests that the relationship between the probability of readmission and the HbA1c measurement depends on the primary diagnosis. The data suggest further that the greater attention to diabetes reflected in HbA1c determination may improve patient outcomes and lower cost of inpatient care.

Information Sciences | 2004

CLIP4: hybrid inductive machine learning algorithm that generates inequality rules

Krzysztof J. Cios; Lukasz Kurgan

The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal continuous, and missing-value attributes. The algorithms flexibility and efficiency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4s accuracy, CPU time, and rule complexity, CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), data-independent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.

Explore More