Pawan Lingras
Saint Mary's University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pawan Lingras.
intelligent information systems | 2004
Pawan Lingras; Chad West
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.
knowledge discovery and data mining | 1998
Pawan Lingras; Yiyu Yao
This article examines basic issues of data mining using the theory of rough sets, which is a recent proposal for generalizing classical set theory. The Pawlak rough set model is based on the concept of an equivalence relation. Recent research has shown that a generalized rough set model need not be based on equivalence relation axioms. The Pawlak rough set model has been used for deriving deterministic as well as probabilistic rules from a complete database. This article demonstrates that a generalized rough set model can be used for generating rules from incomplete databases. These rules are based on plausibility functions proposed by Shafer. The article also discusses the importance of rule extraction from incomplete databases in data mining.
soft computing | 1998
Yiyu Yao; Pawan Lingras
Abstract This paper reviews and examines interpretations of belief functions in the theory of rough sets with finite universe. The concept of standard rough set algebras is generalized in two directions. One is based on the use of nonequivalence relations. The other is based on relations over two universes, which leads to the notion of interval algebras. Pawlak rough set algebras may be used to interpret belief functions whose focal elements form a partition of the universe. Generalized rough set algebras using nonequivalence relations may be used to interpret belief functions which have less than | U | focal elements, where | U | is the cardinality of the universe U on which belief functions are defined. Interval algebras may be used to interpret any belief functions.
Information Sciences | 2007
Pawan Lingras; Cory J. Butz
Support vector machines (SVMs) are essentially binary classifiers. To improve their applicability, several methods have been suggested for extending SVMs for multi-classification, including one-versus-one (1-v-1), one-versus-rest (1-v-r) and DAGSVM. In this paper, we first describe how binary classification with SVMs can be interpreted using rough sets. A rough set approach to SVM classification removes the necessity of exact classification and is especially useful when dealing with noisy data. Next, by utilizing the boundary region in rough sets, we suggest two new approaches, extensions of 1-v-r and 1-v-1, to SVM multi-classification that allow for an error rate. We explicitly demonstrate how our extended 1-v-r may shorten the training time of the conventional 1-v-r approach. In addition, we show that our 1-v-1 approach may have reduced storage requirements compared to the conventional 1-v-1 and DAGSVM techniques. Our techniques also provide better semantic interpretations of the classification process. The theoretical conclusions are supported by experimental findings involving a synthetic dataset.
IEEE Transactions on Knowledge and Data Engineering | 2009
Pawan Lingras; Min Chen; Duoqian Miao
Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This paper proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Experiments with synthetic, standard, and real-world retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. The proposed validity index can also be extended for evaluating other clustering algorithms such as fuzzy clustering.
intelligent information systems | 2001
Pawan Lingras
The rough set is a useful notion for the classification of objects when the available information is not adequate to represent classes using precise sets. Rough sets have been successfully used in information systems for learning rules from an expert. This paper describes how genetic algorithms can be used to develop rough sets. The proposed rough set theoretic genetic encoding will be especially useful in unsupervised learning. A rough set genome consists of upper and lower bounds for sets in a partition. The partition may be as simple as the conventional expert class and its complement or a more general classification scheme. The paper provides a complete description of design and implementation of rough set genomes. The proposed design and implementation is used to provide an unsupervised rough set classification of highway sections.
International Journal of Approximate Reasoning | 2013
Georg Peters; Fernando Crespo; Pawan Lingras; Richard Weber
Clustering is one of the most widely used approaches in data mining with real life applications in virtually any domain. The huge interest in clustering has led to a possibly three-digit number of algorithms with the k-means family probably the most widely used group of methods. Besides classic bivalent approaches, clustering algorithms belonging to the domain of soft computing have been proposed and successfully applied in the past four decades. Bezdeks fuzzy c-means is a prominent example for such soft computing cluster algorithms with many effective real life applications. More recently, Lingras and West enriched this area by introducing rough k-means. In this article we compare k-means to fuzzy c-means and rough k-means as important representatives of soft clustering. On the basis of this comparison, we then survey important extensions and derivatives of these algorithms; our particular interest here is on hybrid clustering, merging fuzzy and rough concepts. We also give some examples where k-means, rough k-means, and fuzzy c-means have been used in studies.
Information Sciences | 1998
Pawan Lingras
Abstract Conventional neural network architectures generally lack semantics. Both rough and neofuzzy neurons introduce semantic structures in the conventional neural network models. Rough neurons make it possible to process data points with a range of values instead of a single precise value. Neofuzzy neurons make it possible to convert crisp values into fuzzy values. This paper compares rough and neofuzzy neural networks. Rough and neofuzzy neurons are demonstrated to be complementary to each other. It is shown that the introduction of rough and fuzzy semantic structures in neural networks can increase the accuracy of predictions.
ieee international conference on fuzzy systems | 2002
Pawan Lingras
Similar to traditional data mining, three important Web mining operations include clustering, association, and sequential analysis. Typical clustering operations in Web mining involve finding natural groupings of Web resources or Web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in Web mining. For example, the clusters and associations in Web mining do not necessarily have crisp boundaries. Moreover, due to a variety of reasons inherent in Web browsing and Web logging, the likelihood of bad or incomplete data is higher. As a result, researchers have studied the possibility of using fuzzy sets in Web mining clustering applications. The paper describes how rough set theory can also be used to develop clustering schemes for Web mining. The unsupervised classification described in the paper uses properties of rough sets along with genetic algorithms to represent clusters as interval sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of groups of Web visitors.
European Journal of Operational Research | 2003
Cedric Davies; Pawan Lingras
Abstract This paper considers the problem of finding the shortest path in a dynamic network, where the weights change as yet-to-be-known functions of time. Routing decisions are based on constantly changing predictions of the weights. The problem has some useful applications in computer and highway networks. The Genetic Algorithm (GA) based strategy presented in this paper, adapts to the changing network information by rerouting during the course of its execution. The paper describes the implementation of the algorithm and results of experiments. A brief discussion on potential applications is also provided.