Is this you? Create Your Porfile

Dan Li

University of Nebraska–Lincoln

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Li is active.

Explore More

Publication

Featured researches published by Dan Li.

Lecture Notes in Computer Science | 2004

Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method

Dan Li; Jitender S. Deogun; William D. Spaulding; Bill Shuart

In this paper, we present a missing data imputation method based on one of the most popular techniques in Knowledge Discovery in Databases (KDD), i.e. clustering technique. We combine the clustering method with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply a fuzzy clustering algorithm to deal with incomplete data. Our experiments show that the fuzzy imputation algorithm presents better performance than the basic clustering algorithm.

international syposium on methodologies for intelligent systems | 2005

Discovering partial periodic sequential association rules with time lag in multiple sequences for prediction

Dan Li; Jitender S. Deogun

A periodic pattern indicates something persistent and predictable, so it is important to identify and characterize the periodicity. This paper presents an approach for mining partial periodic association rules in temporal databases. This approach allows the discovery of periodic episodes such that the events in an episode are not limited to a fixed order. Moreover, this approach treats the antecedent and consequent of a rule separately and allows time lag between them. Thus, rules discovered are useful in many applications for prediction. The approach is implemented using two algorithms based on two data structures, event-based linked list and window-based linked list.

Lecture Notes in Computer Science | 2005

Dealing with missing data: algorithms based on fuzzy set and rough set theories

Dan Li; Jitender S. Deogun; William D. Spaulding; Bill Shuart

Missing data, commonly encountered in many fields of study, introduce inaccuracy in the analysis and evaluation. Previous methods used for handling missing data (e.g., deleting cases with incomplete information, or substituting the missing values with estimated mean scores), though simple to implement, are problematic because these methods may result in biased data models. Fortunately, recent advances in theoretical and computational statistics have led to more flexible techniques to deal with the missing data problem. In this paper, we present missing data imputation methods based on clustering, one of the most popular techniques in Knowledge Discovery in Databases (KDD). We combine clustering with soft computing, which tends to be more tolerant of imprecision and uncertainty, and apply fuzzy and rough clustering algorithms to deal with incomplete data. The experiments show that a hybridization of fuzzy set and rough set theories in missing data imputation algorithms leads to the best performance among our four algorithms, i.e., crisp K-means, fuzzy K-means, rough K-means, and rough-fuzzy K-means imputation algorithms.

granular computing | 2007

Gene Function Classification Using Fuzzy K-Nearest Neighbor Approach

Dan Li; Jitender S. Deogun; Kefei Wang

Prediction of gene function is a classification problem. Given its simplicity and relatively high accuracy, K-Nearest Neighbor (KNN) classification has become a popular choice for many real life applications. However, traditional KNN approach has two drawbacks. First, it cannot identify classes that do not exist in the training data sets. Second, it treats all K neighbors in a similar way without consideration of the distance differences between the test instance and its neighbors. In this paper, exploiting the potential of fuzzy set theory to handle uncertainty in data sets, we develop a fuzzy KNN approach for gene function classification. Experiments show that integrating fuzzy set theory into original KNN approach improves the overall performance of the classification model.

granular computing | 2003

Interpolation techniques for geo-spatial association rule mining

Dan Li; Jitender S. Deogun; Sherri K. Harms

Association rule mining has become an important component of information processing systems due to significant increase in its applications. In this paper, our main objective is to find which interpolation approaches are best suited for discovering geo-spatial association rules from unsampled points. We investigate and integrate two interpolation approaches into our geo-spatial association rule mining algorithm. We call them pre-interpolation and post-interpolation approaches.

rough sets and knowledge technology | 2006

FADS: a fuzzy anomaly detection system

Dan Li; Kefei Wang; Jitender S. Deogun

In this paper, we propose a novel anomaly detection framework which integrates soft computing techniques to eliminate sharp boundary between normal and anomalous behavior. The proposed method also improves data pre-processing step by identifying important features for intrusion detection. Furthermore, we develop a learning algorithm to find classifiers for imbalanced training data to avoid some assumptions made in most learning algorithms that are not necessarily sound. Preliminary experimental results indicate that our approach is very effective in anomaly detection

intelligence and security informatics | 2006

A fuzzy anomaly detection system

Dan Li; Kefei Wang; Jitender S. Deogun

Due to increasing incidents of cyber attacks and heightened concerns for cyber terrorism, implementing effective intrusion detection systems (IDSs) is an essential task for protecting cyber security. Intrusion detection is the process of monitoring and analyzing the events occurring in a computer system in order to detect signs of security problems [1]. Even though the intrusion detection problem has been studied intensively [2], current techniques for intrusion detection still have limitations considering the following three aspects: (1) It is very common to focus on the data mining step, while the other Knowledge Discovery in Databases (KDD) steps are largely ignored [4]. (2) Many intrusion detection systems assume the existence of sharp boundary between normal and anomalous behavior. This assumption, consequently, causes an abrupt separation between normality and anomaly. (3) The construction of many intrusion detection systems is based on some strong assumptions on input data set that make practical applications impractical. Considering all of these limitations, in this paper, we propose a novel anomaly detection framework that has several desirable features.

international syposium on methodologies for intelligent systems | 2003

Spatio-Temporal Association Mining for Un-sampled Sites

Dan Li; Jitender S. Deogun

In this paper, we investigate interpolation methods that are suitable for discovering spatio-temporal association rules for unsampled points with an initial focus on drought risk management. For drought risk management, raw weather data is collected, converted to various indices, and then mined for association rules. To generate association rules for unsampled sites, interpolation methods can be applied at any stage of this data mining process. We develop and integrate three interpolation models into our association rule mining algorithm. The performance of these three models is experimentally evaluated comparing interpolated association rules with rules discovered from actual raw data.

Archive | 2009

Applications of Fuzzy and Rough Set Theory in Data Mining

Dan Li; Jitender S. Deogun

The explosion of very large databases has created extraordinary opportunities for monitoring, analyzing and predicting global economical, geographical, demographic, medical, political, and other processes in the world. Statistical analysis and data mining techniques have emerged for these purposes. Data mining is the process of discovering previously unknown but potentially useful patterns, rules, or associations from huge quantity of data. Data mining can be performed on different data repositories such as relational databases, data warehouses, transactional databases, sequence databases, spatial databases, spatio-temporal databases, and text databases, etc. Typically, data mining functionalities can be classified into two categories: descriptive and predictive. Descriptive mining tasks aim at characterizing the general properties of the data in the databases, while predictive mining tasks perform inherence on the current data in order to make prediction in future.

digital government research | 2006

Periodic association mining in a geospatial decision support system

Dan Li; Jitender S. Deogun

This paper presents an approach for mining partial periodic association rules in temporal databases. This approach allows the discovery of periodic episodes such that the events in an episode are not limited to a fixed order. Moreover, this approach treats the antecedent and consequent of a rule separately and allows time lag between them. Thus, rules discovered are useful in many applications for prediction.

Explore More