Juzhen Dong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juzhen Dong is active.

Explore More

Publication

Featured researches published by Juzhen Dong.

soft computing | 1999

Using Rough Sets with Heuristics for Feature Selection

Juzhen Dong; Ning Zhong; Setsuo Ohsuga

Practical machine learning algorithms are known to degrade in performance when faced with many features that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features with similar-enough behaviors to merit focused analysis have been proposed. In such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because the complexity of time and space. In this paper, we propose an algorithm of using the rough set methodology with greedy heuristics for feature selection. In our approach, selecting features is similar as the filter approach, but the performance of induction is considered in the evaluation criterion for feature selection. That is, we select the features that damage the performance of induction as little as possible.

Archive | 1998

Data Mining: A Probabilistic Rough Set Approach

Ning Zhong; Juzhen Dong; Setsuo Ohsuga

This paper introduces a new approach for mining if-then rules in databases with uncertainty and incompleteness. The approach is based on the combination of Generalization Distribution Table (GDT) and the Rough Set methodology. A GDT is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. By using a GDT as a hypothesis search space and combining the GDT with the rough set methodology, noises and unseen instances can be handled, biases can be flexibly selected, background knowledge can be used to constrain rule generation, and if-then rules with strengths can be effectively acquired from large, complex databases in an incremental, bottom-up mode. In this paper, we focus on basic concepts and an implementation of our methodology.

Neurocomputing | 2001

Rule discovery by soft induction techniques

Ning Zhong; Juzhen Dong; Setsuo Ohsuga

Abstract The paper describes two soft induction techniques, GDT-NR and GDT-RS, for discovering classification rules from databases with uncertainty and incompleteness. The techniques are based on a generalization distribution table (GDT), in which the probabilistic relationships between concepts and instances over discrete domains are represented. By using the GDT as a probabilistic search space, (1) unseen instances can be considered in the rule discovery process and the uncertainty of a rule, including its ability to predict unseen instances, can be explicitly represented in the strength of the rule; (2) biases can be flexibly selected for search control and background knowledge can be used as a bias to control the creation of a GDT and the rule discovery process. We describe that a GDT can be represented by a variant of connectionist networks (GDT-NR for short), and rules can be discovered by learning on the GDT-NR. Furthermore, we combine the GDT with the rough set methodology (GDT-RS for short). By using GDT-RS, a minimal set of rules with larger strengths can be acquired from databases with noisy, incomplete data. We compare GDT-NR with GDT-RS, and describe GDT-RS is a better way than GDT-NR for large, complex databases.

Lecture Notes in Computer Science | 1998

Frameworks for Mining Binary Relations in Data

Tsau Young Lin; Ning Zhong; Juzhen Dong; Setsuo Ohsuga

This paper extends the notion of information tables and concept hierarchies of equivalence relations to binary relations. So extended rough set theory and attribute oriented generalization techniques can be used to mining binary relations in data.

international syposium on methodologies for intelligent systems | 1999

Probabilistic Rough Induction: The GDT-RS Methodology and Algorithms

Juzhen Dong; Ning Zhong; Setsuo Ohsuga

In this paper, we introduce a probabilistic rough induction methodology and discuss two algorithms for its implementation. This methodology is based on the combination of Generalization Distribution Table (GDT) and the Rough Set theory (GDT-RS for short). A GDT is a table in which the probabilistic relationships between concepts and instances over discrete domains are represented. The GDT provides a probabilistic basis for evaluating the strength of a rule. The rough set theory is used to find minimal relative reducts from the set of rules with larger strength. Main features of the GDT-RS are (1) biases can be selected flexibly for search control, and background knowledge can be used as a bias to control the creation of a GDT and the rule induction process; (2) the uncertainty of a rule including the prediction of possible instances can be represented explicitly in the strength of the rule.

Pattern Recognition Letters | 2003

Meningitis data mining by cooperatively using GDT-RS and RSBR

Ning Zhong; Juzhen Dong; Setsuo Ohsuga

This paper describes an application of two rough sets based systems, namely generalized distribution table and rough set (GDT-RS) and rough sets with Boolean reasoning (RSBR) respectively, for mining if-then rules in a meningitis dataset. GDT-RS is a soft hybrid induction system, and RSBR is used for discretization of real valued attributes as a pre-processing step realized before the GDT-RS starts. We argue that discretization of continuous valued attributes is an important pre-processing step in the rule discovery process. We illustrate the quality of rules discovered by GDT-RS is strongly affected by the result of discretization.

Knowledge Based Systems | 2001

A hybrid model for rule discovery in data

Ning Zhong; Juzhen Dong; Chunnian Liu; Setsuo Ohsuga

This paper presents a hybrid model for rule discovery in real world data with uncertainly and incompleteness. The hybrid model is created by introducing an appropriate relationship between deductive reasoning and stochastic process, and extending the relationship so as to include abduction. Furthermore, a generalization distribution table (GDT), which is a variant of transitions matrix in stochastic process, is defined. Thus, the typical methods of symbolic reasoning such as deduction, induction, and abduction, as well as the methods based on soft computing techniques such as rough sets, fuzzy sets, and granular computing can be cooperatively used by taking the GDT and/or the transition matrix in stochastic process as mediums. Ways of implementation of the hybrid model are also discussed.

Lecture Notes in Computer Science | 2002

Gastric Cancer Data Mining with Ordered Information

Ning Zhong; Juzhen Dong; Yiyu Yao; Setsuo Ohsuga

Ordered information is a kind of useful background knowledge to guide a discovery process toward finding different types of novel rules and improving their quality for many real world data mining tasks. In the paper, we investigate ways of using ordered information for gastric cancer data mining, based on rough set theory and granular computing. With respect to the notion of ordered information tables, we describe how to mine ordering rules and how to form granules of values of attributes in a pre/post-processing step for improving the quality of the mined classification rules. Experimental results in gastric cancer data mining show the usefulness and effectiveness of our approaches.

knowledge discovery and data mining | 1998

Data Mining Based on the Generalization Distribution Table and Rough Sets

Ning Zhong; Juzhen Dong; Setsuo Ohsuga

This paper introduces a new approach for mining if-then rules in databases with uncertainty and incompleteness. This approach is based on the combination of Generalization Distribution Table (GDT) and the rough set methodology. The GDT provides a probabilistic basis for evaluating the strength of a rule. It is used to find the rules with larger strengths from possible rules. Furthermore, the rough set methodology is used to find minimal relative reducts from the set of rules with larger strengths. The strength of a rule represents the uncertainty of the rule, which is influenced by both unseen instances and noises. By using our approach, a minimal set of rules with larger strengths can be acquired from databases with noisy, incomplete data. We have applied this approach to discover rules from some real databases.

european conference on principles of data mining and knowledge discovery | 2000

Using Background Knowledge as a Bias to Control the Rule Discovery Process

Ning Zhong; Juzhen Dong; Setsuo Ohsuga

This paper investigates a way of using background knowledge in the rule discovery process. This technique is based on Generalization Distribution Table (GDT for short), in which the probabilistic relationships between concepts and instances over discrete domains are represented. We describe how to use background knowledge as a bias to adjust the prior distribution so that the better knowledge can be discovered.

Explore More