Christoph F. Eick | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christoph F. Eick is active.

Explore More

Publication

Featured researches published by Christoph F. Eick.

Evolutionary Programming | 1997

Supporting Polyploidy in Genetic Algorithms Using Dominance Vectors

Ben S. Hadad; Christoph F. Eick

By memorizing alleles that have been successful in the past, polyploidy has been found to be beneficial for adapting to changing environments. This paper explores the benefits of using polyploidy in Genetic Algorithms. Polyploidy is provided in our approach by using a local chromosome to reflect dominance in diploid and tetraploid organisms, with and without evolving crossover points, added to provide linkage between chromosomes and the dominance control vector. We compare our polyploid approach to a haploid implementation for a benchmark that involves a 0/1 knapsack problem with time varying weight constraints.

advances in geographic information systems | 2008

Finding regional co-location patterns for sets of continuous variables in spatial datasets

Christoph F. Eick; Rachana Parmar; Wei Ding; Tomasz F. Stepinski; Jean-Philippe Nicot

This paper proposes a novel framework for mining regional co-location patterns with respect to sets of continuous variables in spatial datasets. The goal is to identify regions in which multiple continuous variables with values from the wings of their statistical distribution are co-located. A co-location mining framework is introduced that operates in the continuous domain and which views regional co-location mining as a clustering problem in which an externally given fitness function has to be maximized. Interestingness of co-location patterns is assessed using products of z-scores of the relevant continuous variables. The proposed framework is evaluated by a domain expert in a case study that analyzes Arsenic contamination in Texas water wells centering on regional co-location patterns. Our approach is able to identify known and unknown regional co-location patterns, and different sets of algorithm parameters lead to the characterization of Arsenic distribution at different scales. Moreover, inconsistent colocation sets are found for regions in South Texas and West Texas that can be clearly attributed to geological differences in the two regions, emphasizing the need for regional co-location mining techniques. Moreover, a novel, prototype-based region discovery algorithm named CLEVER is introduced that uses randomized hill climbing, and searches a variable number of clusters and larger neighborhood sizes.

international conference on data mining | 2004

Using representative-based clustering for nearest neighbor dataset editing

Christoph F. Eick; Nidal M. Zeidat; Ricardo Vilalta

The goal of dataset editing in instance-based learning is to remove objects from a training set in order to increase the accuracy of a classifier. For example, Wilson editing removes training examples that are misclassified by a nearest neighbor classifier so as to smooth the shape of the resulting decision boundaries. This paper revolves around the use of representative-based clustering algorithms for nearest neighbor dataset editing. We term this approach supervised clustering editing. The main idea is to replace a dataset by a set of cluster prototypes. A clustering approach called supervised clustering is introduced for this purpose. Our empirical evaluation using eight UCI datasets shows that both Wilson and supervised clustering editing improve accuracy on more than 50% of the datasets tested. However, supervised clustering editing achieves four times higher compression rates than Wilson editing.

european conference on principles of data mining and knowledge discovery | 2006

Discovery of interesting regions in spatial data sets using supervised clustering

Christoph F. Eick; Banafsheh Vaezian; Dan Jiang; Jing Wang

The discovery of interesting regions in spatial datasets is an important data mining task. In particular, we are interested in identifying disjoint, contiguous regions that are unusual with respect to the distribution of a given class; i.e. a region that contains an unusually low or high number of instances of a particular class. This paper centers on the discussion of techniques, methodologies, and algorithms to discover such regions. A measure of interestingness and a supervised clustering framework are introduced for this purpose. Moreover, three supervised clustering algorithms are proposed in the paper: an agglomerative hierarchical supervised clustering named SCAH, an agglomerative, grid-based clustering method named SCHG, and lastly an algorithm named SCMRG which searches a multi-resolution grid structure top down for interesting regions. Finally, experimental results of applying the proposed framework and algorithms to the problem of identifying hotspots in spatial datasets are discussed.

international conference on management of data | 1985

Acquisition of terminological knowledge using database design techniques

Christoph F. Eick; Peter C. Lockemann

One of the most dlfflcult problems m knowledge base design 1s the acqulsltlon and formahzatlon of an expert’s rules concerning a special universe of discourse In most cases different experts and the knowledge base designer hnnself will use different termmologles. and ~111 represent rules concerning the same objects m a different way Therefore, one of the first steps m knowledge base design has to be the construction of an integrated, commonly accepted terrmnology, that can be shared by all persons involved in the design process This design step will be the topic of the paper The paper proposes concepts, methods and tools to support the extraction, mtegratlon, transformatlon and evaluation of termmologlcal knowledge that are based on database design techmques and discusses the posslblhtles and lm-ntatlons of automatmg these keywords and phrases knowledge based systems, knowledge base design, database design. conceptual modellmg. semantic modellmg. termmological knowledge acqulsltlon. knowledge mtegratlon, design automation

data warehousing and knowledge discovery | 2007

MOSAIC: a proximity graph approach for agglomerative clustering

Jiyeon Choo; Rachsuda Jiamthapthaksin; Chun-Sheng Chen; Oner Ulvi Celepcikay; Christian Giusti; Christoph F. Eick

Representative-based clustering algorithms are quite popular due to their relative high speed and because of their sound theoretical foundation. On the other hand, the clusters they can obtain are limited to convex shapes and clustering results are also highly sensitive to initializations. In this paper, a novel agglomerative clustering algorithm called MOSAIC is proposed which greedily merges neighboring clusters maximizing a given fitness function. MOSAIC uses Gabriel graphs to determine which clusters are neighboring and approximates non-convex shapes as the unions of small clusters that have been computed using a representative-based clustering algorithm. The experimental results show that this technique leads to clusters of higher quality compared to running a representative clustering algorithm standalone. Given a suitable fitness function, MOSAIC is able to detect arbitrary shape clusters. In addition, MOSAIC is capable of dealing with high dimensional data.

IEEE Intelligent Systems | 1993

Integrating sets, rules, and data in an object-oriented environment

Bogdan D. Czejdo; Christoph F. Eick; Malcolm C. Taylor

The Tanguy knowledge-base management system, which integrates rule-base, database, and object-oriented paradigms to capture the advantages of each, is discussed. Tanguy provides set-oriented interfaces, data-driven production rules, and permanent object storage in a C++ environment. The need for integration of the three paradigms is reviewed. The Tanguy architecture, the data model used in Tanguy, and Tanguys production rules are described.<<ETX>>

international conference on data mining | 2006

A Framework for Regional Association Rule Mining in Spatial Datasets

Wei Ding; Christoph F. Eick; Jing Wang; Xiaojing Yuan

The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One of the special challenges for spatial data mining is that information is usually not uniformly distributed in spatial datasets. Consequently, the discovery of regional knowledge is of fundamental importance for spatial data mining. This paper centers on discovering regional association rules in spatial datasets. In particular, we introduce a novel framework to mine regional association rules relying on a given class structure. A reward-based regional discovery methodology is introduced, and a divisive, grid-based supervised clustering algorithm is presented that identifies interesting subregions in spatial datasets. Then, an integrated approach is discussed to systematically mine regional rules. The proposed framework is evaluated in a real-world case study that identifies spatial risk patterns from arsenic in the Texas water supply.

knowledge discovery and data mining | 2008

Towards region discovery in spatial datasets

Wei Ding; Rachsuda Jiamthapthaksin; Rachana Parmar; Dan Jiang; Tomasz F. Stepinski; Christoph F. Eick

This paper presents a novel region discovery framework geared towards finding scientifically interesting places in spatial datasets. We view region discovery as a clustering problem in which an externally given fitness function has to be maximized. The framework adapts four representative clustering algorithms, exemplifying prototype-based, grid-based, density-based, and agglomerative clustering algorithms, and then we systematically evaluated the four algorithms in a real-world case study. The task is to find feature-based hotspots where extreme densities of deep ice and shallow ice co-locate on Mars. The results reveal that the density-based algorithm outperforms other algorithms inasmuch as it discovers more regions with higher interestingness, the grid-based algorithm can provide acceptable solutions quickly, while the agglomerative clustering algorithm performs best to identify larger regions of arbitrary shape. Moreover, the results indicate that there are only a few regions on Mars where shallow and deep ground ice co-locate, suggesting that they have been deposited at different geological times.

Information Sciences | 2005

A database clustering methodology and tool

Tae-Wan Ryu; Christoph F. Eick

Clustering is a popular data analysis and data mining technique. However, applying traditional clustering algorithms directly to a database is not straightforward due to the fact that a database usually consists of structured and related data; moreover, there might be several object views of the database to be clustered, depending on a data analysts particular interest. Finally, in many cases, there is a data model discrepancy between the format used to store the database to be analyzed and the representation format that clustering algorithms expect as their input. These discrepancies have been mostly ignored by current research.This paper focuses on identifying those discrepancies and on analyzing their impact on the application of clustering techniques to databases. We are particularly interested in the question on how clustering algorithms can be generalized to become more directly applicable to real-world databases. The paper introduces methodologies, techniques, and tools that serve this purpose. We propose a data set representation framework for database clustering that characterizes objects to be clustered through sets of tuples, and introduce preprocessing techniques and tools to generate object views based on this framework. Moreover, we introduce bag-oriented similarity measures and clustering algorithms that are suitable for our proposed data set representation framework. We also demonstrate that our approach is capable of dealing with relationship information commonly found in databases through the bag-oriented clustering. We also argue that our bag-oriented data representation framework is more suitable for database clustering than the commonly used flat file format and produce better quality of clusters.

Explore More