Andreas Hapfelmeier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Hapfelmeier is active.

Explore More

Publication

Featured researches published by Andreas Hapfelmeier.

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases | 2005

Inductive databases in the relational model: the data as the bridge

Stefan Kramer; Volker Aufschild; Andreas Hapfelmeier; Alexander Jarasch; Kristina Kessler; Stefan Reckow; Jörg Wicker; Lothar Richter

We present a new and comprehensive approach to inductive databases in the relational model. The main contribution is a new inductive query language extending SQL, with the goal of supporting the whole knowledge discovery process, from pre-processing via data mining to post-processing. A prototype system supporting the query language was developed in the SINDBAD (structured inductive database development) project. Setting aside models and focusing on distance-based and instance-based methods, closure can easily be achieved. An example scenario from the area of gene expression data analysis demonstrates the power and simplicity of the concept. We hope that this preliminary work will help to bring the fundamental issues, such as the integration of various pattern domains and data mining techniques, to the attention of the inductive database community.

international conference on data mining | 2008

Interpreting PET Scans by Structured Patient Data: A Data Mining Case Study in Dementia Research

Andreas Hapfelmeier; Jana Schmidt; Marianne Mueller; Stefan Kramer; Robert Perneczky; Alexander Kurz; Alexander Drzezga

One of the goals of medical research in the area of dementia is to correlate images of the brain with other variables, for instance, demographic information or outcomes of clinical tests. The usual approach is to select a subset of patients based on such variables and analyze the images associated with those patients. In this paper, we apply data mining techniques to take the opposite approach: We start with the images and explain the differences and commonalities in terms of the other variables. In the first step, we cluster PET scans of patients to form groups sharing similar features in brain metabolism. To the best of our knowledge, it is the first time ever that clustering is applied to whole PET scans. In the second step, we explain the clusters by relating them to non-image variables. To do so, we employ RSD, an algorithm for relational subgroup discovery, with the cluster membership of patients as target variable. Our results enable interesting interpretations of differences in brain metabolism in terms of demographic and clinical variables. The approach was implemented and tested on an exceptionally large pre-existing data collection of patients with different types of dementia. It comprises 10 GB of image data from 454 PET scans, and 42 variables from psychological and demographical data organized in 11 relations of a relational database. We believe that explaining medical images in terms of other variables (patient records, demographic information, etc.) is a challenging new and rewarding area for data mining research.

artificial intelligence in medicine in europe | 2011

A case study of stacked multi-view learning in dementia research

Rui Li; Andreas Hapfelmeier; Jana Schmidt; Robert Perneczky; Alexander Drzezga; Alexander Kurz; Stefan Kramer

Classification of different types of dementia commonly involves examination from several perspectives, e.g., medical images, neuropsychological tests, etc. Thus, dementia classification should lend itself to so-called multi-view learning. Instead of simply combining several views, we use stacking to make the most of the information from the various views (PET scans, MMSE, CERAD and demographic variables). In the paper, we not only show the performance of stacked multiview learning on classifying dementia data, we also try to explain the factors contributing to its performance. More specifically, we show that the correlation of views on the base and the meta level should be within certain ranges to facilitate successful stacked multi-view learning.

IEEE Transactions on Knowledge and Data Engineering | 2014

Pruning Incremental Linear Model Trees with Approximate Lookahead

Andreas Hapfelmeier; Bernhard Pfahringer; Stefan Kramer

Incremental linear model trees with approximate lookahead are fast, but produce overly large trees. This is due to non-optimal splitting decisions boosted by a possibly unlimited number of examples obtained from a data source. To keep the processing speed high and the tree complexity low, appropriate incremental pruning techniques are needed. In this paper, we introduce a pruning technique for the class of incremental linear model trees with approximate lookahead on stationary data sources. Experimental results show that the advantage of approximate lookahead in terms of processing speed can be further improved by producing much smaller and consequently more explanatory, less memory consuming trees on high-dimensional data. This is done at the expense of only a small increase in prediction error. Additionally, the pruning algorithm can be tuned to either produce less accurate model trees at a much higher processing speed or, alternatively, more accurate trees at the expense of higher processing times.

International Wound Journal | 2012

Improving wound score classification with limited remission spectra

Jana Schmidt; Andreas Hapfelmeier; Wolf-Dieter Schmidt; Uwe Wollina

The classification of wounds into healing states depending on their absorption spectrum of visible and near infrared light remains an important task in dermatology. Moreover, a reduction of the spectrum that is used in the classification task to fewer but important wavelengths is desirable, as each measured wavelength increases the examination costs without necessarily providing further information to the classification of wound healing states. This paper addresses two aspects: First the improvement of the classification of wounds into healing states and second, a cost reduction by choosing only important wavelengths. Standard Data Mining methods are evaluated for their classification accuracy (CA) and compared to their performance when applying feature selection techniques that are used to reduce the amount of necessary wavelengths. The results indicate that the 1‐nearest‐neighbor approach (IB1 algorithm) comes up with the best CA, while only relying on a fraction (4%) of the standard wavelength spectrum.

the internet of things | 2016

Detecting Data Stream Dependencies on High Dimensional Data

Jonathan Boidol; Andreas Hapfelmeier

Intelligent production in smart factories or wearable devices that measure our activities produce on an ever growing amount of sensor data. In these environments, the validation of measurements to distinguish sensor flukes from significant events is of particular importance. We developed an algorithm that detects dependencies between sensor readings. These can be used for instance to verify or analyze large scale measurements. An entropy based approach allows us to detect dependencies beyond linear correlation and is well suited to deal with high dimensional and high volume data streams. Results show statistically significant improvements in reliability and on-par execution time over other stream monitoring systems.

intelligent data analysis | 2013

Learning probabilistic real-time automata from multi-attribute event logs

Jana Schmidt; Asghar Ghorbani; Andreas Hapfelmeier; Stefan Kramer

The growing number of time-labeled datasets in science and industry increases the need for algorithms that automatically induce process models. Existing methods are capable of identifying process models that typically only work on single attribute events. We propose a new model type to address the problem of mining multi-attribute events, meaning that each event is described by a vector of attributes. The model is based on timed automata, includes expressive descriptions of states and can be used for making predictions. A probabilistic real time automaton is created, where each state is annotated by a profile of events. To identify the states of the automaton, similar events are combined by a clustering approach. The method was implemented and tested on a synthetic, a medical and a biological dataset. Its prediction accuracy was evaluated on a medical dataset and compared to a combined logistic regression, which is considered a standard in this application domain. Moreover, the method was experimentally compared to Multi-Output HMMs and Petri nets learned by standard process mining algorithms. The experimental comparison suggests that the automaton-based approach performs favorably in several dimensions. Most importantly, we show that meaningful medical and biological process knowledge can be extracted from such automata.

acm symposium on applied computing | 2013

Incremental linear model trees on massive datasets: keep it simple, keep it fast

Andreas Hapfelmeier; Jana Schmidt; Stefan Kramer

The existence of massive datasets raises the need for algorithms that make efficient use of resources like memory and computation time. Besides well-known approaches such as sampling, online algorithms are being recognized as good alternatives, as they often process datasets faster using much less memory. The important class of algorithms learning linear model trees online (incremental linear model trees or ILMTs in the following) offers interesting options for regression tasks in this sense. However, surprisingly little is known about their performance, as there exists no large-scale evaluation on massive stationary datasets under equal conditions. Therefore, this paper shows their applicability on massive stationary datasets under various parameter settings. To reduce biases arising from the choice of a programming language or programming skills, all algorithms were reimplemented within the same framework and tested under the same conditions. Results on real-world datasets indicate that for massive stationary datasets parameter settings leading to complex models do not pay off, as there is at most a small accuracy gain at a much larger running time. Experimental evidence suggests that simple and fast algorithms perform best.

Knowledge and Information Systems | 2010