Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael R. Berthold is active.

Publication


Featured researches published by Michael R. Berthold.


4th Annual Industrial Simulation Conference (ISC) | 2008

KNIME: The Konstanz Information Miner

Michael R. Berthold; Nicolas Cebron; Fabian Dill; Thomas R. Gabriel; Tobias Kötter; Thorsten Meinl; Peter Ohl; Christoph Sieb; Kilian Thiel; Bernd Wiswedel

The Konstanz Information Miner is a modular environment, which enables easy visual assembly and interactive execution of a data pipeline. It is designed as a teaching, research and collaboration platform, which enables simple integration of new algorithms and tools as well as data manipulation or visualization methods in the form of new modules or nodes. In this paper we describe some of the design aspects of the underlying architecture and briefly sketch how new nodes can be incorporated.


international conference on data mining | 2002

Mining molecular fragments: finding relevant substructures of molecules

Christian Borgelt; Michael R. Berthold

We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institutes HIV-screening dataset.


Sigkdd Explorations | 2009

KNIME - the Konstanz information miner: version 2.0 and beyond

Michael R. Berthold; Nicolas Cebron; Fabian Dill; Thomas R. Gabriel; Tobias Kötter; Thorsten Meinl; Peter Ohl; Kilian Thiel; Bernd Wiswedel

The Konstanz Information Miner is a modular environment, which enables easy visual assembly and interactive execution of a data pipeline. It is designed as a teaching, research and collaboration platform, which enables simple integration of new algorithms and tools as well as data manipulation or visualization methods in the form of new modules or nodes. In this paper we describe some of the design aspects of the underlying architecture, briey sketch how new nodes can be incorporated, and highlight some of the new features of version 2.0.


intelligent data analysis | 2000

Advances in intelligent data analysis

David J. Hand; Douglas H. Fisher; Michael R. Berthold

aDepartment of Mathematics, Imperial College, 180 Queen’s Gate, London, SW7 2BZ, UK E-mail: [email protected]; URL: http://www.ma.ic.ac.uk/statistics/djhand.html bDepartment of Computer Science, Box 1679, Station B, Vanderbilt University, Nashville, TN 37235, USA E-mail: [email protected]; URL: http://cswww.vuse.vanderbilt.edu/ ̃dfisher/ cBerkeley Initiative in Soft Computing (BISC), Department of EECS, CS Division, University of California, Berkeley, CA 94720, USA E-mail: [email protected]; URL: http://www.cs.berkeley.edu/ ̃berthold


Neurocomputing | 1998

Constructive training of probabilistic neural networks

Michael R. Berthold; Jay Diamond

Abstract This paper presents an easy to use, constructive training algorithm for probabilistic neural networks, a special type of radial basis function networks. In contrast to other algorithms, predefinition of the network topology is not required. The proposed algorithm introduces new hidden units whenever necessary and adjusts the shape of already existing units individually to minimize the risk of misclassification. This leads to smaller networks compared to classical PNNs and therefore enables the use of large data sets. Using eight classification benchmarks from the StatLog project, the new algorithm is compared to other state of the art classification methods. It is demonstrated that the proposed algorithm generates probabilistic neural networks that achieve a comparable classification performance on these data sets. Only two rather uncritical parameters are required to be adjusted manually and there is no danger of overtraining – the algorithm clearly indicates the end of training. In addition, the networks generated are small due to the lack of redundant neurons in the hidden layer.


intelligent data analysis | 1998

Reasoning about Data

Xiaohui Liu; Paul R. Cohen; Michael R. Berthold

Two factors have affected the work of modem data analysts more than any others. First, the size of machine-readable data sets has increased, especially during the last decade or so. Second, computational methods and tools are being developed that enhance traditional statistical analysis. These two developments have created a new range of problems and challenges for analysts, as well as new opportunities for intelligent systems in data analysis. To provide an international forum for the discussion of these topics, a series of symposia on Intelligent Data Analysis was initiated in 1995 [4]. The second Intelligent Data Analysis conference (IDA-97) was held at Birkbeck College, University of London, 4th-6th August 1997. Almost 130 people from twenty countries in four continents took part in the symposium. A total of 107 papers were submitted to the IDA-97 conference, of which 50 were accepted as either oral or poster presentations. After the conference, five papers were chosen from the conference program and their authors were invited to prepare extended versions for publication in the Intelligent Data Analysis (IDA) JournaL A second round of review provided additional feedback to the authors and the papers are now presented in this special Issue.


Archive | 2010

Guide to Intelligent Data Analysis

Michael R. Berthold; Christian Borgelt; Frank Höppner; Frank Klawonn

Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now at least in principle solve any problem we are faced with so long as we only have enough data. Yet this is not the case. Although large databases allow us to retrieve many different single pieces of information and to compute simple aggregations, general patterns and regularities often go undetected. Furthermore, it is exactly these patterns, regularities and trends that are often most valuable. To avoid the danger of drowning in information, but starving for knowledge the branch of research known as data analysis has emerged, and a considerable number of methods and software tools have been developed. However, it is not these tools alone but the intelligent application of human intuition in combination with computational power, of sound background knowledge with computer-aided modeling, and of critical reflection with convenient automatic model construction, that results in successful intelligent data analysis projects. Guide to Intelligent Data Analysis provides a hands-on instructional approach to many basic data analysis techniques, and explains how these are used to solve data analysis problems. Topics and features: guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring; equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; includes numerous examples using R and KNIME, together with appendices introducing the open source software; integrates illustrations and case-study-style examples to support pedagogical exposition. This practical and systematic textbook/reference for graduate and advanced undergraduate students is also essential reading for all professionals who face data analysis problems. Moreover, it is a book to be used following ones exploration of it. Dr. Michael R. Berthold is Nycomed-Professor of Bioinformatics and Information Mining at the University of Konstanz, Germany. Dr. Christian Borgelt is Principal Researcher at the Intelligent Data Analysis and Graphical Models Research Unit of the European Centre for Soft Computing, Spain. Dr. Frank Hppner is Professor of Information Systems at Ostfalia University of Applied Sciences, Germany. Dr. Frank Klawonn is a Professor in the Department of Computer Science and Head of the Data Analysis and Pattern Recognition Laboratory at Ostfalia University of Applied Sciences, Germany. He is also Head of the Bioinformatics and Statistics group at the Helmholtz Centre for Infection Research, Braunschweig, Germany.


COMPLIFE | 2005

Computational Life Sciences II

Michael R. Berthold; Robert C. Glen; Kay Diederichs; Oliver Kohlbacher; Ingrid Fischer

Come with us to read a new book that is coming recently. Yeah, this is a new coming book that many people really want to read will you be one of them? Of course, you should be. It will not make you feel so hard to enjoy your life. Even some people think that reading is a hard to do, you must be sure that you can do it. Hard will be felt when you have no ideas about what kind of book to read. Or sometimes, your reading material is not interesting enough.Systems Biology.- Structural Protein Interactions Predict Kinase-Inhibitor Interactions in Upregulated Pancreas Tumour Genes Expression Data.- Biochemical Pathway Analysis via Signature Mining.- Recurrent Neuro-fuzzy Network Models for Reverse Engineering Gene Regulatory Interactions.- Data Analysis and Integration.- Some Applications of Dummy Point Scatterers for Phasing in Macromolecular X-Ray Crystallography.- BioRegistry: A Structured Metadata Repository for Bioinformatic Databases.- Robust Perron Cluster Analysis for Various Applications in Computational Life Science.- Structural Biology.- Multiple Alignment of Protein Structures in Three Dimensions.- Protein Annotation by Secondary Structure Based Alignments (PASSTA).- MAPPIS: Multiple 3D Alignment of Protein-Protein Interfaces.- Genomics.- Frequent Itemsets for Genomic Profiling.- Gene Selection Through Sensitivity Analysis of Support Vector Machines.- The Breakpoint Graph in Ciliates.- Computational Proteomics.- ProSpect: An R Package for Analyzing SELDI Measurements Identifying Protein Biomarkers.- Algorithms for the Automated Absolute Quantification of Diagnostic Markers in Complex Proteomics Samples.- Detection of Protein Assemblies in Crystals.- Molecular Informatics.- Molecular Similarity Searching Using COSMO Screening Charges (COSMO/3PP).- Increasing Diversity in In-silico Screening with Target Flexibility.- Multiple Semi-flexible 3D Superposition of Drug-Sized Molecules.- Molecular Structure Determination and Simulation.- Efficiency Considerations in Solving Smoluchowski Equations for Rough Potentials.- Fast and Accurate Structural RNA Alignment by Progressive Lagrangian Optimization.- Visual Analysis of Molecular Conformations by Means of a Dynamic Density Mixture Model.- Distributed Data Mining.- Distributed BLAST in a Grid Computing Context.- Parallel Tuning of Support Vector Machine Learning Parameters for Large and Unbalanced Data Sets.- The Architecture of a Proteomic Network in the Yeast.


international symposium on neural networks | 1994

A time delay radial basis function network for phoneme recognition

Michael R. Berthold

This paper presents the time delay radial basis function network (TDRBF) for recognition of phonemes. The TDRBF combines features from time delay neural networks (TDNN) and radial basis functions (RBF). The ability to detect acoustic features and their temporal relationship independent of position in time is inherited from TDNN. The use of RBFs leads to shorter training times and less parameters to adjust, which makes it easier to apply TDRBF to new tasks. The recognition of three phonemes with about 750 training and testing tokens each was chosen as an evaluation task. The results suggest an equivalent performance of TDRBF and TDNN presented in Waibel et al. (1989), but TDRBF require much less training time to reach a good performance and in addition have a clear indication when the minimum error is reached, therefore no danger of overtraining exists.<<ETX>>


International Journal of Approximate Reasoning | 2003

Mixed fuzzy rule formation

Michael R. Berthold

Many fuzzy rule induction algorithms have been proposed during the past decade or so. Most of these algorithms tend to scale badly with large dimensions of the feature space and in addition have trouble dealing with different feature types or noisy data. In this paper, an algorithm is proposed that extracts a set of so called mixed fuzzy rules. These rules can be extracted from feature spaces with diverse types of attributes and handle the corresponding different types of constraints in parallel. The extracted rules depend on individual subsets of only few attributes, which is especially useful in high dimensional feature spaces. The algorithm along with results on several classification benchmarks is presented and how this method can be extended to handle outliers or noisy training instances is sketched briefly as well.

Collaboration


Dive into the Michael R. Berthold's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Klaus-Peter Huber

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christian Borgelt

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ingrid Fischer

University of Erlangen-Nuremberg

View shared research outputs
Researchain Logo
Decentralizing Knowledge