Mark J. Embrechts | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark J. Embrechts is active.

Explore More

Publication

Featured researches published by Mark J. Embrechts.

international conference on artificial neural networks | 2009

On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification

Jorge M. Santos; Mark J. Embrechts

The Adjusted Rand Index (ARI) is frequently used in cluster validation since it is a measure of agreement between two partitions: one given by the clustering process and the other defined by external criteria. In this paper we investigate the usability of this clustering validation measure in supervised classification problems by two different approaches: as a performance measure and in feature selection. Since ARI measures the relation between pairs of dataset elements not using information from classes (labels) it can be used to detect problems with the classification algorithm specially when combined with conventional performance measures. Instead, if we use the class information, we can apply ARI also to perform feature selection. We present the results of several experiments where we have applied ARI both as a performance measure and for feature selection showing the validity of this index for the given tasks.

systems, man and cybernetics | 2003

Use of machine learning for classification of magnetocardiograms

Mark J. Embrechts; Boleslaw K. Szymanski; Karsten Sternickel; Thanakorn Naenna; Ramathilagam Bragaspathi

We describe the use of machine learning for pattern recognition in magnetocardiography (MCG) that measures magnetic fields emitted by the electrophysiological activity of the heart. We used direct kernel methods to separate abnormal MCG heart patterns from normal ones. For unsupervised learning, we introduced Direct Kernel based Self-Organizing Maps. For supervised learning we used Direct Kernel Partial Least Squares and (Direct) Kernel Ridge Regression. These results are then compared with classical Support Vector Machines and Kernel Partial Least Squares. The hyperparameters for these methods were tuned on a validation subset of the training data before testing. We also investigated the most effective pre-processing, using local, vertical, horizontal and two-dimensional (global) Mahanalobis scaling, wavelet transforms and experimented with variable selection by filtering. The results, similar for all three methods, were encouraging, exceeding the quality of classification achieved by the trained experts.

winter simulation conference | 2006

Taming the Curse of Dimensionality in Kernels and Novelty Detection

Paul F. Evangelista; Mark J. Embrechts; Boleslaw K. Szymanski

The curse of dimensionality is a well known but not entirely well-understood phenomena. Too much data, in terms of the number of input variables, is not always a good thing. This is especially true when the problem involves unsupervised learning or supervised learning with unbalanced data (many negative observations but minimal positive observations). This paper addresses two issues involving high dimensional data: The first issue explores the behavior of kernels in high dimensional data. It is shown that variance, especially when contributed by meaningless noisy variables, confounds learning methods. The second part of this paper illustrates methods to overcome dimensionality problems with unsupervised learning utilizing subspace models. The modeling approach involves novelty detection with the one-class SVM.

international conference on artificial neural networks | 2007

Some properties of the Gaussian kernel for one class learning

Paul F. Evangelista; Mark J. Embrechts; Boleslaw K. Szymanski

This paper proposes a novel approach for directly tuning the gaussian kernel matrix for one class learning. The popular gaussian kernel includes a free parameter, σ, that requires tuning typically performed through validation. The value of this parameter impacts model performance significantly. This paper explores an automated method for tuning this kernel based upon a hill climbing optimization of statistics obtained from the kernel matrix.

systems man and cybernetics | 2002

Computational military tactical planning system

Robert H. Kewley; Mark J. Embrechts

A computational system called fuzzy-genetic decision optimization combines two soft computing methods, genetic optimization and fuzzy ordinal preference, and a traditional hard computing method, stochastic system simulation, to tackle the difficult task of generating battle plans for military tactical forces. Planning for a tactical military battle is a complex, high-dimensional task which often bedevils experienced professionals. In fuzzy-genetic decision optimization, the military commander enters his battle outcome preferences into a user interface to generate a fuzzy ordinal preference model that scores his preference for any battle outcome. A genetic algorithm iteratively generates populations of battle plans for evaluation in a stochastic combat simulation. The fuzzy preference model converts the simulation results into a fitness value for each population member, allowing the genetic algorithm to generate the next population. Evolution continues until the system produces a final population of high-performance plans which achieve the commanders intent for the mission. Analysis of experimental results shows that co-evolution of friendly and enemy plans by competing genetic algorithms improves the performance of the planning system. If allowed to evolve long enough, the plans produced by automated algorithms had a significantly higher mean performance than those generated by experienced military experts.

Journal of Computer-aided Molecular Design | 2003

New developments in PEST shape/property hybrid descriptors

Curt M. Breneman; C. Matthew Sundling; N. Sukumar; Ling-Ling Shen; William P. Katt; Mark J. Embrechts

Recent investigations have shown that the inclusion of hybrid shape/property descriptors together with 2D topological descriptors increases the predictive capability of QSAR and QSPR models. Property-Encoded Surface Translator (PEST) descriptors may be computed using ab initio or semi-empirical electron density surfaces and/or electronic properties, as well as atomic fragment-based TAE/RECON property-encoded surface reconstructions. The RECON and PEST algorithms also include rapid fragment-based wavelet coefficient descriptor (WCD) computation. These descriptors enable a compact encoding of chemical information. We also briefly discuss the use of the RECON/PEST methodology in a virtual high-throughput mode, as well as the use of TAE properties for molecular surface autocorrelation analysis.

computational intelligence and data mining | 2011

Opening black box Data Mining models using Sensitivity Analysis

Paulo Cortez; Mark J. Embrechts

There are several supervised learning Data Mining (DM) methods, such as Neural Networks (NN), Support Vector Machines (SVM) and ensembles, that often attain high quality predictions, although the obtained models are difficult to interpret by humans. In this paper, we open these black box DM models by using a novel visualization approach that is based on a Sensitivity Analysis (SA) method. In particular, we propose a Global SA (GSA), which extends the applicability of previous SA methods (e.g. to classification tasks), and several visualization techniques (e.g. variable effect characteristic curve), for assessing input relevance and effects on the models responses. We show the GSA capabilities by conducting several experiments, using a NN ensemble and SVM model, in both synthetic and real-world datasets.

Drug Metabolism and Disposition | 2006

Classification of Metabolites with Kernel-Partial Least Squares (K-PLS)

Mark J. Embrechts; Sean Ekins

Numerous experimental and computational approaches have been developed to predict human drug metabolism. Since databases of human drug metabolism information are widely available, these can be used to train computational algorithms and generate predictive approaches. In turn, they may be used to assist in the identification of possible metabolites from a large number of molecules in drug discovery based on molecular structure alone. In the current study we have used a commercially available database (MetaDrug) and extracted a fraction of the human drug metabolism data. These data were used along with augmented atom descriptors in a predictive machine learning model, kernel-partial least squares (K-PLS). A total of 317 molecules, including parent drugs and their primary and secondary (sequential) metabolites, were used to build these models corresponding to individual metabolism rules, representing the formation of discrete metabolites, e.g., N-dealkylation. Each model was internally validated to assess the capability to classify other molecules that were left out. Using receiver operator curve statistics models for N-dealkylation, O-dealkylation, aromatic hydroxylation, aliphatic hydroxylation, O-glucuronidation, and O-sulfation gave area under the curve values from 0.75 to 0.84 and were able to predict between 61 and 79% active molecules upon leave-one-out testing. This preliminary study indicates that K-PLS and possibly other similar machine learning methods (such as support vector machines) can be applied to predicting human drug metabolite formation in a classification manner. Improvements can be achieved using considerably larger datasets that contain more positive examples for the less frequently occurring metabolite rules, as well as the external evaluation of novel molecules.

soft computing | 2001

Feature selection for in-silico drug design using genetic algorithms and neural networks

Muhsin Ozdemir; Mark J. Embrechts; F. Arciniegas; C.M. Breneman; L. Lockwood; K.P. Bennett

QSAR (quantitative structure activity relationship) is a discipline within computational chemistry that deals with predictive modeling, often for relatively small datasets where the number of features might exceed the number of data points, leading to extreme dimensionality problems. The paper addresses a novel feature selection procedure for QSAR based on genetic algorithms to reduce the curse of dimensionality problem. In this case the genetic algorithm minimizes a cost function derived from the correlation matrix between the features and the activity of interest that is being modeled. From a QSAR dataset with 160 features, the genetic algorithm selected a feature subset (40 features), which built a better predictive model than with full feature set. The results for feature reduction with genetic algorithm were also compared with neural network sensitivity analysis.

Computers in Biology and Medicine | 2008

Identification of ischemic heart disease via machine learning analysis on magnetocardiograms

Tanawut Tantimongcolwat; Thanakorn Naenna; Chartchalerm Isarankura-Na-Ayudhya; Mark J. Embrechts; Virapong Prachayasittikul

Ischemic heart disease (IHD) is predominantly the leading cause of death worldwide. Early detection of IHD may effectively prevent severity and reduce mortality rate. Recently, magnetocardiography (MCG) has been developed for the detection of heart malfunction. Although MCG is capable of monitoring the abnormal patterns of magnetic field as emitted by physiologically defective heart, data interpretation is time-consuming and requires highly trained professional. Hence, we propose an automatic method for the interpretation of IHD pattern of MCG recordings using machine learning approaches. Two types of machine learning techniques, namely back-propagation neural network (BNN) and direct kernel self-organizing map (DK-SOM), were applied to explore the IHD pattern recorded by MCG. Data sets were obtained by sequential measurement of magnetic field emitted by cardiac muscle of 125 individuals. Data were divided into training set and testing set of 74 cases and 51 cases, respectively. Predictive performance was obtained by both machine learning approaches. The BNN exhibited sensitivity of 89.7%, specificity of 54.5% and accuracy of 74.5%, while the DK-SOM provided relatively higher prediction performance with a sensitivity, specificity and accuracy of 86.2%, 72.7% and 80.4%, respectively. This finding suggests a high potential of applying machine learning approaches for high-throughput detection of IHD from MCG data.

Explore More