Herna L. Viktor
University of Ottawa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Herna L. Viktor.
Sigkdd Explorations | 2004
Hongyu Guo; Herna L. Viktor
Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. Traditional machine learning algorithms may be biased towards the majority class, thus producing poor predictive accuracy over the minority class. In this paper, we describe a new approach that combines boosting, an ensemble-based learning algorithm, with data generation to improve the predictive power of classifiers against imbalanced data sets consisting of two classes. In the DataBoost-IM method, hard examples from both the majority and minority classes are identified during execution of the boosting algorithm. Subsequently, the hard examples are used to separately generate synthetic examples for the majority and minority classes. The synthetic data are then added to the original training set, and the class distribution and the total weights of the different classes in the new training set are rebalanced. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy, against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. Our results are promising and show that the DataBoost-IM method compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes.
Knowledge and Information Systems | 2008
Hongyu Guo; Herna L. Viktor
Multirelational classification aims at discovering useful patterns across multiple inter-connected tables (relations) in a relational database. Many traditional learning techniques, however, assume a single table or a flat file as input (the so-called propositional algorithms). Existing multirelational classification approaches either “upgrade” mature propositional learning methods to deal with relational presentation or extensively “flatten” multiple tables into a single flat file, which is then solved by propositional algorithms. This article reports a multiple view strategy—where neither “upgrading” nor “flattening” is required—for mining in relational databases. Our approach learns from multiple views (feature set) of a relational databases, and then integrates the information acquired by individual view learners to construct a final model. Our empirical studies show that the method compares well in comparison with the classifiers induced by the majority of multirelational mining systems, in terms of accuracy obtained and running time needed. The paper explores the implications of this finding for multirelational research and applications. In addition, the method has practical significance: it is appropriate for directly mining many real-world databases.
european conference on principles of data mining and knowledge discovery | 2006
Herna L. Viktor; Eric Paquet; Hongyu Guo
Clothes should be designed to tailor well, fit the body elegantly and hide obvious body flaws. To attain this goal, it is crucial to know the interrelationships between different body measurements, such as the interplay between e.g. shoulder width, neck circumference and waist. This paper discusses a study to better understand the typical consumer, from a virtual tailors perspective. Cluster analysis was used to group the population into five clothing sizes. Next, multi-relational classification was applied to analyze the interplay between each groups anthropometric body measurements. Throughout this study, three- dimensional (3-D) body scans were used to verify the validity of our findings. Our results indicate that different sets of body measurements are used to characterize each clothing size. This information, together with the demographic profiles of the typical consumer, provides us with new insight into our evolving population.
BioMed Research International | 2015
Eric Paquet; Herna L. Viktor
Macromolecular structures, such as neuraminidases, hemagglutinins, and monoclonal antibodies, are not rigid entities. Rather, they are characterised by their flexibility, which is the result of the interaction and collective motion of their constituent atoms. This conformational diversity has a significant impact on their physicochemical and biological properties. Among these are their structural stability, the transport of ions through the M2 channel, drug resistance, macromolecular docking, binding energy, and rational epitope design. To assess these properties and to calculate the associated thermodynamical observables, the conformational space must be efficiently sampled and the dynamic of the constituent atoms must be simulated. This paper presents algorithms and techniques that address the abovementioned issues. To this end, a computational review of molecular dynamics, Monte Carlo simulations, Langevin dynamics, and free energy calculation is presented. The exposition is made from first principles to promote a better understanding of the potentialities, limitations, applications, and interrelations of these computational methods.
SAE transactions | 2004
Osama Abdali; Herna L. Viktor; Eric Paquet; Marc Rioux
Anthropometric databases consisting of both multimedia and relational content are increasingly becoming commonplace. These databases are huge and contain data with diverse formats, representations and models. Data mining provides a powerful mechanism to further explore and explain the data as contained in these heterogeneous repositories, focusing on discovering new relationships which cannot be found using standard information retrieval techniques. In particular, cluster analysis is a data mining technique which is used to group data records into unlabeled classes, e.g. to group individuals with similar body types, income and education levels into a cluster, using unsupervised learning. This paper introduces cluster analysis as a method to explore 3D body scans together with the relational anthropometric and demographic data as contained in an integrated multimedia anthropometric database. The paper provides an overview of different cluster analysis algorithms and discusses the strengths and weaknesses of each approach when mining 3D objects together with relational attributes. Cluster analysis algorithms are evaluated in terms of scalability, the number of attributes that can be processed, the level of human intervention required and the characteristics of the clusters, amongst others. This is followed by a discussion on the application of cluster analysis to anthropometric data. The use of cluster analysis to group the data records into clusters based on both the 3D body scans and the relational attributes lead to a new understanding of the data and their interrelationships.
International Journal on Digital Libraries | 2009
Julie Doyle; Herna L. Viktor; Eric Paquet
Long-term digital preservation, the process of maintaining digital objects through time to ensure continued access, has become a crucial issue in recent years. Whilst the amount of digitised information is constantly increasing, so too is the pace of progress in information technology, resulting in obsolescence of the software and hardware required to access and view digital information. Despite many organisations recognising this threat and the resulting need for preservation action, more work is required to effectively address the issue. We present in this article a framework for the long-term digital preservation of 3-D data. This framework is based on two pertinent preservation practices, emulation and metadata which ensure that the authenticity and usability, respectively, of a preserved digital object remain intact through time. An evaluation of our framework is presented which illustrates the viability of our approach in retaining accessibility, authenticity and usability for future end users.
industrial and engineering applications of artificial intelligence and expert systems | 2004
Hongyu Guo; Herna L. Viktor
One of the difficulties of using Artificial Neural Networks (ANNs) to estimate atmospheric temperature is the large number of potential input variables available. In this study, four different feature extraction methods were used to reduce the input vector to train four networks to estimate temperature at different atmospheric levels. The four techniques used were: genetic algorithms (GA), coefficient of determination (CoD), mutual information (MI) and simple neural analysis (SNA). The results demonstrate that of the four methods used for this data set, mutual information and simple neural analysis can generate networks that have a smaller input parameter set, while still maintaining a high degree of accuracy.
european conference on machine learning | 2016
Ali Pesaranghader; Herna L. Viktor
Decision makers increasingly require near-instant models to make sense of fast evolving data streams. Learning from such evolving environments is, however, a challenging task. This challenge is partially due to the fact that the distribution of data often changes over time, thus potentially leading to degradation in the overall performance. In particular, classification algorithms need to adapt their models after facing such distributional changes (also referred to as concept drifts). Usually, drift detection methods are utilized in order to accomplish this task. It follows that detecting concept drifts as soon as possible, while resulting in fewer false positives and false negatives, is a major objective of drift detectors. To this end, we introduce the Fast Hoeffding Drift Detection Method (FHDDM) which detects the drift points using a sliding window and Hoeffding’s inequality. FHDDM detects a drift when a significant difference between the maximum probability of correct predictions and the most recent probability of correct predictions is observed. Experimental results confirm that FHDDM detects drifts with less detection delay, less false positive and less false negative, when compared to the state-of-the-art.
IEEE Transactions on Instrumentation and Measurement | 2007
Eric Paquet; Herna L. Viktor
Computer-generated models of the human body generally do not adequately model the complex human morphology. These models, therefore, do not reflect the anthropometric realities and are not specific enough for commercial use. This paper presents an approach to adjust virtual models through the use of body measurements, as obtained from an anthropometric data set. In this approach, the measurements are used in grouping 3D body scans, which are obtained using precise optoelectronic measurement devices, into clusters. The virtual mannequins are then adjusted by using the measurements of the nearest cluster member. In this way, realistic accurate virtual mannequins are created.
NFMCP'14 Proceedings of the 3rd International Conference on New Frontiers in Mining Complex Patterns | 2014
Parinaz Sobhani; Herna L. Viktor; Stan Matwin
Imbalanced data, where the number of instances of one class is much higher than the others, are frequent in many domains such as fraud detection, telecommunications management, oil spill detection, and text classification. Traditional classifiers do not perform well when considering data that are susceptible to both within-class and between-class imbalances. In this paper, we propose the ClusFirstClass algorithm that employs cluster analysis to aid classifiers when aiming to build accurate models against such imbalanced datasets. In order to work with balanced classes, all minority instances are used together with the same number of majority instances. To further reduce the impact of within-class imbalance, majority instances are clustered into different groups and at least one instance is selected from each cluster. Experimental results demonstrate that our proposed ClusFirstClass algorithm yields promising results compared to the state-of-the art classification approaches, when evaluated against a number of highly imbalanced datasets.