John A. Doucette
University of Waterloo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John A. Doucette.
Genetic Programming and Evolvable Machines | 2012
John A. Doucette; Andrew R. McIntyre; Peter Lichodzijewski; Malcolm I. Heywood
Classification under large attribute spaces represents a dual learning problem in which attribute subspaces need to be identified at the same time as the classifier design is established. Embedded as opposed to filter or wrapper methodologies address both tasks simultaneously. The motivation for this work stems from the observation that team based approaches to Genetic Programming (GP) have the potential to design multiple classifiers per class—each with a potentially unique attribute subspace—without recourse to filter or wrapper style preprocessing steps. Specifically, competitive coevolution provides the basis for scaling the algorithm to data sets with large instance counts; whereas cooperative coevolution provides a framework for problem decomposition under a bid-based model for establishing program context. Symbiosis is used to separate the tasks of team/ensemble composition from the design of specific team members. Team composition is specified in terms of a combinatorial search performed by a Genetic Algorithm (GA); whereas the properties of individual team members and therefore subspace identification is established under an independent GP population. Teaming implies that the members of the resulting ensemble of classifiers should have explicitly non-overlapping behaviour. Performance evaluation is conducted over data sets taken from the UCI repository with 649–102,660 attributes and 2–10 classes. The resulting teams identify attribute spaces 1–4 orders of magnitude smaller than under the original data set. Moreover, team members generally consist of less than 10 instructions; thus, small attribute subspaces are not being traded for opaque models.
european conference on genetic programming | 2010
John A. Doucette; Malcolm I. Heywood
We present an empirical analysis of the effects of incorporating novelty-based fitness (phenotypic behavioral diversity) into Genetic Programming with respect to training, test and generalization performance. Three novelty-based approaches are considered: novelty comparison against a finite archive of behavioral archetypes, novelty comparison against all previously seen behaviors, and a simple linear combination of the first method with a standard fitness measure. Performance is evaluated on the Santa Fe Trail, a well known GP benchmark selected for its deceptiveness and established generalization test procedures. Results are compared to a standard quality-based fitness function (count of food eaten). Ultimately, the quality style objective provided better overall performance, however, solutions identified under novelty based fitness functions generally provided much better test performance than their corresponding training performance. This is interpreted as representing a requirement for layered learning/ symbiosis when assuming novelty based fitness functions in order to more quickly achieve the integration of diverse behaviors into a single cohesive strategy.
ACM Transactions on Intelligent Systems and Technology | 2013
Atif Khan; John A. Doucette; Robin Cohen
In this article, we begin by presenting OMeD, a medical decision support system, and argue for its value over purely probabilistic approaches that reason about patients for time-critical decision scenarios. We then progress to present Holmes, a Hybrid Ontological and Learning MEdical System which supports decision making about patient treatment. This system is introduced in order to cope with the case of missing data. We demonstrate its effectiveness by operating on an extensive set of real-world patient health data from the CDC, applied to the decision-making scenario of administering sleeping pills. In particular, we clarify how the combination of semantic, ontological representations, and probabilistic reasoning together enable the proposal of effective patient treatments. Our focus is thus on presenting an approach for interpreting medical data in the context of real-time decision making. This constitutes a comprehensive framework for the design of medical recommendation systems for potential use by medical professionals and patients both, with the end result being personalized patient treatment. We conclude with a discussion of the value of our particular approach for such diverse considerations as coping with misinformation provided by patients, performing effectively in time-critical environments where real-time decisions are necessary, and potential applications facilitating patient information gathering.
international health informatics symposium | 2012
John A. Doucette; Atif Khan; Robin Cohen
Modern medical decision making systems require users to manually collect and process information from distributed and heterogeneous repositories to facilitate the decision making process. There are many factors (such as time, volume of information and technical ability) that can potentially compromise the quality of decisions made for patients. In this work we demonstrate and evaluate a new medical decision making support system, called OMeD, which automatically answers medical queries in real time, by collecting and processing medical information. OMeD utilizes a natural-language-like user interface (for querying) and semantic web techniques (for knowledge representation and reasoning) to answer queries. We compare OMeD to a set of standard machine learning techniques across a series of benchmarks based on simulated patient data. The conventional techniques attempt to learn the answer to a query by analyzing simulated patient records. The sparsity of the simulated data leads conventional techniques to frequently misidentify the relationships between medical concepts. In contrast, OMeD is able to reliably provide correct answers to queries. Unlike conventional automated decision support systems, OMeD also generates independently verifiable proofs for its answers, providing healthcare workers with confidence in the systems recommendations.
Archive | 2010
John A. Doucette; Peter Lichodzijewski; Malcolm I. Heywood
Model-building under the supervised learning domain potentially face a dual learning problem of identifying both the parameters of the model and the subset of (domain) attributes necessary to support the model, thus using an embedded as opposed to wrapper or filter based design. Genetic Programming (GP) has always addressed this dual problem, however, further implicit assumptions are made which potentially increase the complexity of the resulting solutions. In this work we are specifically interested in the case of classification under very large attribute spaces. As such it might be expected that multiple independent/ overlapping attribute subspaces support the mapping to class labels; whereas GP approaches to classification generally assume a single binary classifier per class, forcing the model to provide a solution in terms of a single attribute subspace and single mapping to class labels. Supporting the more general goal is considered as a requirement for identifying a ‘team’ of classifiers with non-overlapping classifier behaviors, in which each classifier responds to different subsets of exemplars. Moreover, the subsets of attributes associated with each team member might utilize a unique ‘subspace’ of attributes. This work investigates the utility of coevolutionary model building for the case of classification problems with attribute vectors consisting of 650 to 100,000 dimensions. The resulting team based coevolutionary evolutionary method-Symbiotic Bid-based (SBB) GP-is compared to alternative embedded classifier approaches of C4.5 and Maximum Entropy Classification (MaxEnt). SSB solutions demonstrate up to an order of magnitude lower attribute count relative to C4.5 and up to two orders of magnitude lower attribute count than MaxEnt while retaining comparable or better classification performance. Moreover, relative to the attribute count of individual models participating within a team, no more than six attributes are ever utilized; adding a further level of simplicity to the resulting solutions.
international conference on machine learning and applications | 2012
Atif Khan; John A. Doucette; Robin Cohen; Daniel J. Lizotte
In this paper, we present a framework which enables medical decision making in the presence of partial information. At its core is ontology-based automated reasoning, machine learning techniques are integrated to enhance existing patient datasets in order to address the issue of missing data. Our approach supports interoperability between different health information systems. This is clarified in a sample implementation that combines three separate datasets (patient data, drug-drug interactions and drug prescription rules) to demonstrate the effectiveness of our algorithms in producing effective medical decisions. In short, we demonstrate the potential for machine learning to support a task where there is a critical need from medical professionals by coping with missing or noisy patient data and enabling the use of multiple medical datasets.
privacy enhancing technologies | 2016
Tariq Elahi; John A. Doucette; Hadi Hosseini; Steven J. Murdoch; Ian Goldberg
Abstract We present a game-theoretic analysis of optimal solutions for interactions between censors and censorship resistance systems (CRSs) by focusing on the data channel used by the CRS to smuggle clients’ data past the censors. This analysis leverages the inherent errors (false positives and negatives) made by the censor when trying to classify traffic as either non-circumvention traffic or as CRS traffic, as well as the underlying rate of CRS traffic. We identify Nash equilibrium solutions for several simple censorship scenarios and then extend those findings to more complex scenarios where we find that the deployment of a censorship apparatus does not qualitatively change the equilibrium solutions, but rather only affects the amount of traffic a CRS can support before being blocked. By leveraging these findings, we describe a general framework for exploring and identifying optimal strategies for the censorship circumventor, in order to maximize the amount of CRS traffic not blocked by the censor. We use this framework to analyze several scenarios with multiple data-channel protocols used as cover for the CRS. We show that it is possible to gain insights through this framework even without perfect knowledge of the censor’s (secret) values for the parameters in their utility function.
ACM Transactions on Intelligent Systems and Technology | 2016
John A. Doucette; Graham Pinhey; Robin Cohen
In this article, we present a distributed algorithm for allocating resources to tasks in multiagent systems, one that adapts well to dynamic task arrivals where new work arises at short notice. Our algorithm is designed to leverage preemption if it is available, revoking resource allocations to tasks in progress if new opportunities arise that those resources are better suited to handle. Our multiagent model assigns a task agent to each task that must be completed and a proxy agent to each resource that is available. Preemption occurs when a task agent approaches a proxy agent with a sufficiently compelling need that the proxy agent determines the newcomer derives more benefit from the proxy agent’s resource than the task agent currently using that resource. Task agents reason about which resources to request based on a learning of churn and congestion. We compare to a well-established multiagent resource allocation framework that permits preemption under more conservative assumptions and show through simulation that our model allows for improved allocations through more permissive preemption. In all, we offer a novel approach for multiagent resource allocation that is able to cope well with dynamic task arrivals.
congress on evolutionary computation | 2011
John A. Doucette; Malcolm I. Heywood
Evolutionary methods for addressing the temporal sequence learning problem generally fall into policy search as opposed to value function optimization approaches. Various recent results have made the claim that the policy search approach is at best inefficient at solving episodic ‘goal seeking’ tasks i.e., tasks under which the reward is limited to describing properties associated with a successful outcome have no qualification for degrees of failure. This work demonstrates that such a conclusion is due to a lack of diversity in the training scenarios. We therefore return to the Acrobot ‘height’ task domain originally used to demonstrate complete failure in evolutionary policy search. This time a very simple stochastic sampling heuristic for defining a population of training configurations is introduced. Benchmarking two recent evolutionary policy search algorithms — Neural Evolution of Augmented Topologies (NEAT) and Symbiotic Bid-Based (SBB) Genetic Programming — under this condition demonstrates solutions as effective as those returned by advanced value function methods. Moreover this is achieved while remaining within the evaluation limit imposed by the original study.
Physical Review A | 2013
Catherine Holloway; John A. Doucette; Christopher Erven; Jean-Philippe Bourgoin; Thomas Jennewein
In entanglement-based quantum key distribution (QKD), the generation and detection of multi-photon modes leads to a trade-off between entanglement visibility and two-fold coincidence events when maximizing the secure key rate (SKR). We produce a predictive model for the optimal two-fold coincidence probability per coincidence window given the channel efficiency and detector dark count rate of a given system. This model is experimentally validated and used in simulations for QKD with satellites as well as optical fibers.