Anna M. Manning
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anna M. Manning.
Data Mining and Knowledge Discovery | 2008
Anna M. Manning; David J. Haglin; John A. Keane
A new algorithm, SUDA2, is presented which finds minimally unique itemsets i.e., minimal itemsets of frequency one. These itemsets, referred to as Minimal Sample Uniques (MSUs), are important for statistical agencies who wish to estimate the risk of disclosure of their datasets. SUDA2 is a recursive algorithm which uses new observations about the properties of MSUs to prune and traverse the search space. Experimental comparisons with previous work demonstrate that SUDA2 is several orders of magnitude faster, enabling datasets of significantly more columns to be addressed. The ability of SUDA2 to identify the boundaries of the search space for MSUs is clearly demonstrated.
pacific asia conference on knowledge discovery and data mining | 2001
Anna M. Manning; John A. Keane
Association rule discovery techniques have gradually been adapt-ed to parallel systems in order to take advantage of the higher speed and greater storage capacity that they offer. The transition to a distributed memory system requires the partitioning of the database among the processors, a procedure that is generally carried out indiscriminately. However, for some techniques the nature of the database partitioning can have a pronounced impact on execution time and attention will be focused on one such algorithm, Fast Parallel Mining (FPM). A new algorithm, Data Allocation Algorithm (DAA), is presented that uses Principal Component Analysis to improve the data distribution prior to FPM.
european conference on principles of data mining and knowledge discovery | 1997
Anna M. Manning; Andy Brass; Carole A. Goble; John A. Keane
In biological sequence analysis many DNA and RNA sequences discovered in laboratory experiments are not properly identified. Here the focus is on using clustering algorithms to provide a structure to the data. The approach is inter-disciplinary using domain knowledge to identify such sequences. The enormous volume and high dimensionality of unidentified biological sequence data presents a challenge. Nonetheless useful and interesting results have been obtained, both directly and indirectly, by applying clustering to the data.
international conference on cluster computing | 2008
Paraskevas Yiapanis; David J. Haglin; Anna M. Manning; Ken Mayes; John A. Keane
SUDA2 is a recursive search algorithm for minimal unique itemset detection. Such sets of items are formed via combinations of non-obvious attributes enabling individual record identification. The nature of SUDA2 allows work to be divided into non-overlapping tasks enabling parallel execution. Earlier work developed a parallel implementation for SUDA2 on an SMP cluster, and this was found to be several orders of magnitude faster than sequential SUDA2. However, if fixed-granularity parallel tasks are scheduled naively in the order of their generation, the system load tends to be imbalanced with little work at the beginning and end of the search. This paper investigates the effectiveness of variable-grained and dynamic work generation strategies for parallel SUDA2. These methods restrict the number of sub-tasks to be generated, based on the criterion of probable work size. The further we descend in the search recursion tree, the smaller the tasks become, thus we only select the largest tasks at each level of recursion as being suitable for scheduling. The revised algorithm runs approximately twice as fast as the existing parallel SUDA2 for finer levels of granularity when variable-grained work generation is applied. The dynamic method, performing level-wise task selection based on size, outperforms the other techniques investigated.
european conference on parallel processing | 1999
Anna M. Manning; John A. Keane
Many association rule algorithms operate in a parallel environment where the database is divided up among a number of processors, a procedure which is usually carried out indiscriminately. The nature of the database partitioning can affect both the number of candidate sets produced and the workload at each processor. This paper demonstrates that Principal Component Analysis can be used successfully to help arrange the records of a database among processors so that efficient load balancing is enabled and candidate set duplication minimised.
DMIN | 2007
David J. Haglin; Anna M. Manning
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems | 2002
Mark Elliot; Anna M. Manning; Rupert W. Ford
Concurrency and Computation: Practice and Experience | 2009
David J. Haglin; Kenneth R. Mayes; Anna M. Manning; John Feo; John R. Gurd; Mark Elliot; John A. Keane
In: Proceedings of GSS Methodology Conference; London; 2001. | 2001
Mark Elliot; Anna M. Manning
Archive | 2006
Kenneth R. Mayes; Mark Elliot; Anna M. Manning; David J. Haglin; John R. Gurd