Ryan R. Curtin
Symantec
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryan R. Curtin.
similarity search and applications | 2016
Ryan R. Curtin; Andrew B. Gardner
We present a novel strategy for approximate furthest neighbor search that selects a candidate set using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends intuition for where our algorithm is most useful. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org).
international conference on signal processing and communication systems | 2017
Conrad Sanderson; Ryan R. Curtin
Modelling of multivariate densities is a core component in many signal processing, pattern recognition and machine learning applications. The modelling is often done via Gaussian mixture models (GMMs), which use computationally expensive and potentially unstable training algorithms. We provide an overview of a fast and robust implementation of GMMs in the C++ language, employing multi-threaded versions of the Expectation Maximisation (EM) and k-means training algorithms. Multi-threading is achieved through reformulation of the EM and k-means algorithms into a MapReduce-like framework. Furthermore, the implementation uses several techniques to improve numerical stability and modelling accuracy. We demonstrate that the multi-threaded implementation achieves a speedup of an order of magnitude on a recent 16 core machine, and that it can achieve higher modelling accuracy than a previously well-established publically accessible implementation. The multi-threaded implementation is included as a user-friendly class in recent releases of the open source Armadillo C++ linear algebra library. The library is provided under the permissive Apache 2.0 license, allowing unencumbered use in commercial products.
Journal of Social Structure | 2017
Conrad Sanderson; Ryan R. Curtin
Statistical modelling of multivariate data through a convex mixture of Gaussians, also known as a Gaussian mixture model (GMM), has many applications in fields such as signal processing, econometrics, and pattern recognition (Bishop 2006). Each component (Gaus-sian) in a GMM is parameterised with a weight, mean vector (centroid), and covariance matrix.
international congress on mathematical software | 2018
Conrad Sanderson; Ryan R. Curtin
When implementing functionality which requires sparse matrices, there are numerous storage formats to choose from, each with advantages and disadvantages. To achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by this issue, we present a user-friendly sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, coordinate list, Red-Black tree) depending on which format is best suited for specific operations, and (ii) template-based meta-programming to automatically detect and optimise execution of common expression patterns. To facilitate relatively quick conversion of research code into production environments, the class and its associated functions provide a suite of essential sparse linear algebra functionality (eg., arithmetic operations, submatrix manipulation) as well as high-level functions for sparse eigendecompositions and linear equation solvers. The latter are achieved by providing easy-to-use abstractions of the low-level ARPACK and SuperLU libraries. The source code is open and provided under the permissive Apache 2.0 license, allowing unencumbered use in commercial products.
Journal of Social Structure | 2018
Ryan R. Curtin; Marcus Edel; Mikhail Lozhnikov; Yannis Mentekidis; Sumedh Ghaisas; Shangtong Zhang
In the past several years, the field of machine learning has seen an explosion of interest and excitement, with hundreds or thousands of algorithms developed for different tasks every year. But a primary problem faced by the field is the ability to scale to larger and larger data—since it is known that training on larger datasets typically produces better results (Halevy, Norvig, and Pereira 2009). Therefore, the development of new algorithms for the continued growth of the field depends largely on the existence of good tooling and libraries that enable researchers and practitioners to quickly prototype and develop solutions (Sonnenburg et al. 2007). Simultaneously, useful libraries must also be efficient and well-implemented. This has motivated our development of mlpack.
Information Systems | 2018
Ryan R. Curtin; Javier Echauz; Andrew B. Gardner
Abstract We present a novel strategy for approximate furthest neighbor search that selects a set of candidate points using the data distribution. This strategy leads to an algorithm, which we call DrusillaSelect , that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by a study of the behavior of the furthest neighbor search problem, which has significantly different structure than the nearest neighbor search problem, and can be understood with the help of an information-theoretic hardness measure that we introduce. We also present a variant of the algorithm that gives an absolute approximation guarantee; under some assumptions, the guaranteed approximation can be achieved in provably less time than brute-force search. Performance studies indicate that DrusillaSelect can achieve comparable levels of approximation to other algorithms, even on the hardest datasets, while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at http://www.mlpack.org ).
Journal of Social Structure | 2016
Conrad Sanderson; Ryan R. Curtin
Archive | 2017
Ryan R. Curtin; Marcus Edel
arXiv: Mathematical Software | 2018
Shikhar Bhardwaj; Ryan R. Curtin; Marcus Edel; Yannis Mentekidis; Conrad Sanderson
Archive | 2018
Ryan R. Curtin; Marcus Edel; Mikhail Lozhnikov; Yannis Mentekidis; Sumedh Ghaisas; Shangtong Zhang