Jaroslav M. Fowkes
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jaroslav M. Fowkes.
Journal of Global Optimization | 2013
Jaroslav M. Fowkes; Nicholas I. M. Gould; Chris L. Farmer
We present a branch and bound algorithm for the global optimization of a twice differentiable nonconvex objective function with a Lipschitz continuous Hessian over a compact, convex set. The algorithm is based on applying cubic regularisation techniques to the objective function within an overlapping branch and bound algorithm for convex constrained global optimization. Unlike other branch and bound algorithms, lower bounds are obtained via nonconvex underestimators of the function. For a numerical example, we apply the proposed branch and bound algorithm to radial basis function approximations.
knowledge discovery and data mining | 2016
Jaroslav M. Fowkes; Charles A. Sutton
Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.
foundations of software engineering | 2016
Jaroslav M. Fowkes; Charles A. Sutton
Existing API mining algorithms can be difficult to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. To address this, we present PAM (Probabilistic API Miner), a near parameter-free probabilistic algorithm for mining the most interesting API call patterns. We show that PAM significantly outperforms both MAPO and UPMiner, achieving 69% test-set precision, at retrieving relevant API call sequences from GitHub. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over 300,000 LOC of hand-written API example code from the 967 client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.
Journal of Global Optimization | 2015
Coralia Cartis; Jaroslav M. Fowkes; Nicholas I. M. Gould
We present improvements to branch and bound techniques for globally optimizing functions with Lipschitz continuity properties by developing novel bounding procedures and parallelisation strategies. The bounding procedures involve nonconvex quadratic or cubic lower bounds on the objective and use estimates of the spectrum of the Hessian or derivative tensor, respectively. As the nonconvex lower bounds are only tractable if solved over Euclidean balls, we implement them in the context of a recent branch and bound algorithm (Fowkes et al. in J Glob Optim 56:1791–1815, 2013) that uses overlapping balls. Compared to the rectangular tessellations of traditional branch and bound, overlapping ball coverings result in an increased number of subproblems that need to be solved and hence makes the need for their parallelization even more stringent and challenging. We develop parallel variants based on both data- and task-parallel paradigms, which we test on an HPC cluster on standard test problems with promising results.
IEEE Transactions on Software Engineering | 2017
Jaroslav M. Fowkes; Pankajan Chanthirasegaran; Razvan Ranca; Miltiadis Allamanis; Mirella Lapata; Charles A. Sutton
Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Indeed, modern code editors provide code folding, which allows one to selectively hide blocks of code. However this is impractical to use as folding decisions must be made manually or based on simple rules. We introduce the autofolding problem, which is to automatically create a code summary by folding less informative code regions. We present a novel solution by formulating the problem as a sequence of AST folding decisions, leveraging a scoped topic model for code tokens. On an annotated set of popular open source projects, we show that our summarizer outperforms simpler baselines, yielding a 28 percent error reduction. Furthermore, we find through a case study that our summarizer is strongly preferred by experienced developers. More broadly, we hope this work will aid program comprehension by turning code folding into a usable and valuable tool.
12th European Conference on the Mathematics of Oil Recovery | 2010
Chris L. Farmer; Jaroslav M. Fowkes; Nicholas I. M. Gould
One is often faced with the problem of finding the optimal location and trajectory for an oil well. Increasingly this includes the additional complication of optimising the design of a multilateral well. We present a new approach based on the theory of expensive function optimisation. The key idea is to replace the underlying expensive function (ie. the simulator response) by a cheap approximation (ie. an emulator). This enables one to apply existing optimisation techniques to the emulator. Our approach uses a radial basis function interpolant to the simulator response as the emulator. Note that the case of a Gaussian radial basis function is equivalent to the geostatistical method of Kriging and radial basis functions can be interpreted as a single-layer neural network. We use a stochastic model of the simulator response to adaptively refine the emulator and optimise it using a branch and bound global optimisation algorithm. To illustrate our approach we apply it numerically to finding the optimal location and trajectory of a multilateral well in a reservoir simulation model using the industry standard ECLIPSE simulator. We compare our results to existing approaches and show that our technique is comparable, if not superior, in performance to these approaches.
international conference on software engineering | 2016
Jaroslav M. Fowkes; Pankajan Chanthirasegaran; Razvan Ranca; Miltiadis Allamanis; Mirella Lapata; Charles A. Sutton
We present a novel tool, TASSAL, that automatically creates a summary of each source file in a project by folding its least salient code regions. The intended use-case for our tool is the first-look problem: to help developers who are unfamiliar with a new codebase and are attempting to understand it. TASSAL is intended to aid developers in this task by folding away less informative regions of code and allowing them to focus their efforts on the most informative ones. While modern code editors do provide \emph{code folding} to selectively hide blocks of code, it is impractical to use as folding decisions must be made manually or based on simple rules. We find through a case study that TASSAL is strongly preferred by experienced developers over simple folding baselines, demonstrating its usefulness. In short, we strongly believe TASSAL can aid program comprehension by turning code folding into a usable and valuable tool. A video highlighting the main features of TASSAL can be found at https://youtu.be/_yu7JZgiBA4.
european conference on machine learning | 2016
Jaroslav M. Fowkes; Charles A. Sutton
Mining itemsets that are the most interesting under a statistical model of the underlying data is a commonly used and well-studied technique for exploratory data analysis, with the most recent interestingness models exhibiting state of the art performance. Continuing this highly promising line of work, we propose the first, to the best of our knowledge, generative model over itemsets, in the form of a Bayesian network, and an associated novel measure of interestingness. Our model is able to efficiently infer interesting itemsets directly from the transaction database using structural EM, in which the E-step employs the greedy approximation to weighted set cover. Our approach is theoretically simple, straightforward to implement, trivially parallelizable and retrieves itemsets whose quality is comparable to, if not better than, existing state of the art algorithms as we demonstrate on several real-world datasets.
arXiv: Software Engineering | 2015
Jaroslav M. Fowkes; Charles A. Sutton
Archive | 2010
Chris L. Farmer; Jaroslav M. Fowkes; Nicholas I. M. Gould