Kristian Woodsend
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kristian Woodsend.
Journal of Machine Learning Research | 2009
Kristian Woodsend; Jacek Gondzio
Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally challenging. A parallel implementation of linear Support Vector Machine training has been developed, using a combination of MPI and OpenMP. Using an interior point method for the optimization and a reformulation that avoids the dense Hessian matrix, the structure of the augmented system matrix is exploited to partition data and computations amongst parallel processors efficiently. The new implementation has been applied to solve problems from the PASCAL Challenge on Large-scale Learning. We show that our approach is competitive, and is able to solve problems in the Challenge many times faster than other parallel approaches. We also demonstrate that the hybrid version performs more efficiently than the version using pure MPI.
empirical methods in natural language processing | 2014
Michael Roth; Kristian Woodsend
State-of-the-art semantic role labelling systems require large annotated corpora to achieve full performance. Unfortunately, such corpora are expensive to produce and often do not generalize well across domains. Even in domain, errors are often made where syntactic information does not provide sufficient cues. In this paper, we mitigate both of these problems by employing distributional word representations gathered from unlabelled data. While straight-forward word representations of predicates and arguments improve performance, we show that further gains are achieved by composing representations that model the interaction between predicate and argument, and capture full argument spans.
Computational Optimization and Applications | 2011
Kristian Woodsend; Jacek Gondzio
Linear support vector machine training can be represented as a large quadratic program. We present an efficient and numerically stable algorithm for this problem using interior point methods, which requires only
Mathematical Programming Computation | 2009
Marco Colombo; Andreas Grothey; Jonathan D. Hogg; Kristian Woodsend; Jacek Gondzio
\mathcal{O}(n)
empirical methods in natural language processing | 2015
Kristian Woodsend; Mirella Lapata
operations per iteration. Through exploiting the separability of the Hessian, we provide a unified approach, from an optimization perspective, to 1-norm classification, 2-norm classification, universum classification, ordinal regression and ε-insensitive regression. Our approach has the added advantage of obtaining the hyperplane weights and bias directly from the solver. Numerical experiments indicate that, in contrast to existing methods, the algorithm is largely unaffected by noisy data, and they show training times for our implementation are consistent and highly competitive. We discuss the effect of using multiple correctors, and monitoring the angle of the normal to the hyperplane to determine termination.
Journal of Artificial Intelligence Research | 2014
Kristian Woodsend; Mirella Lapata
We present a structure-conveying algebraic modelling language for mathematical programming. The proposed language extends AMPL with object-oriented features that allows the user to construct models from sub-models, and is implemented as a combination of pre- and post-processing phases for AMPL. Unlike traditional modelling languages, the new approach does not scramble the block structure of the problem, and thus it enables the passing of this structure on to the solver. Interior point solvers that exploit block linear algebra and decomposition-based solvers can therefore directly take advantage of the problem’s structure. The language contains features to conveniently model stochastic programming problems, although it is designed with a much broader application spectrum.
Archive | 2009
Kristian Woodsend; Jacek Gondzio
We present a new approach for unsupervised semantic role labeling that leverages distributed representations. We induce embeddings to represent a predicate, its arguments and their complex interdependence. Argument embeddings are learned from surrounding contexts involving the predicate and neighboring arguments, while predicate embeddings are learned from argument contexts. The induced representations are clustered into roles using a linear programming formulation of hierarchical clustering, where we can model task-specific knowledge. Experiments show improved performance over previous unsupervised semantic role labeling approaches and other distributed word representation models.
Archive | 2009
Andreas Grothey; Jonathan D. Hogg; Kristian Woodsend; Marco Colombo; Jacek Gondzio
Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.
empirical methods in natural language processing | 2011
Kristian Woodsend; Mirella Lapata
Support vector machines are a powerful machine learning technology, but the training process involves a dense quadratic optimization problem and is computationally expensive. We show how the problem can be reformulated to become suitable for high-performance parallel computing. In our algorithm, data is pre-processed in parallel to generate an approximate low-rank Cholesky decomposition. Our optimization solver then exploits the problem’s structure to perform many linear algebra operations in parallel, with relatively low data transfer between processors, resulting in excellent parallel efficiency for very-large-scale problems.
empirical methods in natural language processing | 2012
Kristian Woodsend; Mirella Lapata
Modeling languages are an important tool for the formulation of mathematical programming problems. Many real-life mathematical programming problems are of sizes that make their solution by parallel techniques the only viable option. Increasingly, even their generation by a modeling language cannot be achieved on a single processor. Surprisingly, however, there has been no effort so far at the development of a parallelizable modeling language. We present a modeling language that enables the modular formulation of optimization problems. Apart from often being more natural for the modeler, this enables the parallelization of the problem generation process making the modeling and solution of truly large problems feasible. The proposed structured modeling language is based on the popular modeling language AMPL and implemented as a pre-/postprocessor to AMPL. Unlike traditional modeling languages, it does not scramble the block-structure of the problem but passes this on to the solver if wished. Solvers such as block linear algebra exploiting interior point solvers and decomposition solvers can therefore directly exploit the structure of the problem.