Aoife Cahill | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aoife Cahill is active.

Explore More

Publication

Featured researches published by Aoife Cahill.

meeting of the association for computational linguistics | 2004

Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations

Aoife Cahill; Michael Burke; Ruth O'Donovan; Josef van Genabith; Andy Way

This paper shows how finite approximations of long distance dependency (LDD) resolution can be obtained automatically for wide-coverage, robust, probabilistic Lexical-Functional Grammar (LFG) resources acquired from treebanks. We extract LFG subcategorisation frames and paths linking LDD reentrancies from f-structures generated automatically for the Penn-II treebank trees and use them in an LDD resolution algorithm to parse new text. Unlike (Collins, 1999; Johnson, 2000), in our approach resolution of LDDs is done at f-structure (attribute-value structure representations of basic predicate-argument or dependency structure) without empty productions, traces and coindexation in CFG parse trees. Currently our best automatically induced grammars achieve 80.97% f-score for f-structures parsing section 23 of the WSJ part of the Penn-II treebank and evaluating against the DCU 1051 and 80.24% against the PARC 700 Dependency Bank (King et al., 2003), performing at the same or a slightly better level than state-of-the-art hand-crafted grammars (Kaplan et al., 2004).

meeting of the association for computational linguistics | 2006

QuestionBank: Creating a Corpus of Parse-Annotated Questions

John Judge; Aoife Cahill; Josef van Genabith

This paper describes the development of QuestionBank, a corpus of 4000 parse-annotated questions for (i) use in training parsers employed in QA, and (ii) evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents (capturing long distance dependencies) from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are (i) using QuestionBank training data improves parser performance to 89.75% labelled bracketing f-score, an increase of almost 11% over the baseline; (ii) back-testing experiments on non-question data (Penn-II WSJ Section 23) shows that the retrained parser does not suffer a performance drop on non-question material; (iii) ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results; (iv) our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision (96.82%) and low recall (39.38%). In summary, QuestionBank provides a useful new resource in parser-based QA research.

meeting of the association for computational linguistics | 2006

Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations

Aoife Cahill; Josef van Genabith

We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations (Cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. We evaluate our approach using string-based evaluation. We currently achieve coverage of 95.26%, a BLEU score of 0.7227 and string accuracy of 0.7476 on the Penn-II WSJ Section 23 sentences of length ≤20.

Computational Linguistics | 2005

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks

Ruth O'Donovan; Michael Burke; Aoife Cahill; Josef van Genabith; Andy Way

We present a methodology for extracting subcategorization frames based on an automatic lexical-functional grammar (LFG) f-structure annotation algorithm for the Penn-II and Penn-III Treebanks. We extract syntactic-function-based subcategorization frames (LFG semantic forms) and traditional CFG category-based subcategorization frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. In contrast to many other approaches, ours does not predefine the subcategorization frame types extracted, learning them instead from the source data. Including particles and prepositions, we extract 21,005 lemma frame types for 4,362 verb lemmas, with a total of 577 frame types and an average of 4.8 frame types per verb. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. To our knowledge, this is the largest and most complete evaluation of subcategorization frames acquired automatically for English.

meeting of the association for computational linguistics | 2004

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank

Ruth O'Donovan; Michael Burke; Aoife Cahill; Josef van Genabith; Andy Way

In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG category-based subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames, associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas, 14348 semantic form types (an average of 4 per lemma) with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource.

empirical methods in natural language processing | 2014

Can characters reveal your native language? A language-independent approach to native language identification

Radu Tudor Ionescu; Marius Popescu; Aoife Cahill

A common approach in text mining tasks such as text categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. In this work, an approach that uses character n-grams as features is proposed for the task of native language identification. Instead of doing standard feature selection, the proposed approach combines several string kernels using multiple kernel learning. Kernel Ridge Regression and Kernel Discriminant Analysis are independently used in the learning stage. The empirical results obtained in all the experiments conducted in this work indicate that the proposed approach achieves state of the art performance in native language identification, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral. In the cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state of the art system by 32.3%.

meeting of the association for computational linguistics | 2007

Pruning the Search Space of a Hand-Crafted Parsing System with a Probabilistic Parser

Aoife Cahill; Tracy Holloway King; John T. Maxwell

The demand for deep linguistic analysis for huge volumes of data means that it is increasingly important that the time taken to parse such data is minimized. In the XLE parsing model which is a hand-crafted, unification-based parsing system, most of the time is spent on unification, searching for valid f-structures (dependency attribute-value matrices) within the space of the many valid c-structures (phrase structure trees). We carried out an experiment to determine whether pruning the search space at an earlier stage of the parsing process results in an improvement in the overall time taken to parse, while maintaining the quality of the f-structures produced. We retrained a state-of-the-art probabilistic parser and used it to pre-bracket input to the XLE, constraining the valid c-structure space for each sentence. We evaluated against the PARC 700 Dependency Bank and show that it is possible to decrease the time taken to parse by ~18% while maintaining accuracy.

Archive | 2008

Deriving Quasi-Logical Forms From F-Structures For The Penn Treebank

Aoife Cahill; Mairéad McCarthy; Michael Burke; Josef van Genabith; Andy Way

In this paper we show how the trees in the Penn treebank can nbe associated automatically with simple quasi-logical forms. Our approach is based on combining two independent strands of work: the first is the observation that there is a close correspondence between quasi-logical forms and LFG f-structures [van Genabith and Crouch, 1996]; the second is the development of an automatic f-structure annotation algorithm for the Penn treebank [Cahill et al, 2002a; Cahill net al, 2002b]. We compare our approach with that of [Liakata and Pulman, 2002].

Machine Translation | 2007

Harry Bunt, John Carroll, Giorgio Satta (eds.) New Developments in Parsing Technology: Kluwer Academic Publishers, Dordrecht, The Netherlands, 2004, xi + 405pp

Aoife Cahill

This book grew out of contributions to the 2000 and 2001 workshops in the series “International Workshop on Parsing Technology”. The best papers were selected and authors were invited to revise, update and extend their work to appear as chapters in this volume. There are 18 content chapters in all, organised into five broadly defined topic areas: statistical parsing methods, new and improved parsing techniques, theoretical advances in parsing technology, spoken language parsing and mathematical and engineering aspects of parsing. Chapter 2 by Michael Collins was prepared as an invited talk at IWPT2001 but not presented because of travel restrictions. The book concentrates on the technology of parsing rather than applications of parsing, though the editors point out that the practical application of parsing often motivates and drives technological advancement in the area.

Machine Translation | 2006