Paul Nulty
University College Dublin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul Nulty.
meeting of the association for computational linguistics | 2007
Paul Nulty
This paper investigates the use of machine learning algorithms to label modifier-noun compounds with a semantic relation. The attributes used as input to the learning algorithms are the web frequencies for phrases containing the modifier, noun, and a prepositional joining term. We compare and evaluate different algorithms and different joining phrases on Nastase and Szpakowiczs (2003) dataset of 600 modifier-noun compounds. We find that by using a Support Vector Machine classifier we can obtain better performance on this dataset than a current state-of-the-art system; even with a relatively small set of prepositional joining terms.
meeting of the association for computational linguistics | 2007
Paul Nulty
For our system we use the SMO implementation of a support vector machine provided with the WEKA machine learning toolkit. As with all machine learning approaches, the most important step is to choose a set of features which reliably help to predict the label of the example. We used 76 features drawn from two very different knowledge sources. The first 48 features are boolean values indicating whether or not each of the nominals in the sentence are linked to certain other words in the WordNet hypernym and meronym networks. The remaining 28 features are web frequency counts for the two nominals joined by certain common prepositions and verbs. Our system performed well on all but two of the relations; theme-tool and origin entity.
European Knowledge Acquisition Workshop | 2016
Gabriel Recchia; Ewan Jones; Paul Nulty; John Regan; Peter de Bolla
This paper presents work in progress on an algorithm to track and identify changes in the vocabulary used to describe particular concepts over time, with emphasis on treating concepts as distinct from changes in word meaning. We apply the algorithm to word vectors generated from Google Books n-grams from 1800–1990 and evaluate the induced networks with respect to their flexibility (robustness to changes in vocabulary) and stability (they should not leap from topic to topic). We also describe work in progress using the British National Biography Linked Open Data Serials to construct a “ground truth” evaluation dataset for algorithms which aim to detect shifts in the vocabulary used to describe concepts. Finally, we discuss limitations of the proposed method, ways in which the method could be improved in the future, and other considerations.
Natural Language Engineering | 2013
Paul Nulty; Fintan Costello
Many English noun pairs suggest an almost limitless array of semantic interpretation. A fruit bowl might be described as a bowl for fruit, a bowl that contains fruit, a bowl for holding fruit, or even (perhaps in a modern sculpture class), a bowl made out of fruit. These interpretations vary in syntax, semantic denotation, plausibility, and level of semantic detail. For example, a headache pill is usually a pill for preventing headaches, but might, perhaps in the context of a list of side effects, be a pill that can cause headaches (Levi, J. N. 1978. The Syntax and Semantics of Complex Nominals. New York: Academic Press.). In addition to lexical ambiguity, both relational ambiguity and relational vagueness make automatic semantic interpretation of these combinations difficult. While humans parse these possibilities with ease, computational systems are only recently gaining the ability to deal with the complexity of lexical expressions of semantic relations. In this paper, we describe techniques for paraphrasing the semantic relations that can hold between nouns in a noun compound, using a semi-supervised probabilistic method to rank candidate paraphrases of semantic relations, and describing a new method for selecting plausible relational paraphrases at arbitrary levels of semantic specification. These methods are motivated by the observation that existing semantic relation classification schemes often exhibit a highly skewed class distribution, and that lexical paraphrases of semantic relations vary widely in semantic precision.
AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science | 2009
Paul Nulty; Fintan Costello
Noun compounds occur frequently in many languages, and the problem of semantic disambiguation of these phrases has many potential applications in natural language processing and other areas. One very common approach to this problem is to define a set of semantic relations which capture the interaction between the modifier and the head noun, and then attempt to assign one of these semantic relations to each compound. For example, the compound phrase flu virus could be assigned the semantic relation causal (the virus causes the flu); the relation for desert wind could be location (the wind is located in the desert). In this paper we investigate methods for learning the correct semantic relation for a given noun compound by comparing the new compound to a training set of hand-tagged instances, using the similarity of the words in each compound. The main contribution of this paper is to directly compare distributional and knowledge-based word similarity measures for this task, using various datasets and corpora. We find that the knowledge based system provides a much better performance when adequate training data is available.
Patient Education and Counseling | 2018
Susan Kamal; Paul Nulty; Olivier Bugnon; Matthias Cavassini; Marie P. Schneider
OBJECTIVE To identify factors associated with low or high antiretroviral (ARV) adherence through computational text analysis of an adherence enhancing programme interview reports. METHODS Using text from 8428 interviews with 522 patients, we constructed a term-frequency matrix for each patient, retaining words that occurred at least ten times overall and used in at least six interviews with six different patients. The text included both the pharmacists and the patients verbalizations. We investigated their association with an adherence threshold (above or below 90%) using a regularized logistic regression model. In addition to this data-driven approach, we studied the contexts of words with a focus group. RESULTS Analysis resulted in 7608 terms associated with low or high adherence. Terms associated with low adherence included disruption in daily schedule, side effects, socio-economic factors, stigma, cognitive factors and smoking. Terms associated with high adherence included fixed medication intake timing, no side effects and positive psychological state. CONCLUSION Computational text analysis helps to analyze a large corpus of adherence enhancing interviews. It confirms main known themes affecting ARV adherence and sheds light on new emerging themes. PRACTICE IMPLICATIONS Health care providers should be aware of factors that are associated with low or high adherence. This knowledge should reinforce the supporting factors and try to resolve the barriers together with the patient.
Journal of Social Structure | 2018
Kenneth Benoit; Kohei Watanabe; Haiyan Wang; Paul Nulty; Adam Obeng; Stefan Müller; Akitaka Matsuo
quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization. It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring. Based entirely on sparse operations,it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis. Using C++ and multi-threading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data. The package is designed for R users needing to apply natural language processing to texts,from documents to final analysis. Its capabilities match or exceed those provided in many end-user software applications, many of which are expensive and not open source. The package is therefore of great benefit to researchers, students, and other analysts with fewer financial resources. While using quanteda requires R programming knowledge, its API is designed to enable powerful, efficient analysis with a minimum of steps. By emphasizing consistent design, furthermore, quanteda lowers the barriers to learning and using NLP and quantitative text analysis even for proficient R programmers.
Electoral Studies | 2016
Paul Nulty; Yannis Theocharis; Sebastian Adrian Popa; Olivier Parnet; Kenneth Benoit
meeting of the association for computational linguistics | 2010
Paul Nulty; Fintan Costello
north american chapter of the association for computational linguistics | 2009
Paul Nulty; Fintan Costello