Anne Cocos
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anne Cocos.
north american chapter of the association for computational linguistics | 2016
Anne Cocos; Chris Callison-Burch
Automatically generated databases of English paraphrases have the drawback that they return a single list of paraphrases for an input word or phrase. This means that all senses of polysemous words are grouped together, unlike WordNet which partitions different senses into separate synsets. We present a new method for clustering paraphrases by word sense, and apply it to the Paraphrase Database (PPDB). We investigate the performance of hierarchical and spectral clustering algorithms, and systematically explore different ways of defining the similarity matrix that they use as input. Our method produces sense clusters that are qualitatively and quantitatively good, and that represent a substantial improvement to the PPDB resource.
joint conference on lexical and computational semantics | 2017
Anne Cocos; Marianna Apidianaki; Chris Callison-Burch
WordNet has facilitated important research in natural language processing but its usefulness is somewhat limited by its relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 times more words, but lacks the semantic structure of WordNet that would make it more directly useful for downstream tasks. We present a method for mapping words from PPDB to WordNet synsets with 89% accuracy. The mapping also lays important groundwork for incorporating WordNet’s relations into PPDB so as to increase its utility for semantic reasoning in applications.
Journal of Biomedical Informatics | 2017
Anne Cocos; Ting Qian; Chris Callison-Burch; Aaron J. Masino
Annotating unstructured texts in Electronic Health Records data is usually a necessary step for conducting machine learning research on such datasets. Manual annotation by domain experts provides data of the best quality, but has become increasingly impractical given the rapid increase in the volume of EHR data. In this article, we examine the effectiveness of crowdsourcing with unscreened online workers as an alternative for transforming unstructured texts in EHRs into annotated data that are directly usable in supervised learning models. We find the crowdsourced annotation data to be just as effective as expert data in training a sentence classification model to detect the mentioning of abnormal ear anatomy in radiology reports of audiology. Furthermore, we have discovered that enabling workers to self-report a confidence level associated with each annotation can help researchers pinpoint less-accurate annotations requiring expert scrutiny. Our findings suggest that even crowd workers without specific domain knowledge can contribute effectively to the task of annotating unstructured EHR datasets.
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications | 2017
Anne Cocos; Marianna Apidianaki; Chris Callison-Burch
The role of word sense disambiguation in lexical substitution has been questioned due to the high performance of vector space models which propose good substitutes without explicitly accounting for sense. We show that a filtering mechanism based on a sense inventory optimized for substitutability can improve the results of these models. Our sense inventory is constructed using a clustering method which generates paraphrase clusters that are congruent with lexical substitution annotations in a development set. The results show that lexical substitution can still benefit from senses which can improve the output of vector space paraphrase ranking models.
empirical methods in natural language processing | 2015
Anne Cocos; Aaron J. Masino; Ting Qian; Ellie Pavlick; Chris Callison-Burch
Crowdsourcing platforms are a popular choice for researchers to gather text annotations quickly at scale. We investigate whether crowdsourced annotations are useful when the labeling task requires medical domain knowledge. Comparing a sentence classification model trained with expert-annotated sentences to the same model trained on crowd-labeled sentences, we find the crowdsourced training data to be just as effective as the manually produced dataset. We can improve the accuracy of the crowd-fueled model without collecting further labels by filtering out worker labels applied with low confidence.
conference of the european chapter of the association for computational linguistics | 2017
Anne Cocos; Chris Callison-Burch
north american chapter of the association for computational linguistics | 2018
Marianna Apidianaki; Guillaume Wisniewski; Anne Cocos; Chris Callison-Burch
north american chapter of the association for computational linguistics | 2018
Anne Cocos; Marianna Apidianaki; Chris Callison-Burch
empirical methods in natural language processing | 2018
Anne Cocos; Veronica Wharton; Ellie Pavlick; Marianna Apidianaki; Chris Callison-Burch
empirical methods in natural language processing | 2017
Ross Mechanic; Dean Fulgoni; Hannah Cutler; Sneha Rajana; Zheyuan Liu; Bradley Jackson; Anne Cocos; Chris Callison-Burch; Marianna Apidianaki