Carmen Banea
University of North Texas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carmen Banea.
empirical methods in natural language processing | 2008
Carmen Banea; Rada Mihalcea; Janyce Wiebe; Samer Hassan
Although research in other languages is increasing, much of the work in subjectivity analysis has been applied to English data, mainly due to the large body of electronic resources and tools that are available for this language. In this paper, we propose and evaluate methods that can be employed to transfer a repository of subjectivity resources across languages. Specifically, we attempt to leverage on the resources available for English and, by employing machine translation, generate resources for subjectivity analysis in other languages. Through comparative evaluations on two different languages (Romanian and Spanish), we show that automatic translation is a viable alternative for the construction of resources and tools for subjectivity analysis in a new target language.
international conference on computational linguistics | 2014
Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Rada Mihalcea; German Rigau; Janyce Wiebe
In Semantic Textual Similarity, systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity. For the English subtask, we exposed the systems to a diversity of testing scenarios, by preparing additional OntoNotesWordNet sense mappings and news headlines, as well as introducing new genres, including image descriptions, DEFT discussion forums, DEFT newswire, and tweet-newswire headline mappings. For Spanish, since, to our knowledge, this is the first time that official evaluations are conducted, we used well-formed text, by featuring sentences extracted from encyclopedic content and newswire. The annotations for both tasks leveraged crowdsourcing. The Spanish subtask engaged 9 teams participating with 22 system runs, and the English subtask attracted 15 teams with 38 system runs.
workshop on graph based methods for natural language processing | 2006
Samer Hassan; Carmen Banea
This paper describes a new approach for estimating term weights in a text classification task. The approach uses term co-occurrence as a measure of dependency between word features. A random walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. We argue that by modeling feature weights using these scores, as opposed to the traditional frequency-based scores, we can achieve better results in a text classification task. Experiments performed on four standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach to feature weighting.
north american chapter of the association for computational linguistics | 2015
Eneko Agirre; Carmen Banea; Claire Cardie; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Iñigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe
In semantic textual similarity (STS), systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new datasets in English and Spanish. The annotations for both subtasks leveraged crowdsourcing. The English subtask attracted 29 teams with 74 system runs, and the Spanish subtask engaged 7 teams participating with 16 system runs. In addition, this year we ran a pilot task on interpretable STS, where the systems needed to add an explanatory layer, that is, they had to align the chunks in the sentence pair, explicitly annotating the kind of relation and the score of the chunk pair. The train and test data were manually annotated by an expert, and included headline and image sentence pairs from previous years. 7 teams participated with 29 runs.
north american chapter of the association for computational linguistics | 2016
Eneko Agirre; Carmen Banea; Daniel M. Cer; Mona T. Diab; Aitor Gonzalez-Agirre; Rada Mihalcea; German Rigau; Janyce Wiebe
Comunicacio presentada al 10th International Workshop on Semantic Evaluation (SemEval-2016), celebrat els dies 16 i 17 de juny de 2016 a San Diego, California.
meeting of the association for computational linguistics | 2007
Samer Hassan; Andras Csomai; Carmen Banea; Ravi Som Sinha; Rada Mihalcea
This paper describes the University of North Texas SubFinder system. The system is able to provide the most likely set of substitutes for a word in a given context, by combining several techniques and knowledge sources. SubFinder has successfully participated in the best and out of ten (oot) tracks in the SemEval lexical substitution task, consistently ranking in the first or second place.
Computer Speech & Language | 2014
Carmen Banea; Rada Mihalcea; Janyce Wiebe
This paper explores the ability of senses aligned across languages to carry coherent subjectivity information. We start out with a manual annotation study, and then seek to create an automatic framework to determine subjectivity labeling for unseen senses. We identify two methods that are able to incorporate subjectivity information originating from different languages, namely co-training and multilingual vector spaces, and show that for this task the latter method is better suited and obtains superior results.
IEEE Transactions on Affective Computing | 2013
Carmen Banea; Rada Mihalcea; Janyce Wiebe
Subjectivity analysis focuses on the automatic extraction of private states in natural language. In this paper, we explore methods for generating subjectivity analysis resources in a new language by leveraging on the tools and resources available in English. Given a bridge between English and the selected target language (e.g., a bilingual dictionary or a parallel corpus), the methods can be used to rapidly create tools for subjectivity analysis in the new language.
international conference on computational linguistics | 2014
Carmen Banea; Di Chen; Rada Mihalcea; Claire Cardie; Janyce Wiebe
This article presents our team’s participating system at SemEval-2014 Task 3. Using a meta-learning framework, we experiment with traditional knowledgebased metrics, as well as novel corpusbased measures based on deep learning paradigms, paired with varying degrees of context expansion. The framework enabled us to reach the highest overall performance among all competing systems.
social informatics | 2017
Shibamouli Lahiri; Carmen Banea; Rada Mihalcea
Every year, millions of students apply to universities for admission to graduate programs (Master’s and Ph.D.). The applications are individually evaluated and forwarded to appropriate faculty members. Considering human subjectivity and processing latency, this is a highly tedious and time-consuming job that has to be performed every year. In this paper, we propose several information retrieval models aimed at partially or fully automating the task. Applicants are represented by their statements of purpose (SOP), and faculty members are represented by the papers they authored. We extract keywords from papers and SOPs using a state-of-the-art keyword extractor. A detailed exploratory analysis of keywords yields several insights into the contents of SOPs and papers. We report results on several information retrieval models employing keywords and bag-of-words content modeling, with the former offering significantly better results. While we are able to correctly retrieve research areas for a given statement of purpose (F-score of 57.7% at rank 2 and 61.8% at rank 3), the task of matching applicants and faculty members is more difficult, and we are able to achieve an F-measure of 21% at rank 2 and 24% at rank 3, when making a selection among 73 faculty members.