Arman Cohan
Georgetown University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arman Cohan.
european conference on information retrieval | 2015
Luca Soldaini; Arman Cohan; Andrew Yates; Nazli Goharian; Ophir Frieder
Keeping current given the vast volume of medical literature published yearly poses a serious challenge for medical professionals. Thus, interest in systems that aid physicians in making clinical decisions is intensifying. A task of Clinical Decision Support (CDS) systems is retrieving highly relevant medical literature that could help healthcare professionals in formulating diagnoses or determining treatments. This search task is atypical as the queries are medical case reports, which differs in terms of size and structure from queries in other, more common search tasks. We apply query reformulation techniques to address literature search based on case reports. The proposed system achieves a statistically significant improvement over the baseline (29% – 32%) and the state-of-the-art (12% – 59%).
north american chapter of the association for computational linguistics | 2015
Arman Cohan; Luca Soldaini; Nazli Goharian
Citation sentences (citances) to a reference article have been extensively studied for summarization tasks. However, citances might not accurately represent the content of the cited article, as they often fail to capture the context of the reported findings and can be affected by epistemic value drift. Following the intuition behind the TAC (Text Analysis Conference) 2014 Biomedical Summarization track, we propose a system that identifies text spans in the reference article that are related to a given citance. We refer to this problem as citance-reference spans matching. We approach the problem as a retrieval task; in this paper, we detail a comparison of different citance reformulation methods and their combinations. While our results show improvement over the baseline (up to 25.9%), their absolute magnitude implies that there is ample room for future improvement.
empirical methods in natural language processing | 2015
Arman Cohan; Nazli Goharian
We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article’s content. Our method overcomes the problem of inconsistency between the citation summary and the article’s content by providing context for each citation. We also leverage the inherent scientific article’s discourse for producing better summaries. We show that our proposed method effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of ROUGE scores on TAC2014 scientific summarization dataset. While the dataset we use for evaluation is in the biomedical domain, most of our approaches are general and therefore adaptable to other domains.
north american chapter of the association for computational linguistics | 2016
Arman Cohan; Sydney Young; Nazli Goharian
Online mental health forums provide users with an anonymous support platform that is facilitated by moderators responsible for finding and addressing critical posts, especially those related to self-harm. Given the seriousness of these posts, it is important that the moderators are able to locate these critical posts quickly in order to respond with timely support. We approached the task of automatically triaging forum posts as a multiclass classification problem. Our model uses a supervised classifier with various features including lexical, psycholinguistic, and topic modeling features. On a dataset of mental forum posts from ReachOut.com1, our approach identified critical cases with a F-score of over 80%, showing the effectiveness of the model. Among 16 participating teams and 60 total runs, our best run achieved macro-average F1-score of 41% for the critical categories (The best score among all the runs was 42%).
International Journal on Digital Libraries | 2018
Arman Cohan; Nazli Goharian
The rapid growth of scientific literature has made it difficult for the researchers to quickly learn about the developments in their respective fields. Scientific summarization addresses this challenge by providing summaries of the important contributions of scientific papers. We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure. Citation texts often lack the evidence and context to support the content of the cited paper and are even sometimes inaccurate. We first address the problem of inaccuracy of the citation texts by finding the relevant context from the cited paper. We propose three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning. We then train a model to identify the discourse facets for each citation. We finally propose a method for summarizing scientific papers by leveraging the faceted citations and their corresponding contexts. We evaluate our proposed method on two scientific summarization datasets in the biomedical and computational linguistics domains. Extensive evaluation results show that our methods can improve over the state of the art by large margins.
north american chapter of the association for computational linguistics | 2016
Arman Cohan; Kevin Meurer; Nazli Goharian
Extraction and interpretation of temporal information from clinical text is essential for clinical practitioners and researchers. SemEval 2016 Task 12 (Clinical TempEval) addressed this challenge using the THYME 1 corpus, a corpus of clinical narratives annotated with a schema based on TimeML 2 guidelines. We developed and evaluated approaches for: extraction of temporal expressions (TIMEX3) and EVENTs; TIMEX3 and EVENT attributes; document-time relations; and narrative container relations. Our approach is based on supervised learning (CRF and logistic regression), utilizing various sets of syntactic, lexical and semantic features with addition of manually crafted rules. Our system demonstrated substantial improvements over the baselines in all the tasks.
international conference on bioinformatics | 2014
Arman Cohan; Luca Soldaini; Andrew Yates; Nazli Goharian; Ophir Frieder
Recent interest in search tools for Clinical Decision Support (CDS) has dramatically increased. These tools help clinicians assess a medical situation by providing actionable information in the form of a select few highly relevant recent medical papers. Unlike traditional search, which is designed to deal with short queries, queries in CDS are long and narrative. We investigate the utility of applying pseudo-relevance feedback (PRF), a query expansion method that performs well in keyword-based medical literature search to CDS search. Using the optimum combination of PRF parameters we obtained statistically significant retrieval efficiency improvement in terms of nDCG, over the baseline.
european conference on information retrieval | 2017
Arman Cohan; Allan Fong; Nazli Goharian; Raj M. Ratwani
Patient Safety Event reports are narratives describing potential adverse events to the patients and are important in identifying, and preventing medical errors. We present a neural network architecture for identifying the type of safety events which is the first step in understanding these narratives. Our proposed model is based on a soft neural attention model to improve the effectiveness of encoding long sequences. Empirical results on two large-scale real-world datasets of patient safety reports demonstrate the effectiveness of our method with significant improvements over existing methods.
Journal of the Association for Information Science and Technology | 2017
Arman Cohan; Sydney Young; Andrew Yates; Nazli Goharian
In recent years, social media has become a significant resource for improving healthcare and mental health. Mental health forums are online communities where people express their issues, and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self‐harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self‐harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We propose an approach for triaging user content into four severity categories that are defined based on an indication of self‐harm ideation. Our models are based on a feature‐rich classification framework, which includes lexical, psycholinguistic, contextual, and topic modeling features. Our approaches improve over the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F‐1 scores). Furthermore, using our proposed model, we analyze the mental state of users and we show that overall, long‐term users of the forum demonstrate decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
international acm sigir conference on research and development in information retrieval | 2017
Arman Cohan; Nazli Goharian
Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions. To address this problem, we propose an unsupervised model that uses distributed representation of words as well as domain knowledge to extract the appropriate context from the reference paper. Evaluation results show the effectiveness of our model by significantly outperforming the state-of-the-art. We furthermore demonstrate how an effective contextualization method results in improving citation-based summarization of the scientific articles.