Dragomir R. Radev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dragomir R. Radev is active.

Explore More

Publication

Featured researches published by Dragomir R. Radev.

Journal of Artificial Intelligence Research | 2004

LexRank: graph-based lexical centrality as salience in text summarization

Güneş Erkan; Dragomir R. Radev

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

Information Processing and Management | 2004

Centroid-based summarization of multiple documents

Dragomir R. Radev; Hongyan Jing; Małgorzata Styś; Daniel Tam

We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

north american chapter of the association for computational linguistics | 2000

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Dragomir R. Radev; Hongyan Jing; Malgorzata Budzikowska

We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

natural language generation | 1998

Generating natural language summaries from multiple on-line sources

Dragomir R. Radev; Kathleen R. McKeown

We present a methodology for summarization of news about current events in the form of briefings that include appropriate background (historical) information. The system that we developed, SUMMONS, uses the output of systems developed for the DARPA Message Understanding Conferences to generate summaries of multiple documents on the same or related events, presenting similarities and differences, contradictions, and generalizations among sources of information. We describe the various components of the system, showing how information from multiple articles is combined, organized into a paragraph, and finally, realized as English sentences. A feature of our work is the extraction of descriptions of entities such as people and places for reuse to enhance a briefing.

international acm sigir conference on research and development in information retrieval | 1995

Generating summaries of multiple news articles

Kathleen R. McKeown; Dragomir R. Radev

We present a natural language system which summarizes a series of news articles on the same event. It uses summarization operators, identified through empirical analysis of a corpus of news summaries, to group together templates from the output of the systems developed for ARPA’s Message Understanding Conferences. Depending on the available resources (e.g., space), summaries of different length can be produced. Our research also provides a methodological framework for future work on the summarization task and on the evaluation of news summarization systems.

Computational Linguistics | 2002

Introduction to the special issue on summarization

Dragomir R. Radev; Eduard H. Hovy; Kathleen R. McKeown

generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics, Kyoto, Japan, pages 344–348. Otterbacher, Jahna, Dragomir R. Radev, and Airong Luo. 2002. Revisions that improve cohesion in multi-document summaries: A preliminary study. In ACL Workshop on Text Summarization, Philadelphia. Papineni, K., S. Roukos, T. Ward, and W-J. Zhu. 2001. BLEU: A method for automatic evaluation of machine translation. Research Report RC22176, IBM. Radev, Dragomir, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Celebi, Hong Qi, Elliott Drabek, and Danyu Liu. 2002. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical Report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, June. Radev, Dragomir R., Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, April. Radev, Dragomir R. and Kathleen R. McKeown. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics, 24(3):469–500. Rau, Lisa and Paul Jacobs. 1991. Creating segmented databases from free text for text retrieval. In Proceedings of the 14th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, New York, pages 337–346. Saggion, Horacio and Guy Lapalme. 2002. Generating indicative-informative summaries with SumUM. Computational Linguistics, 28(4), 497–526. Salton, G., A. Singhal, M. Mitra, and C. Buckley. 1997. Automatic text structuring and summarization. Information Processing & Management, 33(2):193–207. Silber, H. Gregory and Kathleen McCoy. 2002. Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics, 28(4), 487–496. Sparck Jones, Karen. 1999. Automatic summarizing: Factors and directions. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization. MIT Press, Cambridge, pages 1–13. Strzalkowski, Tomek, Gees Stein, J. Wang, and Bowden Wise. 1999. A robust practical text summarizer. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization. MIT Press, Cambridge, pages 137–154. Teufel, Simone and Marc Moens. 2002. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28(4), 409–445. White, Michael and Claire Cardie. 2002. Selecting sentences for multidocument summaries using randomized local search. In Proceedings of the Workshop on Automatic Summarization (including DUC 2002), Philadelphia, July. Association for Computational Linguistics, New Brunswick, NJ, pages 9–18. Witbrock, Michael and Vibhu Mittal. 1999. Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, pages 315–316. Zechner, Klaus. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics, 28(4), 447–485.

language resources and evaluation | 2004

MEAD - A Platform for Multidocument Multilingual Text Summarization

Dragomir R. Radev; Timothy Allison; Sasha Blair-Goldensohn; John Blitzer; Arda Çelebi; Stanko Dimitrov; Elliott Franco Drábek; Ali Hakim; Wai Lam; Danyu Liu; Jahna Otterbacher; Hong Qi; Horacio Saggion; Simone Teufel; Michael Topper; Adam Winkel; Zhu Zhang

Abstract This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

intelligent systems in molecular biology | 2008

Identifying gene-disease associations using centrality on a literature mined gene-interaction network

Arzucan Özgür; Thuy Vu; Güneş Erkan; Dragomir R. Radev

Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. Results: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. Availability: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org Contact: [email protected]

international acm sigir conference on research and development in information retrieval | 2000

Question-answering by predictive annotation

John M. Prager; Eric W. Brown; Anni Coden; Dragomir R. Radev

We present a new technique for question answering called Predictive Annotation. Predictive Annotation identifies potential answers to questions in text, annotates them accordingly and indexes them. This technique, along with a complementary analysis of questions, passage-level ranking and answer selection, produces a system effective at answering natural-language fact-seeking questions posed against large document collections. Experimental results show the effects of different parameter settings and lead to a number of general observations about the question-answering problem.

international acm sigir conference on research and development in information retrieval | 2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal

Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

Explore More