Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hongyan Jing is active.

Publication


Featured researches published by Hongyan Jing.


north american chapter of the association for computational linguistics | 2000

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

Dragomir R. Radev; Hongyan Jing; Malgorzata Budzikowska

We present a multi-document summarizer, called MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We also describe two new techniques, based on sentence utility and subsumption, which we have applied to the evaluation of both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.


meeting of the association for computational linguistics | 2003

Discourse Segmentation of Multi-Party Conversation

Michel Galley; Kathleen R. McKeown; Eric Fosler-Lussier; Hongyan Jing

We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. A significant error reduction is obtained by combining the two knowledge sources.


meeting of the association for computational linguistics | 2004

A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree

Xiaoqiang Luo; Abraham Ittycheriah; Hongyan Jing; Nanda Kambhatla; Salim Roukos

This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content Extraction (ACE) data will be reported. We also train a coreference system using the MUC6 data and competitive results are obtained.


international acm sigir conference on research and development in information retrieval | 1999

The decomposition of human-written summary sentences

Hongyan Jing; Kathleen R. McKeown

We define the problem of decomposing human-written summary sentences and propose a novel Hidden Markov Model solution to the problem. Human summarizers often rely on cutting and pasting of the full document to generate summaries. Decomposing a human-written summary sentence requires determining: (1) whether it is constructed by cutting and pasting, (2) what components in the sentence come from the original document, and (3) where in the document the components come from. Solving the decomposition problem can potentially lead to the automatic acquisition of large corpora for summarization. It also sheds light on the generation of summary text by cutting and pasting. The evaluation shows that the proposed decomposition algorithm performs well.


international acm sigir conference on research and development in information retrieval | 1999

Information retrieval based on context distance and morphology

Hongyan Jing; Evelyne Tzoukermann

We present an approach to information retrieval based on context distance and morphology. Context distance is a measure we use to assess the closeness of word meanings. This context distance model measures semantic distances between words using the local contexts of words within a single document as well as the lexical co-occurrence information in the set of documents to be retrieved. We also propose to integrate the context distance model with morphological analysis in determining word similarity so that the two can enhance each other. Using the standard vector-space model, we evaluated the proposed method on a subset of TREC-4 corpus (AP88 and AP90 collection, 158,240 documents, 49 queries). Results show that this method improves the 11-point average precision by 8.6%.


meeting of the association for computational linguistics | 1998

Combining Multiple, Large-Scale Resources in a Reusable Lexicon for Natural Language Generation

Hongyan Jing; Kathleen R. McKeown

A lexicon is an essential component in a generation system but few efforts have been made to build a rich, large-scale lexicon and make it reusable for different generation applications. In this paper, we describe our work to build such a lexicon by combining multiple, heterogeneous linguistic resources which have been developed for other purposes. Novel transformation and integration of resources is required to reuse them for generation. We also applied the lexicon to the lexical choice and realization component of a practical generation application by using a multi-level feedback architecture. The integration of the lexicon and the architecture is able to effectively improve the system paraphrasing power, minimize the chance of grammatical errors, and simplify the development process substantially.


international conference on natural language generation | 2000

Integrating a Large-Scale, Reusable Lexicon with a Natural Language Generator

Hongyan Jing; Yael Dahan; Michael Elhadad; Kathy McKeown

This paper presents the integration of a large-scale, reusable lexicon for generation with the FUF/SURGE unification-based syntactic realizer. The lexicon was combined from multiple existing resources in a semi-automatic process. The integration is a multi-step unification process. This integration allows the reuse of lexical, syntactic, and semantic knowledge encoded in the lexicon in the development of lexical chooser module in a generation system. The lexicon also brings other benefits to a generation system: for example, the ability to generate many lexical and syntactic paraphrases and the ability to avoid non-grammatical output.


Archive | 1997

Building a Rich Large-scale Lexical Base for Generation

Hongyan Jing; Kathleen R. McKeown; Rebecca J. Passonneau

Most large lexical resources have been developed with language interpretation in mind and can not be used directly for generation We present a rich large scale lexical base for generation constructed by merging various linguistic resources Our approach meets the needs of language generation systems by providing the facilities for mapping from semantic concepts to verb sense pairs for identifying the valid subcategorization forms for a given verb sense and for representing alternations for paraphrasing power Infor mation from di erent resources enriches and constrains each other so the nal result is complete as well as accurate We show by example how this lexical base can be intergrated into a generation package and how it simpli es development process while improving system performance


north american chapter of the association for computational linguistics | 2004

A Statistical Model for Multilingual Entity Detection and Tracking

Radu Florian; Hany Hassan; Abraham Ittycheriah; Hongyan Jing; Nanda Kambhatla; Xiaoqiang Luo; H. Nicolov; Salim Roukos


north american chapter of the association for computational linguistics | 2000

Cut and paste based text summarization

Hongyan Jing; Kathleen R. McKeown

Collaboration


Dive into the Hongyan Jing's collaboration.

Researchain Logo
Decentralizing Knowledge