Jackie Chi Kit Cheung

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jackie Chi Kit Cheung is active.

Explore More

Publication

Featured researches published by Jackie Chi Kit Cheung.

computational intelligence | 2013

MULTI-DOCUMENT SUMMARIZATION OF EVALUATIVE TEXT

Giuseppe Carenini; Jackie Chi Kit Cheung; Adam Pauls

In many decision‐making scenarios, people can benefit from knowing what other peoples opinions are. As more and more evaluative documents are posted on the Web, summarizing these useful resources becomes a critical task for many organizations and individuals. This paper presents a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary. We propose two summarizers: an extractive summarizer and an abstractive one. As an additional contribution, we show how our abstractive summarizer can be modified to generate summaries tailored to a model of the user preferences that is solidly grounded in decision theory and can be effectively elicited from users. We have tested our framework in three user studies. In the first one, we compared the two summarizers. They performed equally well relative to each other quantitatively, while significantly outperforming a baseline standard approach to multidocument summarization. Trends in the results as well as qualitative comments from participants suggest that the summarizers have different strengths and weaknesses. After this initial user study, we realized that the diversity of opinions expressed in the corpus (i.e., its controversiality) might play a critical role in comparing abstraction versus extraction. To clearly pinpoint the role of controversiality, we ran a second user study in which we controlled for the degree of controversiality of the corpora that were summarized for the participants. The outcome of this study indicates that for evaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial. In the third user study we assessed the effectiveness of our user tailoring strategy. The results of this experiment confirm that user tailored summaries are more informative than untailored ones.

international conference on natural language generation | 2008

Extractive vs. NLG-based abstractive summarization of evaluative text: the effect of corpus controversiality

Giuseppe Carenini; Jackie Chi Kit Cheung

Extractive summarization is the strategy of concatenating extracts taken from a corpus into a summary, while abstractive summarization involves paraphrasing the corpus using novel sentences. We define a novel measure of corpus controversiality of opinions contained in evaluative text, and report the results of a user study comparing extractive and NLG-based abstractive summarization at different levels of controversiality. While the abstractive summarizer performs better overall, the results suggest that the margin by which abstraction outperforms extraction is greater when controversiality is high, providing a context in which the need for generation-based methods is especially great.

web search and data mining | 2012

Sequence clustering and labeling for unsupervised query intent discovery

Jackie Chi Kit Cheung; Xiao Li

One popular form of semantic search observed in several modern search engines is to recognize query patterns that trigger instant answers or domain-specific search, producing semantically enriched search results. This often requires understanding the query intent in addition to the meaning of the query terms in order to access structured data sources. A major challenge in intent understanding is to construct a domain-dependent schema and to annotate search queries based on such a schema, a process that to date has required much manual annotation effort. We present an unsupervised method for clustering queries with similar intent and for producing a pattern consisting of a sequence of semantic concepts and/or lexical items for each intent. Furthermore, we leverage the discovered intent patterns to automatically annotate a large number of queries beyond those used in clustering. We evaluated our method on 10 selected domains, discovering over 1400 intent patterns and automatically annotating 125K (and potentially many more) queries. We found that over 90% of patterns and 80% of instance annotations tested are judged to be correct by a majority of annotators.

empirical methods in natural language processing | 2014

Unsupervised Sentence Enhancement for Automatic Summarization

Jackie Chi Kit Cheung; Gerald Penn

We present sentence enhancement as a novel technique for text-to-text generation in abstractive summarization. Compared to extraction or previous approaches to sentence fusion, sentence enhancement increases the range of possible summary sentences by allowing the combination of dependency subtrees from any sentence from the source text. Our experiments indicate that our approach yields summary sentences that are competitive with a sentence fusion baseline in terms of content quality, but better in terms of grammaticality, and that the benefit of sentence enhancement relies crucially on an event coreference resolution algorithm using distributional semantics. We also consider how text-to-text generation approaches to summarization can be extended beyond the source text by examining how human summary writers incorporate source-text-external elements into their summary sentences.

international joint conference on natural language processing | 2009

Topological Field Parsing of German

Jackie Chi Kit Cheung; Gerald Penn

Freer-word-order languages such as German exhibit linguistic phenomena that present unique challenges to traditional CFG parsing. Such phenomena produce discontinuous constituents, which are not naturally modelled by projective phrase structure trees. In this paper, we examine topological field parsing, a shallow form of parsing which identifies the major sections of a sentence in relation to the clausal main verb and the subordinating heads. We report the results of topological field parsing of German using the unlexicalized, latent variable-based Berkeley parser (Petrov et al., 2006) Without any language- or model-dependent adaptation, we achieve state-of-the-art results on the TuBa-D/Z corpus, and a modified NE-GRA corpus that has been automatically annotated with topological fields (Becker and Frank, 2002). We also perform a qualitative error analysis of the parser output, and discuss strategies to further improve the parsing results.

meeting of the association for computational linguistics | 2016

Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data.

Teng Long; Ryan Lowe; Jackie Chi Kit Cheung; Doina Precup

Recent work in learning vector-space embeddings for multi-relational data has focused on combining relational information derived from knowledge bases with distributional information derived from large text corpora. We propose a simple approach that leverages the descriptions of entities or phrases available in lexical resources, in conjunction with distributional semantics, in order to derive a better initialization for training relational models. Applying this initialization to the TransE model results in significant new state-of-the-art performances on the WordNet dataset, decreasing the mean rank from the previous best of 212 to 51. It also results in faster convergence of the entity representations. We find that there is a trade-off between improving the mean rank and the hits@10 with this approach. This illustrates that much remains to be understood regarding performance improvements in relational models.

Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009) | 2009

Optimization-based Content Selection for Opinion Summarization

Jackie Chi Kit Cheung; Giuseppe Carenini; Raymond T. Ng

We introduce a content selection method for opinion summarization based on a well-studied, formal mathematical model, the p-median clustering problem from facility location theory. Our method replaces a series of local, myopic steps to content selection with a global solution, and is designed to allow content and realization decisions to be naturally integrated. We evaluate and compare our method against an existing heuristic-based method on content selection, using human selections as a gold standard. We find that the algorithms perform similarly, suggesting that our content selection method is robust enough to support integration with other aspects of summarization.

Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods | 2016

Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks

Jad Kabbara; Jackie Chi Kit Cheung

Linguistic style conveys the social context in which communication occurs and defines particular ways of using language to engage with the audiences to which the text is accessible. In this work, we are interested in the task of stylistic transfer in natural language generation (NLG) systems, which could have applications in the dissemination of knowledge across styles, automatic summarization and author obfuscation. The main challenges in this task involve the lack of parallel training data and the difficulty in using stylistic features to control generation. To address these challenges, we plan to investigate neural network approaches to NLG to automatically learn and incorporate stylistic features in the process of language generation. We identify several evaluation criteria, and propose manual and automatic evaluation approaches.

empirical methods in natural language processing | 2015

Indicative Tweet Generation: An Extractive Summarization Problem?

Priya Sidhaye; Jackie Chi Kit Cheung

Social media such as Twitter have become an important method of communication, with potential opportunities for NLG to facilitate the generation of social media content. We focus on the generation of indicative tweets that contain a link to an external web page. While it is natural and tempting to view the linked web page as the source text from which the tweet is generated in an extractive summarization setting, it is unclear to what extent actual indicative tweets behave like extractive summaries. We collect a corpus of indicative tweets with their associated articles and investigate to what extent they can be derived from the articles using extractive methods. We also consider the impact of the formality and genre of the article. Our results demonstrate the limits of viewing indicative tweet generation as extractive summarization, and point to the need for the development of a methodology for tweet generation that is sensitive to genre-specific issues.

empirical methods in natural language processing | 2016

Verb Phrase Ellipsis Resolution Using Discriminative and Margin-Infused Algorithms

Kian Kenyon-Dean; Jackie Chi Kit Cheung; Doina Precup

Verb Phrase Ellipsis (VPE) is an anaphoric construction in which a verb phrase has been elided. It occurs frequently in dialogue and informal conversational settings, but despite its evident impact on event coreference resolution and extraction, there has been relatively little work on computational methods for identifying and resolving VPE. Here, we present a novel approach to detecting and resolving VPE by using supervised discriminative machine learning techniques trained on features extracted from an automatically parsed, publicly available dataset. Our approach yields state-of-the-art results for VPE detection by improving F1 score by over 11%; additionally, we explore an approach to antecedent identification that uses the Margin-Infused-RelaxedAlgorithm, which shows promising results.

Explore More