Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hongzhi Xu is active.

Publication


Featured researches published by Hongzhi Xu.


north american chapter of the association for computational linguistics | 2015

LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets

Hongzhi Xu; Enrico Santus; Anna Laszlo; Chu-Ren Huang

In this paper, we describe the system we built for Task 11 of SemEval2015, which aims at identifying the sentiment intensity of figurative language in tweets. We use various features, including those specially concerned with the identification of irony and sarcasm. The features are evaluated through a decision tree regression model and a support vector regression model. The experiment result of the fivecross validation on the training data shows that the tree regression model outperforms the support vector regression model. The former is therefore used for the final evaluation of the task. The results show that our model performs especially well in predicting the sentiment intensity of tweets involving irony and sarcasm.


international conference on computational linguistics | 2014

Corpus-based Study and Identification of Mandarin Chinese Light Verb Variations

Chu-Ren Huang; Jingxia Lin; Menghan Jiang; Hongzhi Xu

When PRC was founded on mainland China and the KMT retreated to Taiwan in 1949, the relation between mainland China and Taiwan became a classical Cold War instance. Neither travel, visit, nor correspondences were allowed between the people until 1987, when government on both sides started to allow small number of Taiwan people with relatives in China to return to visit through a third location. Although the thawing eventually lead to frequent exchanges, direct travel links, and close commercial ties between Taiwan and mainland China today, 38 years of total isolation from each other did allow the language use to develop into different varieties, which have become a popular topic for mainly lexical studies (e.g., Xu, 1995; Zeng, 1995; Wang & Li, 1996). Grammatical difference of these two variants, however, was not well studied beyond anecdotal observation, partly because the near identity of their grammatical systems. This paper focuses on light verb variations in Mainland and Taiwan variants and finds that the light verbs of these two variants indeed show distributional tendencies. Light verbs are chosen for two reasons: first, they are semantically bleached hence more susceptible to changes and variations. Second, the classification of light verbs is a challenging topic in NLP. We hope our study will contribute to the study of light verbs in Chinese in general. The data adopted for this study was a comparable corpus extracted from Chinese Gigaword Corpus and manually annotated with contextual features that may contribute to light verb variations. A multivariate analysis was conducted to show that for each light verb there is at least one context where the two variants show differences in tendencies (usually the presence/absence of a tendency rather than contrasting tendencies) and can be differentiated. In addition, we carried out a K-Means clustering analysis for the variations and the results are consistent with the multivariate analysis, i.e. the light verbs in Mainland and Taiwan indeed have variations and the variations can be successfully differentiated.


international conference on computational linguistics | 2014

Annotation and Classification of Light Verbs and Light Verb Variations in Mandarin Chinese

Jingxia Lin; Hongzhi Xu; Menghan Jiang; Chu-Ren Huang

Light verbs pose an a challenge in linguistics because of its syntactic and semantic versatility and its unique distribution different from regular verbs with higher semantic content and selectional resrictions. Due to its light grammatical content, earlier natural language processing studies typically put light verbs in a stop word list and ignore them. Recently, however, classification and identification of light verbs and light verb construction have become a focus of study in computational linguistics, especially in the context of multi-word expression, information retrieval, disambiguation, and parsing. Past linguistic and computational studies on light verbs had very different foci. Linguistic studies tend to focus on the status of light verbs and its various selectional constraints. While NLP studies have focused on light verbs in the context of either a multi-word expression (MWE) or a construction to be identified, classified, or translated, trying to overcome the apparent poverty of semantic content of light verbs. There has been nearly no work attempting to bridge these two lines of research. This paper takes this challenge by proposing a corpus-bases study which classifies and captures syntactic-semantic difference among all light verbs. In this study, we first incorporate results from past linguistic studies to create annotated light verb corpora with syntactic-semantics features. We next adopt a statistic method for automatic identification of light verbs based on this annotated corpora. Our results show that a language resource based methodology optimally incorporating linguistic information can resolve challenges posed by light verbs in NLP.


international conference on computational linguistics | 2014

Annotate and Identify Modalities, Speech Acts and Finer-Grained Event Types in Chinese Text

Hongzhi Xu; Chu-Ren Huang

Discriminating sentences that denote modalities and speech acts from the ones that describe or report events is a fundamental task for accurate event processing. However, little attention has been paid on this issue. No Chinese corpus is available by now with all different types of sentences annotated with their main functionalities in terms of modality, speech act or event. This paper describes a Chinese corpus with all the information annotated. Based on the five event types that are usually adopted in previous studies of event classification, namely state, activity, achievement, accomplishment and semelfactive, we further provide finer-grained categories, considering that each of the finer-grained event types has different semantic entailments. To differentiate them is useful for deep semantic processing and will thus benefit NLP applications such as question answering and machine translation, etc. We also provide experiments to show that the different types of sentences are differentiable with a promising performance.


Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages | 2017

Case Studies in the Automatic Characterization of Grammars from Small Wordlists

Jordan Kodner; Spencer Kaplan; Hongzhi Xu; Mitchell P. Marcus; Charles Yang

We present two novel examples of simple algorithms which characterize the grammars of low-resource languages: a tool for the characterization of vowel harmony, and a framework for unsupervised morphological segmentation which achieves state-of-the-art performance. Accurate characterization of grammars jump starts the process of description by a trained linguist. Furthermore, morphological segmentation provides gains in machine translation as well, a perennial challenge for low-resource undocumented and endangered languages.


workshop on chinese lexical semantics | 2015

A New Categorization Framework for Chinese Adverbs

Hongzhi Xu; Dingxu Shi; Chu-Ren Huang

Previous studies on the categorization of Chinese adverbs have not come to a conclusive end, in part due to their varying criteria. While many studies have focused on subcategories of adverbs, the boundaries of the subcategories themselves are not clear. As a result, there is still no clear picture where Chinese adverbs stand in the whole field of Chinese semantics. In addition, not enough features have been explored in order to derive highly cohesive categories. In this paper, we present a new categorization framework for Chinese adverbs. Firstly, four coarse-grained categories are proposed according to the semantic structures of sentences, including the proposition, modalities, aspect and other meaning components. Then, several semantic and syntactic features are used to further divide the four categories into more than ninety finer-grained subcategories. Based on the new framework, we find that adverbs in the same category function in similar ways, both semantically and syntactically.


intelligent systems design and applications | 2007

A Novel Term Weighting Scheme for Automated Text Categorization

Hongzhi Xu; Chunping Li


pacific asia conference on language information and computation | 2010

Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus

Hongzhi Xu; Kai Zhao; Likun Qiu; Changjian Hu


PACLIC | 2015

Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets.

Piyoros Tungthamthiti; Enrico Santus; Hongzhi Xu; Chu-Ren Huang; Kiyoaki Shirai


Archive | 2010

Method and system for extracting entity relationship by using structural information

Changjian Hu; Guoyang Shen; Hongzhi Xu

Collaboration


Dive into the Hongzhi Xu's collaboration.

Top Co-Authors

Avatar

Chu-Ren Huang

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Menghan Jiang

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Jingxia Lin

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Dingxu Shi

Hong Kong Polytechnic University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Enrico Santus

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Charles Yang

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qin Lu

Hong Kong Polytechnic University

View shared research outputs
Researchain Logo
Decentralizing Knowledge