Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alex Chengyu Fang is active.

Publication


Featured researches published by Alex Chengyu Fang.


BioMed Research International | 2014

A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System

Jingbo Xia; Alex Chengyu Fang; Xing Zhang

Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES) is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE) algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.


international conference natural language processing | 2010

Term recognition using Conditional Random fields

Xing Zhang; Yan Song; Alex Chengyu Fang

A machine learning framework, Conditional Random fields (CRF), is constructed in this study, which exploits syntactic information to recognize biomedical terms. Features used in this CRF framework focus on syntactic information in different levels, including parent nodes, syntactic functions, syntactic paths and term ratios. A series of experiments have been done to study the effects of training sizes, general term recognition and novel term recognition. The experiment results show that features as syntactic paths and term ratios can achieve good precision of term recognition, including both general terms and novel terms. However, the recall of novel term recognition is still unsatisfactory, which calls for more effective features to be used. All in all, as this research studies in depth the uses of some unique syntactic features, it is innovative in respect of constructing machine learning based term recognition system.


ieee international conference semantic computing | 2011

Relating the Semantics of Dialogue Acts to Linguistic Properties: A Machine Learning Perspective through Lexical Cues

Alex Chengyu Fang; Harry Bunt; Jing Cao; Xiaoyue Liu

This paper describes a corpus-based investigation of dialogue acts. In particular, it attempts to answer questions about the empirical distribution of dialogue acts and to what extent dialogue acts can be automatically predicted from their lexical features. The Switchboard Dialogue Act Corpus is adopted and the SWBD-DAMSL tags used for automatic prediction. We show that 60-70% of the dialogue acts can be predicted from lexical features alone depending on different levels of granularity. We also present a mapping from SWBD-DAMSL tags to the tags of the new ISO standard for dialogue act annotation, as part of an ongoing investigation into the relationship between the structure and granularity of the tag set and classification accuracy. The paper concludes with discussions and suggestions for future work.


BioMed Research International | 2013

Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies

Jingbo Xia; Xing Zhang; Daojun Yuan; Lingling Chen; Jonathan J. Webster; Alex Chengyu Fang

To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.


ieee international conference semantic computing | 2011

Multilingual Verification of the Annotation Scheme ISO-Space

Ki Yong Lee; Alex Chengyu Fang; James Pustejovsky

ISO-Space ([1], [2]) is an emerging annotation scheme for spatial information in language. The purpose of this paper is to verify its descriptive adequacy and semantic transparency for multilingual application. As a starting point, the present verification task works on three languages, namely English, Korean and Chinese. These three are chosen, for they are typologically different from one another: English represents an inflectional analytic language, Korean an agglutinative language and Chinese, an isolating language. Such multilingual verification is required to justify ISO-Space as an international standard for its applicability to various languages other than English.


Archive | 2011

Age tagging and word frequency for learners’ dictionaries

Hanhong Li; Alex Chengyu Fang

In contemporary lexicography, particularly in learners’ dictionaries, word frequency information from large corpora has been used for entry selection, sense ranking, and collocation identification as well as selecting defining vocabulary. However, age information in linguistic corpora has not been adequately highlighted or exploited. Early experiments have demonstrated that word retrieval in long-term memory is much more influenced by the age of acquisition than word frequency. For EFL English learners, it is necessary to know what words native speakers tend to use at different ages besides frequent words. Core vocabulary contains not simply those words with high frequency but also those with even distribution in different age groups. Learners’ dictionaries with this kind of core vocabulary will be of much help for English learning and teaching as well as research in core vocabulary. Our research makes use of the age group information in the British National Corpus XML Edition (BNC XML 2007). It turns out that higher lexical coverage can be achieved when we select core vocabulary by the combined parameters of a word’s dispersion index and distributed frequency in different age groups rather than raw frequency only. Moreover, our study shows that the young age group under 15 rely more on core vocabulary than adults due to its fundamental role in language learning. For the age group over 15 years old, core vocabulary occupies a stable proportion of their vocabulary size despite age increase. Another interesting finding is that each age group tends to acquire more core words selected on a frequency-age basis than those on a raw-frequency basis.


Archive | 2015

Text genres and registers : the computation of linguistic features

Alex Chengyu Fang; Jing Cao

Introduction.- Language Resources.- Corpus Annotation and Usable Linguistic Features.- Etymological Features across Genres and Registers.- Part-of-Speech Tags and ICE Text Classification.- Verbs and Text Classification.- Adjectives and Text Categories.- Adverbial Clauses across Text Categories and Registers.- Coordination across Modes, Genres and Registers.- Semantic Features and Authorship Attribution.- Pragmatics and Dialogue Acts.- The Future.- Bibliography.- Appendix.- Index.


language resources and evaluation | 2012

Creating an interoperable language resource for interoperable linguistic studies

Alex Chengyu Fang

There are two different levels of interoperability for language resources: operational interoperability and conceptual interoperability. The former refers to the standardization of the formal aspects of language resources so that different resources can work together. The latter refers to the standardization of the notional representation of the semantic content of the analysis. This article addresses both issues but focuses on the latter through a description of the annotation and analysis of the International Corpus of English, which is a corpus for the study of English as a global language. The project is parameterised by component, regional sub-corpora and a set of pre-defined textual categories. The one-million-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is first of all a description of steps taken to ensure conformity within the project. These include corpus design, part-of-speech tagging, and syntactic parsing. The article will then present a study that examines the use of adverbial clauses across speech and writing, illustrating the imminent necessity for interoperable analysis of linguistic data.


international conference on computational linguistics | 2006

Evaluating the performance of the survey parser with the NIST scheme

Alex Chengyu Fang

Different metrics have been proposed for the estimation of how good a parser-produced syntactic tree is when judged by a correct tree from the treebank. The emphasis of measurement has been on the number of correct constituents in terms of constituent labels and bracketing accuracy. This article proposes the use of the NIST scheme as a better alternative for the evaluation of parser output in terms of correct match, substitution, deletion, and insertion. It describes an experiment to measure the performance of the Survey Parser that was used to complete the syntactic annotation of the International Corpus of English. This article will finally report empirical scores for the performance of the parser and outline some future research.


international conference natural language processing | 2006

A corpus-based empirical account of adverbial clauses across speech and writing in contemporary british english

Alex Chengyu Fang

Adverbial subordinators are an important index of different types of discourse and have been used, for example, in automatic text classification. This article reports an investigation of the use of adverbial clauses based on a corpus of contemporary British English. It demonstrates on the basis of empirical evidence that it is simply a misconceived notion that adverbial clauses are typically associated with informal, unplanned types of discourse and hence spoken English. The investigation initially examined samples from both spoken and written English, followed by a contrastive analysis of spontaneous and prepared speech, to be finally confirmed by evidence from a further experiment based on timed and untimed university essays. The three sets of experiments consistently produced empirical evidence which irrefutably suggests that, contrary to claims by previous studies, the proportion of adverbial clauses are consistently much lower in speech than in writing and that adverbial clauses are a significant characteristic of planned, elaborated discourse.

Collaboration


Dive into the Alex Chengyu Fang's collaboration.

Top Co-Authors

Avatar

Jing Cao

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Xing Zhang

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jonathan J. Webster

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaoyue Liu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Yan Song

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Ying Liu

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge